New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1237] Create AmqpIO #1725
Conversation
Refer to this link for build results (access rights to CI server needed): |
Is this ready for review? Perhaps assign a reviewer? |
I'm updating/polishing this one. It should be ready for review just after. |
Resuming/finalizing my work on this one. |
f446eed
to
eb3afd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, here's a first round.
* AmqpIO supports AMQP 1.0 protocol using the Apache QPid Proton-J library. | ||
* | ||
* <p>If you want to use AMQP 0.9 protocol, you might also be interested in the Apache Beam | ||
* RabbitMqIO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which does not exist, right? So better not mention it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in another pull request. Ok to remove this note.
for (String address : addresses()) { | ||
stringBuilder.append(address).append(" "); | ||
} | ||
builder.add(DisplayData.item("addresses", stringBuilder.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joiner.on(" ").join(addresses())
Tracker tracker = messenger.incomingTracker(); | ||
checkpointMark.trackers.add(tracker); | ||
currentTimestamp = new Instant(message.getCreationTime()); | ||
if (currentTimestamp.isBefore(checkpointMark.watermark)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, why? This will set the watermark to the lowest message timestamp ever observed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the watermark is the lowest message timestamp in the checkpoint (waiting finalize). Any other suggestion ?
* A {@link PTransform} to read/receive messages using AMQP 1.0 protocol. | ||
*/ | ||
@AutoValue | ||
public abstract static class Read extends PTransform<PBegin, PCollection<String>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this return a PCollection<Message>
, just like Pubsub and Kafka do? Likewise for write.
@Override | ||
public boolean advance() { | ||
messenger.recv(); | ||
if (messenger.incoming() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert this to reduce nesting
for (String address : addresses()) { | ||
stringBuilder.append(address).append(" "); | ||
} | ||
builder.add(DisplayData.item("addresses", stringBuilder.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
Message message = Message.Factory.create(); | ||
message.setAddress(address); | ||
if (spec.subject() != null) { | ||
message.setSubject(spec.subject()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit weird that this is the only field we allow to set except body. Would be less weird if the IO took Message's rather than String :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
public void testRead() throws Exception { | ||
PCollection<String> output = pipeline.apply(AmqpIO.read() | ||
.withMaxNumRecords(10) | ||
.withAddresses(Collections.singletonList("amqp://~localhost:" + port))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work under ipv6 too? I think I've seen trouble before with explicitly using localhost...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work but I will double check anyway.
}; | ||
sender.start(); | ||
pipeline.run(); | ||
sender.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in try/finally w.r.t. pipeline.run()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant.
receiver.start(); | ||
|
||
List<String> data = new ArrayList<>(); | ||
for (int i = 0; i < 1000; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fewer messages and make the test faster and less log-spammy?
cb1cf7e
to
694ff7e
Compare
@jkff I implemented what you suggested: dealing directly with |
I'm working on:
|
694ff7e
to
e58dda8
Compare
ffc6517
to
dd52740
Compare
@jkff I updated the PR with a |
@jkff I think the PR is ready for a new review round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. For the coder problem - is it possible that Message.encode/decode is non-deterministic? Are you familiar with the encoding format?
Tracker tracker = messenger.incomingTracker(); | ||
checkpointMark.trackers.add(tracker); | ||
currentTimestamp = new Instant(message.getCreationTime()); | ||
if (currentTimestamp.isBefore(checkpointMark.watermark)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like a very useful definition of watermark - "min(all seen messages)" will only go backwards; watermarks should not go backwards.
} | ||
|
||
/** | ||
* Define the AMQP messages subject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? The user can already set the subject on the Message in the PCollection.
public void processElement(ProcessContext processContext) throws Exception { | ||
AmqpMessage message = processContext.element(); | ||
for (String address : spec.addresses()) { | ||
message.getMessage().setAddress(address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the current element is forbidden, you need to make a copy
import org.apache.qpid.proton.message.Message; | ||
|
||
/** | ||
* Extend AMQ ProtonJ Message to override the equals() method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary per discussion on beam-dev - please remove this class and let's figure out why coder-based equality is not helping.
return new AmqpMessageCoder(); | ||
} | ||
|
||
// private static final int[] MESSAGE_SIZES = [1 << 14 /* 16 KiB */,1 << 20 /* 1 MiB */, 1 << 26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unfinished.
@Override | ||
public List<CoderProvider> getCoderProviders() { | ||
return ImmutableList.of( | ||
CoderProviders.forCoder(TypeDescriptor.of(Message.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provides a coder for Message, but the API is now in terms of AmqpMessage. I suppose this comment can be resolved once you remove AmqpMessage.
dd52740
to
6cc24e5
Compare
@jkff I updated according to your comments. Especially, I changed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@Override | ||
public void encode(Message value, OutputStream outStream) throws CoderException, IOException { | ||
byte[] data = new byte[4096]; | ||
int bytesWritten = value.encode(data, 0, data.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will this do if the message is bigger than 4096 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fail. I will add a test and a set of messages size that I will test.
|
||
private static final Logger LOG = LoggerFactory.getLogger(AmqpIOTest.class); | ||
|
||
private static int port; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be static, I think
/** | ||
* Tests on {@link AmqpIO}. | ||
*/ | ||
public class AmqpIOTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RunWith(JUnit4.class)
public void testRead() throws Exception { | ||
PCollection<Message> output = pipeline.apply(AmqpIO.read() | ||
.withMaxNumRecords(100) | ||
.withAddresses(Collections.singletonList("amqp://~localhost:" + port))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this test start an AMQP broker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMQP broker can be optional: as the Reader
create a Messenger
, the Messenger
is able to directly receive messages (point to point).
3dba8ee
to
5b250f4
Compare
@jkff I updated the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! OK to self-merge after addressing remaining minor comments.
// they bind the listener | ||
List<UnboundedAmqpSource> sources = new ArrayList<>(); | ||
if (desiredNumSplits > 0) { | ||
for (int i = 0; i < desiredNumSplits; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about for (int i = 0; i < Math.max(1, desiredNumSplits); ++i)
|
||
private static final int[] MESSAGE_SIZES = new int[]{ | ||
8 * 1024, // 8 KiB | ||
64 * 1024, // 62 KiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not "64KB" ? :) I think actually these comments are unnecessary anyway.
encode(value, outStream, maxMessageSize); | ||
return; | ||
} catch (Exception e) { | ||
if (maxMessageSize == MESSAGE_SIZES[MESSAGE_SIZES.length - 1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can slightly simplify:
for (maxMessageSize) {try {encode; return;} catch {continue;}}
throw new CoderException("...");
(gets rid of the if
)
}; | ||
sender.start(); | ||
pipeline.run(); | ||
sender.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant.
LOG.info("Starting pipeline"); | ||
pipeline.run(); | ||
LOG.info("Join receiver thread"); | ||
receiver.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should be in try/finally
message.setSubject("test"); | ||
data.add(message); | ||
} | ||
pipeline.apply(Create.of(data).withCoder(AmqpMessageCoder.of())).apply(AmqpIO.write()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is withCoder still necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's still required on the Create()
.
message.setSubject("test"); | ||
AmqpMessageCoder coder = AmqpMessageCoder.of(); | ||
|
||
byte[] encoded = CoderUtils.encodeToByteArray(coder, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message clone = CoderUtils.clone(coder, message)
|
||
@Test | ||
public void encodeDecodeLargeMessage() throws Exception { | ||
thrown.expect(CoderException.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that "cannot encode more than 64MB" is part of what the user wants from this coder. I'd prefer if you had a test that it can encode a moderately sized message, say a few MB.
…r deals the whole message (not only the body) TODO: fix the Coder
… the Write, change the watermark
…ows an exception is the message is larger than supported
5b250f4
to
7b63c94
Compare
Seems like AmqpIOTest sometimes hangs on my machine, hanging ... Would you mind rolling this back / disabling the test, investigating the failure, and rolling forward? |
Let me disable the test for now. Does it occur only on your machine (I didn't see that on Jenkins) ? |
It occurs pretty reliably on my machine with |
OK, thanks, I'm disabling the tests and investigate. |
Hmmm, I don't reproduce on my machine. Let me investigate deeper. |
Can you try |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.