New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-2648/STORM-2357: Add storm-kafka-client support for at-most-onc… #2249
Conversation
@srdo can you please assign JIRA's to you and mark them as in progress as you work on them and/or submit a pull request. Thanks. |
Yes, I forgot. |
Conceptually the changes look good to me. I have not dug into it in great detail yet, but I do like the direction of the change. I would also like to see the documentation and examples updated to reflect the new change. |
+1 |
@srdo reviewing it |
Added some tests and updated the docs. |
@hmcl Are you reviewing this, or are you satisfied with it? |
Still +1 |
@hmcl I don't mean to try to rush you, but please let me know if you're still reviewing. If not I'll probably merge in the next few days. |
@srdo apologies for the delay. I will finish today. |
Thanks :) |
…e processing and a toggle for whether messages should be emitted with a message id when not using at-least-once
…code in other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still +1.
@hmcl Are you still reviewing this? It's fine if you don't have time to look at this, but please say so. I'd like to not keep holding this up. |
This PR has been waiting for about 2 months, and once it gets +1 and no -1, it can be merged. @hmcl Please vote -1 later and rollback the merge if you have concern about the patch and would want to vote -1. |
Thanks for reviews. Opened the 1.x version here #2353. |
…e processing and a toggle for whether messages should be emitted with a message id when not using at-least-once
See https://issues.apache.org/jira/browse/STORM-2357 and https://issues.apache.org/jira/browse/STORM-2648.
I'd like to get some opinions on whether this approach is a good idea, or whether I've overlooked a better option, before finishing this up with some tests. I don't love that we'll end up with 3 different committing behaviors.
In 2357 it was noted that the spout doesn't currently support true at-most-once, because using Kafka's auto commit option leaves the possibility that the spout receives a tuple, emits it to the topology, crashes and recovers, and then receives and emits the same tuple. The linked issue suggests solving this by committing polled offsets before emitting them to the topology, which is an option added here.
2648 notes that there is currently no way to make Storm track messages when using auto commit with this spout. This prevents Storm UI from showing the complete latency for the spout, and I would assume also prevents max spout pending from having an effect. I've added a toggle to KafkaSpoutConfig to force the spout to emit messages with message ids, even when using auto commit or the at-most-once option. The spout does nothing on ack or fail when not doing at-least-once.
I'd like to keep the spout config simple for the user, so I've added a processing guarantee setting corresponding to the standard at-least-once code path, the path that uses auto commit, and the path that commits offsets before emitting any tuples.