STORM-399 update KafkaConfig.maxOffsetBehind default to be Long.MAX_VALUE#183
Merged
asfgit merged 1 commit intoJul 24, 2014
Conversation
Member
|
+1 |
|
👍 |
|
+1 - though should behaviour be configurable to either skip to latest offset, or log warn that topology has fallen x behind spout. Or should we monitor offsets of broker and topology independently for long-running topologies that may consume slower than broker receives messages but cannot skip ahead? |
|
+1 on this change. |
knusbaum
pushed a commit
to knusbaum/incubator-storm
that referenced
this pull request
Feb 11, 2015
Update carbonite/kryo version [S105108]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've recently upgraded to storm and storm-kafka
0.9.2-incubating, replacing the https://github.com/wurstmeister/storm-kafka-0.8-plus spout I was using previously.I have a large kafka log that I needed processed. I started my topology with
I then needed to make some tweaks in my application code and restarted the topology with spoutConfig.forceFromStart = false. Expecting to pick up where I left off in my kafka log. Instead the kafka spout started from the latest offset. Upon investigation I found this log message in my storm worker logs
Digging in the storm-kafka spout I found this line
https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java#L95
To fix this problem I ended up setting my spout config like so
Now finally to my question.
Why would the kafka spout skip to the latest offset if the current offset is more then 100000 behind by default?
This seems like a bad default value, the spout literally skipped over months of data without any warning.
This pull request sets the default value to
Long.MAX_VALUE