New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-822: Kafka Spout New Consumer API #1131
Conversation
@hmcl I haven't looked at code yet but don't see anything trident related so I was just curious if you are planning on adding trident support here or if that is separate jira? |
import org.apache.kafka.clients.consumer.ConsumerRecord; | ||
import org.apache.storm.tuple.Values; | ||
|
||
public class KafkaRecordTupleBuilder<K,V> implements org.apache.storm.kafka.spout.KafkaTupleBuilder<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to qualify KafkaTupleBuilder
with full package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knusbaum agree. When I was moving stuff around to organize the packages this happened. It has already been cleaned up and will be in the next patch.
@tgravescs trident will be appended to this patch. In the meantime it would be helpful if you could let me know of any other requirements that you may have, and if this initial commit is covering most things that you may need. I am pushing another patch soon with better exception handling, logging, testing, etc. Thanks. |
+1 on Trident support. Thanks! |
// all the tuples that are in traffic when the rebalance occurs will be added | ||
// to black list to be disregarded when they are either acked or failed | ||
private boolean isInBlackList(org.apache.storm.kafka.spout.MessageId msgId) { | ||
return blackList.contains(msgId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably just get rid of this method and inline blackList.contains()
I don't think the blacklist is going to work in general since it is possible in Storm to get multiple ack or fail messages for the same messageId. I think it would be better to empty Everything else looks pretty good besides normal cleanup. |
public void run() { | ||
commit = true; | ||
} | ||
}, 1000, kafkaSpoutConfig.getOffsetsCommitFreqMs(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace this with a static Timer? We might have multiple instances of the spout in a single worker, and having a thread per each just to set a volatile boolean to true, feels like overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 it may be possible but depends on what we want to do. My initial implementation used this thread to do the commits to Kafka. However that causes ConcurrentModificationException on the Kafka side, as it is single threaded. The immediate fix was to just create this volatile variable and leave it as is. This was meant to be temporary all along, an just pushed it like this for an early review attending to the urgency of the patch.
Nevertheless, we should discuss if we would like to consider in the future using the Kafka option commitAsync, which receives as parameter OffsetCommitCallback, which could be used to manage the bookkeeping state. I suppose even if we consider this it won't be in this patch... but I wonder if this piece of functionality would helpful for that scenario, if we ever want to consider it.
I am referring to this comment as C1 in another comment bellow.
Looking through the code I think we are keeping track of way too much state. In some places we cannot help it, but in most places we can probably rely on storm to keep track of the state for us. First of all we probably don't need failed, emittedTuples, nor blacklist. Just put the information we need in the message id and let storm keep track of it for you. The Message ID that you pass in never leaves the spout and is never read by storm, except to check if it is null or not, so it become very convenient in implementing a best effort reply cache.
To implement blacklisting instead of clearing state, and shifting things around, just increment an epoc counter for the entire spout. Then in the logic for fail (where we do the replay) we would have something like.
And ack would be similar
What do you think? |
+1 on trident spout |
@hmcl Its obvious people are also looking for trident support but I'm wondering if it would make sense to split that into separate jira? That way we could get the regular Spout in so people waiting on that would be unblocked and then the trident after. Maybe it depends on how far along you are or how much overlap they have. What do you think? Do you want help with any parts of this? |
@tgravescs probably there is a bit of miscommunication here. I didn't complete the spout part yet because I have been waiting on getting answers to some follow up questions I made to the code review comments, such that I can try to address them completely. I will push what I have, and trident can go in separate if that fits your timeline better. |
@hmcl sorry I missed your questions in all my e-mail |
@revans2 no worries. Concerning your code snippet suggestion, I agree with most of it. We can definitely keep a lot of the state in the MessageId object. I agree it is indeed the ideal solution. I considered it initially but was concerned of how expensive it would be to keep all of that state in the MessageId. However, I don't think that Using the markAsDone(id) strategy will suffice to address the cases where tuples get acked out of order, and the cases were a sequence of tuples will get acked in sequence, but shifted from the last offset committed. In this scenario we need to have a way to keep track of the offset sequences that are ready to be committed. I still think that keeping |
I agree that we need to keep track of outstanding tuples + offsets, and the code you have around acked feels OK, but we could possibly clean it up a little. I called it markAsDone instead of addAckedTuples because I didn't really like having a tuple that failed too many times be marked as acked, but thinking about it more it is not that big of a deal. |
@tgravescs @revans2 I am just finalizing some testing and I will push in the patch after lunch. |
@tgravescs @revans2 I have pushed the latest changes. Please let me know of any feedback or further requirements you may have. Thanks. |
@jianbzhou thanks for your feedback. Let me take a look at this and I will get back to you shortly. |
thanks Hmcl. Just found below log constantly show up, seems it constantly try to commit one offset which is actually committed to kafka already – it might be caused by group rebalance – so a smaller offset (smaller than the committed offset) is acked back lately. For example(it is our assumption, kindly correct me if wrong): one consumer commit offset 1000, polled 1001~1050 messages and emitted, also message was acked for 1001 ~ 1009, then a rebalance happened, another consumer poll message from 1000 to 1025, and commit the offset to 1010, then the message 1010(was emitted before the rebalance) was acked back. This will cause 1010 will never be committed as per the logic in findNextCommitOffset method – because this offset was already commited to kafka successfully. Log is: We applied below fix - For OffsetEntry.add(KafkaSpoutMessageId msgId) method, we changed the code as per below – only add acked message when its offset is bigger than the committed offset. public void add(KafkaSpoutMessageId msgId) { // O(Log N) Could you please help take a look at the above and let me know your thoughts? Thanks. |
HI Hcml, For all above identified issues, we applied some quick and dirty fix and the testing is in progress, we will let you know the final testing result later. |
@jianbzhou thanks once again for your feedback. Can you please share the results of your tests? I am working on adding more tests to this, as well as fix some of these corner cases, possibly using some of your suggested solutions, if they are the most suitable. |
@hmcl sorry for the late reply. we have made some quick and dirty fix for above issues, I will share the new spout to you via email so you can do a quick comparison. Now it seems working for our project. Please help review and let us know your comments/concern on the fix. One customer of us who also use the spout they found some other issues:
|
@jianbzhou can you please email me what you have such that we can provide with a fix? Is there a way you can share the kafka setup that causes the issues that you mention ? I should upload my test cases later today, and that should help us address any possible issues. Thanks. |
@hmcl, could you share your email address? I will send our latest spout so you can have a quick review - this version is working in our testing env for about a week. Our customer faced one issue which seems that the load is not well distributed across all partition in 0.9 KafkaSpout, some partitions have no commit, progress...I am still waiting for the kafka setup from the customer and shall send to you once i have. |
@jianbzhou please email me to |
@hmcl , sorry for the late reply, i was on leave and just now i send the updated spout to you, pls help review. Below is the major changes:
|
@hmcl, currently if user give firstPollOffsetStrategy=UNCOMMITTED_LATEST or LATEST, the spout will not work, because if a kafka consumer re-balance happened, the offset will be seeked to the end, and there will be lots of messages not consumed/emitted/acked&failed, so will never find the next continuous offset to commit, so the log will keep showing that "Non continuous offset found"...... I have a questions here - if a spout read and emit one message, I assume storm will ensure the message will be acked or failed without exception, right? because if it is possible that one emitted message failed to get acked or failed message under some strange situations, it means we cannot find the continuous message to commit, which will directly break the spout. Could you please help confirm if my assumption is correct? If my assumption is not correct - which means one emitted message may not be able to get acked or failed message back, then we must change the spout(need a timeout setting if failed to find next continuous message to commit) - currently the spout will always find the next continuous message to commit, it will try forever... due to the spout will always find the next continuous message to commit, we need to be cautious for below method: |
@jianbzhou looking at it. |
@jianbzhou I confirm that your suggested fix for doSeekRetriableTopicPartitions is correct. I am going to include that in the next patch. |
Following the Trident API support for the new KafkaSpout implementation... Is anyone working on this? Thanks. |
@connieyang I am finishing addressing some issues brought up by the initial users of this kafka spout, as well as unit test coverage, and will push the trident API right after. |
@jianbzhou any updates on `One customer of us who also use the spout they found some other issues:
This is a bit surprising. Can you elaborate on this. Thanks. |
@hmcl, 1, work load is not distributed well is not because of the spout, that is a kafka cluster setup issue and now is resolved 2, for the other two, I dig into the log(sent to your via email) - seems everytime when a re-balance happens, the spout seek to a bigger offset than the committed offset in this partition, per my understanding, this will cause some message not be able to consumed/emitted, so all the log show "Non continuous offset found" Also, per our previous testing - we find once - a worker died and re balance happened, we find one spout(not in the died worker) have some message not acked or failed back. That also caused the "Non continuous offset found" show many times in the log, which will cause no message will be committed to kafka. The only solution will be restart the storm topology. We emit message in this way - kafkaSpoutStreams.emit(collector, tuple, msgId); Could you please help confirm - storm would ensure all the messages that emitted by the spout will be acked/failed back without exception? Because if this is not the case, the spout will not be able to find the continuous offset to commit, then we must fix this issue urgently as we plan to release the change early next month. Please help advise. thanks! |
@jianbzhou Storm guarantees that all the messages are either acked or failed. There is the property "topology.message.timeout.secs" https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/Config.java#L1669 If Storm is configured to use acks, and the acks don't arrive in a certain amount of time, the tuple will be retired. You don't have to worry about the scenario you described, and basically implement the timeout yourself. |
@hmcl, today we found one NullpointerException and i applied a fix as below: In method doSeekRetriableTopicPartitions, if one partition was never committed before one message is failed back, we will encounter below issue. Could you please help review and let me know if this fix is oaky? Code change is as below -from // kafkaConsumer.seekToEnd(rtp); // Seek to last committed offset to |
@jianbzhou thanks. Looking at it. |
@hmcl, just fyi, we found one new issue: If the committedOffset is out of range(say kafka log file removed), the when the poll() method is called, the offset will reset as the property auto.offset.reset. This will cause the newly polled message has bigger offset, so there is a break between committed offset and the acked offset, no continuous offset will be found. |
@hmcl and all, the new spout is fit for the at least once semantics and works fine for us, thanks a lot! Very recently one of our key customers asked to use a at most once implementation. Do we have any plan to have a at-most-once implementation? They set the topology.acker.executors=0 and found the spout is not working. Could you please help to evaluate - 1, will we implement this? 2. roughly how long time needed? Thanks Requirement from customer - “topology.acker.executors” is a Storm parameter, which refers to at-least-once when it’s not 0, and at-most-once if it’s 0. We want to know do we have a at-most-once implementation? |
@jianbzhou you can get obtain at most once semantics by setting maxRetries to zero. Here is the method to do so. |
@hmcl and all, we have communicated via email for a while and going forward let's talk in this thread so everyone is in same page. We want to share the latest spout to you and could you please kindly help review and merge to the community version if any fix is reasonable? we want to avoid diverging too much from the community version. Below are our major fixes:
Could you please kindly help review and let us know if you can merge it into the community version? Any comments/concern pls feel free to let us know. Btw, I just send the latest code to you via email. |
@jianbzhou: can you please clarify the "you" in this statement?
@jianbzhou : wouldn't it be more appropriate to open a new STORM-XXXX JIRA and then communicate via that JIRA? As opposed to private emails with diffs, or adding comments into a merged and done PR? |
@jianbzhou thanks for your suggested fix and for the summary of changes. I think that the best way to go about incorporating your changes is to create a JIRA with summary along the lines "Kafka Spout Improvements/Fixes", and in the description put the contents you pasted above. We can then create a pull request with the suggested changes, and after it is reviewed, merge it into master. It may not be a non trivial merge because other users have already found and fixed some bugs on the code, so likely your suggested fix has a different base commit. Furthermore, one has to evaluate how to safely add your suggested fixes on top of other fixes, which apparently are currently working. There is also a refactoring of the spout going on, which will also add more diffs, so we have to take everything into consideration. @jianbzhou I suggest the following: Can you please create the JIRA, or if you don't have the permissions to do so, please let me know, and I can do it. Then, you can either we can attach your patch to the JIRA, or create a pull request with commit header that matches the JIRA summary. That will link the pull request with the JIRA. If you decide to attach the patch, or even the file with the java source to the JIRA, I will create the pull request. @jianbzhou do you consider adopting a community version which may potentially include only part of your fixes, and have other fixes committed by other contributors. Or would you rather run in your production environment the version that you currently have ? @erikdw "you" would be "me" :). |
@erikdw @hmcl , sorry for the late reply. |
@jianbzhou : you can file a STORM JIRA ticket yourself actually -- you just need to create an Apache JIRA account. |
Thanks erikdw! sorry i was out of office in the last couple of days. I just created a jira account and will create a jira ticket asap and assign to hmcl. |
@hmcl , fired a Jira ticket STORM-2292 and attached the code to the Jira...please let me know if any comments. |
@jianbzhou Thanks for filing the JIRA. I have assigned it to me, such that it's easier to keep track and follow up on it. |
This patch is still under development and was uploaded at this moment for early testing. Please read README.
There may be a bug in the offsets management, because of diff o 1. I am looking into it.
Currently polling from an arbitrary offset is possible but it will come in the next patch, today or tomorrow
I refactored the code a bit and left, maybe, some unnecessary locking. I am also looking into it.
@connieyang @jianbzhou @tgravescs please let me know of any other requirements you may have and I will address them soon.
Thanks