-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offset Handling #72
Comments
Note that in hard-coding offsets of 0 doesn't work well. You will see exceptions like:
|
Hi @jcrobak, it is commented on purpose and these lines relate to the version of Kafka that will support offsets commit to brokers. This features is not available in 0.8.0 which should be released very shortly. The error you are getting in the second comment is because Kafka has rolled (deleted) the file which contained the offset you are trying to fetch. Did you get this error after using If you look at parameters for seek, using If reprocessing messages is a problem for you, the best approach is to store (in zookeeper, or your favorite data store) the offsets where your consumer is at and use it the next time you restart the consumer. Kafka-python does not provide this functionality, partly because the goal for Kafka is to eventually support this instead of relying on a third party such has zookeeper. The commented lines you referred to are meant exactly to support this, whenever it becomes available. IMHO, any consumer using Kafka needs to have checks for duplicate for robustness sake. |
Thanks for the response @mrtheb. I have a few follow-up questions...
Is the feature available in Apache Kafka trunk, is that why this code exists? If that's the case, it seems like there should be a version check, and if it's < 0.8.1, then offset should be determined via
As mentioned above, this seems like a bug. But it'd be a change in behavior to call
Isn't the storing of offsets implemented in
Definitely, in my case it's more a matter of speed/efficiency. |
OK, I see now that this is not implemented on the kafka side per the docs: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit%2FFetchAPI It'd be great if the docs for kafka-python were updated. I can put together a PR. |
My turn!
Yes, you can see in the Kafka trunk that there is support for an OffsetCommitRequest. However, the library is pinned with the Kafka release and 0.8.0 (just released, yeah!) doesn't support this yet.
This really needs to be made clear on the front page. This method will work only when used with Kafka trunk that supports this, otherwise, it will fail. I haven't tried personally since I am using Zookeeper to store the offsets. I am doing this outside the library. Go ahead with the PR if you can. I also think it is a bug if you perform seek(0, 0) and following calls to fetch data returned the error you have. Thoughts on calling seek implicitly? It is just an opinion but, if it were mapped directly on the Kafka clients, there would be a ZookeeperConsumer where this would make sense. kafka-python is halfway between the SimpleConsumer and ZookeeperConsumer in the sense that it does more the the first but less than the latter. I am not against the idea but I think that doing seek implicitly would require leaving some control to the caller to modify the offsets. This is one of the areas where the Kafka ZookeeperConsumer was also criticized. You can find some info here. For this reason, I believe it is a better option to leave it out, even if once OffsetCommit is fully supported. |
how are you storing data in zookeeper? Are you doing something like #38? |
Not exactly, I did some custom work on top of the client before this PR appeared and I have to admit I didn't have the courage to make it generic enough and apply the recipe to kafka-python. Also, as far as I could see, #38 only implements rebalance and not offset commit in zookeeper. Still, since kazoo is integrated in this fork, you could reuse it and just override |
I think this issue is resolved |
The SimpleConsumer has some commented-out code for pulling offsets from the brokers: https://github.com/mumrah/kafka-python/blob/33cde520de9067845d4c7159a2c2834846e1957f/kafka/consumer.py#L100
Is this commented out on purpose? It's unclear to me if "Uncomment for 0.8.1" is referring to a version of Kafka or a version of kafka-python.
I tried to uncomment this, and I get an error in
encode_offset_fetch_request
/write_short_string
. Is offset handling broken or should I be able to get this to work?I am using a work-around now of
consumer.seek(0, 0)
, but this is not ideal, since I have to reprocess every entry each run.The text was updated successfully, but these errors were encountered: