Does brod in its current form support 0.7 Kafka? I have it running and it seems fine, but I believe I saw somewhere on the brod github repo where it explicitly said it doesn't support 0.7.
Any details about support current and future would be greatly appreciated.
brod doesn't support compression, which was added as an option in 0.7. Otherwise, I believe it's compatible.
Thank you David. Are you planning on including compression support at some point or adding future support to brod, inline with advances that take place in Kafka?
That's my intention, but I'm afraid I can't give any firm timelines.
We are looking at using kafka and brod in our environment. Would you be interested in having us develop the compression piece for brod and submitting as a pull request?
Sure thing. :-)
My coworker has been working on compression support for Brod. And he has a first pass that he would like to share with you. It is a work in progress but basic compression (producer) and decompression (consumer) are working for gzip. He has a few questions and we are hoping to work with you on resolving them as they could potentially impact other parts of Brod. Once we have a better understanding of how to proceed we can continue refining compression support for gzip.
My coworkers handle is rasprague.
I've branched this repo into https://github.com/rasprague/brod. This branch adds Kafka 0.7 message header support as well as preliminary gzip compression support.
GZIP compression works, but I'm not very pleased with how I've implemented it. Also, automatic GZIP decompression has been implemented, but this causes the "expected offset" unit test to fail.
I'd be honored if you could take a look at my changes and maybe give me a little guidance in how to better implement auto decompression, as well as correcting the offset calculation for compressed messages.
Please see https://github.com/rasprague/brod/blob/master/notes.txt for more details.
Richard "spike" Sprague
Awesome, thank you. I'll look it over this weekend.
I think your troubles primarily come from problems in our implementation. Carlo and I were kind of discovering Kafka as we went along with this implementation, and we didn't really see the compression bit coming.
Breaking expected offsets with auto-decompression will cause a lot of problems, both for the simple API and the ZKConsumer. I think the solution (and a pretty decent sized refactoring) would be to make MessageSets smarter, and to use them more extensively. So basically, a BaseKafka.fetch() would return a MessageSet, and the MessageSet would encapsulate whether something has compression, would return the appropriate offsets, etc. MessageSets should be constructable given just a byte stream. The proper behavior for iterating a compressed set of offsets/messages should be to have the offset be the offset of the real compressed message boundary, not the individual messages inside. So for instance:
(0, "Hello world"),
(0, "This is compressed."),
(0, "If this read fails, we're going to have to query offset 0 again.")
I have a little free time coming up late this week, so I should be able to take a stab at this. You're more than welcome to dive into this yourself as well. Until this is done though, I think auto-decompression shouldn't happen in brod.
As for the other code -- it looks pretty reasonable to me given the current state of things. I'd encourage you to automatically detect which header you're getting on the fetch/parse side of things (0.6 vs. 0.7 via the magic number). We should only need to use the ugly VERSION_0_7 thing on the producer side of things. You could even make this an optional arg to BaseKafka -- I always thought having that define was an ugly kludge on my part.
Thank you very much for taking an interest in brod. :-)
Great! I can start work on factoring out the VERSION_0_7 boolean from the code, then I'll see about tackling the MessageSet refactoring. I'll keep you posted on my progress.
I've finally been able to put in some time into refactoring the code. I've moved message parsing into MessageSet, and added Message and CompressedMessage objects which get returned by MessageSet's parse method. CompressedMessage has a method to decompress itself and return another MessageSet. I've left automatic decompression out of brod, so as not to mess with the expected offset functionality, but the CompressedMessage objects makes handling compression a little easier. I've also had to modify TestKafkaBlockingWithCompression in test_brod.py to do the offset and message checking in two distinct steps.
I'd like to know what you think, if and when you have time to take a look at my changes.
I don't know if my last post may have fell through the cracks. I've made quite a bit of progress getting message compression support into brod. When you have time, I'd like it if you could take a look at my changes, make sure I'm headed in the right direction, and maybe make some recommendations on how to get compression support merged into the main line.
Here's the link to my branch: https://github.com/rasprague/brod.
Thanks a lot for your time, I look forward to hearing from you soon.
I've taken over as maintainer of brod and I just saw this thread just now. I'll take a look at your changes and do some testing on my end over the next few days and get back to you. Haven't fully caught up on the thread, but are your changes backwards compatible with 0.6?
Yes, I've made sure that my changes are backwards compatible with 0.6. Message parsing will read the magic byte to automatically determine if it's dealing with a 0.6 or a 0.7 message. As for producing messages, the BaseKafka constructor takes in an optional param to allow for either 0.6 or 0.7 message producing.
.Richard "spike" Sprague
Just wanted to check in to see if you were still interested in the 0.7 compatible branch that I cooked up. Ping me to let me know what's up.
In one of our internal tests, I'm getting some errors trying to run your brod branch against a 0.6 kafka broker:
Traceback (most recent call last):
File "prozess/daniels/test.py", line 348, in <module>
File "prozess/daniels/test.py", line 64, in test_daniels
offsets1 = get_latest_offsets([metrics_bp, events_bp])
File "prozess/daniels/test.py", line 313, in get_latest_offsets
offsets[broker_partition] = kafka.latest_offset(broker_partition.topic, broker_partition.partition)
File "/Users/carlo/Projects/datadog/brod/brod/base.py", line 489, in latest_offset
return self.offsets(topic, LATEST_OFFSET, max_offsets=1, partition=partition)
File "/Users/carlo/Projects/datadog/brod/brod/base.py", line 478, in offsets
File "/Users/carlo/Projects/datadog/brod/brod/blocking.py", line 104, in _write
File "/Users/carlo/Projects/datadog/brod/brod/base.py", line 655, in _wrote_request_size
return self._write(request, partial(self._wrote_request, callback))
File "/Users/carlo/Projects/datadog/brod/brod/blocking.py", line 104, in _write
File "/Users/carlo/Projects/datadog/brod/brod/base.py", line 660, in _wrote_request
File "/Users/carlo/Projects/datadog/brod/brod/blocking.py", line 55, in _read
chunk = self._socket.recv(length)
socket.error: [Errno 54] Connection reset by peer
In kafka's logs:
[2012-07-16 15:22:27,122] ERROR kafka.message.InvalidMessageException (kafka.network.Processor)
Has querying for the latest offset changed?
(@rasprague still interested btw, just wanna make sure it doesn't break my existing stuff)
@rasprague have you had success decoding gzipped messages?
I am on Kafka 0.7.2... datadog/brod only retrieves one message I guess because it doesn't understand 0.7 headers. I'm trying your fork and it is able to retrieve a full list of messages at least. But they don't appear to be decoded properly, just gobbledegook:
I put some debugging in and the messages are detected by Brod as gzip compression and CompressedMessage objects are instantiated but the code path that decompresses the message is never reached.
Shame the API is more ugly, but I just found this client works perfectly: https://github.com/mumrah/kafka-python
Hi @ccarollo - Thanks for your interest. This repo is no longer maintained and will be removed in 3 months (on or soon after June 30, 2015). If you are still interested in the functionality offered, we suggest looking at https://github.com/mumrah/kafka-python which is being actively maintained.