Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tornado #731

Open
mrocklin opened this issue Oct 2, 2017 · 4 comments

Comments

@mrocklin
Copy link

commented Oct 2, 2017

I would like to consume and produce messages using PyKafka from within a Tornado application. I am potentially willing to contribute code for this. I have a few questions.

  1. Are the maintainers of this library comfortable adding tornado as an optional dependency? This would bloat your testing stack a bit. I'm also happy to do this externally if preferred.
  2. Is there already a mechanism to run a callback whenever data arrives? Using the non-blocking consume will work in most cases, but when it returns None we'll want to be triggered when data does arrive. Sometimes frameworks like this provide a mechanism for a callback function, which in our case could be setting an Event object. I'm particularly interested in the librdkafka consumer. As far as I can tell looking at the code there is no such mechanism exposed at the Python level, but I thought I would check. If forced we can always use a separate thread for this.

Separately, I noticed that there is no librdkafka solution for balanced consumers. Is this a limitation of librdkafka or is it that pykafka has not wrapped this functionality yet? If the latter then is there any near-term plan to support this?

@mrocklin

This comment has been minimized.

Copy link
Author

commented Oct 2, 2017

On the producer side is there any back pressure? There doesn't seem to be any block= option when calling the produce(...) method. Is there a way to emit a message in a robustly non-blocking way and err if blocking would be inevitable?

@emmett9001

This comment has been minimized.

Copy link
Member

commented Oct 2, 2017

In general I'd like to avoid adding additional dependencies where possible, but it will be helpful to understand exactly what a Tornado dependency would enable. If there's a good tradeoff between tornado-specific code in pykafka and ease of use for users of both, it could be worth the added dependency. Without knowing what this will look like, though, I'm wary of adding even more to our already somewhat bloated test requirements.

Some work was started a while ago by @mikepk on a callback mechanism similar to the one you're describing but for produced messages in #506. If it seems necessary, we can either adapt that work or start from scratch on a callback interface that would meet your needs. I think when we chat tomorrow we'll have a better understanding of the specific requirements around this.

You might be noticing that there's no balanced_consumer.py in the rdkafka directory. You can still use rdkafka with balanced consumers through the use_rdkafka kwarg on BalancedConsumer.

@jeffwidman

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2017

I noticed that there is no librdkafka solution for balanced consumers

That's because those should go the way of the dodo bird. Much better to use Kafka's native Consumer Group API's which are supported by librdkafka and also exposed by pykafka as ManagedBalancedConsumers. Pykafka's BalancedConsumer was built before these native Kafka API's existed, and (IMHO based on our production usage) has fairly serious problems such as #354

@emmett9001

This comment has been minimized.

Copy link
Member

commented Oct 5, 2017

To clarify, the callback that would solve this issue would be called here when a message becomes available on one or more partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.