Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async producer has unbounded internal queue #297

Closed
milyenpabo opened this issue Jan 19, 2015 · 10 comments
Closed

async producer has unbounded internal queue #297

milyenpabo opened this issue Jan 19, 2015 · 10 comments

Comments

@milyenpabo
Copy link

I'm using the provided example code for SimpleProducer to send out 1 kB messages in an infinite loop, as fast as possible. What I see is that I can pass ~60k messages/sec to the kafka-python library, although only ~1k messages are actually sent out towards the kafka brokers. In the meantime, resident memory usage of the producer is going up steadily and without an upper bound.

I'm new to kafka-python, and I'm not familiar with its iternals, but it seems as if the communication between my client code and kafka-python was async (regardless of instantiating the SimpleProcuder with async=False), and messages were queued internally by the kafka-python library in an unlimited queue.

Is that intutition right? Is there a way to use the library in a kind of "blocking" mode where memory usage is limited?

I'm using kafka-python v0.9.2 with two v0.8.2 kafka brokers, all nodes are connected with a 10 Gbps network.

@milyenpabo milyenpabo changed the title Leaking memory in producer? Leaking memory in producer Jan 19, 2015
@dpkp dpkp added the producer label Jan 22, 2015
@dpkp
Copy link
Owner

dpkp commented Jan 22, 2015

this sounds odd -- the SimpleProducer should operate in a sync / blocking mode by default. can you paste a test case to reproduce?

@ediskandarov
Copy link
Contributor

I confirm that memory leak in producer with batch_send=True.

@ediskandarov
Copy link
Contributor

I guess this happens because growing queue.
We have to add queue maxsize parameter.

@milyenpabo
Copy link
Author

Thanks for the quick patch!

Though it might be that from a client point of view, blocking on send would be more desirable than raising an error. What do you think?

@ediskandarov
Copy link
Contributor

Kafka is about a big data.

If you can not send your data to kafka it is better to spill data to disk and try to resend later.
In my opinion there are more cases where no time to wait.

@dpkp
Copy link
Owner

dpkp commented Jan 23, 2015

there is currently no bounds on the async producer's internal queue size. setting either async=True or batch_send=True uses async mode and will not block but instead add messages to the internal queue. If you add messages to the producer's queue faster than it can send them over the network to kafka then you will see memory increasing as the unsent queue grows.

honestly I would not recommend the current async producer code for production use -- it needs to be fixed for several issues (this one included). at my company we wrote our own async producer class that wraps the sync simpleproducer. I will see if we can get that merged back upstream.

@dpkp dpkp changed the title Leaking memory in producer async producer has unbounded internal queue Jan 23, 2015
@wizzat
Copy link
Collaborator

wizzat commented Jan 23, 2015

FWIW, async and multiprocessing both have a host of issues. I'm still
firmly convinced that the multiprocessing consumer is worse for performance
in every way possible.

On Fri, Jan 23, 2015 at 9:32 AM, Dana Powers notifications@github.com
wrote:

there is currently no bounds on the async producer's internal queue size.
setting either async=True or batch_send=True uses async mode and will not
block but instead add messages to the internal queue. If you add messages
to the producer's queue faster than it can send them over the network to
kafka then you will see memory increasing as the unsent queue grows.

honestly I would not recommend the current async producer code for
production use -- it needs to be fixed for several issues (this one
included). at my company we wrote our own async producer class that wraps
the sync simpleproducer. I will see if we can get that merged back upstream.


Reply to this email directly or view it on GitHub
#297 (comment).

@dpkp dpkp added this to the 0.9.4 Release milestone May 12, 2015
@dpkp
Copy link
Owner

dpkp commented May 12, 2015

several PRs to fix this issue: #283, #304, #375 . will need to pick one and merge. I'm going to try to take a closer look tonight. if anyone has thoughts on the relative merits, please drop a note. Otherwise I'll just make an executive decision.

@dpkp
Copy link
Owner

dpkp commented May 13, 2015

#331 also addresses this issue

@dpkp
Copy link
Owner

dpkp commented Jun 7, 2015

#331 is merged and supports this via async_queue_maxsize and async_queue_put_timeout configuration parameters to the producer

@dpkp dpkp closed this as completed Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants