-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU Usage with RabbitMQ Connector (polling interface implemented incorrectly?) #414
Comments
This is because we created a tiny layer based on camel. That's how a camel consumer should work in plain camel. There is space for improvements but I don't think it's implemented incorrectly.. the aim of this project is leveraging the camel components and the engine we already have. |
By the way I'll try to reproduce and check. |
Thanks for reporting. |
I tested the connector yesterday for about 40 minutes with this configuration
The connector didn't consume because nothing was in RabbitMQ. This is the result with Java VisualVM And the monitoring part As you can see the CPU is constantly at 12%. How are you checking the used resources? Through top? Are you using the top irix mode? |
Hi, thank you very much for looking into the issue. I've prepared a docker-compose file to reproduce my issue: https://github.com/Thylossus/docker-kafka-camel-rabbitmq-connector-cpu-usage-scenario. I did not have the time yesterday to do so. When I start all containers and setup the RabbitMQ connector, the CPU usage almost immediately increases significantly. I use top in Irix mode to check the resources:
However, on our test systems we also have New Relic Monitoring which reports high CPU usage as well. With the docker-compose setup on my local machine, you can also see the heavy load on the host system: Maybe the cause lies within my connector configuration (I've also tried adding Best regards, Tobias |
Thanks for this. I forgot to say that I was testing on kafka 2.4.0 |
We'll a look. @orpiske do you want to try to reproduce? |
Yes I do! I will take a look at that. |
I've just tested Kafka version 2.4.0 in the Connect Docker container and still got the same results. Thank you both for investigating this 👍 |
After ten minutes of running I'm still
|
I just modified the cpu_quota here in kafka_connect service:
|
@oscerd I managed to reproduce it. |
Me too. But with a little modification to conf it seems better |
After one hour
|
@oscerd Did you get your results with or without the
|
When I set |
I don't think it's a workaround at all, because basically this happens only in a container context.. I mean if you run the same configuration in standalone kafka connect you won't get the same resource consumption. It's something we need to investigate for sure by the way. |
I agree that we need to investigate this. I am not sure if it's a bug either. As usual for performance issues, there's a lot of things involved. I did try w/ a different connector to see if there would be any similarity and there was none with regards to CPU, but there were other strange things nonetheless. So, yeah, I guess we need to take a closer look. |
Another thing I noticed: the reproducer has the tasks.max set to 10. This is not yet supported. After I decreased it to 1, the CPU usage on my case decreased significantly. |
Yes, those connector actually doesn't work in distributed mode, we need to support it in some way. Still WIP. |
Have you had a look at the load on the individual cores of your system? Setting |
I did not look individually to a specific core, but the load did decrease by a factor of 10 (ie.: from ~31% previously to ~3% after that). Good catch. |
Thank you both for spending so much time with investigating this issue 👍 I've configured our deployment so that the docker container's quota is fixed to 0.75 CPU cores and it works fine:
I run the connector with I've also updated the sample with the latest settings. While we can work with the current solution, I'm not sure if this issue can be closed yet, but I'll leave the decision with you. |
Let's leave this open |
I would focus on understanding of this specific to rabbitmq connector/component |
Today I did play a bit with this and I think we may have some room to improve the CamelSourceTask. I adjusted the code a little bit and managed to reduce both the CPU usage (from ~9% to ~0.4% on this computer), GC allocations and heap usage (from ~350Mb to ~290Mb) while idle. With the change, the code spends far less time in the CamelTask.poll() than the current one we have. Of course, this is just an initial proof of concept. Although it passes through all the tests, we need to do more serious testing, reviews, cleanups, etc ... but it looks there's something we can do here. |
Really well done @orpiske . Can you open a PR with your updates? |
And thanks a lot! |
Thanks @oscerd! That was a fun one to play with. Yes I can. I will cleanup the patches a little bit and I'll send one for review on Monday ... Tuesday at worst. I also put some code in place to make it easier for us to debug problems like this the future. |
This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue apache#414.
This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue apache#414.
This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.
Thank you for looking into it! The PR looks good. Looking forward to version 0.5.0 :) |
This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.
This adds a reference implementation for checking the resource usage of the RabbitMQ component while iddle. The motivation for this is related to the github issue #414.
I think we can close this one. |
Hi,
I'm using the Camel Rabbit MQ Connector as a source connector for my Kafka cluster. Kafka connect is running in a Docker container built with the following Dockerfile
When running this configuration and adding a RabbitMQ connector with the following configuration
the connector task created by this configuration consumes all resources of its assigned CPU core.
I've looked at the thread dumps and activated
TRACE
logging and came up with the following conclusion:The CamelSourceTask does not seem to properly implement the
poll
method of the abstract SourceTask class.The JavaDoc for the
poll
method is as follows (source):Looking at the implementation I cannot see how the contract w.r.t. "should block, but return control to the caller regularly" is fulfilled. Since
consumer.receiveNoWait();
is called, exchanges are not read in a blocking manner and the subsequent code is not blocking either.This causes the
execute
method of WorkerSourceTask to continuously call the poll method which immediately returns if no exchange is available (due to thebreak
statement in the while loop) effectively creating an infinite loop which consumes all CPU resources.I would be grateful if someone could take a look at my analysis. Maybe there is some configuration option that solves my issue, but I was not able to find it. Looking forward to hearing from anyone.
Best regards,
Tobias
P.S.: when
TRACE
logging is activated, the logs immediately show that there is some kind of resource-intensive loop:The corresponding log statements can be found in the
execute
method of WorkerSourceTaskThe text was updated successfully, but these errors were encountered: