-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in HttpPostEmitter causing high CPU usage #5338
Comments
I reviewed quickly but I couldn't find a bug. Could you add some logging code: add else { log.info("failed to emit in batch [%s]", batch); } Also, add batchNumber field in Batch.toString() impl; Also, add logging after successful |
We are facing the same issue, had an outage as a lot of the historicals started spinning at 100% CPU utilization and the queries couldn't return. A lot of threads are stuck here
|
@niketh could you do what I asked in the comment above? |
added it to 0.12.0 milestone. technically it appears to be a regression in 0.11.0 but it would only be patched in 0.12.0 |
@leventov I can try doing what you suggested however it is easily reproducible locally just add following code in some executor -
Please see if you can reproduce at your end. I'll let you know the outcome of extra logging. |
I have been unsuccessful so far to reproduce this issue with latest master or with unit test for emitter 0.6.0. So, I took log statements from PR #5365 and added them to emitter 0.6.0 which druid 0.11.0 uses and I can see the problem happening. Apart from what you suggested, I also added statement Here's the output -
|
Here's the output with more information, with druid 0.11.0 and emitter 0.6.0 -
|
@pjain1 thanks, is it possible to add threads and time information to the output? |
Like this: |
OK, I will do that and post updated output. |
|
Here are multiple thread dumps from multiple runs when this situation happens - https://gist.github.com/pjain1/534c5941390037663b94c7ddcf6a0a3c |
I have been trying to reproduce this issue on master but have been successful only once in multiple tries. |
@pjain1 "s" thread must complete |
After wrapping the while loop in try-catch I found that the
Looks like because of #5300 Java heap space problem is not happening in master code, that's why it is not easily reproducible in master. Nonetheless, there is a bug as if any exception happens while creating new batch, |
After upgrading to 0.11.0, some of the deployments are facing high CPU usage issue.
After taking a thread dump, it was a suspicion that emitter thread might be causing it.
To verify the issue, we added an executor in DruidCoordinator class which just keeps on emitting events in while(true) loop like this -
We found that after some time, batch.tryAddEvents method always return false and the reference in
concurrentBatch
never changes and thewhile(true)
loop just keeps on spinning without sending anything or creating new batch.Still not sure why it is happening as its not happening for all deployments, might be some concurrency issue.
@leventov
The text was updated successfully, but these errors were encountered: