-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Max loading factor steps reached: 100/100 #402
Comments
Can you provide more logs / did you capture them? earlier entries? (you can email them to me) |
Attached log, maybe it helps. |
Any idea when this WARN message could occur? |
Hi, yes it's supposed to be when the processing is going faster than records can be queued up for processing - which is usually an indication of something going wrong. This issue is in the queue, will look more into it soon. However, it's a bit of a doozy. I don't suppose you noticed anything weird with the record processing when this happened? Was everything else appearing to be fine? We need to add the metrics end point / info so when this happens we can get more direct info from the internal state of PC. |
How often has this happened? It's a relatively arbitrary max for the loading factor - but usually 3-5x is enough. 100 should be way too much. It prevents potential OOM errors with the queue sizes. |
Hi, unfortunately we don't noticed anything strange before when this happens. How this works for us is that we get throughout the night a batch of data that our services process "when the processing is going faster than records can be queued up for processing" This is an interesting information, But why does the processing completely stop when it reaches max? How often has this happened? Was throughput going slow when this happened? Thx for the response :) |
So when this happens - no messages are getting processed? That's not what I expected 🤔
ah - if this flag is set, your function returns immediately, without really doing much? Yes that could cause it. However, it shouldn't prevent any progress being made - it's just the system telling you that it's not going to increase the loading factor beyond 100. (which means that the queued messages is about 100 * maxConcurrency setting)...
yeah that shouldn't be the case.. what indication do you see that no progress is being made? |
So we are using KafkaHQ to access our kafka topics and there we see that our service consumer group has a lag which doesn't decrease. And also we no longer see requests getting received from the target endpoint. Maybe the reason for the stop of processing is different and the WARN log message is just a result of that, because the queue doesn't get filled... |
Which version are you running? Are you not getting any failed processing logs? Can you show your processing code? |
We were running version 0.5.2.0, just upgraded to 0.5.2.3 |
yes, sure we can do that - please email me directly, or contact me on community slack. |
Ok thanks, I will ping you, but currently I am busy with other topics ;) |
FYI, I'm going to add a limiter so that the logs aren't spammed with this. |
Hi, @astubbs. |
Thanks @colinkuo for the report! Just to check - was PC still making progress? And you just see it spamming the logs with this error? Can you tell me what your processing function does, or was doing at the time? Is it possible your function was completing records immediately with no delay? I’m very much looking forward to merging the changes in 0.6 which will remove this system :) |
Hi, @astubbs According to our user function, it can take five ms on average. I'd say there might be some delay in processing a message in the user function. Would you please elaborate on how removing the system can address this issue? Thanks! |
Hi @colinkuo, @bartman64 - do you still experience this issue with version 5.0.2.7 - that has fixes for #547 and #606? I am wondering if the issue here is the same as reported in #547... The #637 seems to be somewhat related - in that it gets same warning logged - but it seems to be caused by commit during rebalance handling rather than PC not dropping stale work from queues correctly. |
Hi, we didn't run into that particular issue anymore :) |
Hi @rkolesnev We are starting to test 5.0.2.7 and will keep you posted if any updates. Thanks! |
Hi,
today we noticed an issue on our service. It stopped processing data and when we looked into the logs we saw a lot of messages stating:
"isPoolQueueLow(): Max loading steps reached 100/100"
After restarting the service it run again properly, but still I wanted to now if the logs were any indicator for the error, or if it was something else. When does this WARN message occur?
The text was updated successfully, but these errors were encountered: