New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the algorithm used for byte-based log rate limiting #727
Comments
Started a discussion in CF Slack. |
Added a new step to the proposal based off Jochen's suggestion in Slack: |
For Hypothetical scenario 1 where no log lines are allowed - I can imagine customers being confused that they're missing log lines entirely. If 1 log line is larger in bytes than the max bytes per second, can we truncate the line so they know the line exceeds the limit? |
We've I think been explicitly avoiding truncating logs to meet the rate limit. Maybe it's something we should have considered more, fair. TY for bringing it up!! |
This makes sense to me. Having some kind of API level failures where the minimum has to be met.
I think this comment makes sense as the goal of this feature. I think that instead of emitting every second, if we can figure out when it starts and stops, it makes sense.
Can you elaborate what type of meaningful information this could be? |
I'm not sure I agree with a minimum. People should feel free to set their limit low, or entirely off if need be. I'm not sure we have a good argument as to why that shouldn't be allowed. Perhaps a "hey, you sure about that" on the cli? I don't feel like a "never".
An example we've talked about is, if we want to go to a timeout type limit, including that information("You are being timed out for 1 second") in the outage warning log. |
PSA: planning to discuss this proposal at the working group meeting on Wednesday, May 5th. |
Just for the record: here's a rephrasing of the state we're in As part of implementing app level rate limiting with quotas, some changes to the algorithm of log rate limiting were made.
There's some future thoughts as well on:
|
I think that's a good rephrase, thanks Ben! Having a clear idea of (what we the think are) the benefits of the proposal definitely helps with evaluating it. |
That's why we're talking about doing a timeout box. It both allows us to reduce the number of messages saying when the logs are being emitted, and allows us to demarcate clearer when logs are being dropped. |
The outcome of the proposal discussion at the working group was general agreement on the proposal (by the stakeholders present at the meeting). |
cloudfoundry/executor#73 was merged, which implements (2). |
cloudfoundry/executor#73 has now been released in v2.77.0 |
I am going to close this issue since all related PRs have been merged. Please re-open if that's not the case. |
Enter an issue title
Change the algorithm used for byte-based log rate limiting.
Summary
Context
Diego v2.66.0 added support for a new byte-based log rate limiting mechanism with per-LRP limits (see: proposal). That mechanism currently implements a "token bucket" of size r, initially full and refilled at rate r tokens per second, where r is the log rate limit for the LRP in question. Whenever the log rate limit for an LRP is exceeded an error message (i.e.
app instance exceeded log rate limit
) is inserted into the LRPs log stream.Problem
Hypothetical scenario where many/all log lines are dropped
No log line can be emitted unless it's byte size is below the log rate limit for the LRP. This means that if the log rate limit for an LRP is set below the max log line size, 61440B (~64K), then potentially many (or even all) log lines will be dropped for being too large. We could consider this as either a feature or a bug.
Too many log rate limit exceeded error messages
The error messages are emitted every time the log rate limit is exceeded. If that's happening consistently then the error messages – which are not counted against the log rate limit – may significantly increase the actual log rate of the app. This definitely seems like a bug that we should fix.
Proposal
Diego repo
Describe alternatives you've considered (optional)
Additional Text Output, Screenshots, or contextual information (optional)
The text was updated successfully, but these errors were encountered: