Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttling logging appender #2384

Merged
merged 12 commits into from
Jun 21, 2018
Merged

Throttling logging appender #2384

merged 12 commits into from
Jun 21, 2018

Conversation

ochedru
Copy link
Contributor

@ochedru ochedru commented Jun 11, 2018

See #2376.

This PR implements a throttling mechanism for logging appenders.

The existing parameters limiting the queue size (queueSize and discardingThreshold) do not prevent the application from flooding a remote logging service. Such services usually become expensive when their usage quota is exceeded.

The proposed feature aims at providing a safety net when the application logging goes out of control.

Two new logging configuration parameters are introduced and can apply to any logging appender:

  • throttlingTimeWindow is a Duration defining a sliding window for throttling. By default, it is not set and throttling is disabled.
  • maxMessagesPerThrottlingTimeWindow is the maximum number of messages sent during the throttling time window. Once this number is reached, messages are silently discarded until there is room for new messages in the sliding time window.

@nickbabcock
Copy link
Contributor

nickbabcock commented Jun 11, 2018

I haven't looked too close at the impl, but have you tested to see if Guava's RateLimiter is sufficient?

@ochedru
Copy link
Contributor Author

ochedru commented Jun 11, 2018

The problem with Guava's RateLimiter is that it will not allow small bursts of messages without blocking, so I went for a simple implementation based on a ring buffer of timestamps. It does not block until the time window is "full".
I still have to fix the tests in the PR, though.

@nickbabcock
Copy link
Contributor

Oh man, now I'm kinda torn between the two philosophies. I have an example in my head:

Let's say you set a throttle 100 messages over 10 seconds (ie: average 10 messages a second), but the actual log rate is 100 per second.

With this impl, you see all 100 messages for the first second, but the logs are then silent for 9 seconds.

With guava, there is no silent period, but if the log rate of 100 per second lasts only 1 second and subsequently drops to 0 then guava will drop 90% of the logs in that 10 second period.

So each impl appears to have pros and cons. But in my opinion, the guava ratelimiter seems in better control of bursts -- especially if the infrastructure hosting the logs is measured / charged in ops, the ratelimiter would ensure those ops weren't exceeded. There is also no period of time with the ratelimiter where all logs are dropped until the next window.

Can you comment as to why you prefer one side over the other, so we get the whole story 😄

@ochedru
Copy link
Contributor Author

ochedru commented Jun 11, 2018

Your approach using ops makes sense as well. However, I have two problems with RateLimiter:

  • Guava will block temporarily even if we are under the limit on average. For example, if the limit is 10 logs per second, but the application is logging every second 5 messages in a row, that will take 0.5 second with the rate limiter.
    Ideally, we would implement throttling in the worker thread and this would not be a problem. But this is not possible because the worker code is deep inside logback.
    That is why I have to throttle before enqueing the message. At this point, I think it is not acceptable to wait: this makes the whole async logging useless. My throttle implementation decides without waiting if the incoming message is kept and sent to the queue, or discarded.
    (Yes, RateLimiter has the tryAcquire method to avoid blocking, but then we will likely discard messages whereas we are under the limit on average.)

  • RateLimiter is configured with an integer number of permits per second. This is less flexible that a time window and a number of messages. If I want to configure throttling for 50000 messages per day, I cannot do that directly with RateLimiter.

@nickbabcock
Copy link
Contributor

(Yes, RateLimiter has the tryAcquire method to avoid blocking, but then we will likely discard messages whereas we are under the limit on average.)

I think if you're implying that the RateLimiter prefers under-utilization -- I believe the opposite is true:

You can see in this sample that even though I only allow one qps, I can immediately acquire 1000 seconds worth:

final RateLimiter rateLimiter = RateLimiter.create(1);
final boolean acquired = rateLimiter.tryAcquire(1000);
assertThat(acquired).isTrue();

Subsequent requests will not be successfully acquired for the next 999 seconds. So for those 999 seconds, the rate limiter is over-utilized. The example is a bit contrived (we're not trying to acquire 1000 at a time), but is demonstrative.

On the topic of under-utilization and bursts, the RateLimiter rolls over some of that under utilization to handle a burst:

final RateLimiter rateLimiter = RateLimiter.create(1);
Thread.sleep(1000);
final boolean acquired = rateLimiter.tryAcquire();
final boolean acquired2 = rateLimiter.tryAcquire();
assertThat(acquired).isTrue();
assertThat(acquired2).isTrue();

Without the sleep simulating under utilization, the previous sample will fail. There is a quite a lengthy design javadoc on utilization.

RateLimiter is configured with an integer number of permits per second

I don't think this is accurate. The javadoc has create taking in a double. You can set the permitsPerSecond to be less than one. For your example: RateLimiter.create(50000 / 86400) should do it.


Not trying to push the use of RateLimiter here, I just thought it fit nicely into this situation. It has O(1) space requirements whereas the PR, as it stands, has a O(n) space requirement where n is the number of message per unit of time. A pitfall for those who may want to constrain the number of messages per day.

@ochedru
Copy link
Contributor Author

ochedru commented Jun 12, 2018

My bad, I did not read thoroughly the documentation: RateLimiter is indeed suitable for our use case. Thank you for pointing me to this page!
I updated the PR but I still have to fix the tests regarding timing differences between the CI platforms... I will try to get rid of the ugly Thread.sleep() calls using a CountDownLatch or something.

@@ -137,6 +144,17 @@ public void setDiscardingThreshold(int discardingThreshold) {
this.discardingThreshold = discardingThreshold;
}

@JsonProperty
@Nullable
public BigDecimal getMaxMessagesPerSecond() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there benefits to using a BigDecimal when it is converted into a double straightaway?

@@ -111,6 +114,8 @@

private int discardingThreshold = -1;

private double maxMessagesPerSecond = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but think if someone set

maxMessagesPerSecond: -2

that this should fail fast instead of ignoring it (eg. the dash was a typo). So I was thinking of using a validation constraint for the range (0, ∞). Unfortunately the Min annotation cannot represent an exclusive range, and DecimalMin isn't technically supported for double according to the bean validation spec -- but Hibernate Validator might support it

so you might want to try something like:

    @DecimalMin(value = "0", inclusive = false)
    private Double maxMessagesPerSecond;

And write a test that ensures that validation fails if someone enters 0 or below. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. It must be @Nullable as well, I guess.

@jplock jplock added the feature label Jun 15, 2018
@jplock jplock added this to the 1.4.0 milestone Jun 15, 2018
Copy link
Contributor

@nickbabcock nickbabcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, looks in great shape!

@@ -111,6 +115,10 @@

private int discardingThreshold = -1;

@Nullable
@DecimalMin(value="0", inclusive = false)
private Double maxMessagesPerSecond;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to support a fractional number of messages per second instead of making this an Integer or even an OptionalInt?

Copy link
Contributor

@nickbabcock nickbabcock Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fractional makes sense, if you want 30 messages per minute, you'd set to this 0.5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s too bad a Duration wouldn’t work here somehow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could be on to something, you could have it like (may not be the best name):

messageThrottle: 1ms

Here you'd be throttling logging to 1 message per millisecond.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I thought. Duration has a count and time unit which we need in this case even though it’s not the best name.

@ochedru
Copy link
Contributor Author

ochedru commented Jun 19, 2018

Following your comments, I amended the PR to use a throttle defined by a Duration.

@nickbabcock
Copy link
Contributor

Excellent! Does anyone have suggestions for a name other than messageThrottle?

  • messageRate
  • messageFrequency
  • frequencyThrottle
  • frequencyDivisor

@jplock
Copy link
Member

jplock commented Jun 19, 2018

messageRate or messageFrequency might make sense. We should probably also add a comment (with an example) of how to understand the usage of Duration

A Duration of 1m means only allow one message per minute, while a Duration of 30s means allow 30 messages per second

@nickbabcock
Copy link
Contributor

Duration of 30s means allow 30 messages per second

Slight slip-up, 30s would mean 1 message every 30 seconds 😄

@jplock
Copy link
Member

jplock commented Jun 20, 2018

@nickbabcock this looks good to me 👍

Great job @ochedru!

@nickbabcock nickbabcock merged commit c448456 into dropwizard:master Jun 21, 2018
@jplock
Copy link
Member

jplock commented Jun 21, 2018

We also need to update the release notes with this change.

@nickbabcock
Copy link
Contributor

Updated in bfce31f

@jplock jplock modified the milestones: 1.4.0, 2.0.0 Jun 22, 2018
@nickbabcock nickbabcock mentioned this pull request Sep 14, 2018
2 tasks
joschi added a commit to dropwizard/logback-throttling-appender that referenced this pull request Dec 30, 2018
Extract `ThrottlingAppenderWrapper` from Dropwizard and keep it as a separate (tiny) project.

Refs dropwizard/dropwizard#2376
Refs dropwizard/dropwizard#2384
Refs dropwizard/dropwizard#2458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants