Throttling logging appender #2384

ochedru · 2018-06-11T12:18:02Z

This PR implements a throttling mechanism for logging appenders.

The existing parameters limiting the queue size (queueSize and discardingThreshold) do not prevent the application from flooding a remote logging service. Such services usually become expensive when their usage quota is exceeded.

The proposed feature aims at providing a safety net when the application logging goes out of control.

Two new logging configuration parameters are introduced and can apply to any logging appender:

throttlingTimeWindow is a Duration defining a sliding window for throttling. By default, it is not set and throttling is disabled.
maxMessagesPerThrottlingTimeWindow is the maximum number of messages sent during the throttling time window. Once this number is reached, messages are silently discarded until there is room for new messages in the sliding time window.

nickbabcock · 2018-06-11T13:04:27Z

I haven't looked too close at the impl, but have you tested to see if Guava's RateLimiter is sufficient?

ochedru · 2018-06-11T13:14:18Z

The problem with Guava's RateLimiter is that it will not allow small bursts of messages without blocking, so I went for a simple implementation based on a ring buffer of timestamps. It does not block until the time window is "full".
I still have to fix the tests in the PR, though.

nickbabcock · 2018-06-11T13:47:21Z

Oh man, now I'm kinda torn between the two philosophies. I have an example in my head:

Let's say you set a throttle 100 messages over 10 seconds (ie: average 10 messages a second), but the actual log rate is 100 per second.

With this impl, you see all 100 messages for the first second, but the logs are then silent for 9 seconds.

With guava, there is no silent period, but if the log rate of 100 per second lasts only 1 second and subsequently drops to 0 then guava will drop 90% of the logs in that 10 second period.

So each impl appears to have pros and cons. But in my opinion, the guava ratelimiter seems in better control of bursts -- especially if the infrastructure hosting the logs is measured / charged in ops, the ratelimiter would ensure those ops weren't exceeded. There is also no period of time with the ratelimiter where all logs are dropped until the next window.

Can you comment as to why you prefer one side over the other, so we get the whole story 😄

ochedru · 2018-06-11T14:23:33Z

Your approach using ops makes sense as well. However, I have two problems with RateLimiter:

Guava will block temporarily even if we are under the limit on average. For example, if the limit is 10 logs per second, but the application is logging every second 5 messages in a row, that will take 0.5 second with the rate limiter.
Ideally, we would implement throttling in the worker thread and this would not be a problem. But this is not possible because the worker code is deep inside logback.
That is why I have to throttle before enqueing the message. At this point, I think it is not acceptable to wait: this makes the whole async logging useless. My throttle implementation decides without waiting if the incoming message is kept and sent to the queue, or discarded.
(Yes, RateLimiter has the tryAcquire method to avoid blocking, but then we will likely discard messages whereas we are under the limit on average.)
RateLimiter is configured with an integer number of permits per second. This is less flexible that a time window and a number of messages. If I want to configure throttling for 50000 messages per day, I cannot do that directly with RateLimiter.

nickbabcock · 2018-06-11T22:07:47Z

(Yes, RateLimiter has the tryAcquire method to avoid blocking, but then we will likely discard messages whereas we are under the limit on average.)

I think if you're implying that the RateLimiter prefers under-utilization -- I believe the opposite is true:

You can see in this sample that even though I only allow one qps, I can immediately acquire 1000 seconds worth:

final RateLimiter rateLimiter = RateLimiter.create(1);
final boolean acquired = rateLimiter.tryAcquire(1000);
assertThat(acquired).isTrue();

Subsequent requests will not be successfully acquired for the next 999 seconds. So for those 999 seconds, the rate limiter is over-utilized. The example is a bit contrived (we're not trying to acquire 1000 at a time), but is demonstrative.

On the topic of under-utilization and bursts, the RateLimiter rolls over some of that under utilization to handle a burst:

final RateLimiter rateLimiter = RateLimiter.create(1);
Thread.sleep(1000);
final boolean acquired = rateLimiter.tryAcquire();
final boolean acquired2 = rateLimiter.tryAcquire();
assertThat(acquired).isTrue();
assertThat(acquired2).isTrue();

Without the sleep simulating under utilization, the previous sample will fail. There is a quite a lengthy design javadoc on utilization.

RateLimiter is configured with an integer number of permits per second

I don't think this is accurate. The javadoc has create taking in a double. You can set the permitsPerSecond to be less than one. For your example: RateLimiter.create(50000 / 86400) should do it.

Not trying to push the use of RateLimiter here, I just thought it fit nicely into this situation. It has O(1) space requirements whereas the PR, as it stands, has a O(n) space requirement where n is the number of message per unit of time. A pitfall for those who may want to constrain the number of messages per day.

ochedru · 2018-06-12T09:23:31Z

My bad, I did not read thoroughly the documentation: RateLimiter is indeed suitable for our use case. Thank you for pointing me to this page!
I updated the PR but I still have to fix the tests regarding timing differences between the CI platforms... I will try to get rid of the ugly Thread.sleep() calls using a CountDownLatch or something.

nickbabcock · 2018-06-12T21:15:04Z

dropwizard-logging/src/main/java/io/dropwizard/logging/AbstractAppenderFactory.java

@@ -137,6 +144,17 @@ public void setDiscardingThreshold(int discardingThreshold) {
        this.discardingThreshold = discardingThreshold;
    }

+    @JsonProperty
+    @Nullable
+    public BigDecimal getMaxMessagesPerSecond() {


Are there benefits to using a BigDecimal when it is converted into a double straightaway?

nickbabcock · 2018-06-13T22:56:39Z

dropwizard-logging/src/main/java/io/dropwizard/logging/AbstractAppenderFactory.java

@@ -111,6 +114,8 @@

    private int discardingThreshold = -1;

+    private double maxMessagesPerSecond = -1;


I can't help but think if someone set

maxMessagesPerSecond: -2

that this should fail fast instead of ignoring it (eg. the dash was a typo). So I was thinking of using a validation constraint for the range (0, ∞). Unfortunately the Min annotation cannot represent an exclusive range, and DecimalMin isn't technically supported for double according to the bean validation spec -- but Hibernate Validator might support it

so you might want to try something like:

@DecimalMin(value = "0", inclusive = false) private Double maxMessagesPerSecond;

And write a test that ensures that validation fails if someone enters 0 or below. What do you think?

Sounds good. It must be @Nullable as well, I guess.

nickbabcock

Excellent, looks in great shape!

jplock · 2018-06-18T20:20:38Z

dropwizard-logging/src/main/java/io/dropwizard/logging/AbstractAppenderFactory.java

@@ -111,6 +115,10 @@

    private int discardingThreshold = -1;

+    @Nullable
+    @DecimalMin(value="0", inclusive = false)
+    private Double maxMessagesPerSecond;


does it make sense to support a fractional number of messages per second instead of making this an Integer or even an OptionalInt?

Yeah fractional makes sense, if you want 30 messages per minute, you'd set to this 0.5

It’s too bad a Duration wouldn’t work here somehow

I think you could be on to something, you could have it like (may not be the best name):

messageThrottle: 1ms

Here you'd be throttling logging to 1 message per millisecond.

Exactly what I thought. Duration has a count and time unit which we need in this case even though it’s not the best name.

ochedru · 2018-06-19T10:31:25Z

Following your comments, I amended the PR to use a throttle defined by a Duration.

nickbabcock · 2018-06-19T12:23:21Z

Excellent! Does anyone have suggestions for a name other than messageThrottle?

messageRate
messageFrequency
frequencyThrottle
frequencyDivisor

jplock · 2018-06-19T12:39:40Z

messageRate or messageFrequency might make sense. We should probably also add a comment (with an example) of how to understand the usage of Duration

A Duration of 1m means only allow one message per minute, while a Duration of 30s means allow 30 messages per second

nickbabcock · 2018-06-19T17:23:43Z

Duration of 30s means allow 30 messages per second

Slight slip-up, 30s would mean 1 message every 30 seconds 😄

jplock · 2018-06-20T12:35:25Z

@nickbabcock this looks good to me 👍

Great job @ochedru!

jplock · 2018-06-21T12:30:50Z

We also need to update the release notes with this change.

nickbabcock · 2018-06-21T21:28:18Z

Updated in bfce31f

Extract `ThrottlingAppenderWrapper` from Dropwizard and keep it as a separate (tiny) project. Refs dropwizard/dropwizard#2376 Refs dropwizard/dropwizard#2384 Refs dropwizard/dropwizard#2458

Olivier Chédru added 2 commits June 10, 2018 23:57

Throttling logging appender

d5b0c3b

Throttling logging appender

b8be931

Olivier Chédru added 2 commits June 12, 2018 08:39

Throttling: use Guava's RateLimiter

2b99cc5

Throttling logging appender

99f2e8a

Olivier Chédru added 2 commits June 12, 2018 12:23

Fix javadoc formatting

b508ff9

Fix code formatting

c8a813e

nickbabcock reviewed Jun 12, 2018

View reviewed changes

Use straight double for throttling parameter

6bb91bf

nickbabcock reviewed Jun 13, 2018

View reviewed changes

Olivier Chédru added 3 commits June 14, 2018 09:40

Add validation to maxMessagesPerSecond parameter

b95d6be

Add validation to maxMessagesPerSecond parameter

73f75c5

Try to fix tests

c4da9f8

jplock added the feature label Jun 15, 2018

jplock added this to the 1.4.0 milestone Jun 15, 2018

nickbabcock approved these changes Jun 15, 2018

View reviewed changes

jplock reviewed Jun 18, 2018

View reviewed changes

Throttling is now defined by a Duration

32a38bc

Rename throttle parameter to messageRate, update description

34e3f0d

jplock approved these changes Jun 20, 2018

View reviewed changes

nickbabcock merged commit c448456 into dropwizard:master Jun 21, 2018

nickbabcock mentioned this pull request Jun 21, 2018

Optional throttling for log appenders #2376

Closed

jplock modified the milestones: 1.4.0, 2.0.0 Jun 22, 2018

jplock mentioned this pull request Jun 23, 2018

Support expirable log level configurations #2375

Merged

nickbabcock mentioned this pull request Sep 14, 2018

Update to jetty 9.4.12 #2490

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttling logging appender #2384

Throttling logging appender #2384

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018 •

edited

Loading

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018

ochedru commented Jun 12, 2018

nickbabcock Jun 12, 2018

nickbabcock Jun 13, 2018

ochedru Jun 14, 2018

nickbabcock left a comment

jplock Jun 18, 2018

nickbabcock Jun 18, 2018 •

edited

Loading

jplock Jun 18, 2018

nickbabcock Jun 19, 2018

jplock Jun 19, 2018

ochedru commented Jun 19, 2018

nickbabcock commented Jun 19, 2018

jplock commented Jun 19, 2018

nickbabcock commented Jun 19, 2018

jplock commented Jun 20, 2018

jplock commented Jun 21, 2018

nickbabcock commented Jun 21, 2018

		@@ -111,6 +114,8 @@

		private int discardingThreshold = -1;

		private double maxMessagesPerSecond = -1;

Throttling logging appender #2384

Throttling logging appender #2384

Conversation

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018 • edited Loading

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018

ochedru commented Jun 11, 2018

nickbabcock commented Jun 11, 2018

ochedru commented Jun 12, 2018

nickbabcock Jun 12, 2018

Choose a reason for hiding this comment

nickbabcock Jun 13, 2018

Choose a reason for hiding this comment

ochedru Jun 14, 2018

Choose a reason for hiding this comment

nickbabcock left a comment

Choose a reason for hiding this comment

jplock Jun 18, 2018

Choose a reason for hiding this comment

nickbabcock Jun 18, 2018 • edited Loading

Choose a reason for hiding this comment

jplock Jun 18, 2018

Choose a reason for hiding this comment

nickbabcock Jun 19, 2018

Choose a reason for hiding this comment

jplock Jun 19, 2018

Choose a reason for hiding this comment

ochedru commented Jun 19, 2018

nickbabcock commented Jun 19, 2018

jplock commented Jun 19, 2018

nickbabcock commented Jun 19, 2018

jplock commented Jun 20, 2018

jplock commented Jun 21, 2018

nickbabcock commented Jun 21, 2018

nickbabcock commented Jun 11, 2018 •

edited

Loading

nickbabcock Jun 18, 2018 •

edited

Loading