KAFKA-3720: Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms #8399

soenkeliebau · 2020-03-31T22:53:32Z

Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms

Refactored BufferExhaustedException to be a subclass of TimeoutException so existing code that catches TimeoutExceptions keeps working.

Added handling to count these Exceptions in the metric "buffer-exhausted-records".

Test Strategy
There were existing test cases to check this behavior, which I refactored.
I then added an extra case to check whether the expected Exception is actually thrown, which was not covered by current tests.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ncrease buffer-exhausted-records metric when no memory can be allocated for a record withhin max.block.ms.

chia7712 · 2020-04-01T06:07:45Z

clients/src/main/java/org/apache/kafka/clients/producer/BufferExhaustedException.java


 /**
 * This exception is thrown if the producer is in non-blocking mode and the rate of data production exceeds the rate at
 * which data can be sent for long enough for the allocated buffer to be exhausted.
 */
-public class BufferExhaustedException extends KafkaException {
+public class BufferExhaustedException extends TimeoutException {


Could you add comment for this inheritance? this change is for keeping compatibility.

Is this for backwards compatibility? It seems to me that part of the reason is to allow the users to handle all timeout related type of exceptions with one catch clause.

Is this for backwards compatibility?

the client code catching TimeoutException to handle memory issue will be broken if BufferExhaustedException does not extend TimeoutException.

It is a bit hen and egg I guess. The main reason for adding it in this case was compatibility (at least to my mind), but what you say is a very welcome side effect.
Happy to add a clarification to the comment of course if we feel this makes sense.

chia7712 · 2020-04-01T06:12:53Z

For another, the purpose of this PR is totally different from issue (Remove BufferExhaustedException from doSend() in KafkaProducer). Please update the title of issue after this PR is accepted.

chia7712 · 2020-04-01T06:36:45Z

trace related #1417 and notice the following significant comment

It sounds semantically correct more. Callback and Future should be used for providing result of asynchronous(background) processing. However, these two TimeoutException occurrs while a KafkaProducer is still doing synchronous(foreground) processing and that(kafka producer has to do some foreground processing before it appends record to the accumulator) is the reason why a caller of producer#send is forced to wait until the result turns out, so the caller should receive the result of that call in a synchronous way.

WDYT?

…tion.

soenkeliebau · 2020-04-01T08:36:49Z

Hi @chia7712, thanks for your comment!

I've added a comment on why the Exception extends TimeoutException and generally updated the javadoc for that class a little.

Renaming the jira should not be an issue.

Regarding your comment on whether to catch and rethrow or wrap in the Future the exception, I've written my thoughts on that in the jira covering this PR. In short, I agree that it would make sense to rethrow, but that could potentially break user implementations as it changes the current functionality.

… used to test two different failure scenarios that both used to throw TimeoutException, but now one of them throws BufferExhaustedException.

ijuma · 2020-04-01T13:40:36Z

@soenkeliebau In the original PR (#1417), there was a discussion whether this exception should be thrown within the future or not. What's your reasoning for choosing the option you did?

soenkeliebau · 2020-04-01T13:46:51Z

@ijuma My main reason was that I wanted to keep current behavior as it is.
I agree with the original discussion that it would make sense to rethrow the exception, but current user code that checks the Futures outside the try-catch block which contains the call to .send() would miss the exception if we made that change.
I feel like this would be a potentially breaking change that would require proper communication.

Granted, the number of people that this might affect would probably be extremely small, but still ..

ijuma · 2020-04-01T14:28:57Z

@soenkeliebau Is that true? We currently throw TimeoutException from send if the metadata is not available within max.block.ms, so the user code already has to handle that.

soenkeliebau · 2020-04-01T14:42:56Z

That is true, but the code in that catch block can currently only be triggered by metadata refresh taking too long. If we rethrow we now add a second scenario in which that block might be executed, while bypassing the catch (TimeoutException e) block further down that has been tailored to this scenario.

Something like this for example (full disclosure, haven't tested it):

 Future<RecordMetadata> sendResult = null;
    try {
      sendResult = producer.send(new ProducerRecord<>(topic, null, "test"));
    } catch (TimeoutException e) {
      System.out.println("Metadata refresh took too long!");
      // ...
    }
    // ...
    producer.flush();
    try {
      sendResult.get();
    } catch (TimeoutException e) {
      System.out.println("Couldn't allocate memory for record within max.block.ms");
    } catch (Exception e) {
      e.printStackTrace();
    }

We'd have the same issue with Callbacks, all code looking for TimeoutExceptions in a Callback would be bypassed if we rethrow..

I freely admit that I'm probably harping on about fringe cases here!! If everybody else thinks this is a non-issue I am happy to accept that!

ijuma · 2020-04-01T15:50:45Z

@soenkeliebau Metadata refresh taking too long and allocation being blocked for too long are pretty similar. The timeout config for both is even the same (max.block.ms). I'd probably have all the exceptions within the Future if we were designing this today. But given what we have, it seems like things that block on the foreground maybe should all throw the exception in the foreground.

@hachikuji @junrao Do you have any thoughts on this?

soenkeliebau · 2020-04-01T17:27:44Z

@ijuma As I said, I agree with you that this should throw in the foreground, I just wanted to be conservative about any chance of breaking user code, maybe too conservative :)

If we make sure to mention this in the notable changes and not include it in a bugfix release that should be fine I think.

junrao

@soenkeliebau : Thanks for the PR. Looks good overall. Just a couple of comments below.

junrao · 2020-04-03T00:28:13Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -947,6 +947,8 @@ private void throwIfProducerClosed() {
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
+            if (e instanceof BufferExhaustedException)
+                this.metrics.sensor("buffer-exhausted-records").record();


Could we do this inside RecordAccumulator, which is where the "buffer-exhausted-records" sensor is defined?

@junrao , counter-proposal, if I may: can we maybe move the sensor definition to BufferPool and record the metric there, as that is where the exception is originally thrown.
In RecordsAccumulator we would have to define a new catch block just for this purpose, whereas BufferPool has a dedicated conditional ready to use.

Yes, that seems reasonable.

I've moved things around to this effect.

ping @junrao

junrao · 2020-04-03T01:44:55Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -947,6 +947,8 @@ private void throwIfProducerClosed() {
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
+            if (e instanceof BufferExhaustedException)


It's fine to keep compatibility for now. However, I do wonder if we should just throw ApiException to the caller since the producer can block for max.block.ms. This needs a KIP discussion.

I haven't looked at it, but do all ApiExceptions that may be caught here observe max.block.ms?
Or do you mean just throw ApiException in case the buffer is exhausted?

I was thinking that all retriable errors should observe max.block.ms if not already. For non-retirable errors, just throw the exception immediately.

I'll do some digging on this topic. But if I am not misunderstanding you, we can handle that in a separate change and merge this for now, right?

…here the exception originates. Reordered imports in PlaintextProducerSendTest.

junrao

@soenkeliebau : Thanks for the updated PR. One more comment below.

clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java

junrao

@soenkeliebau : Just a couple of more comments on unit tests.

junrao · 2020-04-10T22:12:23Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java

+    public void testBufferExhaustedExceptionIsThrown() throws Exception {
+        BufferPool pool = new BufferPool(2, 1, metrics, time, metricGroup);
+        pool.allocate(1, maxBlockTimeMs);
+        pool.allocate(2, maxBlockTimeMs);


This waits on real Timer. So waiting 2 secs in a unit test is too long. Perhaps try 10ms?

Happy to do that of course. The maxBlockTimeMs affects the entire class of tests though, I'd like to move that to a separate PR to make the change easier to trace in case tests become unstable due to this.

I ran a couple thousand tests with different values and for me testBlockTimeout became unstable with a value of 10ms - afaik can tell we are betting on a race condition in line 188 . We allocate three bytes, start delayed deallocations, wait a little and then hope that at least one deallocation took place by the time we check. Which worked for 2000 ms, but apparently breaks sometimes for 10 ms.
It is an easy fix by changing the condition to 7 instead of 8, but I'm not sure how much actual worth that assertion has after that.

Happy to discuss this further, but maybe we can first agree on if a new PR makes sense. I think it makes sense to separate this out tbh.

junrao · 2020-04-10T22:13:32Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java

@@ -171,14 +181,14 @@ public void testBlockTimeout() throws Exception {
        try {
            pool.allocate(10, maxBlockTimeMs);
            fail("The buffer allocated more memory than its maximum value 10");
-        } catch (TimeoutException e) {
+        } catch (BufferExhaustedException e) {


Similar to the above, perhaps reduce maxBlockTimeMs to 10ms?

thanks @junrao - I answered in your related comment above.

clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java

junrao

@soenkeliebau : Thanks for the reply. I agree that we can address maxBlockTimeMs in a separate PR. Could you file a jira for that? For this PR, LGTM

soenkeliebau · 2020-04-11T00:32:31Z

Thanks for your help Jun!

ijuma · 2020-08-02T15:16:45Z

@soenkeliebau did you file the JIRA @junrao mentioned above? It would be good to reference it here.

soenkeliebau · 2020-08-20T15:44:54Z

@ijuma apologies for the delayed response

I did indeed open that pull request and an accompanying issue.

The PR is merged by now, but it appears that I forgot to close the ticket, I'll do that now.

KAFKA-3720: Change TimeoutException to BufferExhaustedException and i…

e6fd25e

…ncrease buffer-exhausted-records metric when no memory can be allocated for a record withhin max.block.ms.

chia7712 reviewed Apr 1, 2020

View reviewed changes

Added comment on why BufferExhaustedException subclasses TimeoutExcep…

928b45d

…tion.

Fixed broken test. Had to refactor slightly, as verifySendFailure was…

714731e

… used to test two different failure scenarios that both used to throw TimeoutException, but now one of them throws BufferExhaustedException.

junrao reviewed Apr 3, 2020

View reviewed changes

Addressed comments from Jun: moved metrics recording to BufferPool, w…

8fba836

…here the exception originates. Reordered imports in PlaintextProducerSendTest.

soenkeliebau requested review from ijuma and junrao April 4, 2020 21:12

junrao reviewed Apr 9, 2020

View reviewed changes

clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java Show resolved Hide resolved

junrao reviewed Apr 10, 2020

View reviewed changes

junrao approved these changes Apr 10, 2020

View reviewed changes

junrao merged commit e032a36 into apache:trunk Apr 10, 2020

ijuma mentioned this pull request Aug 2, 2020

KAFKA-3720 : Deprecated BufferExhaustedException and also removed its use and the related sensor metric #1417

Closed

DogukanAltay mentioned this pull request May 22, 2022

Apache PR 8399 Applied DogukanAltay/kafka#2

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-3720: Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms #8399

KAFKA-3720: Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms #8399

soenkeliebau commented Mar 31, 2020

chia7712 Apr 1, 2020

soenkeliebau Apr 1, 2020

ijuma Apr 1, 2020

chia7712 Apr 1, 2020

soenkeliebau Apr 1, 2020

chia7712 commented Apr 1, 2020

chia7712 commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020 •

edited

Loading

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

junrao left a comment

junrao Apr 3, 2020

soenkeliebau Apr 3, 2020

junrao Apr 3, 2020

soenkeliebau Apr 3, 2020

soenkeliebau Apr 8, 2020

junrao Apr 3, 2020

soenkeliebau Apr 3, 2020 •

edited

Loading

junrao Apr 3, 2020

soenkeliebau Apr 3, 2020

junrao left a comment

junrao left a comment

junrao Apr 10, 2020

soenkeliebau Apr 10, 2020

junrao Apr 10, 2020

soenkeliebau Apr 10, 2020

junrao left a comment

soenkeliebau commented Apr 11, 2020

ijuma commented Aug 2, 2020

soenkeliebau commented Aug 20, 2020

KAFKA-3720: Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms #8399

KAFKA-3720: Change TimeoutException to BufferExhaustedException when no memory can be allocated for a record within max.block.ms #8399

Conversation

soenkeliebau commented Mar 31, 2020

Committer Checklist (excluded from commit message)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chia7712 commented Apr 1, 2020

chia7712 commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020 • edited Loading

ijuma commented Apr 1, 2020

soenkeliebau commented Apr 1, 2020

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soenkeliebau Apr 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

soenkeliebau commented Apr 11, 2020

ijuma commented Aug 2, 2020

soenkeliebau commented Aug 20, 2020

soenkeliebau commented Apr 1, 2020 •

edited

Loading

soenkeliebau Apr 3, 2020 •

edited

Loading