KAFKA-5886: Implement KIP-91 delivery.timeout.ms #3849

sutambe · 2017-09-13T17:40:32Z

First shot at implementing kip-91 delivery.timeout.ms. Summary

Added delivery.timeout.ms config. Default 120,000
Changed retries to MAX_INT.
batches may expire whether they are inflight or not. So muted is no longer used in RecordAccumulator.expiredBatches.
In some rare situations batch.done may be called twice. Attempted transitions from failed to succeeded are logged. Successful to successful is an error (exception as before). Other transitions (failed-->aborted, aborted-->failed) are ignored.
The old test from RecordAccumulatorTest is removed. It has three additional tests. testExpiredBatchSingle, testExpiredBatchesSize, testExpiredBatchesRetry. All of them test that expiry is independent of muted.

tedyu · 2017-09-13T20:14:46Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

-            abortRecordAppends();
-        return expired;
+    boolean maybeExpire(long deliveryTimeoutMs, long now) {
+        if (deliveryTimeoutMs < (now - this.createdMs)) {


Should <= be used ?

tedyu · 2017-09-13T20:16:27Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

-                            break;
-                        }
+            synchronized (dq) {
+                // iterate over the batches and expire them if they have been in the accumulator for more than requestTimeOut


deliveryTimeoutMs should be mentioned

apurvam · 2017-09-13T23:27:51Z

Thanks for the PR @sutambe . Looking over the changes, it seems that there are two cases from the KIP which don't seem to be covered:

Setting the pollTimeout to be the expiryTime of the oldest batch being sent in the produce request. I think we need this to make sure that we expire batches in a timely manner.
Related to the previous point, the current patch doesn't seem to expire inflight requests, which is another feature that the KIP seems to promise.

Have I missed something? Or are you planning on adding the functionality above?

becketqin

Thanks for the patch. Left some comments.

becketqin · 2017-09-13T18:15:47Z

clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java

+    /** <code>delivery.timeout.ms</code> */
+    public static final String DELIVERY_TIMEOUT_MS_CONFIG = "delivery.timeout.ms";
+    private static final String DELIVERY_TIMEOUT_MS_DOC = "An upper bound on the time to report success or failure to send a message. "
+                                                          + "The time is measured from the point when send returns. "


It seems better to say Producer.send() instead of send.

becketqin · 2017-09-13T18:18:06Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

@@ -76,13 +76,13 @@
    private String expiryErrorMessage;
    private boolean retry;

-    public ProducerBatch(TopicPartition tp, MemoryRecordsBuilder recordsBuilder, long now) {
-        this(tp, recordsBuilder, now, false);
+    public ProducerBatch(TopicPartition tp, MemoryRecordsBuilder recordsBuilder, long createTime) {


We are passing now everywhere else. Maybe we can just keep the argument name the same.

The actual argument is now. However, I like the formal argument name to be createTime because it's an immutable value while constructing a batch. now, is by definition, changing.

becketqin · 2017-09-14T03:06:17Z

clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java

@@ -232,6 +238,7 @@
                                .define(COMPRESSION_TYPE_CONFIG, Type.STRING, "none", Importance.HIGH, COMPRESSION_TYPE_DOC)
                                .define(BATCH_SIZE_CONFIG, Type.INT, 16384, atLeast(0), Importance.MEDIUM, BATCH_SIZE_DOC)
                                .define(LINGER_MS_CONFIG, Type.LONG, 0, atLeast(0L), Importance.MEDIUM, LINGER_MS_DOC)
+                                .define(DELIVERY_TIMEOUT_MS_CONFIG, Type.LONG, 120 * 1000, atLeast(0L), Importance.MEDIUM, DELIVERY_TIMEOUT_MS_DOC)


Should we validate the delivery.timeout.ms is greater than request.timeout.ms?

becketqin · 2017-09-14T03:31:27Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

@@ -164,25 +164,29 @@ public void abort(RuntimeException exception) {
     * @param exception The exception that occurred (or null if the request was successful)
     */
    public void done(long baseOffset, long logAppendTime, RuntimeException exception) {
-        final FinalState finalState;
-        if (exception == null) {
+        final FinalState tryFinalState = (exception == null) ? FinalState.SUCCEEDED : FinalState.FAILED;


It is probably cleaner to have an explicit EXPIRED state.

I did some digging around. An expired batch's final state is FAILED. I don't feel great about adding yet another finalState. We already have ABORTED and FAILED. The ProducerBatch.done will get even more complicated.

Maybe it's not a big deal but just want to call out that this is a behavior change. Currently the producer will throw exception when transition from FAILED state to another state due to some reason other than expiration. If we change this logic, we may miss those cases which are not failed by expiration but still got state update twice. It may not be that important if we do not have programming bugs.

Personally I think it is better to clearly define the states of the batches even if additional complexity is necessary.

The comments should probably also cover the force close case for completeness.

becketqin · 2017-09-14T03:46:07Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

-            if (this.finalState.get() == FinalState.ABORTED) {
-                log.debug("ProduceResponse returned for {} after batch had already been aborted.", topicPartition);
-                return;
+        if (!this.finalState.compareAndSet(null, tryFinalState)) {


The logic here probably needs more comments. We may have the following three cases that the state of a batch has been updated before the ProduceResponse returns:

A transaction abortion happens. The state of the batches would have been updated to ABORTED.

The producer is closed forcefully. The state of the batches would have been updated to ABORTED.

The batch is expired when it is in-flight. The state of the batch would have been updated to EXPIRED.

In the other cases, we should throw IllegalStateException.

Please review the updated method documentation.

becketqin · 2017-09-14T03:50:16Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

-                            // Stop at the first batch that has not expired.
-                            break;
-                        }
+            synchronized (dq) {


The batches still needs to be expired in order if max.in.flight.requests.per.connection is set to 1. So we probably still want to check if the partition is muted or not. That said, if we guarantee that when RecordAccumulator.expiredBatches() returns non-empty list, all the earlier batches have already been expired, we can remove the muted check here.

BTW, I did not see the logic of expiring an in-flight batch in the current patch. Am I missing something?

It's there now

@becketqin Added the muted check back

becketqin · 2017-09-14T03:50:53Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                Iterator<ProducerBatch> batchIterator = dq.iterator();
+                while (batchIterator.hasNext()) {
+                    ProducerBatch batch = batchIterator.next();
+                    boolean isFull = batch != lastBatch || batch.isFull();


isFull is no longer used.

Still not used.

apurvam · 2017-09-14T23:22:39Z

Heads up @sutambe , the following PR just got merged, and may have some conflicts with the current patch : #3743

There shouldn't be any impact on the logic, however.

ijuma · 2017-09-18T16:41:36Z

Friendly reminder that the feature freeze is this Wednesday.

becketqin · 2017-09-18T18:43:40Z

@ijuma Just want to check. Do you think this feature is a "minor" feature?

ijuma · 2017-09-19T04:31:01Z

@becketqin, it is possible to classify this as a minor feature, but the fact that it affects a core part of the Producer puts it in a bit of a grey area. If the PR is almost ready and we miss the feature freeze, my take is that it would be OK to merge it by the end of this week. Later than that and it seems a bit risky.

It's a bit worrying that the merge conflicts haven't been resolved since last week.

sutambe · 2017-09-19T15:52:00Z

@ijuma @becketqin I've an new PR but after a rebase I've to fix one more test. Working on that now.

ijuma · 2017-09-19T15:58:56Z

Thanks @sutambe!

sutambe · 2017-09-19T16:44:02Z

@apurvam It's not clear to me why testExpiryOfFirstBatchShouldCauseResetIfFutureBatchesFail and testExpiryOfFirstBatchShouldNotCauseUnresolvedSequencesIfFutureBatchesSucceed are failing. It looks like a batch that should be in IncompleteBatches isn't there. Any thoughts?

tedyu · 2017-09-19T17:53:22Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

@@ -271,7 +276,8 @@ private long sendProducerData(long now) {
            }
        }

-        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);
+        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.deliveryTimeoutMs, now);
+        boolean needsTransactionStateReset = false;


This variable can be dropped.

tedyu · 2017-09-19T21:08:10Z

I added the following to testExpiryOfFirstBatchShouldCauseResetIfFutureBatchesFail before the first sender.run() call

        Sender sender = new Sender(logContext, client, metadata, this.accumulator, true, MAX_REQUEST_SIZE, ACKS_ALL, 10,
            new Metrics(), new SenderMetricsRegistry(), time, REQUEST_TIMEOUT, DELIVERY_TIMEOUT, 50, transactionManager, apiVersions);

The test still fails.

apurvam · 2017-09-19T23:10:16Z

@sutambe where are those tests failing? The latest PR builder suggests that the clients and core tests all passed.

sutambe · 2017-09-19T23:16:08Z

@apurvam @ijuma @becketqin The Sender and RecordAccumulator are passing now. The failing tests are connect tests that are irrelevant.

apurvam · 2017-09-19T23:21:58Z

@sutambe I don't think the test failures are irrelevant since the same 3 tests failed in all the runs. Further, the cause of the failure is:

java.lang.AssertionError: 
  Unexpected method call Listener.onFailure(job-0, org.apache.kafka.common.KafkaException: Failed to construct kafka producer):

I think their mocks may need to be updated to take account of the new configs and attendant ConfigExceptions

tedyu · 2017-09-19T23:24:22Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

        List<ProducerBatch> expiredBatches = new ArrayList<>();
+
+        // Expiration of inflight batches happens here in the order of createdMs if the final state of the
+        // batch in not known by (batch's creation time + deliveryTimeoutMs).


in not -> is not

tedyu · 2017-09-19T23:27:37Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+            Map.Entry<Long, List<ProducerBatch>> expiredInflightBatches = inFlightBatches.pollFirstEntry();
+            for (ProducerBatch batch: expiredInflightBatches.getValue()) {
+                if (batch.getFinalState() == null &&
+                    batch.maybeExpire(deliveryTimeoutMs, now)) {


The check 'if (deliveryTimeoutMs <= (now - this.createdMs))' inside maybeExpire() would be true.
Looks like another method can be created inside ProducerBatch which expires the batch.

maybeExpire has a side-effect of setting errorMessage internally. Hence calling it again in if.

Understand.
That part can be refactored - goal is to reduce unnecessary comparison.

@apurvam Those test don't even compile or run on my machine. What's up with those tests?

They can't construct a kafka producer with the changes made in this PR.

Assuming nFlightBatches is a TreeSet suggested above, this code can be simplified to:

while (!inFlightBatches.isEmpty() && inFlightBatches.first().maybeExpire(deliveryTimeoutMs, now)) { expiredBatches.add(inFlightBatches.pollFirst()); }

becketqin

@sutambe Thanks for updating the patch. Made a pass on the non-test file and left some comments. Will review the tests tomorrow. We may need to have some quick turnaround to get this into 1.0.0.

becketqin · 2017-09-20T04:01:43Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

            } else {
+                // A SUCCESSFUL batch must not succeed again.


Is this comment accurate? The new state is not necessarily SUCCEEDED.

becketqin · 2017-09-20T04:04:58Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

@@ -164,25 +164,29 @@ public void abort(RuntimeException exception) {
     * @param exception The exception that occurred (or null if the request was successful)
     */
    public void done(long baseOffset, long logAppendTime, RuntimeException exception) {
-        final FinalState finalState;
-        if (exception == null) {
+        final FinalState tryFinalState = (exception == null) ? FinalState.SUCCEEDED : FinalState.FAILED;


Maybe it's not a big deal but just want to call out that this is a behavior change. Currently the producer will throw exception when transition from FAILED state to another state due to some reason other than expiration. If we change this logic, we may miss those cases which are not failed by expiration but still got state update twice. It may not be that important if we do not have programming bugs.

Personally I think it is better to clearly define the states of the batches even if additional complexity is necessary.

The comments should probably also cover the force close case for completeness.

becketqin · 2017-09-20T04:33:40Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java

-     *     <li> the batch is not in retry AND request timeout has elapsed after it is ready (full or linger.ms has reached).
-     *     <li> the batch is in retry AND request timeout has elapsed after the backoff period ended.
-     * </ol>
+     * Expire the batch if it no outcome is known in within delivery.timeout.ms.


Some typos in this comments. "Expire the batch if no outcome is known within delivery.timeout.ms"

becketqin · 2017-09-20T04:44:01Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

@@ -85,6 +90,10 @@
    private int drainIndex;
    private final TransactionManager transactionManager;

+    // A mapping of batch's creation time to the batch for quick of the oldest batch
+    // and lookup in the order of creation
+    private final NavigableMap<Long, List<ProducerBatch>> inFlightBatches;


Does this have to be a per partition Map? Intuitively we just need a TreeSet<ProducerBatch> with a comparator?

Apparently the my understanding of TreeSet is not accurate. It uses the comparator to decide whether the entries are the same or not. We can use a TreeMap<Long, Set> then. We may also want to bucket the timestamp a little bit, say one second to avoid huge amount of Sets created for each ms in the TreeMap.

I was thinking about this too. Using millisecond as unit for Map key is not prudent.

After the switch to second as unit, we may need to check the two adjacent buckets keyed by ts-1 (sec) and ts+1 (sec).

As we discussed, TreeSet does not cut it. The naming is consistent. A TreeSet is a set. It's just that equality criterion is different.

becketqin · 2017-09-20T04:45:39Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+            Map.Entry<Long, List<ProducerBatch>> expiredInflightBatches = inFlightBatches.pollFirstEntry();
+            for (ProducerBatch batch: expiredInflightBatches.getValue()) {
+                if (batch.getFinalState() == null &&
+                    batch.maybeExpire(deliveryTimeoutMs, now)) {


Assuming nFlightBatches is a TreeSet suggested above, this code can be simplified to:

while (!inFlightBatches.isEmpty() && inFlightBatches.first().maybeExpire(deliveryTimeoutMs, now)) { expiredBatches.add(inFlightBatches.pollFirst()); }

becketqin · 2017-09-20T04:50:43Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

     */
    public void reenqueue(ProducerBatch batch, long now) {
        batch.reenqueued(now);
+        List<ProducerBatch> inflight = inFlightBatches.get(batch.createdMs);


This logic would become inFlightRequests.remove(batch) when a TreeSet is used for this.

becketqin · 2017-09-20T04:52:44Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

@@ -572,6 +616,16 @@ public boolean hasUndrained() {
                                        batch.close();
                                        size += batch.records().sizeInBytes();
                                        ready.add(batch);
+                                        // Put this batch in the list of inflight batches because we might have


This would be just inFlightBatches.add(batch)

becketqin · 2017-09-20T04:54:13Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

@@ -586,6 +640,10 @@ public boolean hasUndrained() {
        return batches;
    }

+    public Long getEarliestDeliveryTimeoutMs() {


We usually just use earliestDeliveryTimeout in Kafka.

becketqin · 2017-09-20T04:58:47Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

@@ -132,6 +135,7 @@ public Sender(LogContext logContext,
                  SenderMetricsRegistry metricsRegistry,
                  Time time,
                  int requestTimeout,
+                  long deliveryTimeoutMs,


It seems we don't need the deliveryTimeoutMs in the sender. It is only used as an argument passed to the accumulator. But the accumulator already has the config.

becketqin · 2017-09-20T05:06:51Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

@@ -271,7 +276,7 @@ private long sendProducerData(long now) {
            }
        }

-        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);
+        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.deliveryTimeoutMs, now);


It seems an existing issue. When we expire the batches here. The memory of those batches will be deallocated. It seems that we will deallocate the same batch again when the ProduceResponse returns?

apurvam · 2017-09-21T01:24:08Z

@sutambe I had a look at the failing Sender expiry tests. What is happening is that the tests are not modified to account for the fact that the inflight batches can be expired. In the tests, we used to expire a batch sitting in the accumulator but not the inflight batch. When the inflight batch returned, it would be re queued.

But now, the test sends the response for the inflight batch, but when it goes to requeue, it discovers that there shouldn't be an inflight request an raises an exception.

The tests should be updated to account for the new behavior and make sure that the inflight batch is not expired.

apurvam · 2017-09-21T01:26:25Z

Actually, the test reveals a bug in the current patch: the response for the inflight batch which expired is not being handled correctly. We should not be trying to requeue it to start with.

So we need two tests: one where the inflight batch is not expired, and the current case. The reenqueue logic in the sender needs to be updated to not reenqueue the expired batches.

becketqin

@sutambe Thanks for updating the batch. A few comments:

for a batch that is got expired prematurely, we should not reqenqueu the batch. (as @apurvam noticed) and we should not double deallocate the memory.
There are a few review comments before that are not addressed yet. (such as unused local variables)
We may want to revisit some of the tests and see if they still make sense.
It would be good to add more unit tests to the patch. More specifically, we may want to have the following tests:

Test a batch is correctly inserted into the in.flight.batches if needed. And not inserted if not needed.
Test the callback of an expired batch is fired in time when it is in-flight/not in-flight
Test when expire an in-flight batch, we still wait for the request to finish before sending the next batch.
Test we are not going to retry an already expired batch.
Test when batch is expired prematurely, the buffer pool is only deallocated after the response is returned. (because we are still holding the batch until the response is returned)

becketqin · 2017-09-21T01:50:36Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/ProducerBatchTest.java

-     * time is interpreted correctly as not expired when the linger time is larger than the difference
-     * between now and create time by {@link ProducerBatch#maybeExpire(int, long, long, long, boolean)}.
+     * A {@link ProducerBatch} configured using a timestamp preceding its create time is interpreted correctly
+     * as not expired by {@link ProducerBatch#maybeExpire(long, long)}.
     */
    @Test
    public void testLargeLingerOldNowExpire() {


This test has nothing to do with linger.ms anymore...

We should change the test name to something like testBatchExpiration. and the test below to testBatchExpirationAfterReenqueue.

becketqin · 2017-09-21T01:51:43Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/ProducerBatchTest.java

-     * preceding its create time is interpreted correctly as not expired when the retryBackoff time is larger than the
-     * difference between now and create time by {@link ProducerBatch#maybeExpire(int, long, long, long, boolean)}.
+     * A {@link ProducerBatch} configured using a timestamp preceding its create time is interpreted correctly
+     * as not expired by {@link ProducerBatch#maybeExpire(long, long)}.
     */
    @Test
    public void testLargeRetryBackoffOldNowExpire() {


Similar to above we should rename this.

becketqin · 2017-09-21T01:52:35Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java

+        long lingerMs = Integer.MAX_VALUE / 16;
+        long retryBackoffMs = Integer.MAX_VALUE / 8;
+        int requestTimeoutMs = Integer.MAX_VALUE / 4;
+        long deliveryTimoutMs = Integer.MAX_VALUE;


typo: timeout

The typo is still there.

becketqin · 2017-09-21T01:53:02Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/RecordAccumulatorTest.java

        long lingerMs = 3000L;
-        int requestTimeout = 60;
+        long deliveryTimoutMs = 3200L;


typo: timeout

becketqin · 2017-09-21T02:19:02Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                // If the final state of the batch is known (success or failed) just skip past it because
+                // the callback has already been invoked. This allows the clean up of the map in bulk right here.
+                if (batch.finalState() == null) {
+                    batch.expire(now);


Should we still expire the batches when they are expired instead of expiring all the bucket? Having a second granularity bucket does not prevent us from doing that, right?

sutambe · 2017-12-20T23:47:57Z

@apurvam @becketqin I updated the implementation to use ConcurrentMap<TopicPartition, Deque<ProducerBatch>>. Please take a look. I don't see the above test failures on my machine.

becketqin

Thanks for updating the patch. Left some comments.

becketqin · 2018-01-04T06:27:34Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                Iterator<ProducerBatch> batchIterator = dq.iterator();
+                while (batchIterator.hasNext()) {
+                    ProducerBatch batch = batchIterator.next();
+                    boolean isFull = batch != lastBatch || batch.isFull();


Still not used.

becketqin · 2018-01-04T06:34:00Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

@@ -85,12 +89,21 @@
    private int drainIndex;
    private final TransactionManager transactionManager;

+    // A per-partition queue of batches ordered by creation time for quick access of the oldest batch
+    private final ConcurrentMap<TopicPartition, PriorityQueue<ProducerBatch>> soonToExpireInFlightBatches;


We don't need a PriorityQueue for this because the batches in the RecordAccumulator is already in order. So we just need to keep the draining order.

becketqin · 2018-01-04T06:39:32Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+    private void maybeRemoveFromSoonToExpire(ProducerBatch batch) {
+        PriorityQueue<ProducerBatch> soonToExpireQueue = soonToExpireInFlightBatches.get(batch.topicPartition);
+        if (soonToExpireQueue != null) {
+            soonToExpireQueue.remove(batch);


If we always insert the batch to the inFlightBatches queue and there is no bug, the batch to be removed should always be the first batch. Can we assert on that?

becketqin · 2018-01-04T06:44:30Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                                        // If the expiry is farther than requestTimeoutMs, we don't have to keep
+                                        // track of this batch because it will either succeed or fail (due to
+                                        // request timeout) much sooner.
+                                        if (batch.soonToExpire(deliveryTimeoutMs, now, requestTimeoutMs)) {


The original reason we have this optimization is because we used to have a big sorted data structure. So avoiding inserting elements to it makes sense. Given that now the batch order in the RecordAccumulator is already guaranteed. It seems we can just put all the drained batches to the inFlightBatches queue, which is simpler.

becketqin · 2018-01-04T07:41:33Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                                // We assume that the batch at the front of the deque will always be the next to expire.
+                                // This may not be true if max.in.flight.requests.per.connection > 1 and retries happen.
+                                // Watch for overflow in createdMs + deliveryTimeoutMs when deliveryTimeoutMs is Long.MAX_VALUE
+                                nextBatchExpiryTimeMs = (first.createdMs + deliveryTimeoutMs < 0) ? nextBatchExpiryTimeMs


The while loop may break if the request size has reached. So there is no guarantee that it will iterate over all the partitions. One alternative is to find the nextBatchExpiryTimeMs in the expireBatches.

becketqin · 2018-01-04T07:50:21Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+    void maybeUpdateNextExpiryTime(long now) {
+        if (now >= nextBatchExpiryTimeMs) {
+            if (soonToExpireInFlightBatches.isEmpty()) {
+                nextBatchExpiryTimeMs = Long.MAX_VALUE;


It seems intuitively this should be the earliest batch in the entire record accumulator?

becketqin · 2018-01-04T08:36:38Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

@@ -277,7 +277,7 @@ private long sendProducerData(long now) {
            }
        }

-        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);
+        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);


It seems we may release the memory for the expired batches before the response is returned. This means the underneath ByteBuffer is still referred by the ProducerBatch instance in the inFlightRequests. I am not sure if this would cause any problem, but it seems a little dangerous.

becketqin · 2018-01-04T08:50:18Z

clients/src/test/java/org/apache/kafka/clients/producer/internals/SenderTest.java

+        sender.run(time.milliseconds());  // send request
+        assertEquals(1, client.inFlightRequestCount());
+
+        Map<TopicPartition, ProduceResponse.PartitionResponse> responseMap = new HashMap<>();


Is the response preparation needed in this case?

apurvam · 2018-01-16T21:42:58Z

retest this please

apurvam · 2018-01-17T19:32:17Z

retest this please

apurvam · 2018-01-17T23:08:17Z

So the org.apache.kafka.clients.producer.internals.SenderTest.testMetadataTopicExpiry test has failed twice in a row with:

java.lang.ArrayIndexOutOfBoundsException
	at java.base/java.util.zip.CRC32C.update(CRC32C.java:151)
	at org.apache.kafka.common.utils.Checksums.update(Checksums.java:42)
	at org.apache.kafka.common.utils.Crc32C.compute(Crc32C.java:72)
	at org.apache.kafka.common.record.DefaultRecordBatch.writeHeader(DefaultRecordBatch.java:468)
	at org.apache.kafka.common.record.MemoryRecordsBuilder.writeDefaultBatchHeader(MemoryRecordsBuilder.java:357)
	at org.apache.kafka.common.record.MemoryRecordsBuilder.close(MemoryRecordsBuilder.java:311)
	at org.apache.kafka.clients.producer.internals.ProducerBatch.close(ProducerBatch.java:427)
	at org.apache.kafka.clients.producer.internals.RecordAccumulator.drain(RecordAccumulator.java:614)
	at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:270)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
	at org.apache.kafka.clients.producer.internals.SenderTest.testMetadataTopicExpiry(SenderTest.java:473)

Given that these changes are on the same code, and given the consistent failure of this test, it is probably a regression. @sutambe can you reproduce the failure locally?

apurvam · 2018-01-17T23:08:58Z

Just looking at the stack trace and the test, it may be that an expired batch is being closed twice in some cases.

hachikuji · 2018-02-24T22:43:34Z

@sutambe @becketqin It would be nice to unblock this. Can someone else pick up the work?

becketqin · 2018-03-05T18:02:09Z

@hachikuji Yeah, this has been pending for too long. I have spoken to @sutambe and he said he still wants to finish the patch. He will figure out the ETA and see if that works.

bbejeck · 2018-05-01T15:56:47Z

@sutambe @becketqin is there any update on the status of this PR? It would be great if we could get this in the next release.

Ishiihara · 2018-05-09T01:18:58Z

@guozhangwang @junrao @bbejeck @becketqin We also hit this issue when running Kafka Streams library with some high volume output topics. It would be nice to get this moving and push it to the next release.

radai-rosenblatt · 2018-05-09T01:35:41Z

becket cant load this page for some reason (some weird issue with his github profile?).
we are ok with you taking over this patch.

sutambe · 2018-05-09T03:28:23Z

@bbejeck @apurvam @becketqin @hachikuji I don't have any update since Dec last year. Sorry, the work has stalled and it has been very hard to find cycles for this effort. I don't mind if Confluent wants to take this effort forward. Better later than never.

Avoiding overflow when deliveryTimeoutMs is MAX_VALUE per-partition map for tracking soon to expire batches Updated tests

Ishiihara · 2018-05-09T06:41:26Z

cc @abbccdda @yuyang08

yuyang08 · 2018-05-31T17:49:33Z

@sutambe i made some change based on your pull request to fix style check and test failure. do yo mind I amend the change to this pull request? cc @becketqin @apurvam @hachikuji
https://github.com/yuyang08/kafka/commit/69fc79a91d0556408c8037649f1e03aa56206ef2

guozhangwang · 2018-05-31T22:22:49Z

@yuyang08 I'd suggest you create your own PR against apache kafka trunk and let other reviewers to continue reviewing that one.

yuyang08 · 2018-06-01T05:18:09Z

@guozhangwang sure. will create a separate pull request

yuyang08 · 2018-06-21T23:12:09Z

@guozhangwang @apurvam @becketqin created new pr #5270 for KAFKA-5886

ijuma · 2019-02-18T19:07:09Z

This has been merged via a different PR, closing.

…ache#3849) This issue has been there for multiple years. Also adjust the logging to only include overridden topic configs, I _think_ this behavior changed unintentionally as part of the kraft work (and made the original issue worse). Unit test included and also tested manually. Reviewer: Alok Nikhil <anikhil@confluent.io>, Kowshik Prakasam <kprakasam@confluent.io>

sutambe changed the title ~~Implement KIP-91 delivery.timeout.ms~~ KAFKA-5886: Implement KIP-91 delivery.timeout.ms Sep 13, 2017

tedyu reviewed Sep 13, 2017

View reviewed changes

becketqin reviewed Sep 14, 2017

View reviewed changes

sutambe force-pushed the kip91 branch from 43fa462 to 8823134 Compare September 19, 2017 16:18

tedyu reviewed Sep 19, 2017

View reviewed changes

sutambe force-pushed the kip91 branch 4 times, most recently from 00145bf to 9d8b7ea Compare September 19, 2017 20:27

sutambe force-pushed the kip91 branch from 9d8b7ea to 9ad558c Compare September 19, 2017 22:39

tedyu reviewed Sep 19, 2017

View reviewed changes

becketqin reviewed Sep 20, 2017

View reviewed changes

sutambe force-pushed the kip91 branch from 9ad558c to 26513c0 Compare September 20, 2017 23:51

becketqin reviewed Sep 21, 2017

View reviewed changes

becketqin reviewed Jan 4, 2018

View reviewed changes

Rebasing KIP-91 delivery.timeout.ms for kafka 1.1.0

588e26b

Avoiding overflow when deliveryTimeoutMs is MAX_VALUE per-partition map for tracking soon to expire batches Updated tests

sutambe force-pushed the kip91 branch from a5b06ed to 588e26b Compare May 9, 2018 05:17

yuyang08 mentioned this pull request Jun 21, 2018

KAFKA-5886: Introduce delivery.timeout.ms producer config (KIP-91) #5270

Merged

ijuma closed this Feb 18, 2019

KAFKA-5886: Implement KIP-91 delivery.timeout.ms #3849

KAFKA-5886: Implement KIP-91 delivery.timeout.ms #3849

Conversation

sutambe commented Sep 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apurvam commented Sep 13, 2017

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sutambe Sep 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apurvam commented Sep 14, 2017

ijuma commented Sep 18, 2017

becketqin commented Sep 18, 2017

ijuma commented Sep 19, 2017

sutambe commented Sep 19, 2017 • edited

ijuma commented Sep 19, 2017

sutambe commented Sep 19, 2017

Choose a reason for hiding this comment

tedyu commented Sep 19, 2017

apurvam commented Sep 19, 2017

sutambe commented Sep 19, 2017

apurvam commented Sep 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sutambe Sep 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apurvam commented Sep 21, 2017

apurvam commented Sep 21, 2017 • edited

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sutambe commented Dec 20, 2017

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sutambe commented Sep 13, 2017 •

edited

sutambe Sep 14, 2017 •

edited

sutambe commented Sep 19, 2017 •

edited

sutambe Sep 21, 2017 •

edited

apurvam commented Sep 21, 2017 •

edited

yuyang08 commented May 31, 2018 •

edited