KAFKA-4586; Add purgeDataBefore() API (KIP-107) #2476

lindong28 · 2017-02-01T05:11:37Z

No description provided.

asfbot · 2017-02-01T06:01:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1398/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-01T06:08:27Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1395/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-02-01T07:26:09Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1395/
Test FAILed (JDK 8 and Scala 2.12).

lindong28 · 2017-02-02T05:26:00Z

@ijuma @ewencp @jjkoshy @radai-rosenblatt I have manually tested it successfully using the ./bin/kafka-purge-data.sh in the patch. I am going to add unit test and probably ducktape integration test as well. But the core code should be ready for review. Would you have time to review the patch?

asfbot · 2017-02-02T05:55:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1422/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-02T05:56:03Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1419/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-02-02T06:34:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1419/
Test PASSed (JDK 8 and Scala 2.12).

ijuma · 2017-02-02T22:13:55Z

@lindong28, thanks for the PR. I probably won't have time to review before next week. cc @junrao as well since he reviewed the KIP.

asfbot · 2017-02-10T23:07:09Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1639/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-02-10T23:48:34Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1640/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-02-11T00:24:26Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1637/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-02-11T01:18:26Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1637/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-02-13T03:56:53Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1656/
Test FAILed (JDK 8 and Scala 2.11).

lindong28 · 2017-02-13T03:58:27Z

@jjkoshy @junrao @becketqin @ijuma @radai-rosenblatt I have added tests and the patch is fully ready for review. Would you have time to review this patch?

asfbot · 2017-02-13T04:46:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1653/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-02-13T05:24:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1653/
Test PASSed (JDK 8 and Scala 2.12).

becketqin · 2017-02-13T10:06:50Z

@lindong28 Thanks for the patch. It seems the patch has conflicts. Could you rebase?

lindong28 · 2017-02-13T18:04:15Z

@becketqin I thought it will take 1+ week for the patch to be reviewed and there will be conflict again anyway. Thus I was going to rebase it after first round of review. What is our general guideline for rebasing big patches? I can certainly rebase it now if you think it is useful.

lindong28 · 2017-02-13T21:16:17Z

@becketqin All conflicts have been resolved and all tests are passed. Thanks!

asfbot · 2017-02-13T21:19:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1664/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-02-13T22:03:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1667/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-02-13T22:04:37Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1664/
Test PASSed (JDK 8 and Scala 2.12).

becketqin · 2017-02-14T00:24:45Z

@lindong28 Thanks for updating the patch. I'll take a look. Usually if there are multiple big patches in parallel, the committers who are reviewing the code would hold back some of the patches to avoid unnecessary rebase.

becketqin

@lindong28 Thanks for the patch. Left some comments.
It looks the patch again has conflicts with trunk. To avoid frequent rebase, let's finish a few iteration of reviews before we rebase.

becketqin · 2017-02-21T00:13:51Z

core/src/main/scala/kafka/admin/AdminClient.scala

+    if (!errors.isEmpty)
+      error(s"Metadata request contained errors: $errors")
+
+    val (authorizedPartitions, unauthorizedPartitions) = purgeOffsets.partition{partitionAndOffset =>


The variable names seem a little misleading. Are all partitions without leader information unauthorized?

Agree. I have updated the names.

becketqin · 2017-02-21T00:16:05Z

core/src/main/scala/kafka/admin/AdminClient.scala

+      response.cluster().leaderFor(partitionAndOffset._1) != null}
+
+    val unauthorizedPartitionResults = unauthorizedPartitions.mapValues( _ =>
+      PurgeDataResult(PurgeResponse.INVALID_LOW_WATERMARK, Errors.UNKNOWN_TOPIC_OR_PARTITION.exception()))


Should we use the exception returned by the broker in this case?

Good point. I have updated code to use error from MetadataRequest.

becketqin · 2017-02-21T00:48:53Z

core/src/main/scala/kafka/admin/AdminClient.scala

+    val observedResults = futures.flatMap{ future =>
+      val elapsed = time.milliseconds() - start
+      remaining = timeoutMs - elapsed
+      if (remaining > 0 && client.poll(future, remaining)) future.value()


When client.poll(future, remaining) returns true, the future may either contains a value (succeeded) or an error (failed). If the future has an error, calling future.value() will throw exception. It seems better if we can return the full results to the users even if some of the requests failed so the users will be able to know which partitions has failed to purge.

This should not be a problem for purgeDataBefore(), because those futures provided to the CompositeFuture has been constructed in such a way that they never raises exception. Those future will call future.complete(result) in case of onFailure, where result indicates has the error information.

I agree it would be make CompositeFuture more useful if this class handles the logic of converting error to result and return the full results to user as you suggested. But I don't have a good way to do it now because CompositeFuture doesn't know the type of the return value -- it currently use template T.

becketqin · 2017-02-21T00:55:27Z

core/src/main/scala/kafka/admin/AdminClient.scala

+    }
+
+    // send requests over network to brokers
+    client.poll(0)


It seems a single consumerNetworkClient.poll(0) cannot guarantee all the requests are sent out. Also, the interface might be a little weird that after purgeDataBefore() is returned the users have to keep calling future.client.poll() otherwise the futures will not be completed. I am wondering how would user use the asynchronous purge in this case? At very least we should document this clearly.

Great point. I have updated the AdminClient to create its own thread to do client.poll(retryBackoffMs). I find it necessary for AdminClient to have its own thread in order to support both syn and async operation.

I have also added testLogStartOffsetAfterAsyncPurge() to validate the asyn purge operation.

becketqin · 2017-02-21T01:00:08Z

clients/src/main/java/org/apache/kafka/common/requests/PurgeRequest.java

+
+        @Override
+        public String toString() {
+            StringBuilder builder = new StringBuilder();


This could just be a string concatenation.

Is there any negative impact to use StringBuilder as compared to string concatenation? Using StringBuilder here allows us to have the same code style as toString() of other requests such as ProduceRequest and LeaderAndIsrRequest.

Not much difference, just for readability (See below) we can keep them the same as other requests.
http://stackoverflow.com/questions/1532461/stringbuilder-vs-string-concatenation-in-tostring-in-java

becketqin · 2017-02-21T01:38:43Z

core/src/main/scala/kafka/cluster/Partition.scala

+   * Return low watermark of the partition.
+   */
+  def purgeRecordsOnLeader(offset: Long): Long = {
+    inReadLock(leaderIsrUpdateLock) {


Should this be in write lock?

I think we should use readlock since this method doesn't update leader or isr of the partition, right?

Yes, you are right. Read lock here is fine.

becketqin · 2017-02-21T01:47:26Z

core/src/main/scala/kafka/log/Log.scala

+   */
+  def maybeIncrementLogStartOffsetAndPurge(offset: Long) {
+    // We don't have to write the log start offset to log-start-offset-checkpoint immediately.
+    // The purgeOffset may be lost only if all replicas of this broker are shutdown


Is this comment accurate?

It was not accurate. I have updated the comment to replace all replicas of this broker to in-sync replicas of this broker

becketqin · 2017-02-21T01:54:33Z

core/src/main/scala/kafka/server/ReplicaManager.scala

+    val localPurgeResults = purgeOnLocalLog(offsetPerPartition, metadataCache)
+    debug("Purge on local log in %d ms".format(time.milliseconds - sTime))
+
+    val purgeStatus = localPurgeResults.map { case (topicPartition, result) =>


Nit: could be mapValues.

I validated that we can not use mapValues() here. This is because mapValues() returns a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements. As a result status.acksPending = true in the constructor of DelayedPurge becomes no-op.

becketqin · 2017-02-21T02:06:32Z

core/src/main/scala/kafka/cluster/Partition.scala

      tryCompleteDelayedRequests()
  }

+  def checkLowWatermarkReachOffset(requiredOffset: Long): (Boolean, Errors, Long) = {


Methods with three tuples as return value may be a little hard to follow, may be we can create a case class. At very least we should document each field of the return value.

Good point. I have updated the code to make it more readable.

becketqin · 2017-02-21T02:22:07Z

clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java


    // Default throttle time
    private static final int DEFAULT_THROTTLE_TIME = 0;
+    // Default low watermark
+    private static final long DEFAULT_LOG_START_OFFSET = 0L;


Do we want to distinguish between NO_LOG_START_OFFSET v.s. LOG_START_OFFSET = 0? Is it clearer to define the NO_LOG_START_OFFSET as -1?

I don't think it is necessary to distinguish between NO_LOG_START_OFFSET v.s. LOG_START_OFFSET = 0. Is there any use-case for NO_LOG_START_OFFSET?

In general, we want to identify the state of the system as clear as possible. The follower should not take any action if the LOG_START_OFFSET on the broker is NO_LOG_START_OFFSET. But if the follower sees the leader returning the starting offset = 0 while the actual starting offset on the leader is not, this introduces confusion.

lindong28 · 2017-02-21T07:59:37Z

@becketqin Thanks so much for taking time to review the patch! Can you check if the updated patch has addressed your comments?

asfbot · 2017-02-21T09:22:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1771/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T01:05:06Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2433/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T01:40:13Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2437/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T01:45:22Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2440/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T01:48:54Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2436/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T01:51:08Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2436/
Test PASSed (JDK 8 and Scala 2.12).

junrao · 2017-03-28T03:56:11Z

It seems that right now, for a compacted topic, the base offset of the first segment is always 0. So, the patch is fine.

junrao · 2017-03-28T03:56:48Z

@lindong28 : Thanks for the patch. LGTM. @becketqin : Do you want to make another pass and then merge?

…eted

becketqin

@lindong28 Thanks for the patch. LGTM except a very rare corner case.

becketqin · 2017-03-28T03:17:00Z

clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java

+                                                                      new Field("timeout", INT32, "The maximum time to await a response in ms."));
+
+    public static final Schema DELETE_RECORDS_RESPONSE_PARTITION_V0 = new Schema(new Field("partition", INT32, "Topic partition id."),
+                                                                                 new Field("low_watermark", INT64, "Smallest available offset"),


Can we be more clear on this field. In the FetchResponse we have log_start_offset which have almost the same comment. Maybe here we can say "The smallest available offset across all live replicas."

Sure. Updated now.

becketqin · 2017-03-28T03:17:44Z

clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java

+
+    public static final Schema DELETE_RECORDS_RESPONSE_PARTITION_V0 = new Schema(new Field("partition", INT32, "Topic partition id."),
+                                                                                 new Field("low_watermark", INT64, "Smallest available offset"),
+                                                                                 new Field("error_code", INT16, "The error code for the given topic."));


for the given topic => for the given partition.

Good catch. Fixed now.

becketqin · 2017-03-28T04:49:22Z

core/src/main/scala/kafka/log/LogManager.scala

    val recoveryPoints = this.logsByDir.get(dir.toString)
    if (recoveryPoints.isDefined) {
      this.recoveryPointCheckpoints(dir).write(recoveryPoints.get.mapValues(_.recoveryPoint))
    }
  }

  /**
+   * Checkpoint log start offset for all logs in provided directory.
+   */
+  private def checkpointLogStartOffsetsInDir(dir: File): Unit = {


There seems a very rare case that may result in message loss. Assuming there is only one replica, consider the following sequence:

User deletes a topic, we are not deleting the log starting offset from the checkpoint file.

If the topic is created again with the same name and the partitions happen to be on the same broker.

user produced some messages and before the log starting offset is checkpointed, the broker went down.

Now when the broker restarts, the old checkpointed log starting offset may be applied to the newly created topic, which may cause the messages that have been produced into the log to be unavailable to the users.

This is a very rare corner case, though.

Good point. I fixed the problem by always do checkpointLogStartOffsetsInDir(removedLog.dir.getParentFile) when a partition is deleted. The overhead will probably be smaller than checkpointing the cleaner offset which we already do everytime we delete a partition.

asfbot · 2017-03-28T05:56:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2443/
Test PASSed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T05:57:37Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2439/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T06:25:57Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2439/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-03-28T08:24:18Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2444/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2017-03-28T10:14:55Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2440/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-03-28T10:15:05Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2440/
Test FAILed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T13:49:21Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2456/
Test PASSed (JDK 7 and Scala 2.10).

asfbot · 2017-03-28T13:51:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2456/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2017-03-28T14:29:51Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2460/
Test FAILed (JDK 8 and Scala 2.11).

lindong28 · 2017-03-28T16:32:15Z

@junrao @becketqin All integration tests have passed except org.apache.kafka.connect.runtime.WorkerTest.testAddRemoveTask with Scala 2.11. I don't think this is caused by this patch because this test passed all other recent tests. Thanks so much for taking time to review this patch!

becketqin · 2017-03-28T17:08:36Z

@lindong28 Thanks for updating the patch. Merged to trunk.
@junrao @ijuma Thanks for the review!

lindong28 force-pushed the KAFKA-4586 branch from 3546fcc to e8449e7 Compare February 2, 2017 05:04

lindong28 force-pushed the KAFKA-4586 branch from 1146e74 to ed11672 Compare February 10, 2017 23:07

lindong28 force-pushed the KAFKA-4586 branch from ed11672 to 478300f Compare February 13, 2017 03:56

lindong28 force-pushed the KAFKA-4586 branch from 478300f to 74b1229 Compare February 13, 2017 21:15

becketqin reviewed Feb 21, 2017

View reviewed changes

lindong28 force-pushed the KAFKA-4586 branch from 603e563 to 27227dc Compare March 28, 2017 00:57

lindong28 added 5 commits March 27, 2017 22:02

KAFKA-4586; Add deleteRecordsBefore() API

68c2317

Rebase on trunk

5e67fb4

Minor variable name changes

684e1b9

Fix random test failure

de4faaa

Remove replica's log start offset from checkpoint file when it is del…

88c55a3

…eted

becketqin reviewed Mar 28, 2017

View reviewed changes

lindong28 force-pushed the KAFKA-4586 branch from 27227dc to 88c55a3 Compare March 28, 2017 05:11

Fix random test failure

b25bc68

lindong28 force-pushed the KAFKA-4586 branch from 576026e to b25bc68 Compare March 28, 2017 13:02

asfgit closed this in 8b05ad4 Mar 28, 2017

lindong28 deleted the KAFKA-4586 branch June 29, 2017 03:56

KAFKA-4586; Add purgeDataBefore() API (KIP-107) #2476

KAFKA-4586; Add purgeDataBefore() API (KIP-107) #2476

Conversation

lindong28 commented Feb 1, 2017

asfbot commented Feb 1, 2017

asfbot commented Feb 1, 2017

asfbot commented Feb 1, 2017

lindong28 commented Feb 2, 2017

asfbot commented Feb 2, 2017

asfbot commented Feb 2, 2017

asfbot commented Feb 2, 2017

ijuma commented Feb 2, 2017

asfbot commented Feb 10, 2017

asfbot commented Feb 10, 2017

asfbot commented Feb 11, 2017

asfbot commented Feb 11, 2017

asfbot commented Feb 13, 2017

lindong28 commented Feb 13, 2017

asfbot commented Feb 13, 2017

asfbot commented Feb 13, 2017

becketqin commented Feb 13, 2017

lindong28 commented Feb 13, 2017 • edited Loading

lindong28 commented Feb 13, 2017

asfbot commented Feb 13, 2017

asfbot commented Feb 13, 2017

asfbot commented Feb 13, 2017

becketqin commented Feb 14, 2017

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 commented Feb 21, 2017

asfbot commented Feb 21, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

junrao commented Mar 28, 2017

junrao commented Mar 28, 2017 • edited Loading

becketqin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

asfbot commented Mar 28, 2017

lindong28 commented Mar 28, 2017

becketqin commented Mar 28, 2017

lindong28 commented Feb 13, 2017 •

edited

Loading

junrao commented Mar 28, 2017 •

edited

Loading