Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-4586; Add purgeDataBefore() API (KIP-107) #2476

Closed
wants to merge 6 commits into from

Conversation

lindong28
Copy link
Member

No description provided.

@asfbot
Copy link

asfbot commented Feb 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1398/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Feb 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1395/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Feb 1, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1395/
Test FAILed (JDK 8 and Scala 2.12).

@lindong28
Copy link
Member Author

@ijuma @ewencp @jjkoshy @radai-rosenblatt I have manually tested it successfully using the ./bin/kafka-purge-data.sh in the patch. I am going to add unit test and probably ducktape integration test as well. But the core code should be ready for review. Would you have time to review the patch?

@asfbot
Copy link

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1422/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1419/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Feb 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1419/
Test PASSed (JDK 8 and Scala 2.12).

@ijuma
Copy link
Contributor

ijuma commented Feb 2, 2017

@lindong28, thanks for the PR. I probably won't have time to review before next week. cc @junrao as well since he reviewed the KIP.

@asfbot
Copy link

asfbot commented Feb 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1639/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Feb 10, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1640/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Feb 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1637/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Feb 11, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1637/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1656/
Test FAILed (JDK 8 and Scala 2.11).

@lindong28
Copy link
Member Author

@jjkoshy @junrao @becketqin @ijuma @radai-rosenblatt I have added tests and the patch is fully ready for review. Would you have time to review this patch?

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1653/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1653/
Test PASSed (JDK 8 and Scala 2.12).

@becketqin
Copy link
Contributor

@lindong28 Thanks for the patch. It seems the patch has conflicts. Could you rebase?

@lindong28
Copy link
Member Author

lindong28 commented Feb 13, 2017

@becketqin I thought it will take 1+ week for the patch to be reviewed and there will be conflict again anyway. Thus I was going to rebase it after first round of review. What is our general guideline for rebasing big patches? I can certainly rebase it now if you think it is useful.

@lindong28
Copy link
Member Author

@becketqin All conflicts have been resolved and all tests are passed. Thanks!

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1664/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1667/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Feb 13, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1664/
Test PASSed (JDK 8 and Scala 2.12).

@becketqin
Copy link
Contributor

@lindong28 Thanks for updating the patch. I'll take a look. Usually if there are multiple big patches in parallel, the committers who are reviewing the code would hold back some of the patches to avoid unnecessary rebase.

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 Thanks for the patch. Left some comments.
It looks the patch again has conflicts with trunk. To avoid frequent rebase, let's finish a few iteration of reviews before we rebase.

if (!errors.isEmpty)
error(s"Metadata request contained errors: $errors")

val (authorizedPartitions, unauthorizedPartitions) = purgeOffsets.partition{partitionAndOffset =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable names seem a little misleading. Are all partitions without leader information unauthorized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I have updated the names.

response.cluster().leaderFor(partitionAndOffset._1) != null}

val unauthorizedPartitionResults = unauthorizedPartitions.mapValues( _ =>
PurgeDataResult(PurgeResponse.INVALID_LOW_WATERMARK, Errors.UNKNOWN_TOPIC_OR_PARTITION.exception()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the exception returned by the broker in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have updated code to use error from MetadataRequest.

val observedResults = futures.flatMap{ future =>
val elapsed = time.milliseconds() - start
remaining = timeoutMs - elapsed
if (remaining > 0 && client.poll(future, remaining)) future.value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When client.poll(future, remaining) returns true, the future may either contains a value (succeeded) or an error (failed). If the future has an error, calling future.value() will throw exception. It seems better if we can return the full results to the users even if some of the requests failed so the users will be able to know which partitions has failed to purge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a problem for purgeDataBefore(), because those futures provided to the CompositeFuture has been constructed in such a way that they never raises exception. Those future will call future.complete(result) in case of onFailure, where result indicates has the error information.

I agree it would be make CompositeFuture more useful if this class handles the logic of converting error to result and return the full results to user as you suggested. But I don't have a good way to do it now because CompositeFuture doesn't know the type of the return value -- it currently use template T.

}

// send requests over network to brokers
client.poll(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a single consumerNetworkClient.poll(0) cannot guarantee all the requests are sent out. Also, the interface might be a little weird that after purgeDataBefore() is returned the users have to keep calling future.client.poll() otherwise the futures will not be completed. I am wondering how would user use the asynchronous purge in this case? At very least we should document this clearly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I have updated the AdminClient to create its own thread to do client.poll(retryBackoffMs). I find it necessary for AdminClient to have its own thread in order to support both syn and async operation.

I have also added testLogStartOffsetAfterAsyncPurge() to validate the asyn purge operation.


@Override
public String toString() {
StringBuilder builder = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be a string concatenation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any negative impact to use StringBuilder as compared to string concatenation? Using StringBuilder here allows us to have the same code style as toString() of other requests such as ProduceRequest and LeaderAndIsrRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much difference, just for readability (See below) we can keep them the same as other requests.
http://stackoverflow.com/questions/1532461/stringbuilder-vs-string-concatenation-in-tostring-in-java

* Return low watermark of the partition.
*/
def purgeRecordsOnLeader(offset: Long): Long = {
inReadLock(leaderIsrUpdateLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in write lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use readlock since this method doesn't update leader or isr of the partition, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Read lock here is fine.

*/
def maybeIncrementLogStartOffsetAndPurge(offset: Long) {
// We don't have to write the log start offset to log-start-offset-checkpoint immediately.
// The purgeOffset may be lost only if all replicas of this broker are shutdown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not accurate. I have updated the comment to replace all replicas of this broker to in-sync replicas of this broker

val localPurgeResults = purgeOnLocalLog(offsetPerPartition, metadataCache)
debug("Purge on local log in %d ms".format(time.milliseconds - sTime))

val purgeStatus = localPurgeResults.map { case (topicPartition, result) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could be mapValues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I validated that we can not use mapValues() here. This is because mapValues() returns a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements. As a result status.acksPending = true in the constructor of DelayedPurge becomes no-op.

tryCompleteDelayedRequests()
}

def checkLowWatermarkReachOffset(requiredOffset: Long): (Boolean, Errors, Long) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods with three tuples as return value may be a little hard to follow, may be we can create a case class. At very least we should document each field of the return value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have updated the code to make it more readable.


// Default throttle time
private static final int DEFAULT_THROTTLE_TIME = 0;
// Default low watermark
private static final long DEFAULT_LOG_START_OFFSET = 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to distinguish between NO_LOG_START_OFFSET v.s. LOG_START_OFFSET = 0? Is it clearer to define the NO_LOG_START_OFFSET as -1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is necessary to distinguish between NO_LOG_START_OFFSET v.s. LOG_START_OFFSET = 0. Is there any use-case for NO_LOG_START_OFFSET?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we want to identify the state of the system as clear as possible. The follower should not take any action if the LOG_START_OFFSET on the broker is NO_LOG_START_OFFSET. But if the follower sees the leader returning the starting offset = 0 while the actual starting offset on the leader is not, this introduces confusion.

@lindong28
Copy link
Member Author

@becketqin Thanks so much for taking time to review the patch! Can you check if the updated patch has addressed your comments?

@asfbot
Copy link

asfbot commented Feb 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1771/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2433/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2437/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2440/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2436/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2436/
Test PASSed (JDK 8 and Scala 2.12).

@junrao
Copy link
Contributor

junrao commented Mar 28, 2017

It seems that right now, for a compacted topic, the base offset of the first segment is always 0. So, the patch is fine.

@junrao
Copy link
Contributor

junrao commented Mar 28, 2017

@lindong28 : Thanks for the patch. LGTM. @becketqin : Do you want to make another pass and then merge?

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 Thanks for the patch. LGTM except a very rare corner case.

new Field("timeout", INT32, "The maximum time to await a response in ms."));

public static final Schema DELETE_RECORDS_RESPONSE_PARTITION_V0 = new Schema(new Field("partition", INT32, "Topic partition id."),
new Field("low_watermark", INT64, "Smallest available offset"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more clear on this field. In the FetchResponse we have log_start_offset which have almost the same comment. Maybe here we can say "The smallest available offset across all live replicas."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated now.


public static final Schema DELETE_RECORDS_RESPONSE_PARTITION_V0 = new Schema(new Field("partition", INT32, "Topic partition id."),
new Field("low_watermark", INT64, "Smallest available offset"),
new Field("error_code", INT16, "The error code for the given topic."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the given topic => for the given partition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed now.

val recoveryPoints = this.logsByDir.get(dir.toString)
if (recoveryPoints.isDefined) {
this.recoveryPointCheckpoints(dir).write(recoveryPoints.get.mapValues(_.recoveryPoint))
}
}

/**
* Checkpoint log start offset for all logs in provided directory.
*/
private def checkpointLogStartOffsetsInDir(dir: File): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems a very rare case that may result in message loss. Assuming there is only one replica, consider the following sequence:

  1. User deletes a topic, we are not deleting the log starting offset from the checkpoint file.
  2. If the topic is created again with the same name and the partitions happen to be on the same broker.
  3. user produced some messages and before the log starting offset is checkpointed, the broker went down.
  4. Now when the broker restarts, the old checkpointed log starting offset may be applied to the newly created topic, which may cause the messages that have been produced into the log to be unavailable to the users.

This is a very rare corner case, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I fixed the problem by always do checkpointLogStartOffsetsInDir(removedLog.dir.getParentFile) when a partition is deleted. The overhead will probably be smaller than checkpointing the cleaner offset which we already do everytime we delete a partition.

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2443/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2439/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2439/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2444/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2440/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2440/
Test FAILed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2456/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2456/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Mar 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2460/
Test FAILed (JDK 8 and Scala 2.11).

@lindong28
Copy link
Member Author

@junrao @becketqin All integration tests have passed except org.apache.kafka.connect.runtime.WorkerTest.testAddRemoveTask with Scala 2.11. I don't think this is caused by this patch because this test passed all other recent tests. Thanks so much for taking time to review this patch!

@asfgit asfgit closed this in 8b05ad4 Mar 28, 2017
@becketqin
Copy link
Contributor

@lindong28 Thanks for updating the patch. Merged to trunk.
@junrao @ijuma Thanks for the review!

@lindong28 lindong28 deleted the KAFKA-4586 branch June 29, 2017 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants