KAFKA-15352: Update log-start-offset before initiating deletion of remote segments #14349

clolov · 2023-09-06T15:21:56Z

This pull request tries to solve is to first update the log-start-offset and then delete remote segments.
In the previous version if a read request arrives between us deleting a segment and updating the log-start-offset we won't be able to service it.

kamalcph

The changes LGTM. Could you please cover the patch with unit tests?

kamalcph · 2023-09-07T02:50:43Z

core/src/main/java/kafka/log/remote/RemoteLogManager.java

@@ -1006,6 +1005,10 @@ private void cleanupExpiredRemoteLogSegments() throws RemoteStorageException, Ex

            // Update log start offset with the computed value after retention cleanup is done
            remoteLogRetentionHandler.logStartOffset.ifPresent(offset -> handleLogStartOffsetUpdate(topicIdPartition.topicPartition(), offset));
+
+            for (RemoteLogSegmentMetadata segmentMetadata : segmentsToDelete) {
+                remoteLogRetentionHandler.deleteRemoteLogSegment(segmentMetadata, x -> true);


Can we and the result of remoteLogRetentionHandler.deleteRemoteLogSegment for all the segments and log an error statement if it is not able to delete the segments?

remoteLogRetentionHandler.deleteRemoteLogSegment(segmentMetadata, x -> !isCancelled() && isLeader());

kamalcph · 2023-09-07T03:04:30Z

@divijvaidya

Could you please review this patch? Discussion thread:

This patch addresses the case-2 mentioned in the KAFKA-15352 ticket partially. Only the leader will move the log-start-offset and starts to delete the remote log segments. In middle of deleting the remote log segments, if leader switch happens then there can be two cases:

The log-start-offset changes was propagated to all the replicas including the new-leader. In this case, both the old-leader and new-leader can delete the remote log segments since the log-start-offset can be updated only with the monotonically increasing value.
The log-start-offset changes was not propagated to the new leader. In this case, the new leader assumes the log-start-offset (with KAFKA-15351) which can be deleted immediately by the old leader and the log-start-offset can be stale until the next segment deletion, we can address this case by ensuring that the log-start-offset propagates to all the replicas before deleting the remote log segments, we can take this up later (or) file a JIRA.

divijvaidya

I have some meta questions:

The race condition between start offset getting updated, leadership getting moved and deletion by RLM is a tricky one. Is there a way to add integ tests for the different scenarios?
Log start offset is updated only by expiration related work. In this PR we update log start offset even though we might not perform expiration. Isn't it safer to update log start offset only if we are leader? Yes, we will still have a case where leadership changes between updating log start and deleting this which is handled by this PR but the probability of that happening is low since there is no compute intensive statements between these two. My point is, should we check for leadership even before updating the log start offset?

core/src/main/java/kafka/log/remote/RemoteLogManager.java

satishd

Thanks @clolov for the PR. Please rebase it with the latest trunk and resolve the conflicts to make it available for review.

kamalcph · 2023-09-07T16:25:27Z

My point is, should we check for leadership even before updating the log start offset?

yes, this is the expectation and being done inside the handleLogStartOffsetUpdate method:

public void handleLogStartOffsetUpdate(TopicPartition topicPartition, long remoteLogStartOffset) {
    if (isLeader()) {
        logger.debug("Updating {} with remoteLogStartOffset: {}", topicPartition, remoteLogStartOffset);
        updateRemoteLogStartOffset.accept(topicPartition, remoteLogStartOffset);
    }
}

satishd

Thanks @clolov for the PR. Had an initial review of the changes.

core/src/main/java/kafka/log/remote/RemoteLogManager.java

kamalcph · 2023-09-08T13:28:02Z

core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java

@@ -179,6 +180,8 @@ public List<EpochEntry> read() {

    private final UnifiedLog mockLog = mock(UnifiedLog.class);

+    private final List<Map<TopicPartition, Long>> events = new ArrayList<>();


Can we use list of tuples instead of list of map?

kamalcph · 2023-09-08T13:34:04Z

core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java

+                .thenAnswer(answer -> {
+                    // assert that log-start-offset has been moved accordingly
+                    // we skip the first entry as it is the local replica ensuring it has the correct log start offset
+                    assertEquals(200, events.get(1).get(leaderTopicIdPartition.topicPartition()));


Instead of asserting the log-start-offset via events. Can we assert it similar to testLogStartOffsetUpdatedOnStartup method?

AtomicLong logStartOffset = new AtomicLong(0); try (RemoteLogManager remoteLogManager = new RemoteLogManager(remoteLogManagerConfig, brokerId, logDir, clusterId, time, tp -> Optional.of(mockLog), (topicPartition, offset) -> logStartOffset.set(offset), brokerTopicStats) { public RemoteLogMetadataManager createRemoteLogMetadataManager() { return remoteLogMetadataManager; } }) { RemoteLogManager.RLMTask task = remoteLogManager.new RLMTask(leaderTopicIdPartition, 128); task.convertToLeader(0); task.run(); assertEquals(200L, logStartOffset.get()); verify(remoteStorageManager).deleteLogSegmentData(remoteLogSegmentMetadatas.get(0)); verify(remoteStorageManager, never()).deleteLogSegmentData(remoteLogSegmentMetadatas.get(1)); }

I am happy to change it to this. The reason why I implemented it with events is that it allowed me to carry out the assertion before deletes were initiated. With your approach we assert that the log start offset has been updated, but we do not assert that it was updated before deletes were carried out. Would you still like me to change it to your proposal?

Or do you mean that the problem is not with the location of the assertion, but that I do not need a list of events where a single AtomicLong will be sufficient?

yes, single AtomicLong will be sufficient. We can assert before and after calling the task.run() method.

kamalcph

LGTM. Thanks @clolov for addressing the review comments.

showuon

Overall LGTM! Thanks for the fix. Left some comments.

showuon · 2023-09-09T07:21:40Z

core/src/main/java/kafka/log/remote/RemoteLogManager.java

+                            if (shouldDeleteSegment) {
+                                segmentsToDelete.add(metadata);
+                            }


nit: These if block and L1004-1006 could be put before canProcess = isSegmentDeleted || !isValidSegment;. I.e.

boolean shouldDeleteSegment = remoteLogRetentionHandler.deleteLogStartOffsetBreachedSegments( metadata, logStartOffset, epochWithOffsets); boolean isValidSegment = false; if (!shouldDeleteSegment) { ... if (isValidSegment) { shouldDeleteSegment = remoteLogRetentionHandler.deleteRetentionTimeBreachedSegments(metadata) || remoteLogRetentionHandler.deleteRetentionSizeBreachedSegments(metadata); } } if (shouldDeleteSegment) { segmentsToDelete.add(metadata); } canProcess = isSegmentDeleted || !isValidSegment;

Thanks for the spot! Hopefully addressed in the next commit

showuon · 2023-09-09T07:35:27Z

core/src/test/java/kafka/log/remote/RemoteLogManagerTest.java

+            assertEquals(200L, logStartOffset.get());
+            verify(remoteStorageManager).deleteLogSegmentData(remoteLogSegmentMetadatas.get(0));
+            verify(remoteStorageManager, never()).deleteLogSegmentData(remoteLogSegmentMetadatas.get(1));


I think this is a good place to continue to verify the situation we want to protect:

// If the follower HAS picked up the changes, and they become the leader this replica won't successfully complete the deletion. // However, the new leader will correctly pick up all breaching segments as log start offset breaching ones // and delete them accordingly. // If the follower HAS NOT picked up the changes, and they become the leader then they will go through this process // again and delete them with the original deletion reason i.e. size, time or log start offset breach.

So, I'm thinking we can continue the test with sth like:

RemoteLogManager.RLMTask task = remoteLogManager.new RLMTask(followerTopicIdPartition, 128); task.convertToLeader(1); .... task.run(); assertEquals(200L, logStartOffset.get()); // verify the 2nd log segment will be deleted by the new leader. verify(remoteStorageManager).deleteLogSegmentData(remoteLogSegmentMetadatas.get(1));

WDYT?

This is a great idea, apologies for not doing it myself. Hopefully the next commit addresses this

clolov · 2023-09-11T08:58:07Z

Thanks for the latest comments! I will review and respond today

…mote segments

divijvaidya

Please don't wait for comments from me to merge this. I am good with the code changes here.

showuon

LGTM! Thanks for the fix!

showuon · 2023-09-12T02:20:06Z

@satishd , do you want to have another look?

satishd

Thanks @clolov , went through the source code changes and LGTM.

This change is about updating the log-start-offset before the segments are deleted from remote storage. This will do a best effort mechanism for followers to receive log-start-offset and they can update their log-start-offset before it becomes a leader.

I could not take a closer look at the tests. I do not want the PR to be blocking, I am fine as others already reviewed these tests.

satishd · 2023-09-12T04:36:40Z

There are a few unrelated test failures. Merging it to trunk and 3.6

…mote segments (#14349) This change is about the current leader updating the log-start-offset before the segments are deleted from remote storage. This will do a best-effort mechanism for followers to receive log-start-offset from the leader and they can update their log-start-offset before it becomes a leader. Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

…mote segments (apache#14349) This change is about the current leader updating the log-start-offset before the segments are deleted from remote storage. This will do a best-effort mechanism for followers to receive log-start-offset from the leader and they can update their log-start-offset before it becomes a leader. Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>

clolov marked this pull request as draft September 6, 2023 15:22

satishd added the tiered-storage Pull requests associated with KIP-405 (Tiered Storage) label Sep 7, 2023

kamalcph reviewed Sep 7, 2023

View reviewed changes

clolov force-pushed the kafka-15352 branch from daf7361 to b499e85 Compare September 7, 2023 10:56

clolov marked this pull request as ready for review September 7, 2023 12:09

divijvaidya reviewed Sep 7, 2023

View reviewed changes

core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated Show resolved Hide resolved

core/src/main/java/kafka/log/remote/RemoteLogManager.java Show resolved Hide resolved

core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated Show resolved Hide resolved

showuon self-assigned this Sep 7, 2023

satishd reviewed Sep 7, 2023

View reviewed changes

clolov force-pushed the kafka-15352 branch from b499e85 to 6655ab6 Compare September 8, 2023 12:48

satishd reviewed Sep 8, 2023

View reviewed changes

core/src/main/java/kafka/log/remote/RemoteLogManager.java Outdated Show resolved Hide resolved

kamalcph reviewed Sep 8, 2023

View reviewed changes

kamalcph approved these changes Sep 8, 2023

View reviewed changes

showuon reviewed Sep 9, 2023

View reviewed changes

clolov added 2 commits September 11, 2023 14:43

KAFKA-15352: Update log-start-offset before initiating deletion of re…

80c0521

…mote segments

Address comments from second round of review

63da324

clolov force-pushed the kafka-15352 branch from 8d36121 to 6976916 Compare September 11, 2023 13:56

Address comments from third round of review

e96e096

clolov force-pushed the kafka-15352 branch from 6976916 to e96e096 Compare September 11, 2023 13:57

divijvaidya approved these changes Sep 11, 2023

View reviewed changes

showuon approved these changes Sep 12, 2023

View reviewed changes

satishd approved these changes Sep 12, 2023

View reviewed changes

satishd merged commit 7483991 into apache:trunk Sep 12, 2023
1 check failed

kamalcph mentioned this pull request Sep 19, 2023

KAFKA-15479: Remote log segments should be considered once for retention breach #14407

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-15352: Update log-start-offset before initiating deletion of remote segments #14349

KAFKA-15352: Update log-start-offset before initiating deletion of remote segments #14349

clolov commented Sep 6, 2023 •

edited

kamalcph left a comment

kamalcph Sep 7, 2023 •

edited

kamalcph commented Sep 7, 2023 •

edited

divijvaidya left a comment

satishd left a comment

kamalcph commented Sep 7, 2023

satishd left a comment

kamalcph Sep 8, 2023

kamalcph Sep 8, 2023 •

edited

clolov Sep 8, 2023

clolov Sep 8, 2023

kamalcph Sep 8, 2023

kamalcph left a comment

showuon left a comment

showuon Sep 9, 2023

clolov Sep 11, 2023

showuon Sep 9, 2023 •

edited

clolov Sep 11, 2023

clolov commented Sep 11, 2023

divijvaidya left a comment

showuon left a comment

showuon commented Sep 12, 2023

satishd left a comment

satishd commented Sep 12, 2023

		@@ -179,6 +180,8 @@ public List<EpochEntry> read() {

		private final UnifiedLog mockLog = mock(UnifiedLog.class);

		private final List<Map<TopicPartition, Long>> events = new ArrayList<>();

KAFKA-15352: Update log-start-offset before initiating deletion of remote segments #14349

KAFKA-15352: Update log-start-offset before initiating deletion of remote segments #14349

Conversation

clolov commented Sep 6, 2023 • edited

kamalcph left a comment

Choose a reason for hiding this comment

kamalcph Sep 7, 2023 • edited

Choose a reason for hiding this comment

kamalcph commented Sep 7, 2023 • edited

divijvaidya left a comment

Choose a reason for hiding this comment

satishd left a comment

Choose a reason for hiding this comment

kamalcph commented Sep 7, 2023

satishd left a comment

Choose a reason for hiding this comment

kamalcph Sep 8, 2023

Choose a reason for hiding this comment

kamalcph Sep 8, 2023 • edited

Choose a reason for hiding this comment

clolov Sep 8, 2023

Choose a reason for hiding this comment

clolov Sep 8, 2023

Choose a reason for hiding this comment

kamalcph Sep 8, 2023

Choose a reason for hiding this comment

kamalcph left a comment

Choose a reason for hiding this comment

showuon left a comment

Choose a reason for hiding this comment

showuon Sep 9, 2023

Choose a reason for hiding this comment

clolov Sep 11, 2023

Choose a reason for hiding this comment

showuon Sep 9, 2023 • edited

Choose a reason for hiding this comment

clolov Sep 11, 2023

Choose a reason for hiding this comment

clolov commented Sep 11, 2023

divijvaidya left a comment

Choose a reason for hiding this comment

showuon left a comment

Choose a reason for hiding this comment

showuon commented Sep 12, 2023

satishd left a comment

Choose a reason for hiding this comment

satishd commented Sep 12, 2023

clolov commented Sep 6, 2023 •

edited

kamalcph Sep 7, 2023 •

edited

kamalcph commented Sep 7, 2023 •

edited

kamalcph Sep 8, 2023 •

edited

showuon Sep 9, 2023 •

edited