New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15802: validate remote segment state before fetching index #14727
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
You will need a similar filter in cleanupExpiredRemoteLogSegments() method as well where call listRemoteLogSegments() to fetch the metadata. Over there we want to iterate through segments which are in copy_finished state OR delete_started state (otherwise same error you faced in this JIRA will surface there).
-
Please add integration test which fails prior to this change for both the changes, one which you have added and another which I suggested in point 1 above.
ec765c7
to
cd6b1f8
Compare
@divijvaidya good catch! |
// since the checkpoint file was already truncated. | ||
boolean shouldDeleteSegment = remoteLogRetentionHandler.isSegmentBreachByLogStartOffset( | ||
|
||
if (SEGMENT_DELETION_VALID_STATES.contains(metadata.state())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since there is no "else" part to it, perhaps short circuiting the if condition with
if (!SEGMENT_DELETION_VALID_STATES.contains(metadata.state())) {
continue;
}
would provide more readability.
(I don't have a strong opinion on this, hence, let me know if you wish to keep it as as it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, applying this fix.
@@ -133,6 +134,10 @@ public class RemoteLogManager implements Closeable { | |||
|
|||
private static final Logger LOGGER = LoggerFactory.getLogger(RemoteLogManager.class); | |||
private static final String REMOTE_LOG_READER_THREAD_NAME_PREFIX = "remote-log-reader"; | |||
private static final Set<RemoteLogSegmentState> SEGMENT_DELETION_VALID_STATES = Collections.unmodifiableSet(EnumSet.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this diagram:
Line 74 in 49d3122
* +---------------------+ +----------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I guess the transition hasn't considered the current implementation where copying and deletion are sequential. If the implementation changes, it may be possible to have this scenario.
I'd say it should be considered a valid transition. I will include it in the valid states and adapt the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY_SEGMENT_STARTED segments are eligible for deletion when those segments were not able to be copied by the leader as the leader went through ungraceful shutdown or for any oher reasons. New leader may pickup the resepctive segments for the targeted offsets that need to be copied and the earlier failed segment will remain in the COPY_SEGMENT_STARTED state and it will eventually be deleted by retention cleanup logic.
So, COPY_SEGMENT_STARTED is a valid transition even now when copy and deletion are happening sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishd in that case wouldn't the new leader retry to upload the segment? If a segment is left on copy_started state, then it'd risk some segment not being uploaded. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I can see a new segment id being generated here.
Got it, so there may be segment that will stay on copy_started until eventually deleted. Then, the way to assess if this is not an issue is to check that another topic-id-partition-offset combination with a different segment id has copy_finished, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, one way to check is whether there is a segment associated with the target offset or not by using RLMM#remoteLogSegmentMetadata().
cd6b1f8
to
864cdd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeqo for the PR, Left a couple of comment/clarifications.
@@ -133,6 +134,10 @@ public class RemoteLogManager implements Closeable { | |||
|
|||
private static final Logger LOGGER = LoggerFactory.getLogger(RemoteLogManager.class); | |||
private static final String REMOTE_LOG_READER_THREAD_NAME_PREFIX = "remote-log-reader"; | |||
private static final Set<RemoteLogSegmentState> SEGMENT_DELETION_VALID_STATES = Collections.unmodifiableSet(EnumSet.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COPY_SEGMENT_STARTED segments are eligible for deletion when those segments were not able to be copied by the leader as the leader went through ungraceful shutdown or for any oher reasons. New leader may pickup the resepctive segments for the targeted offsets that need to be copied and the earlier failed segment will remain in the COPY_SEGMENT_STARTED state and it will eventually be deleted by retention cleanup logic.
So, COPY_SEGMENT_STARTED is a valid transition even now when copy and deletion are happening sequentially.
Currently, the RLMM is fetching the index even when remote segment metadata state says upload has not complete yet. Workaround using current APIs to only fetch remote indexes when segment is properly uploaded. Also, the RLMM is deleting segments even when remote segment metadata state says upload has not complete yet. Only consider segments where copy has started or completed, or where delete has started. [KAFKA-15802](https://issues.apache.org/jira/browse/KAFKA-15802)
864cdd1
to
6140a9d
Compare
@jeqo Please maintain the commit history if possible as it helps in reviewing the updated commits in the PR. It is fine if you need to rebase and merge the commits for valid reasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, left minor comments.
verifyDeleteLogSegment(segmentMetadataList, deletableSegmentCount, currentLeaderEpoch); | ||
} | ||
|
||
@ParameterizedTest(name = "testDeleteLogSegmentDueToRetentionTimeBreach segmentCount={0} deletableSegmentCount={1}") | ||
@CsvSource(value = {"50, 0", "50, 1", "50, 23", "50, 50"}) | ||
@CsvSource(value = {"50, 0, DELETE_SEGMENT_STARTED", "50, 1, COPY_SEGMENT_FINISHED", "50, 23, DELETE_SEGMENT_STARTED", "50, 50, COPY_SEGMENT_FINISHED"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we skip updating the existing tests?
There is already a test (testDeleteRetentionMsBeforeSegmentReady) to verify that if the state is DELETE_SEGMENT_FINISHED, no interaction is expected with the RemoteStorageManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rename the test?
testDeleteRetentionMsBeforeSegmentReady
to testDeleteRetentionMsOnExpiredSegment
(expired/invalid/something-similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, applying suggestion
@@ -988,6 +998,10 @@ void cleanupExpiredRemoteLogSegments() throws RemoteStorageException, ExecutionE | |||
return; | |||
} | |||
RemoteLogSegmentMetadata metadata = segmentsIterator.next(); | |||
|
|||
if (!SEGMENT_DELETION_VALID_STATES.contains(metadata.state())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this condition be simplified to?
if (RemoteLogSegmentState.DELETE_SEGMENT_FINISHED.equals(metadata.state())) {
continue;
}
If we want to add/remove state, we can change the logic later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, applying change.
…ex/cleanup Align code style
…ex/cleanup Apply latest suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeqo for addressing the review comments. LGTM.
There are a few unrelated test failures, merging the changes to 3.6 branch. @jeqo Please raise PR against trunk by picking this change and let me know, will review and merge it when there are no related failures in Jenkins job. |
…che#14727) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
) (#14759) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) (apache#14759) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) (apache#14759) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) (apache#14759) (#244) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com> Co-authored-by: Jorge Esteban Quilcate Otoya <jorge.quilcate@aiven.io>
…che#14727) (apache#14759) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
…che#14727) (apache#14759) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
See commits for more details
[KAFKA-15802]
Committer Checklist (excluded from commit message)