-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-1025. Ratis logs may not be purged completely #170
Conversation
+1 |
Newbie question: With/without this change, UT still work means that the complete purge logic is not properly tested, right? Can you point me to which UT is covering purge logic? |
@amaliujia Please check TestSegmentedRaftLog. Purge currently considers cached segments for purging and currently in code we are evicting the cache. This leads to scenarios where segments are never purged. |
@lokeshj1703, if I understand correctly, this fix will purge the logs instead of evicting them from cache. This might lead to more frequent install snapshots requests if a follower is lagging behind a few log segments. Currently in Ozone, the maximum number of cached segments is set to 2 in both Datanodes and Ozone.
I think the first approach would be the clean solution. What are your thoughts? |
I think you are referring to the change in TestSegmentedRaftLog#syncWithSnapshot. So this function is called in follower after install snapshot or notify install snapshot. Since the snapshot has already been installed in the follower I think it is ok to purge the logs. Leader would have the snapshot too.
The change will not purge logs when they need to be evicted. If you look at SegmentedRaftLogCache#evictCache it only evicts the entry cache. It still tracks the log segment in the SegmentedRaftLogCache#closedSegments list. Log segments are removed from closedSegments during purge or truncate. |
Consider the following scenario.
|
Good point! I think this can be fixed by the following change.
The point is if we have a list of all log segments during startup the purge works correctly. The log segment references should be light-weight if the entry cache is evicted from the log segment. I think we can avoid directory scan during purge with this precondition. |
Agreed. Its best if directory scan can be avoided. |
@hanishakoneru Thanks for review! I have added a commit which fixes the logic for loading log segments. |
Thanks @lokeshj1703. |
@lokeshj1703 are the CI failures related? |
@hanishakoneru The failure does not seem to be related. Retriggered CI. |
@hanishakoneru Thanks for review! I have committed the PR to master branch. |
What changes were proposed in this pull request?
Ozone Manager Ratis server tries to purge logs up to the snapshotIndex after a snapshot is taken. But it only purges the logs which have been cached in memory. This could lead to older logs not being purged and consuming disk space.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-1025
How was this patch tested?
Existing UT