Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RATIS-1025. Ratis logs may not be purged completely #170

Closed
wants to merge 4 commits into from

Conversation

lokeshj1703
Copy link
Contributor

What changes were proposed in this pull request?

Ozone Manager Ratis server tries to purge logs up to the snapshotIndex after a snapshot is taken. But it only purges the logs which have been cached in memory. This could lead to older logs not being purged and consuming disk space.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-1025

How was this patch tested?

Existing UT

@runzhiwang
Copy link
Contributor

+1

@amaliujia
Copy link
Contributor

amaliujia commented Aug 7, 2020

Newbie question:

With/without this change, UT still work means that the complete purge logic is not properly tested, right? Can you point me to which UT is covering purge logic?

@lokeshj1703
Copy link
Contributor Author

With/without this change, UT still work means that the complete purge logic is not properly tested, right? Can you point me to which UT is covering purge logic?

@amaliujia Please check TestSegmentedRaftLog. Purge currently considers cached segments for purging and currently in code we are evicting the cache. This leads to scenarios where segments are never purged.

@hanishakoneru
Copy link
Contributor

@lokeshj1703, if I understand correctly, this fix will purge the logs instead of evicting them from cache.

This might lead to more frequent install snapshots requests if a follower is lagging behind a few log segments. Currently in Ozone, the maximum number of cached segments is set to 2 in both Datanodes and Ozone.

  1. Instead of purging logs when they need to be evicted from cache, can the purge implementation be fixed to also consider the logs on disk and not in cache. With this approach, we would have to take care of tracking on disk logs for other cases as well (for example {{RaftLog#getStartIndex}}).
  2. Another option would be to align the cache eviction logic with the log purge logic but this might not be optimal if the purge limit is set very high.

I think the first approach would be the clean solution. What are your thoughts?

@lokeshj1703
Copy link
Contributor Author

@lokeshj1703, if I understand correctly, this fix will purge the logs instead of evicting them from cache.

This might lead to more frequent install snapshots requests if a follower is lagging behind a few log segments. Currently in Ozone, the maximum number of cached segments is set to 2 in both Datanodes and Ozone.

I think you are referring to the change in TestSegmentedRaftLog#syncWithSnapshot. So this function is called in follower after install snapshot or notify install snapshot. Since the snapshot has already been installed in the follower I think it is ok to purge the logs. Leader would have the snapshot too.
Currently purge is called in StateMachineUpdater#takeSnapshot. I think follower where snapshot is getting installed would not call takeSnapshot until later and thus may not be able to purge logs.

  1. Instead of purging logs when they need to be evicted from cache, can the purge implementation be fixed to also consider the logs on disk and not in cache. With this approach, we would have to take care of tracking on disk logs for other cases as well (for example {{RaftLog#getStartIndex}}).

The change will not purge logs when they need to be evicted. If you look at SegmentedRaftLogCache#evictCache it only evicts the entry cache. It still tracks the log segment in the SegmentedRaftLogCache#closedSegments list. Log segments are removed from closedSegments during purge or truncate.

@hanishakoneru
Copy link
Contributor

Consider the following scenario.
Snapshot is taken every 25 transactions. PurgeUptoSnapshotIndex is set to false. This implies when snapshot is taken, only logs up to min commit index on all nodes will be purged.
Now let's say we have log segments log_11_20, log_21_30 and log_31_40 on disk. Snapshot was taken at index 25 but one of the nodes had commit index 12. So only log_0_10 is purged. When this node is restarted, it will not add log_11_20 to its closedSegments because of the following logic in SegmentedRaftLogCache. When purge is called next, log_11_20 will not be purged as it is not in the cached closedSegments list.

void loadSegment(LogPathAndIndex pi, boolean keepEntryInCache,
      Consumer<LogEntryProto> logConsumer, long lastIndexInSnapshot) throws IOException {
    LogSegment logSegment = LogSegment.loadSegment(storage, pi.getPath().toFile(),
        pi.getStartIndex(), pi.getEndIndex(), pi.isOpen(), keepEntryInCache, logConsumer, raftLogMetrics);
    if (logSegment != null && logSegment.getEndIndex() > lastIndexInSnapshot) {
      addSegment(logSegment);
    }
  }

@lokeshj1703
Copy link
Contributor Author

When purge is called next, log_11_20 will not be purged as it is not in the cached closedSegments list.

Good point! I think this can be fixed by the following change.

void loadSegment(LogPathAndIndex pi, boolean keepEntryInCache,
      Consumer<LogEntryProto> logConsumer, long lastIndexInSnapshot) throws IOException {
    LogSegment logSegment = LogSegment.loadSegment(storage, pi.getPath().toFile(),
        pi.getStartIndex(), pi.getEndIndex(), pi.isOpen(), keepEntryInCache, logConsumer, raftLogMetrics);
    if (logSegment != null) {
      addSegment(logSegment);
    }
  }

The point is if we have a list of all log segments during startup the purge works correctly. The log segment references should be light-weight if the entry cache is evicted from the log segment. I think we can avoid directory scan during purge with this precondition.

@hanishakoneru
Copy link
Contributor

I think we can avoid directory scan during purge with this precondition.

Agreed. Its best if directory scan can be avoided.

@lokeshj1703
Copy link
Contributor Author

@hanishakoneru Thanks for review! I have added a commit which fixes the logic for loading log segments.

@hanishakoneru
Copy link
Contributor

Thanks @lokeshj1703.
LGTM. +1 pending CI.

@hanishakoneru
Copy link
Contributor

@lokeshj1703 are the CI failures related?

@lokeshj1703
Copy link
Contributor Author

@hanishakoneru The failure does not seem to be related. Retriggered CI.

@lokeshj1703
Copy link
Contributor Author

@hanishakoneru Thanks for review! I have committed the PR to master branch.

@lokeshj1703 lokeshj1703 closed this Sep 8, 2020
@lokeshj1703 lokeshj1703 deleted the RATIS-1025 branch September 8, 2020 06:59
symious pushed a commit to symious/ratis that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants