Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

Merged
merged 2 commits into from
Aug 2, 2024

Conversation

sumitagrawl
Copy link
Contributor

What changes were proposed in this pull request?

FullTable Cache eviction is used for cleanup entry in cache if its value is null and same is persisted in DB. Current logic keeps and iterate all epoch for updatation.

As Optimization having logic refactor:

  • keep epochEntries only for case where key is deleted (i.e. value is null as marked for deletion)

  • lock is put is required to be readonly, as protection is required while cleanup of cache entry (which is not threadsafe) wrt put logic. Its safe to have parallel put.
    1. Cleanup Policy Never uses ConcurrentSkipListMap, and its computeIfPresent is not atomic, so there can be a race condition between cleanup and requests adding to cache. (This might cause cleaning up entries which are not flushed to DB, and this can cause correctness issue)
    Reference: HDDS-4583

  • lock is moved to higher level outside loop of epochEntries, to avoid multiple write lock-unlock, and its memory operation, so loop logic will be very fast, and reduce lock contention.

  • scheduler is changed to timer based to avoid keep adding new entry in worker. Also need only recent epoch to clear old epoch in eviction for epochEntries, as its not mandatory to have all epoch in order for cleanup.

As above changes,

  • it will avoid entries for new and modified key-values. Only for delete case. Since delete is very less for volume and bucket, this will avoid cleanup for epochs.
  • If there is lock starvation for cleanup, workers queue in scheduler will not increase and will be consistent to have minimum memory usages.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11201

How was this patch tested?

  • test case updated covering changes

@sumitagrawl sumitagrawl marked this pull request as draft July 18, 2024 11:55
@sumitagrawl sumitagrawl marked this pull request as ready for review July 18, 2024 13:29
Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumitagrawl for this patch. LGTM.

@sadanand48 sadanand48 merged commit d38372a into apache:master Aug 2, 2024
50 checks passed
@@ -111,18 +116,31 @@ public void loadInitial(CacheKey<KEY> key, CacheValue<VALUE> value) {
@Override
public void put(CacheKey<KEY> cacheKey, CacheValue<VALUE> value) {
try {
lock.writeLock().lock();
lock.readLock().lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there are no parallel calls to TableCache#put method. What will happen if someone makes a parallel put calls with same key? Will we end up in inconsistent state here with readLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no use case, but if someone do parallel call, the last one will get updated as collection is concurrentSkipListMap and no corruption.
This is same case for PartialCacheTable where same logic, but no lock. Lock purpose was eviction issue as reported in HDDS-4583 Jira where target problem is mentioned. And hence this is not required.

}
}

@Override
public void cleanup(List<Long> epochs) {
executorService.execute(() -> evictCache(epochs));
epochCleanupQueue.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen to the epochs that were there in the epochCleanupQueue which were not picked up by the cleanupTasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are giving preference to latest epoch, and since latest epoch is received, which means previous epoch is already notified. Using this, intermediate epoch is not required and in eviction logic, only latest is used to handle all previous epch and action for cleanup.
Check for sequential epoch is not required and logic is modified.

devabhishekpal pushed a commit to devabhishekpal/ozone that referenced this pull request Aug 8, 2024
…ache#6962)

Added simple card and insights cards

Fixed design dimenstions and added responsiveness

Addressed review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants