HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

sumitagrawl · 2024-07-18T11:55:12Z

What changes were proposed in this pull request?

FullTable Cache eviction is used for cleanup entry in cache if its value is null and same is persisted in DB. Current logic keeps and iterate all epoch for updatation.

As Optimization having logic refactor:

keep epochEntries only for case where key is deleted (i.e. value is null as marked for deletion)
lock is put is required to be readonly, as protection is required while cleanup of cache entry (which is not threadsafe) wrt put logic. Its safe to have parallel put.
1. Cleanup Policy Never uses ConcurrentSkipListMap, and its computeIfPresent is not atomic, so there can be a race condition between cleanup and requests adding to cache. (This might cause cleaning up entries which are not flushed to DB, and this can cause correctness issue)
Reference: HDDS-4583
lock is moved to higher level outside loop of epochEntries, to avoid multiple write lock-unlock, and its memory operation, so loop logic will be very fast, and reduce lock contention.
scheduler is changed to timer based to avoid keep adding new entry in worker. Also need only recent epoch to clear old epoch in eviction for epochEntries, as its not mandatory to have all epoch in order for cleanup.

As above changes,

it will avoid entries for new and modified key-values. Only for delete case. Since delete is very less for volume and bucket, this will avoid cleanup for epochs.
If there is lock starvation for cleanup, workers queue in scheduler will not increase and will be consistent to have minimum memory usages.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11201

How was this patch tested?

test case updated covering changes

sadanand48

Thanks @sumitagrawl for this patch. LGTM.

nandakumar131 · 2024-08-05T02:54:24Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/cache/FullTableCache.java

@@ -111,18 +116,31 @@ public void loadInitial(CacheKey<KEY> key, CacheValue<VALUE> value) {
  @Override
  public void put(CacheKey<KEY> cacheKey, CacheValue<VALUE> value) {
    try {
-      lock.writeLock().lock();
+      lock.readLock().lock();


Currently, there are no parallel calls to TableCache#put method. What will happen if someone makes a parallel put calls with same key? Will we end up in inconsistent state here with readLock?

Currently no use case, but if someone do parallel call, the last one will get updated as collection is concurrentSkipListMap and no corruption.
This is same case for PartialCacheTable where same logic, but no lock. Lock purpose was eviction issue as reported in HDDS-4583 Jira where target problem is mentioned. And hence this is not required.

nandakumar131 · 2024-08-05T03:03:01Z

hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/cache/FullTableCache.java

    }
  }

  @Override
  public void cleanup(List<Long> epochs) {
-    executorService.execute(() -> evictCache(epochs));
+    epochCleanupQueue.clear();


What will happen to the epochs that were there in the epochCleanupQueue which were not picked up by the cleanupTasks?

We are giving preference to latest epoch, and since latest epoch is received, which means previous epoch is already notified. Using this, intermediate epoch is not required and in eviction logic, only latest is used to handle all previous epch and action for cleanup.
Check for sequential epoch is not required and logic is modified.

…ache#6962) Added simple card and insights cards Fixed design dimenstions and added responsiveness Addressed review comments

HDDS-11201. optimize fulltable cache eviction, scheduler and lock

2a11e2f

sumitagrawl marked this pull request as draft July 18, 2024 11:55

fix testcases

afbf763

sumitagrawl marked this pull request as ready for review July 18, 2024 13:29

sadanand48 approved these changes Aug 2, 2024

View reviewed changes

sadanand48 merged commit d38372a into apache:master Aug 2, 2024
50 checks passed

nandakumar131 reviewed Aug 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

sumitagrawl commented Jul 18, 2024

sadanand48 left a comment

nandakumar131 Aug 5, 2024

sumitagrawl Aug 5, 2024

nandakumar131 Aug 5, 2024

sumitagrawl Aug 5, 2024

HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962

Conversation

sumitagrawl commented Jul 18, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

sadanand48 left a comment

Choose a reason for hiding this comment

nandakumar131 Aug 5, 2024

Choose a reason for hiding this comment

sumitagrawl Aug 5, 2024

Choose a reason for hiding this comment

nandakumar131 Aug 5, 2024

Choose a reason for hiding this comment

sumitagrawl Aug 5, 2024

Choose a reason for hiding this comment