-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11201. optimize fulltable cache eviction, scheduler and lock #6962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sumitagrawl for this patch. LGTM.
@@ -111,18 +116,31 @@ public void loadInitial(CacheKey<KEY> key, CacheValue<VALUE> value) { | |||
@Override | |||
public void put(CacheKey<KEY> cacheKey, CacheValue<VALUE> value) { | |||
try { | |||
lock.writeLock().lock(); | |||
lock.readLock().lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there are no parallel calls to TableCache#put
method. What will happen if someone makes a parallel put calls with same key? Will we end up in inconsistent state here with readLock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently no use case, but if someone do parallel call, the last one will get updated as collection is concurrentSkipListMap and no corruption.
This is same case for PartialCacheTable where same logic, but no lock. Lock purpose was eviction issue as reported in HDDS-4583 Jira where target problem is mentioned. And hence this is not required.
} | ||
} | ||
|
||
@Override | ||
public void cleanup(List<Long> epochs) { | ||
executorService.execute(() -> evictCache(epochs)); | ||
epochCleanupQueue.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen to the epochs
that were there in the epochCleanupQueue
which were not picked up by the cleanupTasks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are giving preference to latest epoch, and since latest epoch is received, which means previous epoch is already notified. Using this, intermediate epoch is not required and in eviction logic, only latest is used to handle all previous epch and action for cleanup.
Check for sequential epoch is not required and logic is modified.
…ache#6962) Added simple card and insights cards Fixed design dimenstions and added responsiveness Addressed review comments
What changes were proposed in this pull request?
FullTable Cache eviction is used for cleanup entry in cache if its value is null and same is persisted in DB. Current logic keeps and iterate all epoch for updatation.
As Optimization having logic refactor:
keep epochEntries only for case where key is deleted (i.e. value is null as marked for deletion)
lock is put is required to be readonly, as protection is required while cleanup of cache entry (which is not threadsafe) wrt put logic. Its safe to have parallel put.
1. Cleanup Policy Never uses ConcurrentSkipListMap, and its computeIfPresent is not atomic, so there can be a race condition between cleanup and requests adding to cache. (This might cause cleaning up entries which are not flushed to DB, and this can cause correctness issue)
Reference: HDDS-4583
lock is moved to higher level outside loop of epochEntries, to avoid multiple write lock-unlock, and its memory operation, so loop logic will be very fast, and reduce lock contention.
scheduler is changed to timer based to avoid keep adding new entry in worker. Also need only recent epoch to clear old epoch in eviction for epochEntries, as its not mandatory to have all epoch in order for cleanup.
As above changes,
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11201
How was this patch tested?