New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-1499. OzoneManager Cache. #798
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
* @return - TableStore. | ||
* @throws IOException on Failure | ||
*/ | ||
<KEY, VALUE> Table<KEY, VALUE> getTable(String name, | ||
Class<KEY> keyType, Class<VALUE> valueType) throws IOException; | ||
Class<KEY> keyType, Class<VALUE> valueType, | ||
TableCache.CACHETYPE cachetype) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an external visible TableCache.CACHETYPE ? shouldn't this be an implementation detail of the Tables that have Cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this because for a few tables like bucket and volume table plan is to maintain full table information, for other tables we maintain a partial cache, whereas for few tables we don't want to maintain cache at all. (This is a common interface for all tables in Ozone SCM/OM. So, having this option will help to know which kind of cache need to be used for the table.)
As these are frequently used for validation of almost every operation in OM. So, this might improve validation like bucket/volume exists or not checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the cache type.
* If the cacheKey already exists, it will override the entry. | ||
* @param cacheKey | ||
* @param cacheValue | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I was really hoping that the fact that there is a cache is not visible to the layer that is reading and writing.
Is there a reason why that should be exposed to calling applications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once after the operation is executed in applyTransaction just before releasing the lock and sending a response to the client we need to add the response into cache. So that next subsequent read/write requests validation can be done with cache/db data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx makes sense.
this.rawTable = rawTable; | ||
this.codecRegistry = codecRegistry; | ||
this.keyType = keyType; | ||
this.valueType = valueType; | ||
if (cachetype == TableCache.CACHETYPE.FULLCACHE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is impossible for the user to tell you apriori if they want a full cache or partial cache. When you start a cluster you always want a full cache. We should get a cache size -- or get a percentage of memory from the OM cache size and use that if needed. Or for time being rely on the RocksDB doing the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Anu for the comment, removed the cache type.
} else { | ||
// Doing this because, if the Cache Value Last operation is deleted | ||
// means it will eventually removed from DB. So, we should return null. | ||
if (cacheValue.getLastOperation() != CacheValue.OperationType.DELETED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even cache the deleted Operations? Delete is not in the performance critical path at all. If you can instruct the system to make the full commit or flush the buffer when there is a delete op you don't need to keep this extra state in the cache. yes, repeated deletes will call state machine call back. When do we actually flush / clear this entry?
} | ||
} | ||
} else { | ||
return getFromTable(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you need this get again ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tables where the cache is disabled, we need to do as before just read from DB and return data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood the comment, updated the code to remove getTable in multiple places.
@Private | ||
@Evolving | ||
public class PartialTableCache<CACHEKEY, CACHEVALUE> | ||
implements TableCache<CACHEKEY, CACHEVALUE>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you have seen this, https://github.com/facebook/rocksdb/wiki/Block-Cache
We already do this cache control in the RockDB. I am not sure if we should do this twice. Unless you have a lookup problem which cannot be solved by hashing or prefix lookup, we will have more efficient usage of memory by relying on the underlying layer and more over having a unified cache layer will lead to better cache layer utilization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is maintained for correctness purpose for reads and validation of subsequent requests. The cache will be cleared once we flush to DB.
return cache.size(); | ||
} | ||
|
||
private void evictCache(long epoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a key be evicted if it was a delete operation and the state machine commit has taken place ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Key will be evicted once double buffer flushes to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bharatviswa504 for working on this.
@@ -60,6 +62,9 @@ void putWithBatch(BatchOperation batch, KEY key, VALUE value) | |||
* Returns the value mapped to the given key in byte array or returns null | |||
* if the key is not found. | |||
* | |||
* First it will check from cache, if it has entry return the value | |||
* otherwise, get from the RocksDB table. | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RDBTable implementation of Table does not check the cache. We should probably move this statement to TypedTable which implements the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// If currentEntry epoch is greater than epoch, we have deleted all | ||
// entries less than specified epoch. So, we can break. | ||
if (currentEntry.getEpoch() > epoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid the 2nd if check and put it in else block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
GenericTestUtils.waitFor(() -> | ||
((TypedTable<String, String>) testTable).getCache().size() == 4, | ||
100, 5000); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check that the cache entries remaining in the cache are the expected entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Override | ||
public void put(CACHEKEY cacheKey, CACHEVALUE value) { | ||
cache.put(cacheKey, value); | ||
CacheValue cacheValue = (CacheValue) cache.get(cacheKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of casting the cache.get() object to CacheValue, I think CACHEVALUE itself should extend CacheValue so that it is guaranteed that the Value part of TableCache is an instance of CacheValue.class. Same for CACHEKEY also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
💔 -1 overall
This message was automatically generated. |
Thank You @hanishakoneru for the review. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments. Overall the patch looks quite good to me.
} | ||
|
||
/** | ||
* Last happened Operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bharat, what if we support further operation types in future? I was thinking whether we really need this lastOperation field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this lastOperation field.
/** | ||
* Defines type of cache need to be used by OM RocksDB tables. | ||
*/ | ||
enum CACHETYPE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One feedback from Anu was to not differentiate between full and partial cache while initializing so caller does not have to make a choice.
Instead we try to limit cache size by configuration and load as many entries as possible. If we cannot load all entries then we could set a flag that it is a partial cache. The implementation will certainly be a bit more complex.
I don't have a strong opinion either way. I am +1 with current approach also.
On second thought I am leaning towards removing this distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this cache is used for correctness reasons for further operations validation and read operation correctness. (As when double buffer comes in to the place we don't commit to DB immediately)
And also we are cleaning up the cache after flush, so I am not sure what is the use of loading entries. This cache is mainly for correctness reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay let's skip the full cache then for now. We can add it later as an optimization. All caches will be partial, and if an entry is not found in the cache we always go to RocksDB. It should simplify the patch a bit. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for get() actually takes care of that, I feel for bucket/volume table having full cache will be useful as this is used for all the request validation's. So, if we have in cache this will be a quick operation instead of disk access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed FullCache, and have the only implementation for TableCache i.e PartialCache.
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/FullTableCache.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public void cleanup(long epoch) { | ||
// Do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you throw NotImplementedYetException here to make it clear that the implementation is pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for full table cache, we never want to cleanup, it is a do nothing operation.
Added to the java doc.
@@ -245,42 +247,50 @@ protected DBStoreBuilder addOMTablesAndCodecs(DBStoreBuilder builder) { | |||
*/ | |||
protected void initializeOmTables() throws IOException { | |||
userTable = | |||
this.store.getTable(USER_TABLE, String.class, VolumeList.class); | |||
this.store.getTable(USER_TABLE, String.class, VolumeList.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one argument for why we should not have partial/full distinction. 🙂 It makes initialization of the tables a little clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for some tables where SCM does not need to have cache, so that is the reason for introducing CacheType.
And also for some tables like deletedtable, where this is used by background thread inOM, these does not need cache.
And as for volume/bucket table we plan to have entire table info, and cleanup for them is no-op, even after flush to db, where as for partial cache the cache entries will be cleaned up. This is the only difference between full table cache vs partial table cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Now caller's no need to specify CacheType.
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/TypedTable.java
Show resolved
Hide resolved
if (cacheValue.getLastOperation() != CacheValue.OperationType.DELETED) { | ||
return cacheValue.getValue(); | ||
} else { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making delete operation a special case, in case of delete we can just push a null value into the cache. So it will automatically return null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can pass null value to CacheValue actual value. This will work.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Also instead of null we can pass in Optional.absent(), same thing but makes it very clear that the value can be missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thank You @arp7 for the review. |
💔 -1 overall
This message was automatically generated. |
Thank You @arp7 for offline discussion. |
💔 -1 overall
This message was automatically generated. |
Seems like cache.put is never called except testcases, cache will always be empty. |
This jira add's Table Cache, this is not integrated to OM Code, in further jira's it will be integrated, and we shall call this API put. Jira description also has the same information. |
💔 -1 overall
This message was automatically generated. |
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/TypedTable.java
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/TypedTable.java
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/TypedTable.java
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/PartialTableCache.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/utils/db/cache/TableCache.java
Show resolved
Hide resolved
Thank You @arp7 for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Probably should wait for comments from @arp7 and @hanishakoneru
@@ -44,6 +44,7 @@ | |||
*/ | |||
Table<byte[], byte[]> getTable(String name) throws IOException; | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Space only change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committing this for now since Anu +1'ed.
* If the cacheKey already exists, it will override the entry. | ||
* @param cacheKey | ||
* @param cacheValue | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx makes sense.
+1 LGTM. |
As per subject, this is to keep naming of local tables consistent with other table types. Author: Wei Song <wsong@linkedin.com> Reviewers: Prateek Maheshwari <pmaheshwari@linkedin.com> Closes apache#798 from weisong44/SAMZA-1980 and squashes the following commits: 51c17ff4 [Wei Song] Renamed LocalStoreBackedTable to LocalTable 9c12120 [Wei Song] Merge remote-tracking branch 'upstream/master' 89bfc14 [Wei Song] Merge remote-tracking branch 'upstream/master' a53e562 [Wei Song] SAMZA-1964 Make getTableSpec() in RemoteTableDescriptor reentrant c9e8bf7 [Wei Song] Merge remote-tracking branch 'upstream/master' 7c777fe [Wei Song] Merge remote-tracking branch 'upstream/master' a06e8ec [Wei Song] Merge remote-tracking branch 'upstream/master' 2c679c3 [Wei Song] Merge remote-tracking branch 'upstream/master' a56c28d [Wei Song] Merge remote-tracking branch 'upstream/master' 097958c [Wei Song] Merge remote-tracking branch 'upstream/master' 05822f0 [Wei Song] Merge remote-tracking branch 'upstream/master' f748050 [Wei Song] Merge remote-tracking branch 'upstream/master' 7706ab1 [Wei Song] Merge remote-tracking branch 'upstream/master' f5731b1 [Wei Song] Merge remote-tracking branch 'upstream/master' 1e5de45 [Wei Song] Merge remote-tracking branch 'upstream/master' c85604e [Wei Song] Merge remote-tracking branch 'upstream/master' 242d844 [Wei Song] Merge remote-tracking branch 'upstream/master' ec7d840 [Wei Song] Merge remote-tracking branch 'upstream/master' e19b4dc [Wei Song] Merge remote-tracking branch 'upstream/master' 8ee7844 [Wei Song] Merge remote-tracking branch 'upstream/master' 1c6a2ea [Wei Song] Merge remote-tracking branch 'upstream/master' a6c94ad [Wei Song] Merge remote-tracking branch 'upstream/master' 41299b5 [Wei Song] Merge remote-tracking branch 'upstream/master' 239a095 [Wei Song] Merge remote-tracking branch 'upstream/master' eca0020 [Wei Song] Merge remote-tracking branch 'upstream/master' 5156239 [Wei Song] Merge remote-tracking branch 'upstream/master' de708f5 [Wei Song] Merge remote-tracking branch 'upstream/master' df2f8d7 [Wei Song] Merge remote-tracking branch 'upstream/master' f28b491 [Wei Song] Merge remote-tracking branch 'upstream/master' 4782c61 [Wei Song] Merge remote-tracking branch 'upstream/master' 0440f75 [Wei Song] Merge remote-tracking branch 'upstream/master' aae0f38 [Wei Song] Merge remote-tracking branch 'upstream/master' a15a7c9 [Wei Song] Merge remote-tracking branch 'upstream/master' 5cbf9af [Wei Song] Merge remote-tracking branch 'upstream/master' 3f7ed71 [Wei Song] Added self to committer list
No description provided.