New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-1499. OzoneManager Cache. #798
Changes from 3 commits
3479a57
2aa8432
b6b5595
7d25984
a08d166
7c5c481
a3ccf35
7349675
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
import java.util.ArrayList; | ||
|
||
import org.apache.hadoop.classification.InterfaceStability; | ||
import org.apache.hadoop.utils.db.cache.TableCache; | ||
|
||
/** | ||
* The DBStore interface provides the ability to create Tables, which store | ||
|
@@ -44,17 +45,20 @@ public interface DBStore extends AutoCloseable { | |
*/ | ||
Table<byte[], byte[]> getTable(String name) throws IOException; | ||
|
||
|
||
/** | ||
* Gets an existing TableStore with implicit key/value conversion. | ||
* | ||
* @param name - Name of the TableStore to get | ||
* @param keyType | ||
* @param valueType | ||
* @param cachetype - Type of cache need to be used for this table. | ||
* @return - TableStore. | ||
* @throws IOException on Failure | ||
*/ | ||
<KEY, VALUE> Table<KEY, VALUE> getTable(String name, | ||
Class<KEY> keyType, Class<VALUE> valueType) throws IOException; | ||
Class<KEY> keyType, Class<VALUE> valueType, | ||
TableCache.CACHETYPE cachetype) throws IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need an external visible TableCache.CACHETYPE ? shouldn't this be an implementation detail of the Tables that have Cache? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this because for a few tables like bucket and volume table plan is to maintain full table information, for other tables we maintain a partial cache, whereas for few tables we don't want to maintain cache at all. (This is a common interface for all tables in Ozone SCM/OM. So, having this option will help to know which kind of cache need to be used for the table.) As these are frequently used for validation of almost every operation in OM. So, this might improve validation like bucket/volume exists or not checks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the cache type. |
||
|
||
/** | ||
* Lists the Known list of Tables in a DB. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,8 +21,10 @@ | |
|
||
import java.io.IOException; | ||
|
||
import org.apache.commons.lang3.NotImplementedException; | ||
import org.apache.hadoop.classification.InterfaceStability; | ||
|
||
import org.apache.hadoop.utils.db.cache.CacheKey; | ||
import org.apache.hadoop.utils.db.cache.CacheValue; | ||
/** | ||
* Interface for key-value store that stores ozone metadata. Ozone metadata is | ||
* stored as key value pairs, both key and value are arbitrary byte arrays. Each | ||
|
@@ -60,6 +62,7 @@ void putWithBatch(BatchOperation batch, KEY key, VALUE value) | |
* Returns the value mapped to the given key in byte array or returns null | ||
* if the key is not found. | ||
* | ||
* | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The RDBTable implementation of Table does not check the cache. We should probably move this statement to TypedTable which implements the cache. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
* @param key metadata key | ||
* @return value in byte array or null if the key is not found. | ||
* @throws IOException on Failure | ||
|
@@ -97,6 +100,28 @@ void putWithBatch(BatchOperation batch, KEY key, VALUE value) | |
*/ | ||
String getName() throws IOException; | ||
|
||
/** | ||
* Add entry to the table cache. | ||
* | ||
* If the cacheKey already exists, it will override the entry. | ||
* @param cacheKey | ||
* @param cacheValue | ||
*/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, I was really hoping that the fact that there is a cache is not visible to the layer that is reading and writing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once after the operation is executed in applyTransaction just before releasing the lock and sending a response to the client we need to add the response into cache. So that next subsequent read/write requests validation can be done with cache/db data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thx makes sense. |
||
default void addCacheEntry(CacheKey<KEY> cacheKey, | ||
CacheValue<VALUE> cacheValue) { | ||
throw new NotImplementedException("addCacheEntry is not implemented"); | ||
} | ||
|
||
/** | ||
* Removes all the entries from the table cache which are having epoch value | ||
* less | ||
* than or equal to specified epoch value. | ||
* @param epoch | ||
*/ | ||
default void cleanupCache(long epoch) { | ||
throw new NotImplementedException("cleanupCache is not implemented"); | ||
} | ||
|
||
/** | ||
* Class used to represent the key and value pair of a db entry. | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,13 @@ | |
|
||
import java.io.IOException; | ||
|
||
import com.google.common.annotations.VisibleForTesting; | ||
import org.apache.hadoop.utils.db.cache.CacheKey; | ||
import org.apache.hadoop.utils.db.cache.CacheValue; | ||
import org.apache.hadoop.utils.db.cache.FullTableCache; | ||
import org.apache.hadoop.utils.db.cache.PartialTableCache; | ||
import org.apache.hadoop.utils.db.cache.TableCache; | ||
|
||
/** | ||
* Strongly typed table implementation. | ||
* <p> | ||
|
@@ -31,22 +38,40 @@ | |
*/ | ||
public class TypedTable<KEY, VALUE> implements Table<KEY, VALUE> { | ||
|
||
private Table<byte[], byte[]> rawTable; | ||
private final Table<byte[], byte[]> rawTable; | ||
|
||
private final CodecRegistry codecRegistry; | ||
|
||
private CodecRegistry codecRegistry; | ||
private final Class<KEY> keyType; | ||
arp7 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
private Class<KEY> keyType; | ||
private final Class<VALUE> valueType; | ||
|
||
private Class<VALUE> valueType; | ||
private final TableCache<CacheKey<KEY>, CacheValue<VALUE>> cache; | ||
|
||
public TypedTable( | ||
Table<byte[], byte[]> rawTable, | ||
CodecRegistry codecRegistry, Class<KEY> keyType, | ||
Class<VALUE> valueType) { | ||
this(rawTable, codecRegistry, keyType, valueType, | ||
null); | ||
} | ||
|
||
|
||
public TypedTable( | ||
Table<byte[], byte[]> rawTable, | ||
CodecRegistry codecRegistry, Class<KEY> keyType, | ||
Class<VALUE> valueType, TableCache.CACHETYPE cachetype) { | ||
this.rawTable = rawTable; | ||
this.codecRegistry = codecRegistry; | ||
this.keyType = keyType; | ||
this.valueType = valueType; | ||
if (cachetype == TableCache.CACHETYPE.FULLCACHE) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is impossible for the user to tell you apriori if they want a full cache or partial cache. When you start a cluster you always want a full cache. We should get a cache size -- or get a percentage of memory from the OM cache size and use that if needed. Or for time being rely on the RocksDB doing the right thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, Anu for the comment, removed the cache type. |
||
cache = new FullTableCache<>(); | ||
} else if (cachetype == TableCache.CACHETYPE.PARTIALCACHE) { | ||
cache = new PartialTableCache<>(); | ||
} else { | ||
cache = null; | ||
} | ||
} | ||
|
||
@Override | ||
|
@@ -69,8 +94,40 @@ public boolean isEmpty() throws IOException { | |
return rawTable.isEmpty(); | ||
} | ||
|
||
/** | ||
* Returns the value mapped to the given key in byte array or returns null | ||
arp7 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* if the key is not found. | ||
* | ||
* First it will check from cache, if it has entry return the value | ||
* otherwise, get from the RocksDB table. | ||
* | ||
* @param key metadata key | ||
* @return VALUE | ||
* @throws IOException | ||
*/ | ||
@Override | ||
public VALUE get(KEY key) throws IOException { | ||
// Here the metadata lock will guarantee that cache is not updated for same | ||
// key during get key. | ||
if (cache != null) { | ||
CacheValue<VALUE> cacheValue = cache.get(new CacheKey<>(key)); | ||
if (cacheValue == null) { | ||
return getFromTable(key); | ||
} else { | ||
// Doing this because, if the Cache Value Last operation is deleted | ||
// means it will eventually removed from DB. So, we should return null. | ||
if (cacheValue.getLastOperation() != CacheValue.OperationType.DELETED) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we even cache the deleted Operations? Delete is not in the performance critical path at all. If you can instruct the system to make the full commit or flush the buffer when there is a delete op you don't need to keep this extra state in the cache. yes, repeated deletes will call state machine call back. When do we actually flush / clear this entry? |
||
return cacheValue.getValue(); | ||
} else { | ||
return null; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of making delete operation a special case, in case of delete we can just push a null value into the cache. So it will automatically return null. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we can pass null value to CacheValue actual value. This will work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Also instead of null we can pass in Optional.absent(), same thing but makes it very clear that the value can be missing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
} | ||
} else { | ||
return getFromTable(key); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if you need this get again ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For tables where the cache is disabled, we need to do as before just read from DB and return data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Understood the comment, updated the code to remove getTable in multiple places. |
||
} | ||
} | ||
|
||
private VALUE getFromTable(KEY key) throws IOException { | ||
byte[] keyBytes = codecRegistry.asRawData(key); | ||
byte[] valueBytes = rawTable.get(keyBytes); | ||
return codecRegistry.asObject(valueBytes, valueType); | ||
|
@@ -106,6 +163,40 @@ public void close() throws Exception { | |
|
||
} | ||
|
||
@Override | ||
public void addCacheEntry(CacheKey<KEY> cacheKey, | ||
CacheValue<VALUE> cacheValue) { | ||
// This will override the entry if there is already entry for this key. | ||
cache.put(cacheKey, cacheValue); | ||
} | ||
|
||
|
||
@Override | ||
public void cleanupCache(long epoch) { | ||
arp7 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
cache.cleanup(epoch); | ||
} | ||
|
||
@VisibleForTesting | ||
TableCache<CacheKey<KEY>, CacheValue<VALUE>> getCache() { | ||
return cache; | ||
} | ||
|
||
public Table<byte[], byte[]> getRawTable() { | ||
return rawTable; | ||
} | ||
|
||
public CodecRegistry getCodecRegistry() { | ||
return codecRegistry; | ||
} | ||
|
||
public Class<KEY> getKeyType() { | ||
return keyType; | ||
} | ||
|
||
public Class<VALUE> getValueType() { | ||
return valueType; | ||
} | ||
|
||
/** | ||
* Key value implementation for strongly typed tables. | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.hadoop.utils.db.cache; | ||
|
||
import java.util.Objects; | ||
|
||
/** | ||
* CacheKey for the RocksDB table. | ||
* @param <KEY> | ||
*/ | ||
public class CacheKey<KEY> { | ||
|
||
private final KEY key; | ||
|
||
public CacheKey(KEY key) { | ||
Objects.requireNonNull(key, "Key Should not be null in CacheKey"); | ||
this.key = key; | ||
} | ||
|
||
public KEY getKey() { | ||
return key; | ||
} | ||
|
||
@Override | ||
public boolean equals(Object o) { | ||
if (this == o) { | ||
return true; | ||
} | ||
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
CacheKey<?> cacheKey = (CacheKey<?>) o; | ||
return Objects.equals(key, cacheKey.key); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(key); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.hadoop.utils.db.cache; | ||
|
||
import java.util.Objects; | ||
|
||
/** | ||
* CacheValue for the RocksDB Table. | ||
* @param <VALUE> | ||
*/ | ||
public class CacheValue<VALUE> { | ||
|
||
private VALUE value; | ||
private OperationType lastOperation; | ||
// This value is used for evict entries from cache. | ||
// This value is set with ratis transaction context log entry index. | ||
private long epoch; | ||
|
||
public CacheValue(VALUE value, OperationType lastOperation, long epoch) { | ||
Objects.requireNonNull(value, "Value Should not be null in CacheValue"); | ||
this.value = value; | ||
this.lastOperation = lastOperation; | ||
this.epoch = epoch; | ||
} | ||
|
||
public VALUE getValue() { | ||
return value; | ||
} | ||
|
||
public OperationType getLastOperation() { | ||
return lastOperation; | ||
} | ||
|
||
public long getEpoch() { | ||
return epoch; | ||
} | ||
|
||
/** | ||
* Last happened Operation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bharat, what if we support further operation types in future? I was thinking whether we really need this lastOperation field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed this lastOperation field. |
||
*/ | ||
public enum OperationType { | ||
CREATED, | ||
UPDATED, | ||
DELETED | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Space only change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committing this for now since Anu +1'ed.