Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-1499. OzoneManager Cache. #798

Merged
merged 8 commits into from May 20, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -24,6 +24,7 @@
import java.util.ArrayList;

import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.utils.db.cache.TableCache;

/**
* The DBStore interface provides the ability to create Tables, which store
Expand All @@ -44,17 +45,20 @@ public interface DBStore extends AutoCloseable {
*/
Table<byte[], byte[]> getTable(String name) throws IOException;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Space only change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing this for now since Anu +1'ed.

/**
* Gets an existing TableStore with implicit key/value conversion.
*
* @param name - Name of the TableStore to get
* @param keyType
* @param valueType
* @param cachetype - Type of cache need to be used for this table.
* @return - TableStore.
* @throws IOException on Failure
*/
<KEY, VALUE> Table<KEY, VALUE> getTable(String name,
Class<KEY> keyType, Class<VALUE> valueType) throws IOException;
Class<KEY> keyType, Class<VALUE> valueType,
TableCache.CACHETYPE cachetype) throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an external visible TableCache.CACHETYPE ? shouldn't this be an implementation detail of the Tables that have Cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this because for a few tables like bucket and volume table plan is to maintain full table information, for other tables we maintain a partial cache, whereas for few tables we don't want to maintain cache at all. (This is a common interface for all tables in Ozone SCM/OM. So, having this option will help to know which kind of cache need to be used for the table.)

As these are frequently used for validation of almost every operation in OM. So, this might improve validation like bucket/volume exists or not checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the cache type.


/**
* Lists the Known list of Tables in a DB.
Expand Down
Expand Up @@ -38,6 +38,7 @@
import org.apache.hadoop.utils.RocksDBStoreMBean;

import com.google.common.base.Preconditions;
import org.apache.hadoop.utils.db.cache.TableCache;
import org.apache.ratis.thirdparty.com.google.common.annotations.VisibleForTesting;
import org.rocksdb.ColumnFamilyDescriptor;
import org.rocksdb.ColumnFamilyHandle;
Expand Down Expand Up @@ -258,9 +259,10 @@ public Table<byte[], byte[]> getTable(String name) throws IOException {

@Override
public <KEY, VALUE> Table<KEY, VALUE> getTable(String name,
Class<KEY> keyType, Class<VALUE> valueType) throws IOException {
Class<KEY> keyType, Class<VALUE> valueType,
TableCache.CACHETYPE cachetype) throws IOException {
return new TypedTable<KEY, VALUE>(getTable(name), codecRegistry, keyType,
valueType);
valueType, cachetype);
}

@Override
Expand Down
Expand Up @@ -21,8 +21,10 @@

import java.io.IOException;

import org.apache.commons.lang3.NotImplementedException;
import org.apache.hadoop.classification.InterfaceStability;

import org.apache.hadoop.utils.db.cache.CacheKey;
import org.apache.hadoop.utils.db.cache.CacheValue;
/**
* Interface for key-value store that stores ozone metadata. Ozone metadata is
* stored as key value pairs, both key and value are arbitrary byte arrays. Each
Expand Down Expand Up @@ -60,6 +62,7 @@ void putWithBatch(BatchOperation batch, KEY key, VALUE value)
* Returns the value mapped to the given key in byte array or returns null
* if the key is not found.
*
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RDBTable implementation of Table does not check the cache. We should probably move this statement to TypedTable which implements the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param key metadata key
* @return value in byte array or null if the key is not found.
* @throws IOException on Failure
Expand Down Expand Up @@ -97,6 +100,28 @@ void putWithBatch(BatchOperation batch, KEY key, VALUE value)
*/
String getName() throws IOException;

/**
* Add entry to the table cache.
*
* If the cacheKey already exists, it will override the entry.
* @param cacheKey
* @param cacheValue
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I was really hoping that the fact that there is a cache is not visible to the layer that is reading and writing.
Is there a reason why that should be exposed to calling applications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once after the operation is executed in applyTransaction just before releasing the lock and sending a response to the client we need to add the response into cache. So that next subsequent read/write requests validation can be done with cache/db data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx makes sense.

default void addCacheEntry(CacheKey<KEY> cacheKey,
CacheValue<VALUE> cacheValue) {
throw new NotImplementedException("addCacheEntry is not implemented");
}

/**
* Removes all the entries from the table cache which are having epoch value
* less
* than or equal to specified epoch value.
* @param epoch
*/
default void cleanupCache(long epoch) {
throw new NotImplementedException("cleanupCache is not implemented");
}

/**
* Class used to represent the key and value pair of a db entry.
*/
Expand Down
Expand Up @@ -20,6 +20,13 @@

import java.io.IOException;

import com.google.common.annotations.VisibleForTesting;
import org.apache.hadoop.utils.db.cache.CacheKey;
import org.apache.hadoop.utils.db.cache.CacheValue;
import org.apache.hadoop.utils.db.cache.FullTableCache;
import org.apache.hadoop.utils.db.cache.PartialTableCache;
import org.apache.hadoop.utils.db.cache.TableCache;

/**
* Strongly typed table implementation.
* <p>
Expand All @@ -31,22 +38,40 @@
*/
public class TypedTable<KEY, VALUE> implements Table<KEY, VALUE> {

private Table<byte[], byte[]> rawTable;
private final Table<byte[], byte[]> rawTable;

private final CodecRegistry codecRegistry;

private CodecRegistry codecRegistry;
private final Class<KEY> keyType;
arp7 marked this conversation as resolved.
Show resolved Hide resolved

private Class<KEY> keyType;
private final Class<VALUE> valueType;

private Class<VALUE> valueType;
private final TableCache<CacheKey<KEY>, CacheValue<VALUE>> cache;

public TypedTable(
Table<byte[], byte[]> rawTable,
CodecRegistry codecRegistry, Class<KEY> keyType,
Class<VALUE> valueType) {
this(rawTable, codecRegistry, keyType, valueType,
null);
}


public TypedTable(
Table<byte[], byte[]> rawTable,
CodecRegistry codecRegistry, Class<KEY> keyType,
Class<VALUE> valueType, TableCache.CACHETYPE cachetype) {
this.rawTable = rawTable;
this.codecRegistry = codecRegistry;
this.keyType = keyType;
this.valueType = valueType;
if (cachetype == TableCache.CACHETYPE.FULLCACHE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is impossible for the user to tell you apriori if they want a full cache or partial cache. When you start a cluster you always want a full cache. We should get a cache size -- or get a percentage of memory from the OM cache size and use that if needed. Or for time being rely on the RocksDB doing the right thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Anu for the comment, removed the cache type.

cache = new FullTableCache<>();
} else if (cachetype == TableCache.CACHETYPE.PARTIALCACHE) {
cache = new PartialTableCache<>();
} else {
cache = null;
}
}

@Override
Expand All @@ -69,8 +94,40 @@ public boolean isEmpty() throws IOException {
return rawTable.isEmpty();
}

/**
* Returns the value mapped to the given key in byte array or returns null
arp7 marked this conversation as resolved.
Show resolved Hide resolved
* if the key is not found.
*
* First it will check from cache, if it has entry return the value
* otherwise, get from the RocksDB table.
*
* @param key metadata key
* @return VALUE
* @throws IOException
*/
@Override
public VALUE get(KEY key) throws IOException {
// Here the metadata lock will guarantee that cache is not updated for same
// key during get key.
if (cache != null) {
CacheValue<VALUE> cacheValue = cache.get(new CacheKey<>(key));
if (cacheValue == null) {
return getFromTable(key);
} else {
// Doing this because, if the Cache Value Last operation is deleted
// means it will eventually removed from DB. So, we should return null.
if (cacheValue.getLastOperation() != CacheValue.OperationType.DELETED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even cache the deleted Operations? Delete is not in the performance critical path at all. If you can instruct the system to make the full commit or flush the buffer when there is a delete op you don't need to keep this extra state in the cache. yes, repeated deletes will call state machine call back. When do we actually flush / clear this entry?

return cacheValue.getValue();
} else {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making delete operation a special case, in case of delete we can just push a null value into the cache. So it will automatically return null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can pass null value to CacheValue actual value. This will work.
Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Also instead of null we can pass in Optional.absent(), same thing but makes it very clear that the value can be missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}
} else {
return getFromTable(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you need this get again ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tables where the cache is disabled, we need to do as before just read from DB and return data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood the comment, updated the code to remove getTable in multiple places.

}
}

private VALUE getFromTable(KEY key) throws IOException {
byte[] keyBytes = codecRegistry.asRawData(key);
byte[] valueBytes = rawTable.get(keyBytes);
return codecRegistry.asObject(valueBytes, valueType);
Expand Down Expand Up @@ -106,6 +163,40 @@ public void close() throws Exception {

}

@Override
public void addCacheEntry(CacheKey<KEY> cacheKey,
CacheValue<VALUE> cacheValue) {
// This will override the entry if there is already entry for this key.
cache.put(cacheKey, cacheValue);
}


@Override
public void cleanupCache(long epoch) {
arp7 marked this conversation as resolved.
Show resolved Hide resolved
cache.cleanup(epoch);
}

@VisibleForTesting
TableCache<CacheKey<KEY>, CacheValue<VALUE>> getCache() {
return cache;
}

public Table<byte[], byte[]> getRawTable() {
return rawTable;
}

public CodecRegistry getCodecRegistry() {
return codecRegistry;
}

public Class<KEY> getKeyType() {
return keyType;
}

public Class<VALUE> getValueType() {
return valueType;
}

/**
* Key value implementation for strongly typed tables.
*/
Expand Down
@@ -0,0 +1,56 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.utils.db.cache;

import java.util.Objects;

/**
* CacheKey for the RocksDB table.
* @param <KEY>
*/
public class CacheKey<KEY> {

private final KEY key;

public CacheKey(KEY key) {
Objects.requireNonNull(key, "Key Should not be null in CacheKey");
this.key = key;
}

public KEY getKey() {
return key;
}

@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
CacheKey<?> cacheKey = (CacheKey<?>) o;
return Objects.equals(key, cacheKey.key);
}

@Override
public int hashCode() {
return Objects.hash(key);
}
}
@@ -0,0 +1,63 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.utils.db.cache;

import java.util.Objects;

/**
* CacheValue for the RocksDB Table.
* @param <VALUE>
*/
public class CacheValue<VALUE> {

private VALUE value;
private OperationType lastOperation;
// This value is used for evict entries from cache.
// This value is set with ratis transaction context log entry index.
private long epoch;

public CacheValue(VALUE value, OperationType lastOperation, long epoch) {
Objects.requireNonNull(value, "Value Should not be null in CacheValue");
this.value = value;
this.lastOperation = lastOperation;
this.epoch = epoch;
}

public VALUE getValue() {
return value;
}

public OperationType getLastOperation() {
return lastOperation;
}

public long getEpoch() {
return epoch;
}

/**
* Last happened Operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bharat, what if we support further operation types in future? I was thinking whether we really need this lastOperation field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this lastOperation field.

*/
public enum OperationType {
CREATED,
UPDATED,
DELETED
}

}