New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-2380. Use the Table.isExist() API instead of get() API while checking for presence of key. #102
Conversation
…r pre-existing files in the directory path.
...con/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/ContainerDBServiceProviderImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. One minor comment.
And also can you confirm if UT failure is related, as I don't see UT's failing in other CI runs.
Changed the title of the JIRA instead :) |
Ready to go, if UT failures are not related. |
/retest |
@@ -60,15 +60,15 @@ public static OMDirectoryResult verifyFilesInPath( | |||
String dbDirKeyName = omMetadataManager.getOzoneDirKey(volumeName, | |||
bucketName, pathName); | |||
|
|||
if (omMetadataManager.getKeyTable().get(dbKeyName) != null) { | |||
if (omMetadataManager.getKeyTable().isExist(dbKeyName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch looks correct, but I have a question about the earlier patch that went in to support this.
The documentation of keyMayExist -- says this.
public boolean keyMayExist(ColumnFamilyHandle columnFamilyHandle,
byte[] key,
java.lang.StringBuilder value)
If the key definitely does not exist in the database, then this method returns false, else true. This check is potentially lighter-weight than invoking DB::Get(). One way to make this lighter weight is to avoid doing any IOs.
Parameters:
columnFamilyHandle - ColumnFamilyHandle instance
key - byte array of a key to search for
value - StringBuilder instance which is a out parameter if a value is found in block-cache.
There is a value parameter in the call, which is documented as -- out parameter if the value is found in the block cache.
The objective of calling into this function, I suppose to leverage that -- Mind you the keyMayExist is not bloom filter lookup, it says If the value is in memory already you get access to it.
However, if you read the code -- it looks like we ignore the possiblity of gettting the value back from the block cache.
Here is the code that was committed via
"HDDS-1691 : RDBTable#isExist should use Rocksdb#keyMayExist"
// RocksDB#keyMayExist // If the key definitely does not exist in the database, then this // method returns false, else true. return db.keyMayExist(handle, key, new StringBuilder()) && db.get(handle, key) != null;
So how does this call eliminate the IO cost, since the new StringBuilder is completely ignored ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @anuengineer. This is a valid observation. I will fix the working of the isExist method in #101 where we are modifying the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks @anuengineer for flagging it. Yes in true case if we get value from block cache we can use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect, let me go ahead and commit this patch, since you will fix that issue in the #101 pull request.
I have committed this patch to the master branch. @bharatviswa504 Thanks for the review. @avijayanhwx Thank you for the contribution. |
What changes were proposed in this pull request?
Currently, when OM creates a file/directory, it checks the absence of all prefix paths of the key in its RocksDB. Since we don't care about the deserialization of the actual value from the DB, we should use the isExist API added in org.apache.hadoop.hdds.utils.db.Table which internally uses the more performant keyMayExist API of RocksDB. More documentation on the keyMayExist API can be found here This API was already used in the old OM create file flow. This patch adds it to the new OM code flow that uses Double buffering.
There is a similar check in Recon container DB. Refactored that as well.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2380
How was this patch tested?
Manually tested.