HDDS-6428. [Merge rocksdb in datanode] Add prefix iterator support to RDBTable.#3176
HDDS-6428. [Merge rocksdb in datanode] Add prefix iterator support to RDBTable.#3176nandakumar131 merged 2 commits intoapache:HDDS-3630from
Conversation
07348c4 to
7109588
Compare
|
@guihecheng can we please add an example of entries in the table which we expect ? And what is the need for fixed prefix length support, that will help in reviewing this further. |
|
@mukul1987 Oh, sure, I'll add more descriptions about the sample prefixed keys after rocksdb-merge and explain the need to have fixed length prefixes soon. |
|
@mukul1987 descriptions updated above, thanks~ |
nandakumar131
left a comment
There was a problem hiding this comment.
Overall the patch looks good @guihecheng. Thanks for working on this.
There was a problem hiding this comment.
Can we avoid Arrays.copyOf here as the value is not modified/passed outside. We can implement our own comparison logic instead.
| (prefix == null || Arrays.equals( | |
| Arrays.copyOf(rocksDBIterator.key(), prefix.length), prefix)); | |
| private static boolean startsWith(byte[] prefix, byte[] value) { | |
| if (prefix==null) | |
| return true; | |
| if (value==null) | |
| return false; | |
| int length = prefix.length; | |
| if (value.length < length) | |
| return false; | |
| for (int i=0; i<length; i++) | |
| if (prefix[i] != value[i]) | |
| return false; | |
| return true; | |
| } | |
There was a problem hiding this comment.
Ah, that's a good idea to prevent copy, thanks~
There was a problem hiding this comment.
We don't need to have prefixLength associated with RDBTable. Prefix is something which should be passed as an argument. It is not applicable at table level.
For getRangeKVs calls we can use MetadataKeyFilters.MetadataKeyFilter for passing prefix to be matched.
If we need the Iterator optimized prefix filter to be used, we can add a new getSequentialRangeKVs method which also takes prefix as an argument. (Since we cannot use MetadataKeyFilters.MetadataKeyFilter as prefix filter for Iterator creation)
There was a problem hiding this comment.
Yes, it is better to prevent a prefixLength set at the table level.
Here I tried to have unified interface for different schemas, so I'll try to use the MetadataKeyFilters.MetadataKeyFilter to pass prefix but not introducing new method dedicated for schema v3.
There was a problem hiding this comment.
OK...I found it not so easy to pass then extract prefix using the MetadataKeyFilters.MetadataKeyFilter, because the KeyFilter interface doesn't offer convenient interface for that and hard to refactor. And also we haven't introduced the fixed-length prefix stuff for schemaV3 keys and we can't assume the char encoding in general purpose classes like MetadataKeyFilters.MetadataKeyFilter or RDBTable.
I'm trying to add a new parameter prefix to the getRangeKVs & getSequentialRangeKVs interfaces.
I feel that it is painful to do so, but I would like to have abstracted interfaces for all schemas instead of if-else checks in each call place.
There was a problem hiding this comment.
We should not set prefix length at Table level.
Pass prefix as a new parameter to getRangeKVs.
|
Hi @nandakumar131 , I'm sorry that I have to do a force push since I did a rebase on HDDS-3630, because the new changes will conflict with the merged HDDS-6404. Here I did remove the |
nandakumar131
left a comment
There was a problem hiding this comment.
Thanks @guihecheng for updating the PR.
What changes were proposed in this pull request?
Add prefix iterator support to RDBTable.
A sample from the
metadatacolumn family of the current per-container rocksdb instance as follows:"#BCSID" -> 9
"#BLOCKCOUNT" -> 1
"#BYTESUSED" -> 10240000
After rocksdb-merge, it will be like:
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004#BCSID" -> 9
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004#BLOCKCOUNT" -> 1
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0004#BYTESUSED" -> 10240000
Those chars as a prefix are the containerID encoded in IOS-8859-1(An extension of the ASCII charset).
After rocksdb-merge, every container on the disk will get all its metadata KV pairs in a per-disk rocksdb instance.
When we want to access the metadata KV of a specific container, we have to do a
seekoperation in the column family to the right postion.Here we use a fixed-length prefix(an encoded containerID) so as to utilize the
Prefix Seekfeature of rocksdb to speed up the seek operation, FYI: https://github.com/facebook/rocksdb/wiki/Prefix-Seek.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6428
How was this patch tested?
new ut added.