-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] add endkey to scan API #568
Comments
Any ideas on how we work around the hashed keys issue? |
I can't think of a way to find a reliable endkey if the keys have been hashed, that's just the nature of hashing them, they scatter everywhere. The best idea I can think of is to check the properties. If we see One step further would be to check that configuration as well as the requested client and send the warning. But this would require upkeep when a new binding starts to take advantage of endkey. I guess one more idea would be to check the configuration in CoreWorkload, and just send null as the endkey when the endkey can't be computed so that downstream clients can't use it. This would force binding developers to account for this scenario. Of course this would need to be documented in the developers guide. |
Another negative consideration for this idea is even with I think this may mean this is a dead idea and that startkey and recordcount are the best we can do to formulate a scan operation in the most cases. If bindings can't completely fit in that mold then a performance warning, or simply not implementing scan should suffice. |
I think that part's not so bad. Scan by definition presumes the primary key is sorted, I think. Wether that happens in the way the datastore happens to store things isn't our concern. Bindings that have a problem with that already bail on implementing scan, so this doesn't make that any worse. |
This and #402 are after the same thing. We should consolidate. |
Currently the scan API gives the bindings a startkey and a recordcount. Some databases could benefit by having an endkey instead of the recordcount. OrientDB for sure could be more efficient in scans with an endkey, I think HBase could do things in a safer way as well. There may be more.
My proposal would be to just add endkey into the scan API alongside recordcount so that bindings could use one or the other. Currently bindings have no easy way to decipher the endkey on their own.
doTransactionScan has all the information to easily provide the endkey without much more work: https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java#L744. Then you would have to update all the bindings, it should be a change that doesn't effect much unless the binding chooses to use the endkey.
A major consideration:
insertorder=hashed
which is the default, you can't provide a reliable endkey that matched recordcount because the keys are no longer lexicographically ordered. This may blow the whole idea out of the water, we would definitely need to put a large warning of this case.Are there other considerations I missed?
The text was updated successfully, but these errors were encountered: