Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] add endkey to scan API #568

Open
kruthar opened this issue Dec 31, 2015 · 5 comments
Open

[core] add endkey to scan API #568

kruthar opened this issue Dec 31, 2015 · 5 comments

Comments

@kruthar
Copy link
Collaborator

kruthar commented Dec 31, 2015

Currently the scan API gives the bindings a startkey and a recordcount. Some databases could benefit by having an endkey instead of the recordcount. OrientDB for sure could be more efficient in scans with an endkey, I think HBase could do things in a safer way as well. There may be more.

My proposal would be to just add endkey into the scan API alongside recordcount so that bindings could use one or the other. Currently bindings have no easy way to decipher the endkey on their own.

doTransactionScan has all the information to easily provide the endkey without much more work: https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java#L744. Then you would have to update all the bindings, it should be a change that doesn't effect much unless the binding chooses to use the endkey.

A major consideration:

  • With insertorder=hashed which is the default, you can't provide a reliable endkey that matched recordcount because the keys are no longer lexicographically ordered. This may blow the whole idea out of the water, we would definitely need to put a large warning of this case.

Are there other considerations I missed?

@busbey
Copy link
Collaborator

busbey commented Dec 31, 2015

Any ideas on how we work around the hashed keys issue?

@kruthar
Copy link
Collaborator Author

kruthar commented Dec 31, 2015

I can't think of a way to find a reliable endkey if the keys have been hashed, that's just the nature of hashing them, they scatter everywhere.

The best idea I can think of is to check the properties. If we see insertorder=hashed and a non-zero value for scanproportion then throw up a warning message that says some bindings will be negatively affected by this configuration. This really isn't that great of an idea as there will likely be more bindings that are NOT affected and so this is a needless/confusing warning.

One step further would be to check that configuration as well as the requested client and send the warning. But this would require upkeep when a new binding starts to take advantage of endkey.

I guess one more idea would be to check the configuration in CoreWorkload, and just send null as the endkey when the endkey can't be computed so that downstream clients can't use it. This would force binding developers to account for this scenario. Of course this would need to be documented in the developers guide.

@kruthar
Copy link
Collaborator Author

kruthar commented Jan 6, 2016

Another negative consideration for this idea is even with insertorder=ordered you are only guaranteed that your key set represents an ordered set when sorted. You are not guaranteed that the records are stored in any particular order in the database. So, having a startkey and endkey may still require some sort of sorting in the database to retrieve the keys between than, which is not representative of a scan.

I think this may mean this is a dead idea and that startkey and recordcount are the best we can do to formulate a scan operation in the most cases. If bindings can't completely fit in that mold then a performance warning, or simply not implementing scan should suffice.

@busbey
Copy link
Collaborator

busbey commented Jan 6, 2016

I think that part's not so bad. Scan by definition presumes the primary key is sorted, I think. Wether that happens in the way the datastore happens to store things isn't our concern. Bindings that have a problem with that already bail on implementing scan, so this doesn't make that any worse.

@busbey
Copy link
Collaborator

busbey commented Feb 1, 2016

This and #402 are after the same thing. We should consolidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants