Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providing an alternative tiKV store #2

Closed
docteurklein opened this issue Nov 8, 2022 · 9 comments
Closed

providing an alternative tiKV store #2

docteurklein opened this issue Nov 8, 2022 · 9 comments

Comments

@docteurklein
Copy link

Hello :) Lovely project, thank you for this!

Could it be possible to switch stores with a distributed remote one like tiKV?

@zh217
Copy link
Contributor

zh217 commented Nov 8, 2022

Thank you for your interest!

In principle, yes, any KV store can be used as the storage layer. The only requirement is the ability
to store binary key-value pairs and to scan keys by bytes order (some of the ACID properties may have to be relaxed if the store does not have the required operations). We are currently refactoring in order to make an internal storage API available. When that is done, swapping should be trivial. We may even provide a ready-to-run distribution!

Would you mind describing your use case for a distributed store a little bit? And do you have any consistency/latency/throughput requirements? What is the maximum amount of data that a single query could touch for your use case?

@docteurklein
Copy link
Author

docteurklein commented Nov 8, 2022

Thanks for the answers :)

A use-case I had in mind was to have a scalable, multi-tenant database for web frontends.
Having separate storage from compute would allow scalability of stateless compute nodes (cozo), and let storage backends like foundationDB or tiKV handle the ACID concurrency stuff.

The advantage is bottomless scalability, and not being bottlenecked by single node I/O limits, at the cost of network latency.

@zh217
Copy link
Contributor

zh217 commented Nov 13, 2022

The storage layer API rewrite is now complete. It works very well with other embedded engines such as SQLite, which opens up way to have Cozo running on mobile (compiling rocksdb for mobile is too much trouble, and the mobile version of rocksdb lacks critical features).

However, I'm not very positive about the performance when the backend is a distributed store. It works, sure, but our smoke tests complete in around 130 seconds under TiKV backend. After we did some obvious optimizations it went down to around 40 seconds, but going beyond that could take lots of work and may only yield meager returns. For comparison, with purely in-memory storage it completes about 0.6 seconds, with RocksDB backend about 0.8 seconds, and with SQLite backend about 1.2 seconds.

@docteurklein is 1 to 2 orders of magnitude decrease in performance acceptable for your use case? It certainly isn't for us, as we can pretty much forget about running most of the graph algorithms. Currently under distributed storage the only queries that can still run efficiently are simple point queries that yield small number of rows.

To regain performance under the distributed setting really requires a distributed and replicated KV store. Unfortunately there seems to be no suitable open source projects that we can just use, and writing one is significantly more work than this project, and is certainly outside the scope of this project.

@zh217
Copy link
Contributor

zh217 commented Nov 16, 2022

Close this, as nothing more can be done at the moment.

@zh217 zh217 closed this as completed Nov 16, 2022
@docteurklein
Copy link
Author

Thanks for the feedback! Looks like shared-nothing architectures always win when it comes to perf.
I'd love to give tikv a spin anyway, did you publish the pluggable tikv backend?

@zh217
Copy link
Contributor

zh217 commented Nov 18, 2022

@docteurklein It will be available with the next release, but it will be highly experimental (and untested).

@dumblob
Copy link

dumblob commented Jan 24, 2023

It has been already several releases since @zh217 tried tikv as backend. @docteurklein did you find some time to try it yourself?

@docteurklein
Copy link
Author

not yet because I would have to recompile cozo with the correct features.

@tluyben
Copy link

tluyben commented Jul 9, 2023

@zh217 to do this properly (without the drop in performance), it would need some more abstract distribution right? Like replication & rebalancing based on usage? Seems unlikely it can be fast, ever, based on low level kv like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants