Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite as key-value store for concurrent Rust programs #1

Closed
Byron opened this issue Mar 8, 2020 · 5 comments
Closed

SQLite as key-value store for concurrent Rust programs #1

Byron opened this issue Mar 8, 2020 · 5 comments
Labels
article not an issue wontfix This will not be worked on

Comments

@Byron
Copy link
Contributor

Byron commented Mar 8, 2020

The motivation for this article is to save you a day or two when making SQLite ready for your concurrent, async Rust program, using it as key-value store minimising the "SQL" part.

If you want to get to the advertised content, skip the 'Why…' sections.

Why not Sled?

Maybe some of you may think why one would chose a 20 year old embedded SQL database written in "C" over Sled, a modern pure Rust key-value store!

The short answer is: Do try it, we need an alternative and every user makes it better. SQLite is not a key-value store, and it's not suitable for concurrent application out of the box. Sled can save you a lot of trouble, and it's "just working" in concurrent programs with an API that feels natural and powerful.

However, when I did try to use it (in a version from 2020-02-27), I encountered a few issues which taken together made using sled too much of a risk given the requirements of my program.

  • There may still be some time till v1.0.0 and one or the other issue to encounter
  • My database grew to 14.1 GB and the time to open it was dominated by IO, delaying application startup by seconds. In an earlier version that stored additional 16GB of assets, sled would read ~22GB on startup and write 7.5GB when opening the database.
  • Migrating big sled databases (which may be required between releases) can take a lot of memory. In one instance when migrating a 27GB database, it would use 50GB of RAM - it was a miracle my now deceased MBP with 16GB of RAM managed to complete the job, and for the first time I was afraid to lose data that way.
  • When storing about 210 thousand little marker objects that are nothing more than a key, accessing all of these in a random access manner caused the memory consumption to peak at 5GB, never going down below 3GB after that point despite the application being done.
  • By default, the application's memory consumption would be around 300MB, and as it turned out most of it came from Sled. There are a few configuration options, but they didn't seem to allow me trading speed for memory to the extend I would have wanted.

Given that I would like to run my program on small hardware with no more than 512MB of RAM, the sometimes unexpectedly high use of memory made it a deal breaker, so I had to look elsewhere :/.

You, however, should really give it a try, as by the time you do, none of the above might still be an issue. Maybe nothing of the above would be an issue for you. I think Rust does need Sled, which right now is our best bet and I hope we will get there sooner than later.

Why SQLite ?

A venerable 20+ year old embedded SQL database written in C doesn't seem like the go-to solution for a key-value store for concurrent applications.
And it isn't out of the box, but can be performing admirably when configured correctly.

On top of that, it's well-tested, proven and stable - nice traits for a database to store your valuable data.

The last paragraph from the about-page reads…

We the developers hope that you find SQLite useful and we entreat you to use it well: to make good and beautiful products that are fast, reliable, and simple to use. Seek forgiveness for yourself as you forgive others. And just as you have received SQLite for free, so also freely give, paying the debt forward.

…which made me fall in love with it a little.

From Rust one of the best ways to interact with it is Rusqlite, which puts a very rusty API on top of the C-interface SQLite exposes naturally. It's also the only option I tried, there was no other need.

Let's start from there!

Configuring SQLite with concurrent applications

Many concurrent writers

By default, concurrent writes will lock the database on the first writer, and as writers don't queue up, all but the first one will fail. There can be any amount of readers while there is no writer, which is exactly the rules that Rust enforces in your program.

A first step towards the solution is this chunk of SQL that wants to be at the start of your application

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/mod.rs#L25-L28

  • journal_mode = WAL
    • This allows any amount of readers even in the presence of one or more writers. Writers obtain locks per table, allowing concurrent writes to different tables with automatic queuing for the actual writes. Multiple writers to the same table are not allowed. Transactions that include reads followed by writes will fail if a write happened in the mean time, which may have invalidated the read.
  • synchronous = NORMAL
    • Only sync data to disk when absolutely necessary, at least at the end of each transaction.
    • This mode can still lead to bad write performance if there are many small transactions, and depending on your application, you might want to turn that off entirely and trigger syncs yourself.
  • wal_autocheckpoint = 1000
    • This line makes an implicit value explicit, and means that every thousand write transactions it will write the changes in the WAL back to the actual database file, if possible. As this operation locks the whole database, by default it will fail if there are still readers around, which may cause the WAL file to grow quite large.
  • wal_checkpoint(TRUNCATE)
    • On startup of the application, write all changes in the WAL file back to the database and truncate the WAL file to zero size. This helps to avoid unbounded growth and assures your database eventually gets to see the changes committed to the WAL file as well. Please note that keeping commits in the WAL file doesn't affect what readers see - the database appears consistent even though there are two files with data, instead of one.

Now most direct writes will work concurrently, but transactions who read before they write are still likely to fail in the presence of other writes performed while reading.

There are two measures to make this work. Firstly, there should be a busy handler which, in the simplest case, sleeps a little. It should be installed when opening a connection to the database.

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/mod.rs#L59-L60

Secondly each transaction that plans to read before writing should begin in IMMEDIATE mode. Without it, it will first get a read lock for the read portion, and then try to upgrade to a write lock when needed. In my case, this didn't trigger the busy handler and would fail permanently.

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/table.rs#L135-L160

connection.transaction_with_behavior(rusqlite::TransactionBehavior::Immediate)?

The above is the key to making this work. In conjunction with a busy handler that sleeps…

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/mod.rs#L96-L100

…this call will block until a write lock was obtained before any read happened.

Avoiding blocking busy handler for a better Future

In a threaded application, blocking the whole thread might be the way to go. With Futures, however, that is not advised in the common case as that may prevent all futures to make progress, depending on your executor.

Additionally using busy handlers per connection restricts the knowledge about what's going on within the handler, as these cannot be closures (i.e. they must be fn(i32) -> bool).

To solve this, one can refrain from installing a busy handler…

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/mod.rs#L53-L55

…and handle SQLite Busy failures by yourself:

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/persistence/table.rs#L134-L137

The retry-utility itself can receive additional context to better integrate with your program.

https://github.com/crates-io/criner/blob/88c55a3c129622557fccd72964ffd28ee37d30da/criner/src/persistence/table.rs#L223-L274

Please note that even though I chose to block, it should be possible to do an async sleep instead of a blocking one. For my program, however, that wasn't required as tasks are distributed to threaded executors, where blocking futures will not prevent all other futures from making progress.

Writing many small objects, fast

When writing many small objects intuitively (that is, naively unfortunately), performance will be low and dominated by IO as every write commits by default which triggers an fsync.

Something that may improve the situation at the expense of safety is to set synchronous = OFF and skip fsync() all together.

I took a different route and decided to go with prepared statements within a transaction.

https://github.com/crates-io/criner/blob/c36b108c763270a3e5b3adc9c562e0a12646a00d/criner/src/engine/stage/changes.rs#L126-L143

Much of this code is boilerplate to communicate the application state to prodash, and it relies on using a connection that blocks while the we try to get an immediate-mode transaction. From there, a prepared statement is used to insert changing key-value pairs followed by a single commit in the end. This performs admirably and I have seen 220k objects inserted in less than a second.

Gains and losses compared to using Sled

The sled version I used previously was a version from 2020-02-27. It must be clear to the reader that I am a big fan of Sled, and I can't wait to try it again. For this project, however, I wasn't able to and find it very unfortunate.

  • Gains
    • predictable, low memory consumption
    • exhaustive documentation of expected runtime behaviour
    • connect to the same database from multiple processes
    • Easy introspection thanks to sqlitebrowser
    • Database size about 4 times smaller (14.11GB down to 3.4GB, 3.1GB zipped down to 700MB)
    • No zero-copy
      • SQlite does not support this at all as values are copied out of the database. Owning everything makes Rust code easier to read due to the lack of lifetimes, but certainly costs you in terms of allocations and memcopies. In retrospect, I consider making use of zero-copy from the get-go a premature optimization - the added code complexity isn't worth it with the Rust of today.
  • Losses
    • Zero-copy
      • Originally I intended to use Sled as backing store for rust-messagepack encoded blobs which are deserialized into structs that use a lot of Cow's. It worked perfectly and was too fast for progress bars :D.
    • Easy-to-use concurrent-by-default API that is a pleasure to use and 'just works™️'. SQLite needs a lot of massaging and your app to embrace the SQLite API to support that. About a day was spent on figuring out how to get things back to where they were.
    • Now you have to study the manual and solve concurrency issue that weren't there before.
    • No-sql - having a native key-value store is a big bonus to me
    • No lifetime issues - as SQLite transactions keep their connection as reference, refactoring gets awkwardly hard where I simply fail to express the lifetimes necessary.

A feel for the achieved parallelism and performance…

asciicast

@djc
Copy link

djc commented Mar 8, 2020

Have you looked at LMDB at all? I think there are some crates that provide a safe zero-copy interface, and it would be a proven actual key-value store.

@Byron
Copy link
Contributor Author

Byron commented Mar 8, 2020

Thanks for the hint, I didn't have it on my radar at all!

I think for the next project that needs a database, there is a whole slew of additional options to evaluate.

@Byron
Copy link
Contributor Author

Byron commented Mar 8, 2020

Oh, and now that I see that @djc is the author of Askama - I do hope that will come in handy soon.
Something that swayed me to look at Sqlite more closely certainly is crates.rs, which is the engine behind lib.rs which I admire. It uses Sqlite for a bunch of things, including producing actual insights.

blakesweeney added a commit to blakesweeney/rnc-utils that referenced this issue Oct 4, 2020
Seems that sled is too slow and uses too much memory. This appears to be
a fundamental issue with sled, see:

spacejam/sled#986
the-lean-crate/criner#1

as these discuss the issue pretty clearly. Until the issue is fixed sled
isn't a good choice for our usage. I could switch to sqlite, as I've
written things like that before but I don't want to have. It is harder
than just using a get/merge that is builtin to the database. Rocks has
a similar API but is much faster than sled.

I've explored LMDB before (which is very fast) but it has some sharp
edges that I don't want to deal with, RocksDB while having a weird API
is at least much cleaner to work with I think.
@avinassh
Copy link

Hey @Byron, thanks for this detailed write up and it was a quite informative read. Taught me more about SQLite using in multiple writer situations.

PS: Thanks for dua-cli, I use it very often. Also, thanks for GitPython, which I use in my joke project :P

@Byron
Copy link
Contributor Author

Byron commented May 23, 2021

Thanks a lot for your kind words :). Actually I forgot about this post entirely and but am pleasantly surprised it exists - sometimes when running criner I wonder how it interacts with sqlite just vaguely remembering that it put up a little bit of a fight.

It's great to hear dua-cli is useful to you - I didn't regret writing it thus far (as I managed to not delete my disk with it yet ;)), and use it many times a day myself.
It's good to see I can finally be a rust rockstar - maybe one day you can rewrite that one using gitoxide and cut the 'time to rockstar' by a factor of ten :D.

@Byron Byron closed this as completed Jun 5, 2021
@the-lean-crate the-lean-crate locked and limited conversation to collaborators Jun 5, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
article not an issue wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants