Sync frequency #49

wvh · 2017-05-30T11:57:29Z

Hello,

I made a benchmark writing 1M records to badger, which took about 18 minutes. I forgot to close the database. The database size on disk was 0, i.e. badger didn't write anything to disk.

Doesn't badger sync anything to the disk in 18 minutes and 1M records?

And isn't 18 minutes quite slow for operations in memory only?

PS: default options, simple Put() of random strings of the form "event RANDOM_INTEGER".

manishrjain · 2017-05-30T12:09:14Z

In async mode, if the value size is lower than option.ValueThreshold, then key-value doesn't get written out to WAL. In this case, sounds like all of your values were lower than 20 bytes (if you used the default options).

Then, they get written out to LSM tree. They would be written to a memtable. Only once a memtable gets filled up, would it be synced to disk. 1M keys isn't sufficient to do that. A 64MB memtable can take 3M keys before it fills up.

This should all be really fast. I suspect the reason it took you 18 mins, is because you didn't batch your requests. Batching is absolutely critical to getting better performance. There's quite a bit of overhead cost per request; which you can amortize well by sending a 1000 keys in one request. You can send many requests in parallel to achieve even better performance.

Try these out. It shouldn't take more than 20 seconds or so. Do remember to Close the store.

P.S. You can look at the populate code here to see how we populated data to run Badger v/s RocksDB benchmarks.

wvh · 2017-05-31T13:17:21Z

First of all, thanks for your insights.

My values are about 12-15 bytes. Is there a technical reason for the default ValueThreshold = 20 option? You'd need to document this clearly, as those records will get lost if the server goes down. 18 minutes without a single sync is – I'm sure you agree – pretty long. Similar for the 64MB memtable. I understand speed is a priority, but it's not entirely clear to me as a casual user that it might take a long time before anything actually gets sync'ed to disk... I'd not expect to lose anything more than perhaps a few seconds worth of records with default settings. Not sure how often LevelDB does a sync to disk.

I've tried a batch approach, as you suggested. Writing 10000 records in batches of 1000:

start: populateLevelDB
  end: populateLevelDB Δt: 110.948957ms ops: 90131.53679308585
start: populateBadger
  end: populateBadger Δt: 10.896048533s ops: 917.7639003455051
start: populateBadgerBatch
  end: populateBadgerBatch Δt: 18.422385ms ops: 542817.8816152198
start: populateBoltDB
  end: populateBoltDB Δt: 1m46.367649028s ops: 94.01354727100926
start: populateBoltDBOneTrans
  end: populateBoltDBOneTrans Δt: 212.684806ms ops: 47017.93319453201

400000-500000 ops, a lot faster than non-batched inserts and also a lot faster than other kv-stores.

The problem in my use case is that the database stores user-generated events, which can't be batched easily as they happen sporadically.

szymonm · 2017-05-31T14:00:35Z

@wvh AFAIK, there is no direct technical reason behind ValueThreshold = 20. However, you must know that if we store a value in the ValueLog, we need to keep a pointer to this value in the LSM tree. The size of the pointer is 10 bytes, so it makes no sense no have ValueThreshold below that. We just assumed that the value should be at least 2x bigger to make any sense to keep it in a separate place at the cost of keeping the pointer in the LSM tree.

When tuning this parameter, you should know that when retrieving a KV pair there are 3 possible execution paths (I'm simplifying a bit here to give you intuition).

If size(value) < ValueThreshold (<=> KV pair is stored in LSM tree) and KV pair is in memory, we just retrieve it from memory, so you have best Get time.
If size(value) < ValueThreshold but the KV pair didn't fit into memory, we have to do a random-read to read it from disk.
If size(value) > ValueThreshold, we have to retrieve the pointer to it from LSM tree (from memory or disk) and then the value from disk, so you have at least one disk call.

There is an order of magnitude difference in Get time between 1 and 2, while linear between 2 and 3. If you increase ValueThreshold more of KV pairs go from 3 to 2, but at the same time, you can fit less of them to memory, so more of them goes from 1 to 2.

The documents will not get lost if the server goes down when SyncWrites option is true. So, you can populate your database with SyncWrites = false and then restart it with SyncWrites = true to avoid losing production data.

Regarding your problem that events are generated by users, you could do some kind of buffering to have good write speed and low latency at the same time. You could, for example, store buffered data every 1ms, so that users won't notice it, while you have data written in batches when the load is high.

szymonm · 2017-06-05T04:34:49Z

@wvh If you have no more questions, let me close the issue.

manishrjain · 2017-06-05T07:42:56Z

The problem in my use case is that the database stores user-generated events, which can't be batched easily as they happen sporadically.

I'd recommend you to set SyncWrites=true. It seems like you don't need a very high write throughput, but do care about your writes being persisted. In that case, a slightly higher write latency shouldn't be that big of a deal.

You can't have it both ways. Either you choose sync writes with higher write latency and ensure persistence, or you choose async writes with lower write latency and give up immediate persistence.

Even with sync writes, writes to SSDs, if that's what you're using, are pretty fast; so I don't see much downside for you to set SyncWrites=true.

wvh · 2017-06-05T11:12:31Z

szymonm: That's a pretty helpful explanation. It would be interesting if this would turn into documentation for people who understand the bigger picture but not the specific implementation details of each individual KV-store.

manishrjain: The difference is pretty minor, so yes, I very much prefer the sync option. Writes are pretty variable in my use case and while not every write is absolutely crucial, it should still be possible to kill the server and preferably not lose any events. Please make sure people understand that this sync is off by default.

Thanks to both of you for your time!

manishrjain added the kind/question Something requiring a response label May 30, 2017

szymonm closed this as completed Jun 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync frequency #49

Sync frequency #49

wvh commented May 30, 2017 •

edited

manishrjain commented May 30, 2017 •

edited

wvh commented May 31, 2017

szymonm commented May 31, 2017 •

edited

szymonm commented Jun 5, 2017

manishrjain commented Jun 5, 2017

wvh commented Jun 5, 2017

Sync frequency #49

Sync frequency #49

Comments

wvh commented May 30, 2017 • edited

manishrjain commented May 30, 2017 • edited

wvh commented May 31, 2017

szymonm commented May 31, 2017 • edited

szymonm commented Jun 5, 2017

manishrjain commented Jun 5, 2017

wvh commented Jun 5, 2017

wvh commented May 30, 2017 •

edited

manishrjain commented May 30, 2017 •

edited

szymonm commented May 31, 2017 •

edited