-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync frequency #49
Comments
In async mode, if the value size is lower than option.ValueThreshold, then key-value doesn't get written out to WAL. In this case, sounds like all of your values were lower than 20 bytes (if you used the default options). Then, they get written out to LSM tree. They would be written to a memtable. Only once a memtable gets filled up, would it be synced to disk. 1M keys isn't sufficient to do that. A 64MB memtable can take 3M keys before it fills up. This should all be really fast. I suspect the reason it took you 18 mins, is because you didn't batch your requests. Batching is absolutely critical to getting better performance. There's quite a bit of overhead cost per request; which you can amortize well by sending a 1000 keys in one request. You can send many requests in parallel to achieve even better performance. Try these out. It shouldn't take more than 20 seconds or so. Do remember to P.S. You can look at the populate code here to see how we populated data to run Badger v/s RocksDB benchmarks. |
First of all, thanks for your insights. My values are about 12-15 bytes. Is there a technical reason for the default I've tried a batch approach, as you suggested. Writing 10000 records in batches of 1000:
400000-500000 ops, a lot faster than non-batched inserts and also a lot faster than other kv-stores. The problem in my use case is that the database stores user-generated events, which can't be batched easily as they happen sporadically. |
@wvh AFAIK, there is no direct technical reason behind When tuning this parameter, you should know that when retrieving a KV pair there are 3 possible execution paths (I'm simplifying a bit here to give you intuition).
There is an order of magnitude difference in The documents will not get lost if the server goes down when Regarding your problem that events are generated by users, you could do some kind of buffering to have good write speed and low latency at the same time. You could, for example, store buffered data every 1ms, so that users won't notice it, while you have data written in batches when the load is high. |
@wvh If you have no more questions, let me close the issue. |
I'd recommend you to set SyncWrites=true. It seems like you don't need a very high write throughput, but do care about your writes being persisted. In that case, a slightly higher write latency shouldn't be that big of a deal. You can't have it both ways. Either you choose sync writes with higher write latency and ensure persistence, or you choose async writes with lower write latency and give up immediate persistence. Even with sync writes, writes to SSDs, if that's what you're using, are pretty fast; so I don't see much downside for you to set SyncWrites=true. |
szymonm: That's a pretty helpful explanation. It would be interesting if this would turn into documentation for people who understand the bigger picture but not the specific implementation details of each individual KV-store. manishrjain: The difference is pretty minor, so yes, I very much prefer the sync option. Writes are pretty variable in my use case and while not every write is absolutely crucial, it should still be possible to kill the server and preferably not lose any events. Please make sure people understand that this sync is off by default. Thanks to both of you for your time! |
Hello,
I made a benchmark writing 1M records to badger, which took about 18 minutes. I forgot to close the database. The database size on disk was 0, i.e. badger didn't write anything to disk.
Doesn't badger sync anything to the disk in 18 minutes and 1M records?
And isn't 18 minutes quite slow for operations in memory only?
PS: default options, simple Put() of random strings of the form "event RANDOM_INTEGER".
The text was updated successfully, but these errors were encountered: