Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming replication is pathologically slow #153

Closed
wants to merge 2 commits into from

Conversation

@rvagg
Copy link
Member

commented Jul 3, 2013

Streaming one level instance into another is much slower than expected. @dominictarr was looking into this with me at Nodeconf using this gist https://gist.github.com/brycebaril/5893248

The gist has two files, one that creates a small 1e6 record leveldb instance (~24MB on disk) and another which will attempt to use db1.createReadStream().pipe(db2.createWriteStream()) to clone the instance to another level instance.

What we were seeing is that once the initial batch is sent, each buffered batch will typically be only a single record, significantly slowing down the transfer. In our case it was taking ~75 seconds.

@rvagg

This comment has been minimized.

Copy link
Member

commented Jul 3, 2013

I've turned this issue into a pull-request with a new branch that I hacked on in LAX.

I'd like to make it more adaptive, but even now if you try it out you should see huge improvements in speed. There is a test/benchmarks/stream-test.js benchmark that does basically what your gist was doing. There are some console.log writes in write-stream.js there you can uncomment to see some interesting stuff too.

Comments & improvements welcome.

@juliangruber

This comment has been minimized.

Copy link
Member

commented Jul 3, 2013

sweet, stream performance improvements are huge!

@mcollina

This comment has been minimized.

Copy link
Member

commented Jul 4, 2013

Super cool! :)

How was the magic number of concurrent batches determined? Is this depending on the underlining system in same way?

@rvagg

This comment has been minimized.

Copy link
Member

commented Jul 7, 2013

MAGIC!

The hardwired numbers in write-stream.js really are magic, I'd love help and/or thoughts about how to make them less magic. I want to make it adaptive to load.

@mcollina

This comment has been minimized.

Copy link
Member

commented Jul 8, 2013

I fear that it is hardware-dependent, too.
We could build a short bench for knowing which setups works better and get some feedbacks from everybody.
Maybe it is something for another PR?

@dominictarr

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2013

What if it uses a k-armed bandit to decide between large, medium, and small batches?
and was measured on a writes/second

http://en.wikipedia.org/wiki/Multi-armed_bandit

Basically: normally, you do what worked best, but some of the time, you randomly try a different strategy,
and check if you get a better result. you'd just have to measure how long each write took,
and what strategy it was using.

But this is mad science. we should merge this pr and try crazy stuff later (we'd need benchmarks to verify that stuff too)

@mcollina

This comment has been minimized.

Copy link
Member

commented Jul 8, 2013

This is mad science :), but it may work very well :).
👍 for merging!

@rvagg rvagg referenced this pull request Jul 16, 2013
@dominictarr

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2013

so, I think we should merge this...
we can always update to a smarter way later.

@rvagg

This comment has been minimized.

Copy link
Member

commented Jul 23, 2013

I'm interested to see if @pgte can improve with #161, my solution still feels a bit hacky with an arbitrary timeout.

This was referenced Aug 14, 2013
@mcollina

This comment has been minimized.

Copy link
Member

commented Feb 2, 2014

Should we close this one? WriteStream is going away in any case.

@ralphtheninja

This comment has been minimized.

Copy link
Member

commented Mar 16, 2015

@mcollina Yes we should :)

@ralphtheninja ralphtheninja deleted the write-stream-optimisation branch Aug 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.