Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move commit processing to other threads #593

Closed
keith-turner opened this issue Dec 21, 2015 · 2 comments
Closed

Move commit processing to other threads #593

keith-turner opened this issue Dec 21, 2015 · 2 comments

Comments

@keith-turner
Copy link
Contributor

After working on #592 I am thinking that all commit processing could be pushed off to other threads. Currently the threads processing transactions also do the commit. Doing the commit involves waiting on results to come back from the conditional writer multiple times. I am thinking that instead of doing this threads should push transactions that are ready to commit on to a queue where a few threads process them. This queue would be memory limited. This would allow more work to be built up for the conditional writers and batch writers (as much as will fit in memory). As soon as a thread is ready to commit a tx, it would throw that on a queue and immediately go start working on getting another transaction ready to commit. I have a strong suspicion that this could really increase throughput.

@keith-turner
Copy link
Contributor Author

This approach for committing transactions would be similar to async io methods. No threads would wait on a transaction when the conditional writer is working on it. Conceptually a transaction would be submitted to the conditional writer, then when that completes a thread would pick the transaction up and start working on its next step.

@keith-turner
Copy link
Contributor Author

I implemented async commit. With async commit changes I am seeing much improved processing rates and much higher resource utilization on the cluster. I ran webindex on a 20 node EC2 cluster and saw the following. Previously with a similar setup I saw a peak of processing around 1000 web pages per second. In the test below I loaded 80 common crawl files in two batches of 40.

tx_per_sec

I set the cluster to have 100M of commit memory. Below is a new plot for this change that shows how many transactions are working their way through the commit process.

committing

Of course seeing much higher rates for setting. Estimate below is indexing 50,000 links per second peak from web pages.

set

While the new async commit changes significantly increase throughput, they make transaction commits take longer. This impacts follow on observers by increasing their lock wait time. Need to look into ways to minimize the impact of this.

lock_wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant