-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move commit processing to other threads #593
Comments
This approach for committing transactions would be similar to async io methods. No threads would wait on a transaction when the conditional writer is working on it. Conceptually a transaction would be submitted to the conditional writer, then when that completes a thread would pick the transaction up and start working on its next step. |
I implemented async commit. With async commit changes I am seeing much improved processing rates and much higher resource utilization on the cluster. I ran webindex on a 20 node EC2 cluster and saw the following. Previously with a similar setup I saw a peak of processing around 1000 web pages per second. In the test below I loaded 80 common crawl files in two batches of 40. I set the cluster to have 100M of commit memory. Below is a new plot for this change that shows how many transactions are working their way through the commit process. Of course seeing much higher rates for setting. Estimate below is indexing 50,000 links per second peak from web pages. While the new async commit changes significantly increase throughput, they make transaction commits take longer. This impacts follow on observers by increasing their lock wait time. Need to look into ways to minimize the impact of this. |
closes #593 implemented async commit
After working on #592 I am thinking that all commit processing could be pushed off to other threads. Currently the threads processing transactions also do the commit. Doing the commit involves waiting on results to come back from the conditional writer multiple times. I am thinking that instead of doing this threads should push transactions that are ready to commit on to a queue where a few threads process them. This queue would be memory limited. This would allow more work to be built up for the conditional writers and batch writers (as much as will fit in memory). As soon as a thread is ready to commit a tx, it would throw that on a queue and immediately go start working on getting another transaction ready to commit. I have a strong suspicion that this could really increase throughput.
The text was updated successfully, but these errors were encountered: