New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't take the GIL when iterating over items #418
Conversation
Nice catch, that's a huge improvement. So yes, I agree we need a list of dataflows that we know are problematic, or were in the past, so that we can monitor the impact of performance work on all of them. I could also update the kafka throughput benchmarks to the latest bytewax version and check if and how this change impacts that, as that's a bit closer to a real world usage scenario. |
Adds GitHub action to collect performance test information with https://codspeed.io
CodSpeed Performance ReportCongrats! CodSpeed is installed 🎉
You will start to see performance impacts in the reports once the benchmarks are run from your default branch.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find and benchmarking. I don't think there's any downside here.
Looking at our other Timely operators, I would also look into refactoring our branch
and extract_key
Timely operators to use OperatorBuilder
so we can do this same transformation. I think all the other Timely operators already avoid taking the GIL in an inner loop.
Great idea. I'd like to merge this PR and open a separate one that looks into refactoring those two operators. I'd like to establish a baseline on |
In addition to the previous PR #414, I spent some time looking at GIL contention that is evidenced when running this dataflow: https://gist.github.com/damiondoesthings/b5d16d22d18f37675a8a76e318c1bc8c
The changes in this PR dramatically reduce the effect of acquiring and releasing the GIL when looping over many items. In this case, to determine the time for each item in a large batch:
The bulk of the difference for this particular dataflow comes from the change to stateful_unary.rs:429, where we preemptively take the GIL, and pass the token to
logic.on_awake
. The other changes in this PR are other areas I found where we were looping over items and taking the GIL within that loop.I'm not sure in the general case where the best place to take a coarse grained GIL lock, and pass it to methods that need the token would be. I think we would need a few different benchmarking dataflows that exercise different parts of the codebase would be needed to determine where the best place to take the GIL would be.