Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't take the GIL when iterating over items #418

Merged
merged 3 commits into from Mar 11, 2024
Merged

Don't take the GIL when iterating over items #418

merged 3 commits into from Mar 11, 2024

Conversation

whoahbot
Copy link
Contributor

@whoahbot whoahbot commented Mar 9, 2024

In addition to the previous PR #414, I spent some time looking at GIL contention that is evidenced when running this dataflow: https://gist.github.com/damiondoesthings/b5d16d22d18f37675a8a76e318c1bc8c

The changes in this PR dramatically reduce the effect of acquiring and releasing the GIL when looping over many items. In this case, to determine the time for each item in a large batch:

❯ hyperfine \
                       --parameter-list branch_name main,coarse_gil \
                       --setup "git checkout {branch_name}; maturin develop --release" \
                       "python -m bytewax.run examples.slow_wax -i0 -a\"localhost:9999\""
Benchmark 1: python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = main)
  Time (mean ± σ):     10.034 s ±  1.073 s    [User: 3.332 s, System: 10.033 s]
  Range (min … max):    7.420 s … 10.976 s    10 runs

Benchmark 2: python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = coarse_gil)
  Time (mean ± σ):      1.196 s ±  0.006 s    [User: 1.081 s, System: 0.119 s]
  Range (min … max):    1.186 s …  1.207 s    10 runs

Summary
  python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = coarse_gil) ran
    8.39 ± 0.90 times faster than python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = main)

The bulk of the difference for this particular dataflow comes from the change to stateful_unary.rs:429, where we preemptively take the GIL, and pass the token to logic.on_awake. The other changes in this PR are other areas I found where we were looping over items and taking the GIL within that loop.

I'm not sure in the general case where the best place to take a coarse grained GIL lock, and pass it to methods that need the token would be. I think we would need a few different benchmarking dataflows that exercise different parts of the codebase would be needed to determine where the best place to take the GIL would be.

@Psykopear
Copy link
Contributor

Nice catch, that's a huge improvement.
The fix in this PR feels like the right thing here, but as you say, we probably shouldn't draw general conclusions from this test case alone.

So yes, I agree we need a list of dataflows that we know are problematic, or were in the past, so that we can monitor the impact of performance work on all of them.

I could also update the kafka throughput benchmarks to the latest bytewax version and check if and how this change impacts that, as that's a bit closer to a real world usage scenario.

Adds GitHub action to collect performance test
information with https://codspeed.io
Copy link

codspeed-hq bot commented Mar 11, 2024

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 0 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

@whoahbot whoahbot marked this pull request as ready for review March 11, 2024 17:24
Copy link
Contributor

@davidselassie davidselassie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find and benchmarking. I don't think there's any downside here.

Looking at our other Timely operators, I would also look into refactoring our branch and extract_key Timely operators to use OperatorBuilder so we can do this same transformation. I think all the other Timely operators already avoid taking the GIL in an inner loop.

@whoahbot
Copy link
Contributor Author

Great find and benchmarking. I don't think there's any downside here.

Looking at our other Timely operators, I would also look into refactoring our branch and extract_key Timely operators to use OperatorBuilder so we can do this same transformation. I think all the other Timely operators already avoid taking the GIL in an inner loop.

Great idea. I'd like to merge this PR and open a separate one that looks into refactoring those two operators. I'd like to establish a baseline on main and see if our benchmarking workflow will report differences when refactoring those.

@whoahbot whoahbot merged commit 0f9cf2d into main Mar 11, 2024
29 checks passed
@whoahbot whoahbot deleted the coarse_gil branch March 11, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants