Don't take the GIL when iterating over items #418

whoahbot · 2024-03-09T00:40:09Z

In addition to the previous PR #414, I spent some time looking at GIL contention that is evidenced when running this dataflow: https://gist.github.com/damiondoesthings/b5d16d22d18f37675a8a76e318c1bc8c

The changes in this PR dramatically reduce the effect of acquiring and releasing the GIL when looping over many items. In this case, to determine the time for each item in a large batch:

❯ hyperfine \
                       --parameter-list branch_name main,coarse_gil \
                       --setup "git checkout {branch_name}; maturin develop --release" \
                       "python -m bytewax.run examples.slow_wax -i0 -a\"localhost:9999\""
Benchmark 1: python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = main)
  Time (mean ± σ):     10.034 s ±  1.073 s    [User: 3.332 s, System: 10.033 s]
  Range (min … max):    7.420 s … 10.976 s    10 runs

Benchmark 2: python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = coarse_gil)
  Time (mean ± σ):      1.196 s ±  0.006 s    [User: 1.081 s, System: 0.119 s]
  Range (min … max):    1.186 s …  1.207 s    10 runs

Summary
  python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = coarse_gil) ran
    8.39 ± 0.90 times faster than python -m bytewax.run examples.slow_wax -i0 -a"localhost:9999" (branch_name = main)

The bulk of the difference for this particular dataflow comes from the change to stateful_unary.rs:429, where we preemptively take the GIL, and pass the token to logic.on_awake. The other changes in this PR are other areas I found where we were looping over items and taking the GIL within that loop.

I'm not sure in the general case where the best place to take a coarse grained GIL lock, and pass it to methods that need the token would be. I think we would need a few different benchmarking dataflows that exercise different parts of the codebase would be needed to determine where the best place to take the GIL would be.

Psykopear · 2024-03-11T08:35:00Z

Nice catch, that's a huge improvement.
The fix in this PR feels like the right thing here, but as you say, we probably shouldn't draw general conclusions from this test case alone.

So yes, I agree we need a list of dataflows that we know are problematic, or were in the past, so that we can monitor the impact of performance work on all of them.

I could also update the kafka throughput benchmarks to the latest bytewax version and check if and how this change impacts that, as that's a bit closer to a real world usage scenario.

Adds GitHub action to collect performance test information with https://codspeed.io

codspeed-hq · 2024-03-11T17:07:16Z

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 0 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

davidselassie

Great find and benchmarking. I don't think there's any downside here.

Looking at our other Timely operators, I would also look into refactoring our branch and extract_key Timely operators to use OperatorBuilder so we can do this same transformation. I think all the other Timely operators already avoid taking the GIL in an inner loop.

whoahbot · 2024-03-11T20:46:04Z

Great find and benchmarking. I don't think there's any downside here.

Looking at our other Timely operators, I would also look into refactoring our branch and extract_key Timely operators to use OperatorBuilder so we can do this same transformation. I think all the other Timely operators already avoid taking the GIL in an inner loop.

Great idea. I'd like to merge this PR and open a separate one that looks into refactoring those two operators. I'd like to establish a baseline on main and see if our benchmarking workflow will report differences when refactoring those.

Don't take the GIL when iterating over items

5c40148

whoahbot requested review from davidselassie and Psykopear March 9, 2024 00:40

Add in benchmarking tests.

c5ef6a4

Adds GitHub action to collect performance test information with https://codspeed.io

Optional typing for older Python versions.

00ccf86

whoahbot marked this pull request as ready for review March 11, 2024 17:24

davidselassie approved these changes Mar 11, 2024

View reviewed changes

whoahbot merged commit 0f9cf2d into main Mar 11, 2024
29 checks passed

whoahbot deleted the coarse_gil branch March 11, 2024 21:16

whoahbot mentioned this pull request Mar 12, 2024

Add benchmarks for core operators #420

Merged

adriencaccia mentioned this pull request Mar 12, 2024

[FEATURE] Make CodSpeed work #421

Closed

whoahbot mentioned this pull request Mar 28, 2024

Fix an issue with session windows #430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't take the GIL when iterating over items #418

Don't take the GIL when iterating over items #418

whoahbot commented Mar 9, 2024 •

edited

Psykopear commented Mar 11, 2024

codspeed-hq bot commented Mar 11, 2024 •

edited

Detected benchmarks

davidselassie left a comment

whoahbot commented Mar 11, 2024

Don't take the GIL when iterating over items #418

Don't take the GIL when iterating over items #418

Conversation

whoahbot commented Mar 9, 2024 • edited

Psykopear commented Mar 11, 2024

codspeed-hq bot commented Mar 11, 2024 • edited

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

Detected benchmarks

davidselassie left a comment

Choose a reason for hiding this comment

whoahbot commented Mar 11, 2024

whoahbot commented Mar 9, 2024 •

edited

codspeed-hq bot commented Mar 11, 2024 •

edited