Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release the Python interpreter lock while waiting for data #39

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

phil-opp
Copy link
Collaborator

As long as we keep the global interpreter lock (GIL) active, no other Python thread can make progress. This lead to starvation when there are multiple Python operators on the same runtime node. This PR fixes this by holding the GIL only as long as needed, i.e. when calling code of the operator. Most importantly, we don't hold the GIL anymore when the operator is idle and waiting for new inputs.

Fixes #38

As long as we keep the global interpreter lock (GIL) active, no other Python thread can make progress. This lead to starvation when there are multiple Python operators on the same runtime node. This PR fixes this by holding the GIL only as long as needed, i.e. when calling code of the operator. Most importantly, we don't hold the GIL anymore when the operator is idle and waiting for new inputs.
@phil-opp
Copy link
Collaborator Author

Note that Python operators still share the same global interpreter lock, they just try to hold that lock as short as possible now. Ideally, we would use a separate Python runtime for each operator, but unfortunately this isn't supported in pyo3 yet (see PyO3/pyo3#879 and PyO3/pyo3#576).

@phil-opp phil-opp changed the title Release the Python interpreter lock when waiting for data Release the Python interpreter lock while waiting for data Jun 29, 2022
@haixuanTao
Copy link
Collaborator

So this fix the issue, I was previously facing, but, after testing on dora-drives, i'm not sure it is fast enough and scalable enough to be a proposed API.

@phil-opp
Copy link
Collaborator Author

Ok, thanks for trying! How about we merge this PR for now (since it at least improves things) and look into ways to make this faster later?

Right now, I can only think of one solution/workaround for the GIL problem: spawn multiple processes. We could achieve that by changing our dataflow YAML to run each Python operator on its own runtime node.

@haixuanTao haixuanTao self-assigned this Jun 29, 2022
@phil-opp phil-opp merged commit cfe8e8c into main Jun 29, 2022
@phil-opp phil-opp deleted the python-gil branch June 29, 2022 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 Python Operators will share the same GIL
2 participants