Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue #43: non-reentrant with-gil causes deadlocks with pytorch #64

Merged
merged 1 commit into from
Jan 25, 2020

Conversation

fonghou
Copy link
Contributor

@fonghou fonghou commented Jan 25, 2020

Further research python c api (esp. https://docs.python.org/3/c-api/init.html#non-python-created-threads), base on torch developers who point out that PyGILState_Ensure and PyGILState_Release are reentrant (ie. recursive calls are allowed).

  • Change with-gil (acquire-gil! and release-gil!) to reentrant PyGILState api.
  • Remove unnecessary acquire-gil! from interpreter/finalize!
  • Remove unnecessary nil from with-wil in (defn generic-python-as-jvm)
  • lein test passes

I've been using the patch in a nrepl session that runs for 6 hours with multiple clients (vscode calva remote ssh, and emacs cider-connect). All interactive evals are very responsive, no deadlocks. It used to deadlock within a few tools.namespace/refresh-all calls.

  * Change with-gil (acquire-gil! and release-gil!) to reentrant PyGILState api.
  * Remove unnecessary acquire-gil! in interpreter/finalize!
  * Remove unnecessary nil from with-wil in (defn generic-python-as-jvm)
@fonghou
Copy link
Contributor Author

fonghou commented Jan 25, 2020

BTW, :thread-state field in Interpreter atom has no value in this patch. I scan the whole repo, only python-thread-state and swap-interpreters! use this field, but these two functions are not called anywhere. I left them there, however want to mention "PyThreadState* PyGILState_GetThisThreadState()" can get it from native thread local var managed by PyGILState_Ensure/PyGILState_Release. I think it becomes redundant to keep trace of it on java side.

@jjtolton
Copy link
Contributor

This looks very promising. This might solve a few long standing stability issues. I’m not sure how to evaluate/test it though! Chris will probably have a much better idea.

@cnuernber
Copy link
Collaborator

This is really great. We have another person who I have been working with who is working with c++ threads and working we are working through some issues related to the GIL. Since this is reentrant, the pyGILState_Check call isn't even needed any more and all the code is simpler.

As far as thoroughly testing this, we don't have an answer aside from the unit tests. What I know is @fonghou is attempting to use pytorch from multiple repl threads which before this change definitely didn't work. So I am going to merge this in and assume that this is a better pathway forward. Real verification across many libraries and threaded uses cases will have to wait for someone to pay for it.

@cnuernber cnuernber closed this Jan 25, 2020
@cnuernber cnuernber reopened this Jan 25, 2020
@cnuernber cnuernber merged commit 0981376 into clj-python:master Jan 25, 2020
@cnuernber cnuernber mentioned this pull request Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants