-
-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple interpreters #5
Comments
Hello Chris, I would love to contribute, however, do you have any idea how we might achieve this? Is there any documentation on the knowledge one has to gather before solving the problem? From what I have seen from the source code, you already made some infrastructure for accepting multiple interpreters. |
@davidpham87 - The description on this issue contains documentation and links. What specifically is missing or what didn't make sense? |
Probably the issue is I have 0 experience with JNA/JNI, so I am totally lost on how I could solve the issue. Maybe I will start to explore on my own whenever I have free time and see how I can come up with a solution. |
OK, I understand. We probably don't need new JNI functions. I believe all of the ones that need to be bound (all of the ones that are linked to from the python threading page) are already done. The very first step IMO is to read the python threading page with the assumption that we can get to any function that we want to. These are the base rules of the system and we can't violate them. Because of this I believe we should discuss that section at a high level before moving on. |
1, 2: See two (S)trategies:
S.1: @cnuernber can you recommend some of the source in libpython-clj and its depens that are relevant to this task? Interested to read more to help evaluate possible next steps. |
@skullgoblet1089 Your analysis is greatly appreciated and seems spot-on to me. It isn't this library's job to do multiple python processes as that equates to multiple jvm processes I think. We can, however, gain some of the benefits of multiple processes by having multiple interpreters and if libraries doing long running operations are disciplined about releasing the GIL (like numpy and tensorflow) then we can actually do some meaningful multithreading. I imagine most people will do their multithreading in Clojure. My understanding is that this leaves using sub-interpreters primarily for memory management purposes I think but there is also some level of concurrency enabled by using sub-interpreters I think and potentially a lot depending on the library that someone is using. Relevant Sources
Sub-interpretersThis is the best we can do. It does enable some level of memory management and some (far lesser and library dependent) level of concurrency. On the other hand, now we have to worry about things like what happens when someone uses an object created from one interpreter in another. I believe the actual management is straight forward and I think the steps I proposed in the initial commit message are reasonable:
I should add that the decision of when to release interpreters is also up to question. Should the interpreters be bound to the GC, have an explicit release call or both? |
@cnuernber , thank you. I've starting reading through libpython-clj source. Your outline of 1 - 4 makes sense to me. Sub-Interpreters:Reading through PEP 554 -- Multiple Interpreters in the Stdlib helped me wrap my head around this some more. ConcurrencyOne particular example caught my eye: Running in a threadpool executor. It sounds like all of the sub-interpreters are subject to the GIL within CPython's main interpreter, though they have isolated state aside from global modules etc. described in the caveats. Imagine there are two use cases where the use of sub-interpreters might achieve concurrency >= that attainable in CPython: (1) as you mention, stdlib / extensions like numpy that "aggressively" release the GIL, and (2.) stdlib / extensions that are known to be inherently "threadsafe" by definition due to the fact that they do not manipulate shared state (usually not possible with threads as they share process state by design). I'm probably missing some of the complexity of thread state management related to sub-interpreters that would make a case against this claim, but subinterpreters themselves naturally seem to fall into the second category due to their described isolation - i.e. as long as they are not sharing state with other sub-interpreters or main interpeter which would require synchronization. Any other use cases seem to me like you would just collapse to multithreading problem in CPython. The references to CSP in PEP 554 sound like something that is after 1- 4 in your outline and which might benefit from Clojure interop. MemoryAs we've alluded, none of the above related to concurrency discounts the advantages of using a subinterpreter to independently manage a long running process not unlike running subprocess.Popen in Python but minus IPC etc. If the memory released by destroying a subinterpreter can be garbage collected or released back to os that is also a win. W/r/t your questions re interpeter management, I found a model for comparison detailed below. Jep:PEP 554 -- Existing Usage led me to discover: Jep. Some of Jep's documented features sound similar to this proposal. A few relevant articles from the project Wiki:
@cnuernber can you comment on Jep vs. libpython-clj? If we can't directly consume the Jep API as-is, what do you think about its design with regards to sub-interpreter management? |
Jep is a mature system for interacting with the python interpreter but it has a couple fairly serious drawbacks:
Because of all of this it doesn't have the concept of generic bridged objects and it doesn't work well from the repl (which can run your code in arbitrary REPL threads). It is just a far more static and limited design starting at the conceptualization of the system to it's actual implementation (JNI vs. JNA, java vs. Clojure). On the other hand, they have explored the sub interpreter pathway quite considerably and we want to learn from them as much as we can. In your exploration I recommend also checking out their JNI pathways as the Java doesn't tell (even close) the whole story: Basically using JNA which is more flexible and more dynamic we can get more than Jep can offer (like for instance not having to be bound to a particular thread, bridging objects from one language to another, less setup and no native compile step, multiple python version support from one jar, etc) all without needing to write a custom C layer and have a pip module and all this other nonsense. Plus Clojure is just a closer match to Python than Java in a lot of ways and you see the effects of this with how well the python objects integrate with the Clojure REPL and how little code it took Alan to build Panthera. Obviously I am biased, however. Maybe the most instructive thing would be to attempt to do some of Alan's Panthera tutorials via Jep if you really want to understand what it is like to use that system. |
Thanks @cnuernber. That explanation was very helpful. Makes sense, I had a feeling there were "drawbacks", otherwise you would have already been using it. I have a takeaway to analyze the Jep sub-interpreter "pathway". And with that, I'm ready to start writing code. To get the ball rolling, I'm going to try an approach emulating the Jep implementation in a simple test a la (2.a) below:
|
Closing with wontfix for now. |
As of
|
Some additional work would be required for interprocess communication, but those can be accomplished via the standard techniques. |
The libpython system was built with incomplete support for multiple interpreters.
Filling out this support will allow one to control python from multiple threads and use some serializable communication format (like pickle or json) to communicate python objects between interpreters.
The rules for libpython w/r/t threading are fairly clearly written on their threading API page. Interpreters of some scope of memory allocated to them and their threading primitives include at least:
libpython-clj's behavoir w/r/t all this is somewhat summarized here.
The gist is this - libpython binds the current interpreter to a thread-local variable when it acquires the gil. If one is bound then it attempts a swap. It does this with a release on
try...finally
. It also uses Java synchronization to make sure that interpreters are not reentrant in some unexpected way.So, the work required here is to, using the libc bindings, build out the pathway further for multiple interpreters. A rough outline may be:
(defn create-interpreter ...)
My opinion is that 1,2 would be useful enough to be a clear PR. Then we can carry it forward with 3 and 4 in separate stages being careful to check and be correct as we can w/r/t memory and threading semantics at each step. I also believe this is an important and necessary in the growth of this library.
The relevant portions of the zulip chat that prompted this are pasted below.
The text was updated successfully, but these errors were encountered: