Support for multiple interpreters #5

cnuernber · 2019-08-19T22:53:13Z

The libpython system was built with incomplete support for multiple interpreters.

Filling out this support will allow one to control python from multiple threads and use some serializable communication format (like pickle or json) to communicate python objects between interpreters.

The rules for libpython w/r/t threading are fairly clearly written on their threading API page. Interpreters of some scope of memory allocated to them and their threading primitives include at least:

save interpreter state to variable and release lock.
load interpreter state from variable and acquire lock.
swap active interpreter state in the current thread which assumes the lock is acquired.

libpython-clj's behavoir w/r/t all this is somewhat summarized here.

The gist is this - libpython binds the current interpreter to a thread-local variable when it acquires the gil. If one is bound then it attempts a swap. It does this with a release on try...finally. It also uses Java synchronization to make sure that interpreters are not reentrant in some unexpected way.

So, the work required here is to, using the libc bindings, build out the pathway further for multiple interpreters. A rough outline may be:

(defn create-interpreter ...)
check that two interpreters can execute serially first, then simultaneously in multiple threads.
Attempt some limited communication between them.
Test and ensure bridging works appropriately between them.

My opinion is that 1,2 would be useful enough to be a clear PR. Then we can carry it forward with 3 and 4 in separate stages being careful to check and be correct as we can w/r/t memory and threading semantics at each step. I also believe this is an important and necessary in the growth of this library.

The relevant portions of the zulip chat that prompted this are pasted below.

I was thinking about it, but would it be possible to create multiple python interpreters/processes in a single/multiple namespaces? If yes, we could sell it as a solution for concurrency in python :) That being said it will be interesting to be able to provide explicitly an interpreter/python process to a module, or the ability to explicitly attach a module to an interpreter.

It would also means we could have isolation of data in the data analysis processes avoid namespace pollution all too common in python.

(By the way, not being able to do that seems to be one of the limitations of Reticulate, the R package for Python interop:
https://stackoverflow.com/questions/50762140/r-reticulate-how-can-i-close-restart-the-python-console
Hearing from people who use it intensively, that is indeed a practical problem. In cases of long running computations, as well as of memory leaks, being able to get rid of a python process and create a new one would be useful.)

The text was updated successfully, but these errors were encountered:

davidpham87 · 2019-08-20T18:23:28Z

Hello Chris,

I would love to contribute, however, do you have any idea how we might achieve this? Is there any documentation on the knowledge one has to gather before solving the problem? From what I have seen from the source code, you already made some infrastructure for accepting multiple interpreters.

cnuernber · 2019-08-21T16:03:03Z

@davidpham87 - The description on this issue contains documentation and links. What specifically is missing or what didn't make sense?

davidpham87 · 2019-08-21T16:06:03Z

Probably the issue is I have 0 experience with JNA/JNI, so I am totally lost on how I could solve the issue. Maybe I will start to explore on my own whenever I have free time and see how I can come up with a solution.

cnuernber · 2019-08-21T16:13:59Z

OK, I understand.

We probably don't need new JNI functions. I believe all of the ones that need to be bound (all of the ones that are linked to from the python threading page) are already done.

The very first step IMO is to read the python threading page with the assumption that we can get to any function that we want to. These are the base rules of the system and we can't violate them. Because of this I believe we should discuss that section at a high level before moving on.

skullgoblet1089 · 2019-08-22T15:21:08Z

Hearing from people who use it intensively, that is indeed a practical problem. In cases of long running computations, as well as of memory leaks, being able to get rid of a python process and create a new one would be useful.)

1, 2:
In Python, the above makes sense. Common practice to manage an SMP process pool, then kill off memory intense workers after they complete tasks to free up memory to os. Reason for this is because the python process will always operate at last high water mark. There are other ways than JNA to achieve an SMP arrangement with Python from Clojure runtime (e.g. protobuff). So I assume the goal is to have the multiple python interpreters reside within the jvm process, where we can use the python memory allocators to achieve the same, and then use existing bridges to manage them. This could result in a behemoth process. But if garbage collection is working properly, I can see a situation where user creates multiple small to mid sized python interpreters as a specialized pool to distribute work. Worth the experiment IMO.

See two (S)trategies:

Creating and destroying isolated interpreters that can only communicate with Clojure runtime environment.
This section in threading also seems relevant Sub-interpreter Support although the caveats seem too prohibitive to comprehensively answer the full intent of this proposal.

S.1: @cnuernber can you recommend some of the source in libpython-clj and its depens that are relevant to this task? Interested to read more to help evaluate possible next steps.
S.2: @cnuernber what do you think about Sub-interpreters?

cnuernber · 2019-08-23T00:11:23Z

@skullgoblet1089 Your analysis is greatly appreciated and seems spot-on to me. It isn't this library's job to do multiple python processes as that equates to multiple jvm processes I think.

We can, however, gain some of the benefits of multiple processes by having multiple interpreters and if libraries doing long running operations are disciplined about releasing the GIL (like numpy and tensorflow) then we can actually do some meaningful multithreading. I imagine most people will do their multithreading in Clojure.

My understanding is that this leaves using sub-interpreters primarily for memory management purposes I think but there is also some level of concurrency enabled by using sub-interpreters I think and potentially a lot depending on the library that someone is using.

Relevant Sources

jna low interpreter level bindings
high level interpreter bindings. Specifically the GIL management.

Sub-interpreters

This is the best we can do. It does enable some level of memory management and some (far lesser and library dependent) level of concurrency. On the other hand, now we have to worry about things like what happens when someone uses an object created from one interpreter in another. I believe the actual management is straight forward and I think the steps I proposed in the initial commit message are reasonable:

(defn create-interpreter ...)
check that two interpreters can execute serially first, then simultaneously in multiple threads.
Attempt some limited communication between them.
Test and ensure bridging works appropriately between them.

I should add that the decision of when to release interpreters is also up to question. Should the interpreters be bound to the GC, have an explicit release call or both?

skullgoblet1089 · 2019-08-23T05:15:41Z

@cnuernber , thank you. I've starting reading through libpython-clj source. Your outline of 1 - 4 makes sense to me.

Sub-Interpreters:

Reading through PEP 554 -- Multiple Interpreters in the Stdlib helped me wrap my head around this some more.

Concurrency

One particular example caught my eye: Running in a threadpool executor. It sounds like all of the sub-interpreters are subject to the GIL within CPython's main interpreter, though they have isolated state aside from global modules etc. described in the caveats. Imagine there are two use cases where the use of sub-interpreters might achieve concurrency >= that attainable in CPython: (1) as you mention, stdlib / extensions like numpy that "aggressively" release the GIL, and (2.) stdlib / extensions that are known to be inherently "threadsafe" by definition due to the fact that they do not manipulate shared state (usually not possible with threads as they share process state by design). I'm probably missing some of the complexity of thread state management related to sub-interpreters that would make a case against this claim, but subinterpreters themselves naturally seem to fall into the second category due to their described isolation - i.e. as long as they are not sharing state with other sub-interpreters or main interpeter which would require synchronization. Any other use cases seem to me like you would just collapse to multithreading problem in CPython. The references to CSP in PEP 554 sound like something that is after 1- 4 in your outline and which might benefit from Clojure interop.

Memory

As we've alluded, none of the above related to concurrency discounts the advantages of using a subinterpreter to independently manage a long running process not unlike running subprocess.Popen in Python but minus IPC etc. If the memory released by destroying a subinterpreter can be garbage collected or released back to os that is also a win. W/r/t your questions re interpeter management, I found a model for comparison detailed below.

Jep:

PEP 554 -- Existing Usage led me to discover: Jep. Some of Jep's documented features sound similar to this proposal. A few relevant articles from the project Wiki:

Sandboxed Interpreters: How-Jep-Works
Jep and the GIL

@cnuernber can you comment on Jep vs. libpython-clj? If we can't directly consume the Jep API as-is, what do you think about its design with regards to sub-interpreter management?

cnuernber · 2019-08-23T13:21:27Z

@skullgoblet1089

Jep is a mature system for interacting with the python interpreter but it has a couple fairly serious drawbacks:

Jep interpreters have to be accessed only from the thread they were created in.
JNI backend means that adding new functionality requires coding it in C.
JNI backend also means that you have to install a python module via pip that is python-version dependent. It means you have some set of shared dependencies of python, numpy, and Jep alltogether to manage.
Jep stdout/stderr handling is a mess; it doesn't forward the python stdout to java so a lot of things just die silently (!!).
When I tested the numpy zero-copy pathway it was somewhat unpredictable when it would work and when it wouldn't. In addition, the Jep mirrors returned don't have nearly the functionality of the tech tensor/datatype system (like no support for strides) and as such fewer numpy arrays can be moved across without more manipulations.

Because of all of this it doesn't have the concept of generic bridged objects and it doesn't work well from the repl (which can run your code in arbitrary REPL threads). It is just a far more static and limited design starting at the conceptualization of the system to it's actual implementation (JNI vs. JNA, java vs. Clojure). On the other hand, they have explored the sub interpreter pathway quite considerably and we want to learn from them as much as we can.

In your exploration I recommend also checking out their JNI pathways as the Java doesn't tell (even close) the whole story:

Jep C bindings.

Basically using JNA which is more flexible and more dynamic we can get more than Jep can offer (like for instance not having to be bound to a particular thread, bridging objects from one language to another, less setup and no native compile step, multiple python version support from one jar, etc) all without needing to write a custom C layer and have a pip module and all this other nonsense. Plus Clojure is just a closer match to Python than Java in a lot of ways and you see the effects of this with how well the python objects integrate with the Clojure REPL and how little code it took Alan to build Panthera.

Obviously I am biased, however. Maybe the most instructive thing would be to attempt to do some of Alan's Panthera tutorials via Jep if you really want to understand what it is like to use that system.

skullgoblet1089 · 2019-08-23T17:33:51Z

Thanks @cnuernber. That explanation was very helpful. Makes sense, I had a feeling there were "drawbacks", otherwise you would have already been using it. I have a takeaway to analyze the Jep sub-interpreter "pathway".

And with that, I'm ready to start writing code. To get the ball rolling, I'm going to try an approach emulating the Jep implementation in a simple test a la (2.a) below:

(defn create-interpreter ...)
(a.) check that two interpreters can execute serially first, then (b.) simultaneously in multiple threads.

cnuernber · 2020-01-21T14:12:38Z

Closing with wontfix for now.

jjtolton · 2021-03-12T02:16:44Z

As of 2.00-beta-11, this can now be accomplished using the cljbridge technology.

 $ ipython3 -i cljbridge.py
Python 3.6.9 (default, Oct  8 2020, 12:12:24)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import multiprocessing

In [2]: for i in range(4):
   ...:     multiprocessing.Process(target=init_clojure_repl).start()
   ...: 

In [3]: Mar 11, 2021 9:10:53 PM clojure.tools.logging$eval3184$fn__3187 invoke
INFO: nREPL server started on port 38811 on host localhost - nrepl://localhost:38811
Mar 11, 2021 9:10:53 PM clojure.tools.logging$eval3184$fn__3187 invoke
INFO: nREPL server started on port 39139 on host localhost - nrepl://localhost:39139
Mar 11, 2021 9:10:54 PM clojure.tools.logging$eval3184$fn__3187 invoke
INFO: nREPL server started on port 45259 on host localhost - nrepl://localhost:45259
Mar 11, 2021 9:10:54 PM clojure.tools.logging$eval3184$fn__3187 invoke
INFO: nREPL server started on port 39257 on host localhost - nrepl://localhost:39257

jjtolton · 2021-03-12T02:17:26Z

Some additional work would be required for interprocess communication, but those can be accomplished via the standard techniques.

cnuernber added enhancement New feature or request help wanted Extra attention is needed harder Harder issue labels Aug 19, 2019

cnuernber mentioned this issue Jan 18, 2020

Simplify interpreter design for only one interpreter #47

Closed

cnuernber closed this as completed Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple interpreters #5

Support for multiple interpreters #5

cnuernber commented Aug 19, 2019

davidpham87 commented Aug 20, 2019

cnuernber commented Aug 21, 2019

davidpham87 commented Aug 21, 2019

cnuernber commented Aug 21, 2019

skullgoblet1089 commented Aug 22, 2019

cnuernber commented Aug 23, 2019 •

edited

skullgoblet1089 commented Aug 23, 2019 •

edited

cnuernber commented Aug 23, 2019

skullgoblet1089 commented Aug 23, 2019

cnuernber commented Jan 21, 2020

jjtolton commented Mar 12, 2021

jjtolton commented Mar 12, 2021

Support for multiple interpreters #5

Support for multiple interpreters #5

Comments

cnuernber commented Aug 19, 2019

davidpham87 commented Aug 20, 2019

cnuernber commented Aug 21, 2019

davidpham87 commented Aug 21, 2019

cnuernber commented Aug 21, 2019

skullgoblet1089 commented Aug 22, 2019

cnuernber commented Aug 23, 2019 • edited

Relevant Sources

Sub-interpreters

skullgoblet1089 commented Aug 23, 2019 • edited

Sub-Interpreters:

Concurrency

Memory

Jep:

cnuernber commented Aug 23, 2019

skullgoblet1089 commented Aug 23, 2019

cnuernber commented Jan 21, 2020

jjtolton commented Mar 12, 2021

jjtolton commented Mar 12, 2021

cnuernber commented Aug 23, 2019 •

edited

skullgoblet1089 commented Aug 23, 2019 •

edited