-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preloading '__main__' with forkserver has been broken for a long time #98552
Comments
A PR fixing this and adding unittest coverage would be great. BTW as you are using forkserver, be aware of the recently disclosed #97514 if you ever run on 3.9 or later. (that issue can be worked around) |
Thanks for the pointer on the CVE. The simple fix for this is pretty straightforward and I'll try to get it together quickly (first cpython PR so may take a little longer). It is just to deal with the renamed dict entry. However, I've discovered some other "quirks" with forkserver, and was wondering if they are known or being worked on, if they could use improvement, or are expected and it's just my newness in looking at it. (I'm on linux but trying to use multiprocessing in programs that also use threading, and have gotten burned a few times by deadlocks because of it, leading me to move to using forkserver. But maybe I should be considering another way forward?) Anyways, with the break fixed by using the right dict key to get Not having The When |
I did some experiments. Using I'll put together a PR for this approach too, to solicit input. The downside (for me) of this more general solution is that it alters code on the new process side, versus the not as complete fix that is on the original process side. This means the latter can be done with monkey-patching, but the former cannot, as far as I understand. |
Updating the tests for these changes led to some comical confusion. Turns out there was another bug waiting... forkserver does not flush its stdout/stderr before each fork, and that makes things really confusing. I was using print statements in files to make sure preload was working and they were only imported in the forkserver and not in the new forked processes. But the output said otherwise, even though everything was actually working... |
…n for a long time (9 years). We do this by using spawn.get_preparation_data() and then using the values in that dictionary appropriately. This lets us also fix the setting of sys.path (which never worked) and other minor properties of the forkserver process. While updating the test to verify these fixes, we also discovered that forkserver was not flushing stdout/stderr before it forked to create a new process. This caused the updated test to fail because unflushed output in the forkserver would then be printed by the forked process as well. This is now fixed too.
Bug report
The
forkserver
start method provides the ability to callset_forkserver_preload
on the multiprocessing context to load modules into and configure the forkserver process. By doing this carefully, you can avoid having to do module loading and other work each time the forkserver process is forked to create a new process. Without doing such work,forkserver
can be way slower than the traditionalfork
start methodYou can specify the module
'__main__'
in theset_forkserver_preload
list, and the forkserver source has special code when you do this. It ensures that the main file path does not have to be configured/loaded after each fork. To do this, inmultiprocessing.forkserver.ensure_running
, it callsmultiprocessing.spawn.get_preparation_data
and then uses themain_path
entry that may be in the returned dict.Unfortunately, 3 months after it was introduced, this functionality was broken in commit 9a76735. That commit renamed the
main_path
dictionary entry returned inget_preparation_data
toinit_main_from_path
, but didn't update the use inmultiprocessing.fork_server
Not having the ability to load and configure main on the forkserver ends up being unusually painful for my recent scenario, which led to tracking this down. I have a python program on a share that spawns short-lived processes at a high rate. Then multiple machines run this program from the share. Huge slowdown ensues as smbd processes on the server go crazy responding to every new process on every client reading the file and stat-ing the directory the file is contained in.
A simple fix in
multiprocessing.forkserver
accounting for the changed name rectifies the problem. I'll work on putting that PR together. Any thoughts on a workaround that doesn't require modifying the python source are welcome, as I imagine it will be a while until I'm on a python with the fix.Here's a simple repro.
Your environment
Ubuntu 20.04.4 LTS, CPython 3.8
Python source code examination indicates this bug is still present in the current version of CPython.
The text was updated successfully, but these errors were encountered: