Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fork support #8295

Closed
wavexx opened this issue Sep 10, 2014 · 18 comments
Closed

fork support #8295

wavexx opened this issue Sep 10, 2014 · 18 comments
Labels
status:won't change Indicates that work won't continue on an issue or pull request

Comments

@wavexx
Copy link
Contributor

wavexx commented Sep 10, 2014

I was trying to look into how much work would be to support fork() in julia as a first class operation.
fork would be needed for higher performance mutli-processing #7616, #985 thanks to a shared initial memory state.

I'd like to discuss 2 things here: what the fork semantics should be and the actual implementation details.

What should be the state of open file descriptors after a fork? Similarly, what should be do with current tasks?

Ideally fork would return pid, stdout and stderr to the forked process, while adding the PID to the current list of managed processes. The list of pending tasks would be reset on the child process (as you obviously don't want to continue tasks in the new environment), as for the list of managed processes (as there would be 2 processes managing the same pids). File descriptors coming from libuv are already marked FD_CLOEXEC, so there should be nothing much to do here.

libuv doesn't support forking (there's no uv_fork). However, re-initializing libuv from scratch on the child process looks do-able if we also aim to reset any pending task anyway.

Is there anything that's escaping me?

@ihnorton
Copy link
Member

See JuliaLang/libuv#19 and the other PRs referenced therein (for some related discussion).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 11, 2014

fork has no equivalent on windows (which is why it doesn't exist in libuv – only things that can be done on all platforms are allowed)

@wavexx
Copy link
Contributor Author

wavexx commented Sep 11, 2014

On 09/11/2014 04:53 AM, Jameson Nash wrote:

fork has no equivalent on windows (which is why it doesn't exist in
libuv – only things that can be done on all platforms are allowed)

Yes, though allowing an existing process to restart libuv wouldn't be a
deal breaker in this case.

Forking is quite a big deal for several performance tweaks.
Is this also a fixed stance in julia, or are we allowed to introduce
platform specific tuning as long as we do it out of libuv?

@StefanKarpinski
Copy link
Sponsor Member

I would be supportive of having non-portable features as long as it's clear that they are not portable.

@amitmurthy
Copy link
Contributor

Threading in libraries is an issue you may need to consider. For example, OpenBLAS creates its own threads. These will not be active in the forked child.

@wavexx
Copy link
Contributor Author

wavexx commented Sep 11, 2014

On 09/11/2014 11:38 AM, Amit Murthy wrote:

Threading in libraries is an issue you may need to consider. For
example, OpenBLAS creates its own threads. These will not be active
in the forked child.

Good point, though I was more worried about open IPC handles after the
fork that might interfere with the parent process. As an example, a
message queue that could still be used in the child due to a pending
task. I'm also worried about any atexit() handler that could perform
unwarranted cleanup in behalf on the parent.

@vtjnash vtjnash added the status:won't change Indicates that work won't continue on an issue or pull request label Mar 25, 2016
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 25, 2016

it's worth noting here that C doesn't even support fork (http://man7.org/linux/man-pages/man2/fork.2.html)

@vtjnash vtjnash closed this as completed Mar 25, 2016
@wavexx
Copy link
Contributor Author

wavexx commented Mar 25, 2016

On Fri, Mar 25 2016, Jameson Nash notifications@github.com wrote:

it's worth noting here that C doesn't even support fork
(http://man7.org/linux/man-pages/man2/fork.2.html)

I'm not sure I follow.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 25, 2016

After a fork(2) in a multithreaded program, the child can safely
call only async-signal-safe functions (see signal(7)) until such
time as it calls execve(2).

@wavexx
Copy link
Contributor Author

wavexx commented Mar 25, 2016

On Fri, Mar 25 2016, Jameson Nash notifications@github.com wrote:

After a fork(2) in a multithreaded program, the child can safely
call only async-signal-safe functions (see signal(7)) until such
time as it calls execve(2).

Well, yes, but it's a bit of a stretch.

But does this hint that we're getting true multithreaded runtime support
instead? ;)

@yuyichao
Copy link
Contributor

We're already using many libraries with true multihread support.

@giordano
Copy link
Contributor

I know that this issue has been closed as won't fix, but I'd like to point out that supporting fork would most probably make Cuba.jl package support parallelization and greatly speed-up computation of numerical integrals, see giordano/Cuba.jl#1 (at least I suspect this, I'm not 100% sure)

@Keno
Copy link
Member

Keno commented Jul 18, 2016

fork is not the correct way to go about parallelization. The same should be easily achievable by using threads.

@giordano
Copy link
Contributor

I don't have much choice. The C library on which Cuba.jl is based makes use of fork.

@Keno
Copy link
Member

Keno commented Jul 18, 2016

What's the reason it uses fork over threads? Are there global buffers that get clobbered?

@giordano
Copy link
Contributor

The parallelization model used in Cuba is described in this paper: https://arxiv.org/abs/1408.6373 Quoting from page 1:

Cuba uses fork/wait rather than the pthread* functions. The latter are slightly
more efficient because parent and child share their memory space, but for the same reason
they also require a reentrant integrand function, and the programmer may not have control
over reentrancy in all languages (e.g. Fortran’s I/O is typically non-reentrant). fork on the
other hand creates a completely independent copy of the running process and thus works
for any integrand function.

@Keno
Copy link
Member

Keno commented Jul 18, 2016

That's an understandable reason, but that fails to take into account that many programs can't be forked in general (in particular all multithreaded programs). Ideally the library should provide both options (or even better have hooks to bring your own).

@habemus-papadum
Copy link

habemus-papadum commented Oct 13, 2016

One (potentially) very nice application of fork in julia would be the following "flow" for interactive visualizations and audio (or anything else that could benefit from a programmable "realtime infinite runloop" that is typically difficult in a single threaded environment like julia)

  • as normal in a repl, load data, define functions, etc specific to your domain
  • when you need to visualize/interact -- create a lambda that knows how to "render" (via gl, etc)
  • fork, and go along your merry way in the parent process
  • The child process however will create a gl window(or whatever) and then enters into an infinite runloop, in which it periodically renders using the lambda carefully prepared by the parent. If the interactivity is provided by a native api (e.g. glfw gives keypresses and mouse events) then lib_uv being in a broken state no longer matters (though it would be nice to do things more cleanly if possible)

"fork and spin" might be an apt name for this strategy.

In any case, my point is that fork may not be the proper approach to all parallelism, but fork is a magic super power, particularly in a repl world, and should be guarded carefully and used liberally (if you ❤️ system images (and how could you not?), then you must ❤️ fork!)

Obviously, I like to write flippantly, but hopefully there's something useful here.

cheers, nehal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:won't change Indicates that work won't continue on an issue or pull request
Projects
None yet
Development

No branches or pull requests

9 participants