Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process.fork has dangerous semantics #6421

Closed
forksaber opened this issue Jul 21, 2018 · 18 comments · Fixed by #12934
Closed

Process.fork has dangerous semantics #6421

forksaber opened this issue Jul 21, 2018 · 18 comments · Fixed by #12934

Comments

@forksaber
Copy link

forksaber commented Jul 21, 2018

forkall(Current)

Process.fork currently has forkall semantics with respect to fibers. What this means is that all the fibers which were alive during the Process.fork call remain alive in the forked process.

Advantages

  • no deadlocks possible due to locked mutexes in contrast to when only a single fiber survives the fork

Disadvantages

Since fd/sockets are shared across forks, fibers which were doing writes will do double writes which will cause:

  • broken http responses
  • duplicated data in files
  • corrupted db queries

Similarly, fd/sockets reads may get divided across forks. So, both processes will see

  • partial http requests
  • partial db responses
  • partial file data

An example of double writes:

spawn do
  5.times do |i|
    sleep rand
    puts "line #{i}"
  end 
end
process = Process.fork do
  sleep 10
end

process.wait

Expected output:

line 0
line 1
line 2
line 3
line 4

Actual output:

line 0
line 0
line 1
line 2
line 1
line 3
line 4
line 2
line 3
line 4

IMO, these are really severe disadvantages so these semantics ought to change.

Alternatives

1. fork like semantics (linux/unix)

In this scenario, only the fiber invoking Process.fork lives on in the forked process. Rest of the fibers are either collected(majority of them) and a few core runtime fibers like SignalHandler loop are recreated.

Advantages

  • no unintentional socket/fd corruption
  • easy to reason about
  • similar to how threads would behave once thread-support lands

Disadvantages

  • Deadlocks: Any attempt to lock a mutex which was locked in non forking Fiber will result in a deadlock. This is manageable when you own the mutexes but if mutexes belong to stdlib/shard then deadlocks are more or less imminent. pthread_atfork like api for fibers may help here though i am not sure how successful that will be.

2. Disable fork without exec (go/erlang)

This approach is in line with what go and erlang takes. They find fork too difficult to implement correctly in a multi-threaded/multi-fiber/multi-actor environment that they don't even bother implementing it.

Disadvantages

  • won't be able to daemonize the process using the double fork technique but can definitely use a C binary which exec()s to a crystal binary.

Conclusion

Personally, I would prefer that fork without exec be removed from the stdlib so there's no risk of deadlocks or data corruption. If that's unacceptable, even implementing fork semantics is still a step in the right direction.

PS: I've been implementing a supervisor clone in crystal which gave me an opportunity to look deeper into Crystal's internals.

#6416 is also caused by forkall behaviour. I'll provide a root cause analysis in that thread.

@asterite
Copy link
Member

Yes, I'd like that too. We have fork because we had it before we went with non-blocking IO and fibers like Go. Now that did that, it's probably time to remove fork (which also doesn't exist in Windows...)

@asterite
Copy link
Member

Another reason: the experiments on trying to bring multithread support to Crystal were difficult to achieve, and getting fork to work was one of the reasons (not the reason, but one of them).

@98-f355-f1
Copy link

98-f355-f1 commented Jul 21, 2018

I used your example and ran it... got a similar input... I have scanned over the source code to see what's going on.
I do kind of see what you are suggesting, and how it's possibly happening. I agree with the fact that the implementation of fork needs to be improved, but I don't think that we have to throw it away either.

I tweaked your example and ran a comparable in Ruby for expected output. I am fearful of runaway processes, so I will always use a wait.

`
chan = Channel(Nil).new
spawn do
10.times do |i|
sleep rand
puts "line #{i}"
end
chan.send(nil)
end

t = Process.fork do
sleep 1
end

chan.receive
t.wait
`

I didn't quite the same output:

line 0 line 0 line 1 line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9
Do you think that the differences my be OS related?

These are going to be fun bugs to eradicate... :P

@asterite I think that the lack of windows operability shouldn't mean that we need to discard the whole thing. Now that Microsoft is going into the deepend on interoperability with Linux... these functions will be able to be accessed by MS users. There is always spawn or other ways to do it also for windows users.

@forksaber
Copy link
Author

Ah, the output was from 5.times and not 10.times. Edited my original post to reflect that. Also added process.wait.
@98-f355-f1 You are only getting line 0 twice because your forked process is only sleeping for 1 sec. Increase it to see similar output.

I think the only way to improve fork while keeping it would be implement forkone semantics but then we'll have to worry about mutexes and deadlocks. I am not sure if that will be considered an improvement.
So removing it is the safest choice.

see
https://thorstenball.com/blog/2014/10/13/why-threads-cant-fork/
https://news.ycombinator.com/item?id=8449164

@NIFR91
Copy link

NIFR91 commented Jul 21, 2018

I think fork should be kept if possible, maybe as suggested by enforcing fork and exec pattern.

@jhass
Copy link
Member

jhass commented Jul 21, 2018

Random idea: can we allow it only before any user fibers, or generally any fibers not explicitly marked as safe for fork, are created?

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jul 21, 2018

I'm leaning towards both positions (keep or remove) but I still lean a bit on the keep it side.

Yes, fork is dangerous, it can lead to disasters if invoked at whatever time; you should never have to use it; except for some rare edge cases where it makes sense on a POSIX system in a controlled environment (e.g. zero downtime binary reloads) without requiring C.

Just like daemonizing, I believe we should have a proper fork handling (that fixes a few bolts, like reseeding rand), mark it as unsafe and document it as very dangerous. Maybe remove the global ::fork and keep only Process.fork. I think it's better than having users just call LibC.fork manually (never do that).

I think it's possible to keep fork with a multithreaded event loop. The GC must signal all threads to stop before it can start collecting memory, fork must do the same (unless it's a fork/exec).

@kostya
Copy link
Contributor

kostya commented Jul 21, 2018

fork is need for example here: https://github.com/kostya/curl-downloader, i not find any efficient ways except fork (i tried scheduler mapper to events, threads, curl multi). just write some danger comment for all who wants to use fork.

@98-f355-f1
Copy link

@forksaber you wrote above "PS: I've been implementing a supervisor clone in crystal which gave me an opportunity to look deeper into Crystal's internals." Could you explain how you did this?

@forksaber on the second note, I got similar type erratic behavior also, but I am impatient so I wasn't going to wait 14 min for the sleep LOL. I tried it at different times, but curiously when I rewrote with the wait, it sort of straightened up the behavior.

I guess that it seems that most folks want to keep the fork, perhaps remove the exec, and put in strong warning, use at your own risk, explaining some possible side effects.

@RX14
Copy link
Contributor

RX14 commented Jul 21, 2018

I agree with the "keep fork as-is (but fix as many bugs as we can) but mark it as super dangerous" idea. We should provide a "sanitized" safe fork that works before fibers have been spawned too.

@forksaber
Copy link
Author

@98-f355-f1 You can check out my supervisor implementation here https://github.com/forksaber/supervisor
Almost feature complete but not much in the way of documentation and tests.

@straight-shoota
Copy link
Member

I think we should remove plain Process.fork from the stdlib.

fork is not supported on win32 at all. That makes it a very platform-specific feature which we would like to avoid in stdlib.
On other platforms, there are difficulties with fibers and it's entirely unsupported with multithreading enabled. AFAIK this can't work reliably ever. In a forked process only one thread continues which makes it error-prone. So with MT moving forward, future use cases look very slim.

This answer on SO explains the issues with fork in Golang, which should apply exactly to Crystal as well: https://stackoverflow.com/a/28371586/7401689

As a replacement, fork could be made available in a platform-specific shard which can be added as a dependency for the few use cases where you would really want to use bare fork behaviour.

fork immediately followed by exec would obviously still be required as implementation for Process.exec on POSIX platforms.

@j8r
Copy link
Contributor

j8r commented Nov 26, 2019

fork can be kept under a platform-specific module. This modularization is common in other languages.

@jan-zajic
Copy link
Contributor

I think we can keep platform depended apis along with their logical modules but mark them as so. If someone uses them he should know about consequences (i.e non-portability). See my proposal about this concept - #8524.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 28, 2019

I finally changed my mind on fork. Keeping support would require a lot of work and lots of bugs to fix, especially with MT (we already had a bunch with a single thread) but... for what purposes?

  • Daemonizing a process? There are many tools specialized to spawn, monitor and log processes starting from init systems themselves nowadays. They'll do a better job, and lets users use whatever they prefer, rather than each application re-doing everything (that probably won't be used because users will use external tools).
  • Sharing a server socket? This is a legit use-case (and performance easily beat MT), but we can create the socket with SO_REUSEPORT and just start multiple processes.
  • Take advantage of multiple cores, then communicating in some ways (e.g. through sockets)? That's what MT is for! Otherwise we can use an IPC protocol (e.g. ZMQ, AMQP server) and start multiple processes, or plain UNIX sockets.

Let's not bother.

@RX14
Copy link
Contributor

RX14 commented Dec 4, 2019

The only concevable use of fork would be to insert safe privilege-dropping commands like a chroot or setrlimit between the fork and the exec. This would be useful for a container scheduler for example.

I would support an unsafe non-documented API to make these happen if someone was interested in the capability, but otherwise i'm fully in support of pulling fork.

@straight-shoota
Copy link
Member

I would support an unsafe non-documented API to make these happen if someone was interested in the capability, but otherwise i'm fully in support of pulling fork.

That would certainly be better placed in a documented API of a dedicated shard.

@RX14
Copy link
Contributor

RX14 commented Dec 6, 2019

@straight-shoota well Process.exec needs to be part of the stdlib, and it can't be obvious or documented how to use a post-fork exec callback since it's very unsafe. I don't see how this could be a shard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.