New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: fork and signal child handlers #6426

Merged
merged 5 commits into from Sep 6, 2018

Conversation

Projects
None yet
9 participants
@ysbaddaden
Member

ysbaddaden commented Jul 22, 2018

Fixes an issue where a parent process could receive signals directed to the child or vice-versa.

fixes #6416

@RX14

RX14 approved these changes Jul 22, 2018

@RX14

This comment has been minimized.

Member

RX14 commented Jul 22, 2018

Thanks for doing the debugging! It really shows how dangerous fork is with these subtle ordering and error handling issues.

@forksaber

This comment has been minimized.

forksaber commented Jul 23, 2018

I am not sure if this should be handled as part of this PR, but I think there still exists a subtle race condition in the signal handler implementation. Consider the following order of events:

Parent  : fork()
Child   : spawns
Child   : receives a signal
Child   : writes the signal to parent's pipe
Child   : Signal.after_fork is called
Parent  : processes the signal meant for child (**BUG**)

I haven't actually tried writing down a fix but something like this should work:

#Pseudo code

# block all the signals for the current thread
pthread_sigmask(block all signals) 
pid =LibC.fork
if pid > 0
  #Parent
  pthread_sigmask(unblock all signals)
else
 #child
  if run_hooks       # (Process.fork)
    Signal.after_fork
    pthread_sigmask(unblock all signals)
    run_remaining_hooks
  else                    #(Process.exec)
    # reset all signal handlers  to SIG_DFL or  call Signal.after_fork  
    (signals.each { |signal| signal.reset } ) OR (Signal.after_fork) 
    pthread_sigmask(unblock all signals)
    execvp
  end
end
   

References:
https://www.redhat.com/archives/libvir-list/2008-August/msg00303.html
https://github.com/libvirt/libvirt/blob/e83da1990cfb75ae9a83559b6b702bb9fc4bc61e/src/util/vircommand.c#L297

@jhass

jhass approved these changes Jul 24, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 24, 2018

You're right again!

I added a commit to disable signals then reenable them after fork (parent and child). Now we should be fine 🤞

I chose to continue to not reset signal handlers in the child process. Either developers don't care about signals and we'll keep the default handlers (i.e. SIGSEGV, SIGCHLD) or she cares about them and she's responsible for resetting them if needed.

@forksaber

This comment has been minimized.

forksaber commented Jul 25, 2018

Thank you @ysbaddaden for working on this PR. Everything is fine except that the race condition still exists in fork+exec scenario as signals are not blocked before forking when run_hooks is false. So the following changes should make this patch perfect:

module Crystal::Signal
  def self.clear_signal_handlers
    @@handlers.keys.each { |sig| sig.reset }
  end
end
-if run_hooks
-  LibC.sigfillset(pointerof(newmask))
-  ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
-  raise Errno.new("pthread_sigmask") unless ret == 0
-end
+LibC.sigfillset(pointerof(newmask))
+ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
+raise Errno.new("pthread_sigmask") unless ret == 0
 
 pid = LibC.fork
 errno = Errno.value
 
 case pid
 when 0
   # child:
   pid = nil
   if run_hooks
     Process.after_fork_child_callbacks.each(&.call)
-    LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
+  else
+    # exec scenario
+    Crystal::Signal.clear_signal_handlers
   end
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
 when -1
   # error:
-  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
   raise Errno.new("fork", errno)
 else
   # parent:
-  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil) if run_hooks
+  LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
 end

Someone experienced a similar in go:
golang/go@df0892c

PS: Ideally, the method clear_signal_handlers should be implemented using an array of atomic flags for registered handlers as when multiple threads are present the @@handlers hash could be in an inconsistent state and might cause segfaults. But that can be tackled in a later PR.

# Psuedo code for thread-safe  clear_signal_handlers
module Crystal::Signal
  @@handled = StaticArray(AtomicFlag, 32).new(AtomicFlag.new(0_u8))
  
  def self.trap(signal, handler) : Nil
    @@handled[signal].set(1_u8)
    previous_def
  end
       
  def self.clear_signal_handlers
    @@handled.each_with_index do |flag, signal|
      next if flag.get == 0_u8
      signal.reset
     end
   end
end
@bararchy

This comment has been minimized.

Contributor

bararchy commented Jul 26, 2018

@RX14 or @jhass
Two approves here, LGTM?

@Sija

This comment has been minimized.

Contributor

Sija commented Jul 26, 2018

@bararchy Hold your horses, let's wait for conclusions regarding newest @forksaber's comment.

@bararchy

This comment has been minimized.

Contributor

bararchy commented Jul 26, 2018

@Sija sure, I was under the impression this has already been reviewed.

@Sija

This comment has been minimized.

Contributor

Sija commented Jul 26, 2018

Well, it was. Before that comment ;)

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 27, 2018

Yes, @forksaber is right again! I'm fixing this.

if run_hooks
Process.after_fork_child_callbacks.each(&.call)
LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
end

This comment has been minimized.

@ysbaddaden

ysbaddaden Jul 27, 2018

Member

In chose to not reenable signals in a fork/exec situation. I just leave the handlers and mask in place, and let execve(2) override that for the new process.

This comment has been minimized.

@forksaber

forksaber Jul 27, 2018

I think if you don't reenable the signals, they will remain blocked across the exec. So, it is necessary to reenable the signals and reset them.

The new process shall inherit at least the following attributes from 
the calling process image:

Nice value (see nice())
semadj values (see semop())
Process ID
Parent process ID
Process group ID
Session membership
Real user ID
Real group ID
Supplementary group IDs
Time left until an alarm clock signal (see alarm())
Current working directory
Root directory
File mode creation mask (see umask())
File size limit (see ulimit())
Process signal mask (see sigprocmask())
Pending signal (see sigpending())
tms_utime, tms_stime, tms_cutime, and tms_cstime (see times())
Resource limits
Controlling terminal
Interval timers

Reference:
http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 27, 2018

Sigh... Okay.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 87e6430 to 5d101bf Jul 27, 2018

@forksaber

This comment has been minimized.

forksaber commented Jul 28, 2018

Sorry for being such an annoying person/reviewer :(
I can't help but point out that the race condition(fork/exec scenario) has shifted from

# run_hooks: false
pid = LibC.fork
case pid
when 0
  # child
  # RACE here
  exec
end

to

# run_hooks: false
block_signals
pid = LibC.fork
case pid
when 0
  # child
  unblock_signals
  # RACE here : since parent's pipe exists, any signal 
  # delivered here will end up in parent process
  exec
end

A crucial step is missing to fix this race condition is to reset the signal handlers to SIG_DFL before unblocking the signals:

case pid
when 0
  # child
+  Crystal::Signal.reset_signal_handlers!
  unblock_signals
  # RACE here : since parent's pipe exists, any signal 
  # delivered here will end up in parent process
  exec
end

An attempt to fix this:

module Crystal::Signal
  def self.mutex
    @@mutex
  end

  # safe to invoke only in post-fork/pre-exec state
  def self.reset_signal_handlers!
    @@handlers.keys.each { |sig| sig.reset }
  end
end
 protected def self.fork_internal(run_hooks : Bool = true)
   newmask = uninitialized LibC::SigsetT
   oldmask = uninitialized LibC::SigsetT
 
   LibC.sigfillset(pointerof(newmask))
   ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
   raise Errno.new("Failed to disable signals") unless ret == 0
 
+  # lock signal mutex before forking to ensure @@handlers is in
+  # a consistent state
+  mutex = Crystal::Signal.mutex
+  mutex.lock if ! run_hooks
+
   case pid = LibC.fork
   when 0
     # child:
     pid = nil
     if run_hooks
       Process.after_fork_child_callbacks.each(&.call)
       LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
     else
+      # no need to unlock mutex as we won't be registering
+      # any custom signal handlers before exec.
+      # Also unlocking mutex not safe here as scheduler is not reinitialized
+      Crystal::Signal.reset_signal_handlers!
       # sigmask is inherited: must reset it
       LibC.sigemptyset(pointerof(newmask))
       LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), nil)
     end
   when -1
+    # unlock mutex immediately after fork in parent
+    mutex.unlock if ! run_hooks
     # error:
     errno = Errno.value
     LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
     raise Errno.new("fork", errno)
   else
+    # unlock mutex immediately after fork in parent
+    mutex.unlock if ! run_hooks
     # parent:
     LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(oldmask), nil)
   end
 
   pid
 end
@RX14

This comment has been minimized.

Member

RX14 commented Jul 28, 2018

@forksaber honestly we really appreciate your comments! It's much nicer to sort this out now instead of debugging an intermittent race condition in production years later.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 29, 2018

Yes, you're right again! Sigh me, too focused on the child state than on the child -> parent leak.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 5d101bf to a146666 Jul 29, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 29, 2018

Now I hope it's fine 🤞

@forksaber

This comment has been minimized.

forksaber commented Jul 29, 2018

The patch LGTM 👍 when there exists only a single thread (which is currently the case).

However, consider this multi-threaded case

Thread-1 : locks Crystal::Signal mutex

Thread-1 : is in the middle of modifying @@handlers hash....execution state could be 
           inside any method of the Hash implementation ( hash's internal state might 
           be inconsistent depending on the implementation)

Thread-2 : fork

Child : now accesses said inconsistent @@handlers hash possibly 
        causing segfault or undefined behaviour

Thread-1 : finishes modifying @@handlers

So, mutex.lock is needed before fork to ensure that no other thread modifies the @@handlers hash while the current thread forks.

I can understand that you may not want to include the mutex.lock since Crystal is currently single threaded.
An alternative is to add some TODO comment related to this, so we can revisit this once thread-support lands.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch 2 times, most recently from 81d93d5 to 9aa18c2 Jul 30, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 30, 2018

Let's reset all signals and be done with it now.

@forksaber

This comment has been minimized.

forksaber commented Jul 30, 2018

@ysbaddaden Agreed

@forksaber

This comment has been minimized.

forksaber commented Jul 30, 2018

Assumption: An average crystal program defines 3 custom signal handlers

To make this decision easier, I am adding a benchmark that compares the following approaches:

  • hash: @@handlers.keys is used to find signals and reset them
  • reset all: reset all signal handlers irrespective of whether they have non default values
  • array of atomic flags: used alongside @@handlers hash to register and query handler state change
require "benchmark"
class AtomicFlag
  @flag = Atomic(UInt8).new(0_u8)

  def get : Bool
    @flag.get == 1_u8
  end

  def set(val : Bool)
    if val
      @flag.set(1_u8)
    else
      @flag.set(0_u8)
    end
  end
end

arr = StaticArray(AtomicFlag, 32).new { AtomicFlag.new }
handlers = Hash(Int32, Proc(Signal, Nil)).new

n = 3
(1..n).each do |i|
  arr[i].set true
  handlers[i] = ->(sig : Signal) { nil }
end

Benchmark.ips do |x|
  x.report("reset changed - hash  ") do
    handlers.each_key do |i|
      LibC.signal(i, LibC::SIG_DFL)
    end
  end

  x.report("reset changed - atomic") do
    (1..31).each do |i|
      flag = arr[i-1].get
      LibC.signal(i, LibC::SIG_DFL) if flag
    end
  end

  x.report("reset all             ") do
    (1..31).each { |i| LibC.signal(i, LibC::SIG_DFL) }
  end
end

Output

reset changed - hash     1.47M (678.12ns) (± 0.42%)  0 B/op        fastest
reset changed - atomic   1.42M (705.08ns) (± 0.29%)  0 B/op   1.04× slower
reset all              144.33k (  6.93µs) (± 0.42%)  0 B/op  10.22× slower

I am fine with either solution....using atomics to reset select signals or resetting all the signals.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 1905a26 to 17c7fee Aug 1, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Aug 1, 2018

Thanks for the benchmark!

I maintain a sigset_t, because signals are tricky: they usually range 1-32, but there ain't no guarantee about that. In addition the POSIX functions are fast enough. With this patch it's only 1.07× slower than the unsafe @@handlers.each_key, so Process.run won't see a significant performance loss.

The set is only mutated within a mutex so mutation is thread safe. It's also modified before the actual trap and after the actual reset. I assume it's fine to use without the mutex in fork/exec. It's just a fixed set of longs, and either a thread is adding a signal to the set and we don't care (the signal isn't trapped, yet) or it's being removed and we'll just have a useless reset.

All cases look fine to me. I hope you agree @forksaber 🤞

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 17c7fee to 7d17917 Aug 1, 2018

@bcardiff bcardiff added this to the 0.26.0 milestone Aug 4, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Aug 4, 2018

@ysbaddaden unless the last message from @forksaber push you to do something else in this PR we are good to go and include this PR as is.

@forksaber you probably know this already, but you input was of great great value. Thanks! 🙇

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Aug 5, 2018

It condos my understanding: Go is using an atomic to avoid a mutex that would block.

The race is quite edge case: it requires that a thread would modify a signal while another is exactly calling fork.

I naively assume that a program does setup a few signals at startup and won't change them much afterwards.

Maybe we can just close the pipe in fork/exec and add a "write byte unless closed" in the signal handler. That would avoid passing child signal to parent and exec will reset any signals we missed to SIG_DFL anyway.

@forksaber

This comment has been minimized.

forksaber commented Aug 5, 2018

Nice idea @ysbaddaden
Building on your idea....something like this should work great.

module Crystal::Signal
  @@after_fork_before_exec = false

  def self.after_fork_before_exec
    @@after_fork_before_exec = true
    ::Signal.each do |signal|
      LibC.signal(signal, LibC::SIG_DFL) if signal.set?
    end
  end

  def self.trap(signal, handler) : Nil
    @@mutex.synchronize do
      unless @@handlers[signal]?
        signal.set_add
####
        LibC.signal(signal.value, ->(value : Int32) {
          if @@after_fork_before_exec
             # signal handler was not reset due to the above race...lets reset it
             LibC.signal(value, LibC::SIG_DFL)
             # re raise the signal so that its handled with SIG_DFL semantics
             LibC.raise(value)
          else
             writer.write_bytes(value)
          end
        })
####
      end
      @@handlers[signal] = handler
    end
  end

end

We can even double down on this idea and forget about maintaining a sigset. Then, every signal handled in after_fork_before_exec state will be reset dynamically. 😁

Thanks @bcardiff, I appreciate it 😃

@sdogruyol

Some really great improvements here 🌮 Thank you @ysbaddaden @forksaber 👍

@bcardiff

This comment has been minimized.

Member

bcardiff commented Aug 6, 2018

@ysbaddaden given that we should release soon, maybe either we merge as is this pr and defer future work, or we release 0.26 without this. Your call.

@bcardiff bcardiff removed this from the 0.26.0 milestone Aug 8, 2018

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 7d17917 to fa84884 Aug 8, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Aug 8, 2018

I pushed a commit to close the pipe anyway, and to check if it's closed when writing the signal. I'm fixing I fixed the conflict.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from fa84884 to 8184d6e Aug 8, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Aug 8, 2018

I think it's okay for now. An atomic sigset would allow to avoid a mutex, but that can wait for multithreaded fibers. We can probably avoid more race conditions, but that's really an edge case, and we figure something better someday if it proves to really be an issue —I doubt an app would trap/reset signals in loops.

@RX14

RX14 approved these changes Aug 10, 2018

@ghastmask

This comment has been minimized.

ghastmask commented Aug 21, 2018

Hi it sounds like you guys already got everything figured out but I wanted to point you to the facebook folly Subprocess c++ implementation which is pretty well documented in case you hadn't seen it:

https://github.com/facebook/folly/blob/master/folly/Subprocess.cpp

@RX14 RX14 added this to the 0.26.1 milestone Aug 21, 2018

@RX14 RX14 requested a review from bcardiff Aug 21, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Aug 22, 2018

I would rather include this in 0.27. It is a bit sensible and I would prefer to minimize the risk of needing a 0.26.2 for this.

@bcardiff bcardiff modified the milestones: 0.26.1, 0.27.0 Aug 22, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Sep 6, 2018

@ysbaddaden would you rebase on master so we merge this without squashing?

ysbaddaden added some commits Jul 22, 2018

Fix: always close parent signal pipe after fork
Fixes read operation after fork, forcing the signal handler fiber to
retry on the new pipe.

Also avoids the child from receiving a parent signal from the pipe.

refs #6416
Fix: always reset signal handlers before scheduler reset on fork
This will avoid a future issue where a thread could process signal
in parallel to the signal resets.
Fix: disable/restore signals when forking process
Always block signal reception before forking, even in fork/exec
cases, so the child will never receive a signal and wrongly forward
it to the parent process.

Always restores the sigmask after fork. During fork/exec the signal
pipe isn't replaced but signal handlers are resetted to `SIG_DFL` so
they won't be received anymore (and won't be forwarded to the
parent) before we restore the sigmask (inherited after exec).
Only reset trapped signals
Improves fork/exec executation by limiting the number of signals to
reset to SIG_DFL to the signals that have actually been trapped. We
now reset only 1 signal by default, instead of 32.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:fix-fork-and-signal-child-handlers branch from 8184d6e to e5b401f Sep 6, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Sep 6, 2018

I just did. There was no conflicts, thought.

@bcardiff bcardiff merged commit a350db8 into crystal-lang:master Sep 6, 2018

4 checks passed

ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ysbaddaden ysbaddaden deleted the ysbaddaden:fix-fork-and-signal-child-handlers branch Sep 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment