Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shall we switch back to fork()/exec() ? #468

Closed
siteshwar opened this issue Apr 11, 2018 · 6 comments
Closed

Shall we switch back to fork()/exec() ? #468

siteshwar opened this issue Apr 11, 2018 · 6 comments

Comments

@siteshwar
Copy link
Contributor

Forking a new process with posix_spawn() can result in a race condition while setting terminal foreground process group. It was discussed earlier in this thread. This is dgk's response from the thread:

ksh93 uses posix_spawn() rather than fork/exec to run a simple command for
performance reasons.  However, posix_spawnv() doesn't have a way to
set the terminal group before the exec().  The parent process sets the
terminal group which leads to the race.

There is a simple change you can make to a program to guarentee that
the terminal will get set at the beginning of the program.
You can add two lines.  One line is
        #include        <termios.h>

and for the other line  
                tcsetpgrp(2,getpgrp());
Alternatively, you can write a program that does these two lines and
then execs the original program.

This problem has been seen by multiple users and reported in Red Hat bugzilla 1295563. Only reliable way to fix this problem is to switch back to fork()/exec(). I propose that we should enable _AST_no_spawnveg flag by default.

@siteshwar
Copy link
Contributor Author

posix_spawn() is used due to performance reasons. We should analyze how much performance impact we will have if we stop using it. We can enable _AST_no_spawnveg as an experiment and see if it causes performance degradation or any other regressions.

@siteshwar siteshwar added the bug label Apr 11, 2018
@kernigh
Copy link
Contributor

kernigh commented Apr 11, 2018

I reproduced this bug by adding a sleep(1) so ksh loses the race with the child.

diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index 9d037351..4574ce6e 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3358,6 +3358,7 @@ static pid_t sh_ntfork(Shell_t *shp, const Shnode_t *t, char *argv[], int *jobid
         if (grp == 1) job.curpgid = spawnpid;
 #ifdef SIGTSTP
         if (grp > 0 && !(otype & FAMP)) {
+            sleep(1);
             while (tcsetpgrp(job.fd, job.curpgid) < 0 && job.curpgid != spawnpid) {
                 job.curpgid = spawnpid;
             }

Then I entered this Python command in an interactive ksh:

$ python3 -c 'import os; print(os.getpgrp(), os.tcgetpgrp(0))'
32695 98273

The bug is that the two numbers are different: ksh lost the race to python, so python did tcgetpgrp() before ksh did tcsetpgrp(). If I enter the same command in some other shell, the two numbers are equal.

@krader1961 krader1961 added the RFC label Apr 12, 2018
@krader1961
Copy link
Contributor

Below is the comment I made in issue #34 where posix_spawn was mentioned. I'll also add that the fish shell has had problems like the one described above from the use of posix_spawn. And, as with the ksh code, supporting both fork and posix_spawn greatly complicated the fish code. Despite the fact that quick and dirty benchmarking suggested that, for fish at least, posix_spawn provided very little benefit I was unable to convince them to remove it.

BTW, the intended use case for posix_spawn() is spawning "helper processes" from processes with a large virtual address space (e.,g., a server like a SQL RDBMS), not starting new process groups by a CLI shell. Notably it doesn't provide the equivalent of setsid() and tcsetpgrp(). Which makes it a poor choice for shells like ksh. It's really meant to provide a cheap way for potentially large programs to run external commands without incurring the cost of duplicating its virtual memory structures (e.g., the process page tables).

While it is more efficient than fork() when the virtual process size is large that will not normally be an issue for a ksh process. And we still have to support fork() because posix_spawn() isn't available on all systems we want to support or is broken (e.g., on Cygwin). Also, note that on Linux it's actually implemented as a function not a syscall. On Linux we should be using clone(). And on other systems (e.g., BSD) we should probably be using vfork().

See https://bugzilla.redhat.com/show_bug.cgi?id=1295563 for another example of how posix_spawn() causes problems.

There was a long discussion about using posix_spawn() in the Go language: golang/go#5838. It looks like ultimately they chose to use the Linux clone() function.

@krader1961
Copy link
Contributor

I propose that we should enable _AST_no_spawnveg flag by default.

I agree. That should be the default build config. Anyone wanting to risk flakey behavior because they think the performance boost is worth it is welcome to enable posix_spawn support for their build. But it's more important that the shell behave correctly and predictably than that it run a tiny percentage faster. And it should be easy to do some microbenchmarks with and without posix_spawn to quantify the difference. I'll bet that unless you do something like cache hundreds of megabytes of data in ksh vars the performance will be close enough not to matter.

siteshwar added a commit that referenced this issue Apr 12, 2018
spawnvex family of function were introduced after the last stable
release as an improvement over spawnveg. They were being compiled even
when _AST_no_spawnveg macro was defined. This was causing test cases to
fail if ksh is compiled with '-D_AST_no_spawnveg=1' flag. This commit
makes compilation of these functions conditional.

Related: #468
@DavidMorano
Copy link

@krader1961

Please make FORK/EXEC the compilation default. There are enough flakey problems already without having to worry about whether a new process is in the proper process group or whether it is in the foreground process group for TTY purposes. Enough problems are enough, at least for now (I would think).

Also, from a performance perspective, KSH will always be a relatively tiny program (in memory size) compared with most any other real programs running on the system now-a-days. A modern fork of KSH will be one of the cheapest things that will ever happen on a modern UNIX box!

Thank you.

siteshwar added a commit that referenced this issue Apr 13, 2018
Using posix_spawn() may result in a race condition while setting
terminal foreground process group. Copying dgk's response from related
discussion in ast-developers mailing list:

"
ksh93 uses posix_spawn() rather than fork/exec to run a simple command
for performance reasons.  However, posix_spawnv() doesn't have a way to
set the terminal group before the exec().  The parent process sets the
terminal group which leads to the race.

There is a simple change you can make to a program to guarentee that
the terminal will get set at the beginning of the program.
You can add two lines.  One line is
        #include        <termios.h>

and for the other line
        tcsetpgrp(2,getpgrp());

Alternatively, you can write a program that does these two lines and then
execs the original program.
"

Only way to reliably fix this problem in ksh is to switch back to fork()/
exec().

If there are any performance regressions or bugs caused by this change,
you can switch to posix_spawn() by setting '-D_AST_no_spawnveg=0'.

Related: #468
Related: rhbz#1160482
siteshwar added a commit that referenced this issue Apr 13, 2018
spawnvex family of function were introduced after the last stable
release as an improvement over spawnveg. They were being compiled even
when _AST_no_spawnveg macro was defined. This was causing test cases to
fail if ksh is compiled with '-D_AST_no_spawnveg=1' flag. This commit
makes compilation of these functions conditional.

Related: #468
@siteshwar
Copy link
Contributor Author

I am closing this issue as we have already switched to fork()/exec() through #470. Any side effects of this change should go as a separate bug.

JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 20, 2020
WARNING: This is an experimental fix for the race conditions
         reported at ksh93#79 and att#468. There may
         be bugs that have not yet been caught.

This patch replaces usage of posix_spawn with vfork, since the
former is prone to race conditions.

src/lib/libast/features/lib,
src/lib/libast/comp/spawnveg.c:
- Remove the posix_spawn implementation of spawnveg.
- Set the terminal process group between fork/vfork and execve
  to fix a race condition.

src/cmd/ksh93/sh/xec.c:
- Set the terminal process group on error to fix a vfork(2) lockup
  and crash with fork(2); both occured after the to the spawnveg fix.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 20, 2020
WARNING: This is an experimental fix for the race conditions
         reported at ksh93#79 and att#468. There may
         be bugs that have not yet been caught.

This patch replaces usage of posix_spawn with vfork, since the
former is prone to race conditions.

src/lib/libast/features/lib,
src/lib/libast/comp/spawnveg.c:
- Remove the posix_spawn implementation of spawnveg.
- Set the terminal process group between fork/vfork and execve
  to fix a race condition.

src/cmd/ksh93/sh/xec.c:
- Set the terminal process group on error to fix a vfork(2) lockup
  and crash with fork(2); both occured after the spawnveg fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants