Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

Closed
jens-maus opened this issue Mar 8, 2016 · 3 comments
Closed

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

jens-maus opened this issue Mar 8, 2016 · 3 comments
Labels

Comments

@jens-maus
Copy link

When trying to build the beta branch under Ubuntu/Linux 15.10 using bin/package make the build fails with the following error message at tmpnam.c:

+ cc -D_BLD_DLL -fPIC -D_BLD_ast -O -I. -I/home/maus/projekte/ksh/src/lib/libast -Icomp -I/home/maus/projekte/ksh/src/lib/libast/comp -Iinclude -I/home/maus/projekte/ksh/src/lib/libast/include -Istd -I/home/maus/projekte/ksh/src/lib/libast/std -D_PACKAGE_ast -c /home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c: In function '_ast_tmpnam':
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:48:14: error: storage size of 'buf' isn't known
  static char buf[L_tmpnam];
              ^
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:50:39: error: expected expression before ',' token
  return pathtemp(s ? s : buf, L_tmpnam, NiL, "tn", NiL);
                                       ^
mamake [lib/libast]: *** exit code 1 making tmpnam.o

After analyses the L_tmpnam, P_tmpdir and L_ctermid seem to be defined empty as can be seen by looking at arch/linux.i386-64/src/lib/libast/FEATURE/stdio:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir
#endif
#ifndef L_ctermid
#define L_ctermid
#endif
#ifndef L_tmpnam
#define L_tmpnam
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After manually patching these values to sensible values like the following the build succeeds:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir "/tmp"
#endif
#ifndef L_ctermid
#define L_ctermid 1024
#endif
#ifndef L_tmpnam
#define L_tmpnam 1024
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After having applied these changes the build succeeds and ksh93 is correctly build.

@jens-maus
Copy link
Author

Please note that here the same issue seems to have been reported to the Debian project:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777930

The same problems seems to have been reported to the SUSE project as well:
https://forums.opensuse.org/showthread.php/508185-GCC-5

Both of these issues seem to be related to GCC5 compilation which might trigger the issue.

@jens-maus
Copy link
Author

After further investigation I think I have found a working patch which fixes the issue. The openSUSE developers seem to have developed a working patch for this. See here:

https://build.opensuse.org/package/view_file/home:Andreas_Schwab:Factory/ksh/cpp.patch?expand=1

So please consider integrating this patch.

@siteshwar
Copy link
Contributor

It has been fixed in beta branch by 477c024

gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
ksh segfaults in job_chksave after receiving SIGCHLD

https://bugs.launchpad.net/ubuntu/+source/ksh/+bug/1697501
Eric Desrochers wrote on 2017-06-12:

[Impact]
* The compiler optimization dropped parts from the ksh job
  locking mechanism from the binary code. As a consequence, ksh
  could terminate unexpectedly with a segmentation fault after
  it received the SIGCHLD signal.

[Test Case]

Unfortunately, there is no clear and easy way to reproduce the
segfault.
* But the original reporter of this bug can randomly reproduce
  the problem using an in-house ksh script that only works
  inside his infrastructure as follow : "ksh
  <in-house-script.ksh>" and then once in a while ksh will
  segfault as follow :

(gdb) bt
#0 job_chksave (pid=pid@entry=19003) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948
att#1 0x00000000004282ab in job_reap (sig=17) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:428
att#2 <signal handler called>
...

[Regression Potential]
* Regression risk : low/none expected, the package has been
  highly/intensively tested by a user who run over 18M ksh
  scripts a day on each of their clusters.
[...]
* The fix has been written by RH and has been proven to work for
  them for the last 3 years.
* A test package including the RH fix has been intensively tested
  and verified (pre-SRU) by an affected user with positive
  feedbacks using a reproducer that segfault without the RH
  patch.
* Test package (pre-SRU) feedbacks :
  https://bugs.launchpad.net/ubuntu/xenial/+source/ksh/+bug/1697501/comments/7

[Other Info]
* Details about the RH bug :
  - https://bugzilla.redhat.com/show_bug.cgi?id=1123467
  - https://bugzilla.redhat.com/show_bug.cgi?id=1112306
  - https://access.redhat.com/solutions/1253243
  - http://rhn.redhat.com/errata/RHBA-2014-1015.html
  - ksh.spec
    * Fri Jul 25 2014 Michal Hlavinka <email address hidden> - 20120801-10.8
    * job locking mechanism did not survive compiler optimization (#1123467)
  - patch
    * ksh-20120801-locking.patch

Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181

[Original Description]

# gdb
[New LWP 3882]
Core was generated by `/bin/ksh <KSH_SCRIPT>.ksh'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 job_chksave (pid=pid@entry=19385) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948
1948 if(jp->pid==pid)

(gdb) p *jp
Cannot access memory at address 0xb

(gdb) p *jp->pid
Cannot access memory at address 0x13

(gdb) p pid
$2 = 19385

(gdb) p *jpold
$1 = {next = 0xb, pid = -604008960, exitval = 11124}

The struct is corrupted at some point looking at the next,pid and
exitval struct members values which isn't valid data.

# assembly code
=> 0x0000000000427159 <+41>: cmp %edi,0x8(%rdx)

(gdb) p $edi ## pid variable
$1 = 19385

(gdb) p *($rdx + 8) ## jp->pid struct
Cannot access memory at address 0x13
--

ksh is segfaulting because it can't access struct "jp" ($rdx)
thus cannot de-reference the struct member "jp>pid" ($rdx + 8) at
line : src/cmd/ksh93/sh/jobs.c:1948 when looking if jp->pid is
equal to pid ($edi) variable.
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
The code for handling process substitution with redirection was
never being run because IORAW is usually set when IOPROCSUB is
set. This commit fixes the problem by moving the required code
out of the !IORAW if statement. The following command now prints
'good' instead of writing 'ok' to a bizzare file:

$ ksh -c 'echo ok > >(sed s/ok/good/); wait'
good

This commit also fixes a bug that caused the process ID of the
asynchronous process to print when the shell was in interactive
mode. The following command no longer prints a process ID,
behaving like in Bash and zsh:

$ echo >(true)
/dev/fd/5

src/cmd/ksh93/sh/args.c:
 - Temporarily turn off the interactive state while in a process
   substitution to prevent the shell from printing the PID of
   the asynchronous process.

src/cmd/ksh93/sh/io.c:
 - Move the code for process substitution with redirection into
   a separate if statement.

src/cmd/ksh93/tests/io.sh:
 - Add two tests for both process substitution bugs fixed by this
   commit.

src/cmd/ksh93/tests/shtests:
 - Update shtests with a patch from Martijn Dekker to use
   pretty-printing for the output from the times builtin (if it
   is available).

Fixes att#2
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
Hopefully this doesn't introduce new bugs, but it does fix at
least the following:

1. When whence -v/-a found an "undefined" (i.e. autoloadable)
   function in $FPATH, it actually loaded the function as a side
   effect of reporting on its existence (!). Now it only reports.

2. 'whence' will now canonicalise paths properly. Examples:
	$ whence ///usr/lib/../bin//./env
	/usr/bin/env
	$ (cd /; whence -v dev/../usr/bin//./env)
	dev/../usr/bin//./env is /usr/bin/env

3. 'whence' no longer prefixes a spurious double slash when doing
   something like 'cd / && whence bin/echo'. On Cygwin, an initial
   double slash denotes a network server, so this was not just a
   cosmetic problem.

4. 'whence -a' now reports a "tracked alias" (a.k.a. hash table
   entry, i.e. cached $PATH search) even if an actual alias by the
   same name exists. This needed fixing because in fact the hash
   table entry continues to be used when bypassing the alias.
   Aliases and "tracked aliases" are not remotely the same thing;
   confusing nomenclature is not a reason to report wrong results.

5. When using 'hash' or 'alias -t' on a command that is also a
   builtin to force caching a $PATH search for the external
   command, 'whence -a' double-reported the path:
	$ hash printf; whence -a printf
	printf is a shell builtin
	printf is /usr/bin/printf
	printf is a tracked alias for /usr/bin/printf
   This is now fixed so that the second output line is gone.
   Plus, if there were multiple versions of the command on $PATH,
   the tracked alias was reported at the end, which is the wrong
   order. This is also fixed.

src/cmd/ksh93/bltins/whence.c: whence():
- Refactor the do...while loop that handles whence -v/-a for path
  searches in such a way that the code actually makes sense and
  stops looking like higher esotericism. Just doing this fixed att#2,
  att#4 and att#5 above (the latter two before I even noticed them). For
  instance, the path_fullname() call to canonicalise paths was
  already there; it was just never used.
- Remove broken 'notrack' flaggery for deciding whether to report a
  hash table entry a.k.a. "tracked alias"; instead, check the hash
  table (shp->track_tree).

src/cmd/ksh93/sh/path.c:
- path_search(): Re att#3: When prefixing the PWD, first check if
  we're in '/' and if so, don't prefix it; otherwise, adding the
  next slash causes an initial double slash. (Since '/' is the only
  valid single-character absolute path, all we need to do is check
  if the second character pwd[1] is non-null.)
- path_search(): Re att#1: Stop autoloading when called by 'whence':
  * The 'flag==2' check to avoid autoloading a function was
    broken. The flag value is 2 on the first whence() loop
    iteration, but 3 on subsequent ones. Change to 'flag >= 2'.
  * However, this only fixes it if the function file does not have
    the x permission bit, as executable files are handled by
    path_absolute() which unconditionally autoloads functions!
    So, pass on our flag parameter when callling path_absolute().
- path_absolute(): Re att#1: Add flag parameter. Do not autoload
  functions if flag >= 2.

src/cmd/ksh93/include/path.h,
src/cmd/ksh93/bltins/typeset.c,
src/cmd/ksh93/sh/main.c,
src/cmd/ksh93/sh/xec.c:
- Re att#1: Update path_absolute() calls, adding a 0 flag parameter.

src/cmd/ksh93/include/name.h:
- Remove now-unused pathcomp member from union Value. It was
  introduced in 9906535 to allow examining the value of a tracked
  alias. This commit uses nv_getval() instead.

src/cmd/ksh93/tests/builtins.sh,
src/cmd/ksh93/tests/path.sh:
- Add and tweak various related tests.

Fixes: ksh93#84
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
For one Red Hat customer, the following reproducer consistently
crashed, tough I was not able to reproduce it and neither was RH.
However, the crash analysis is sound (see below).

    function dlog
    {
      fc -ln -0
    }
    trap dlog DEBUG
    >/tmp/blah

Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-arraylen.patch

The Red Hat bug thread is closed to the public as it also contains
some correspondence with their customer. But it has an excellent
crash analysis from Thomas Gardner which I'm including here for the
record (the line numbers are for their ksh at the time, not 93u+m).

===begin analysis===
> The creation of an empty file instead of a command that executes
> anything causes the coredump.
[...]
> Here is my analysis on the core that was provided by the customer:
>
> (gdb) bt
> #0  sh_fmtq (string=0x1 <Address 0x1 out of bounds>)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/string.c:340
> att#1  0x0000000000457e96 in out_string (cp=<value optimized out>, c=32,
>     quoted=<value optimized out>, iop=<value optimized out>)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:444
> att#2  0x000000000045804c in sh_debug (shp=0x76d180, trap=0x7f2f13a821e0 "dlog",
>     name=<value optimized out>, subscript=<value optimized out>,
>     argv=0x76e070, flags=<value optimized out>)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:548
> att#3  0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> [...need go no further...]
>
> In frame 2, we can see it cycling through your classic
> (char **)argv array like:
>
> 543		while(cp = *argv++)
> 544		{
> 545			if((flags&ARG_EXP) && argv[1]==0)
> 546				out_pattern(iop, cp,' ');
> 547			else
> 548				out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv);
> 549		}
> 550		if(flags&ARG_ASSIGN)
> 551			sfputc(iop,')');
> 552		else if(iop==stkstd)
>
> (we seg-fault after going down the out_string function in line
> 548 up there). The string pointer that points to = 0x1 up in
> frame #0 (sh_fmtq) traces back to the "cp" variable in line 548
> up there. The "argv" variable being referenced up there just gets
> passed in as the fifth argument to this function.
>
> In frame att#3 (sh_exec, line 1265), the line that makes the call
> that takes us to frame 2 is:
>
> 1265                     int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_R     AW);
>
> so "com" (the fifth argument) is what's going wrong as it
> descends down through these calls. Looking at where it comes
> from, well, it's assigned here:
>
> 1241                 if(argn==0)
> 1242                 {
> 1243                     /* fake 'true' built-in */
> 1244                     np = SYSTRUE;
> 1245                     *argv = nv_name(np);
> 1246                     com = argv;
> 1247                 }
>
> because as we can see:
>
> (gdb) f 3
> att#3  0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> 1265						int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW);
> (gdb) p argn
> $2 = 0
> (gdb)
>
> argn is == 0 here. The tip-off here is that nv_name clearly
> returns a simple pointer to an array of characters, not an array
> of pointers to arrays of characters as is evidenced by the fact
> that the assignment is "*argv = nv_name(np);" not "argv =
> nv_name(np);". Looking at the function nv_name proves that it
> does indeed return a single pointer to an array of characters,
> not a pointer to an array of pointers to arrays of characters.
> Now, com is defined as a 'char **':
>
> 1002         char        *cp=0, **com=0, *comn;
>
> (as it is expected to be in the calls that follow) also, that
> argv is also defined as the effective equivalent a 'char **':
>
> 1237                 static char *argv[1];
>
> Yup, argv is actually an array of pointers (char ** equivalent),
> but that array is restricted to having exactly one element.
> Recalling the assignment in the previously quoted line:
>
> 1245                     *argv = nv_name(np);
>
> we see that the one and only element in that argv array is
> getting assigned a pointer to an array of characters here.
> Nothing necessarily wrong with that, but remember the loop we
> looked at earlier in frame att#2 (sh_debug). It went like:
>
> 543		while(cp = *argv++)
> 544		{
> 545			if((flags&ARG_EXP) && argv[1]==0)
> 546				out_pattern(iop, cp,' ');
> 547			else
> 548				out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv);
> 549		}
>
> which is clearly expecting argv in this context (com in frame 3,
> which really points to that static local single element array
> that is also pointed to by argv in frame 2) to be an array of
> pointers of indefinite size, each element being a pointer, but
> whose last element will be a null pointer. Well, in frame 3 it is
> clearly an array with only a single element, and that one element
> is NOT pointing to null. Watch this:
>
> (gdb) f 3
> att#3  0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4)
>     at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265
> 1265						int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW);
> (gdb) p com
> $8 = (char **) 0x76e060
> (gdb) p &argv
> $9 = (char *(*)[1]) 0x76e060
> (gdb) p com[0]
> $11 = 0x5009c6 "true"
> (gdb) p com[1]
> $10 = 0x1 <Address 0x1 out of bounds>
> (gdb) p argv[0]
> $12 = 0x5009c6 "true"
> (gdb) p argv[1]
> $13 = 0x1 <Address 0x1 out of bounds>
> (gdb)
>
> So, as expected, com and &argv point to the same place, the first
> element points to the constant string "true", but since the array
> is defined as having only one element, when you refer to a second
> element in that array, you get well, whatever random crap happens
> to be in that memory location. When we try to reproduce this
> problem, apparently we're getting 0 there (or we're not quite
> following this same code path, which is also possible), but the
> customer happens to have a "1" in that memory location.
===end analysis===

src/cmd/ksh93/sh/xec.c: sh_exec():
- When processing TCOM (simple command) with an empty/null command,
  increase the size of the static dummy argv[1] array to argv[2],
  ensuring a terminating NULL element so that 'while(cp = *argv++)'
  loops don't crash. (Note that static objects are automatically
  initialised to zero in C.)

src/cmd/ksh93/tests/io.sh:
- Adapt the reproducer, testing a null-command redirection 1000x.
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch

Prior discussion:
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01037.html
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01038.html
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01042.html
https://bugzilla.redhat.com/1212992

On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote:
> I have a user with a ksh crashing problem, and that has
> some "Write error: No space left on device" messages
> in /var/log/messages.
>
> After some debugging, and creating a chroot on a file
> disk image, and a test user, and slowly filling the
> "on file" filesystem, e.g.
>
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2
>
> until leaving just around 12K, I managed to reproduce the
> problem, and be able to debug it with valgrind and vgdb;
> debugging on these conditions is tricky, as cannot tell
> valgrind to spawn gdb, because then gdb itself would fail
> to start.
>
> So, after following the code enough, I learned that at places
> it handles SH_JMPEXIT, there was almost non existing
> handling of SH_JMPERREXIT.
>
> ksh would evently cause a crash due to the struct
> subshell allocated on stack, in sh/subshell.c:sh_subshell
> kept set to the global subshell_data, after it siglongjmp
> back the stack due to, not fully handling the out of disk
> space errors. It would print a few messages, everytime
> a pipe was created, e.g.:
>
> /etc/profile: line 28: write to 3 failed [No space left on device]
>
> until eventually crashing due to corrupted memory; e.g. the
> references to stack data from sh_subsell in the global
> subshell_data. One strange thing to me in coredump analysis
> was that subshell_data prev field was pointing to itself when
> it eventually crashed, what later was understood and expected...
>
> The attached patch handles SH_JMPERREXIT in the code
> paths SH_JMPEXIT is handled, and the failed login, on
> full disk, ends in a pause() call:
>
> ---terminal 1---
> $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
> --vgdb-error=0 /bin/ksh -l
> ==17730== (action at startup) vgdb me ...
> ==17730==
> ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
> ==17730==   /path/to/gdb /bin/ksh
> ==17730== and then give GDB the following command
> ==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
> ==17730== --pid is optional if only one valgrind process is running
> ==17730==
> ==17730== Syscall param mount(type) points to unaddressable byte(s)
> ==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
> ==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
> ==17730==    by 0x493C8B: fs3d (fs3d.c:57)
> ==17730==    by 0x423E41: sh_init (init.c:1302)
> ==17730==    by 0x405CD3: sh_main (main.c:141)
> ==17730==    by 0x405B84: main (pmain.c:45)
> ==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==17730==
> ==17730== (action on error) vgdb me ...
> ==17730== Continuing ...
> /etc/profile: line 28: write to 3 failed [No space left on device]
> ---8<---
>
> ---terminal 2---
> (gdb) c
> Continuing.
> ^C
> Program received signal SIGTRAP, Trace/breakpoint trap.
> 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> att#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
> att#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
> att#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
> file=0x524804 <e_sysprofile> "/etc/profile")
>     at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
> att#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
> att#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
> (gdb)
> ---8<---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants