beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

jens-maus · 2016-03-08T12:38:35Z

When trying to build the beta branch under Ubuntu/Linux 15.10 using bin/package make the build fails with the following error message at tmpnam.c:

+ cc -D_BLD_DLL -fPIC -D_BLD_ast -O -I. -I/home/maus/projekte/ksh/src/lib/libast -Icomp -I/home/maus/projekte/ksh/src/lib/libast/comp -Iinclude -I/home/maus/projekte/ksh/src/lib/libast/include -Istd -I/home/maus/projekte/ksh/src/lib/libast/std -D_PACKAGE_ast -c /home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c: In function '_ast_tmpnam':
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:48:14: error: storage size of 'buf' isn't known
  static char buf[L_tmpnam];
              ^
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:50:39: error: expected expression before ',' token
  return pathtemp(s ? s : buf, L_tmpnam, NiL, "tn", NiL);
                                       ^
mamake [lib/libast]: *** exit code 1 making tmpnam.o

After analyses the L_tmpnam, P_tmpdir and L_ctermid seem to be defined empty as can be seen by looking at arch/linux.i386-64/src/lib/libast/FEATURE/stdio:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir
#endif
#ifndef L_ctermid
#define L_ctermid
#endif
#ifndef L_tmpnam
#define L_tmpnam
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After manually patching these values to sensible values like the following the build succeeds:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir "/tmp"
#endif
#ifndef L_ctermid
#define L_ctermid 1024
#endif
#ifndef L_tmpnam
#define L_tmpnam 1024
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After having applied these changes the build succeeds and ksh93 is correctly build.

The text was updated successfully, but these errors were encountered:

jens-maus · 2016-03-08T12:49:32Z

Please note that here the same issue seems to have been reported to the Debian project:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777930

The same problems seems to have been reported to the SUSE project as well:
https://forums.opensuse.org/showthread.php/508185-GCC-5

Both of these issues seem to be related to GCC5 compilation which might trigger the issue.

jens-maus · 2016-03-08T13:45:20Z

After further investigation I think I have found a working patch which fixes the issue. The openSUSE developers seem to have developed a working patch for this. See here:

https://build.opensuse.org/package/view_file/home:Andreas_Schwab:Factory/ksh/cpp.patch?expand=1

So please consider integrating this patch.

siteshwar · 2017-06-30T21:33:40Z

It has been fixed in beta branch by 477c024

ksh segfaults in job_chksave after receiving SIGCHLD https://bugs.launchpad.net/ubuntu/+source/ksh/+bug/1697501 Eric Desrochers wrote on 2017-06-12: [Impact] * The compiler optimization dropped parts from the ksh job locking mechanism from the binary code. As a consequence, ksh could terminate unexpectedly with a segmentation fault after it received the SIGCHLD signal. [Test Case] Unfortunately, there is no clear and easy way to reproduce the segfault. * But the original reporter of this bug can randomly reproduce the problem using an in-house ksh script that only works inside his infrastructure as follow : "ksh <in-house-script.ksh>" and then once in a while ksh will segfault as follow : (gdb) bt #0 job_chksave (pid=pid@entry=19003) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948 att#1 0x00000000004282ab in job_reap (sig=17) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:428 att#2 <signal handler called> ... [Regression Potential] * Regression risk : low/none expected, the package has been highly/intensively tested by a user who run over 18M ksh scripts a day on each of their clusters. [...] * The fix has been written by RH and has been proven to work for them for the last 3 years. * A test package including the RH fix has been intensively tested and verified (pre-SRU) by an affected user with positive feedbacks using a reproducer that segfault without the RH patch. * Test package (pre-SRU) feedbacks : https://bugs.launchpad.net/ubuntu/xenial/+source/ksh/+bug/1697501/comments/7 [Other Info] * Details about the RH bug : - https://bugzilla.redhat.com/show_bug.cgi?id=1123467 - https://bugzilla.redhat.com/show_bug.cgi?id=1112306 - https://access.redhat.com/solutions/1253243 - http://rhn.redhat.com/errata/RHBA-2014-1015.html - ksh.spec * Fri Jul 25 2014 Michal Hlavinka <email address hidden> - 20120801-10.8 * job locking mechanism did not survive compiler optimization (#1123467) - patch * ksh-20120801-locking.patch Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867181 [Original Description] # gdb [New LWP 3882] Core was generated by `/bin/ksh <KSH_SCRIPT>.ksh'. Program terminated with signal SIGSEGV, Segmentation fault. #0 job_chksave (pid=pid@entry=19385) at /build/ksh-6IEHIC/ksh-93u+20120801/src/cmd/ksh93/sh/jobs.c:1948 1948 if(jp->pid==pid) (gdb) p *jp Cannot access memory at address 0xb (gdb) p *jp->pid Cannot access memory at address 0x13 (gdb) p pid $2 = 19385 (gdb) p *jpold $1 = {next = 0xb, pid = -604008960, exitval = 11124} The struct is corrupted at some point looking at the next,pid and exitval struct members values which isn't valid data. # assembly code => 0x0000000000427159 <+41>: cmp %edi,0x8(%rdx) (gdb) p $edi ## pid variable $1 = 19385 (gdb) p *($rdx + 8) ## jp->pid struct Cannot access memory at address 0x13 -- ksh is segfaulting because it can't access struct "jp" ($rdx) thus cannot de-reference the struct member "jp>pid" ($rdx + 8) at line : src/cmd/ksh93/sh/jobs.c:1948 when looking if jp->pid is equal to pid ($edi) variable.

The code for handling process substitution with redirection was never being run because IORAW is usually set when IOPROCSUB is set. This commit fixes the problem by moving the required code out of the !IORAW if statement. The following command now prints 'good' instead of writing 'ok' to a bizzare file: $ ksh -c 'echo ok > >(sed s/ok/good/); wait' good This commit also fixes a bug that caused the process ID of the asynchronous process to print when the shell was in interactive mode. The following command no longer prints a process ID, behaving like in Bash and zsh: $ echo >(true) /dev/fd/5 src/cmd/ksh93/sh/args.c: - Temporarily turn off the interactive state while in a process substitution to prevent the shell from printing the PID of the asynchronous process. src/cmd/ksh93/sh/io.c: - Move the code for process substitution with redirection into a separate if statement. src/cmd/ksh93/tests/io.sh: - Add two tests for both process substitution bugs fixed by this commit. src/cmd/ksh93/tests/shtests: - Update shtests with a patch from Martijn Dekker to use pretty-printing for the output from the times builtin (if it is available). Fixes att#2

Hopefully this doesn't introduce new bugs, but it does fix at least the following: 1. When whence -v/-a found an "undefined" (i.e. autoloadable) function in $FPATH, it actually loaded the function as a side effect of reporting on its existence (!). Now it only reports. 2. 'whence' will now canonicalise paths properly. Examples: $ whence ///usr/lib/../bin//./env /usr/bin/env $ (cd /; whence -v dev/../usr/bin//./env) dev/../usr/bin//./env is /usr/bin/env 3. 'whence' no longer prefixes a spurious double slash when doing something like 'cd / && whence bin/echo'. On Cygwin, an initial double slash denotes a network server, so this was not just a cosmetic problem. 4. 'whence -a' now reports a "tracked alias" (a.k.a. hash table entry, i.e. cached $PATH search) even if an actual alias by the same name exists. This needed fixing because in fact the hash table entry continues to be used when bypassing the alias. Aliases and "tracked aliases" are not remotely the same thing; confusing nomenclature is not a reason to report wrong results. 5. When using 'hash' or 'alias -t' on a command that is also a builtin to force caching a $PATH search for the external command, 'whence -a' double-reported the path: $ hash printf; whence -a printf printf is a shell builtin printf is /usr/bin/printf printf is a tracked alias for /usr/bin/printf This is now fixed so that the second output line is gone. Plus, if there were multiple versions of the command on $PATH, the tracked alias was reported at the end, which is the wrong order. This is also fixed. src/cmd/ksh93/bltins/whence.c: whence(): - Refactor the do...while loop that handles whence -v/-a for path searches in such a way that the code actually makes sense and stops looking like higher esotericism. Just doing this fixed att#2, att#4 and att#5 above (the latter two before I even noticed them). For instance, the path_fullname() call to canonicalise paths was already there; it was just never used. - Remove broken 'notrack' flaggery for deciding whether to report a hash table entry a.k.a. "tracked alias"; instead, check the hash table (shp->track_tree). src/cmd/ksh93/sh/path.c: - path_search(): Re att#3: When prefixing the PWD, first check if we're in '/' and if so, don't prefix it; otherwise, adding the next slash causes an initial double slash. (Since '/' is the only valid single-character absolute path, all we need to do is check if the second character pwd[1] is non-null.) - path_search(): Re att#1: Stop autoloading when called by 'whence': * The 'flag==2' check to avoid autoloading a function was broken. The flag value is 2 on the first whence() loop iteration, but 3 on subsequent ones. Change to 'flag >= 2'. * However, this only fixes it if the function file does not have the x permission bit, as executable files are handled by path_absolute() which unconditionally autoloads functions! So, pass on our flag parameter when callling path_absolute(). - path_absolute(): Re att#1: Add flag parameter. Do not autoload functions if flag >= 2. src/cmd/ksh93/include/path.h, src/cmd/ksh93/bltins/typeset.c, src/cmd/ksh93/sh/main.c, src/cmd/ksh93/sh/xec.c: - Re att#1: Update path_absolute() calls, adding a 0 flag parameter. src/cmd/ksh93/include/name.h: - Remove now-unused pathcomp member from union Value. It was introduced in 9906535 to allow examining the value of a tracked alias. This commit uses nv_getval() instead. src/cmd/ksh93/tests/builtins.sh, src/cmd/ksh93/tests/path.sh: - Add and tweak various related tests. Fixes: ksh93#84

For one Red Hat customer, the following reproducer consistently crashed, tough I was not able to reproduce it and neither was RH. However, the crash analysis is sound (see below). function dlog { fc -ln -0 } trap dlog DEBUG >/tmp/blah Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-arraylen.patch The Red Hat bug thread is closed to the public as it also contains some correspondence with their customer. But it has an excellent crash analysis from Thomas Gardner which I'm including here for the record (the line numbers are for their ksh at the time, not 93u+m). ===begin analysis=== > The creation of an empty file instead of a command that executes > anything causes the coredump. [...] > Here is my analysis on the core that was provided by the customer: > > (gdb) bt > #0 sh_fmtq (string=0x1 <Address 0x1 out of bounds>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/string.c:340 > att#1 0x0000000000457e96 in out_string (cp=<value optimized out>, c=32, > quoted=<value optimized out>, iop=<value optimized out>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:444 > att#2 0x000000000045804c in sh_debug (shp=0x76d180, trap=0x7f2f13a821e0 "dlog", > name=<value optimized out>, subscript=<value optimized out>, > argv=0x76e070, flags=<value optimized out>) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:548 > att#3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > [...need go no further...] > > In frame 2, we can see it cycling through your classic > (char **)argv array like: > > 543 while(cp = *argv++) > 544 { > 545 if((flags&ARG_EXP) && argv[1]==0) > 546 out_pattern(iop, cp,' '); > 547 else > 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv); > 549 } > 550 if(flags&ARG_ASSIGN) > 551 sfputc(iop,')'); > 552 else if(iop==stkstd) > > (we seg-fault after going down the out_string function in line > 548 up there). The string pointer that points to = 0x1 up in > frame #0 (sh_fmtq) traces back to the "cp" variable in line 548 > up there. The "argv" variable being referenced up there just gets > passed in as the fifth argument to this function. > > In frame att#3 (sh_exec, line 1265), the line that makes the call > that takes us to frame 2 is: > > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_R AW); > > so "com" (the fifth argument) is what's going wrong as it > descends down through these calls. Looking at where it comes > from, well, it's assigned here: > > 1241 if(argn==0) > 1242 { > 1243 /* fake 'true' built-in */ > 1244 np = SYSTRUE; > 1245 *argv = nv_name(np); > 1246 com = argv; > 1247 } > > because as we can see: > > (gdb) f 3 > att#3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW); > (gdb) p argn > $2 = 0 > (gdb) > > argn is == 0 here. The tip-off here is that nv_name clearly > returns a simple pointer to an array of characters, not an array > of pointers to arrays of characters as is evidenced by the fact > that the assignment is "*argv = nv_name(np);" not "argv = > nv_name(np);". Looking at the function nv_name proves that it > does indeed return a single pointer to an array of characters, > not a pointer to an array of pointers to arrays of characters. > Now, com is defined as a 'char **': > > 1002 char *cp=0, **com=0, *comn; > > (as it is expected to be in the calls that follow) also, that > argv is also defined as the effective equivalent a 'char **': > > 1237 static char *argv[1]; > > Yup, argv is actually an array of pointers (char ** equivalent), > but that array is restricted to having exactly one element. > Recalling the assignment in the previously quoted line: > > 1245 *argv = nv_name(np); > > we see that the one and only element in that argv array is > getting assigned a pointer to an array of characters here. > Nothing necessarily wrong with that, but remember the loop we > looked at earlier in frame att#2 (sh_debug). It went like: > > 543 while(cp = *argv++) > 544 { > 545 if((flags&ARG_EXP) && argv[1]==0) > 546 out_pattern(iop, cp,' '); > 547 else > 548 out_string(iop, cp,' ',n?0: (flags&(ARG_RAW|ARG_NOGLOB))||*argv); > 549 } > > which is clearly expecting argv in this context (com in frame 3, > which really points to that static local single element array > that is also pointed to by argv in frame 2) to be an array of > pointers of indefinite size, each element being a pointer, but > whose last element will be a null pointer. Well, in frame 3 it is > clearly an array with only a single element, and that one element > is NOT pointing to null. Watch this: > > (gdb) f 3 > att#3 0x000000000045a867 in sh_exec (t=0x7f2f13aafad0, flags=4) > at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/xec.c:1265 > 1265 int n = sh_debug(shp,trap,(char*)0,(char*)0, com, ARG_RAW); > (gdb) p com > $8 = (char **) 0x76e060 > (gdb) p &argv > $9 = (char *(*)[1]) 0x76e060 > (gdb) p com[0] > $11 = 0x5009c6 "true" > (gdb) p com[1] > $10 = 0x1 <Address 0x1 out of bounds> > (gdb) p argv[0] > $12 = 0x5009c6 "true" > (gdb) p argv[1] > $13 = 0x1 <Address 0x1 out of bounds> > (gdb) > > So, as expected, com and &argv point to the same place, the first > element points to the constant string "true", but since the array > is defined as having only one element, when you refer to a second > element in that array, you get well, whatever random crap happens > to be in that memory location. When we try to reproduce this > problem, apparently we're getting 0 there (or we're not quite > following this same code path, which is also possible), but the > customer happens to have a "1" in that memory location. ===end analysis=== src/cmd/ksh93/sh/xec.c: sh_exec(): - When processing TCOM (simple command) with an empty/null command, increase the size of the static dummy argv[1] array to argv[2], ensuring a terminating NULL element so that 'while(cp = *argv++)' loops don't crash. (Note that static objects are automatically initialised to zero in C.) src/cmd/ksh93/tests/io.sh: - Adapt the reproducer, testing a null-command redirection 1000x.

Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch Prior discussion: https://www.mail-archive.com/ast-users@lists.research.att.com/msg01037.html https://www.mail-archive.com/ast-users@lists.research.att.com/msg01038.html https://www.mail-archive.com/ast-users@lists.research.att.com/msg01042.html https://bugzilla.redhat.com/1212992 On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote: > I have a user with a ksh crashing problem, and that has > some "Write error: No space left on device" messages > in /var/log/messages. > > After some debugging, and creating a chroot on a file > disk image, and a test user, and slowly filling the > "on file" filesystem, e.g. > > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024 > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2 > > until leaving just around 12K, I managed to reproduce the > problem, and be able to debug it with valgrind and vgdb; > debugging on these conditions is tricky, as cannot tell > valgrind to spawn gdb, because then gdb itself would fail > to start. > > So, after following the code enough, I learned that at places > it handles SH_JMPEXIT, there was almost non existing > handling of SH_JMPERREXIT. > > ksh would evently cause a crash due to the struct > subshell allocated on stack, in sh/subshell.c:sh_subshell > kept set to the global subshell_data, after it siglongjmp > back the stack due to, not fully handling the out of disk > space errors. It would print a few messages, everytime > a pipe was created, e.g.: > > /etc/profile: line 28: write to 3 failed [No space left on device] > > until eventually crashing due to corrupted memory; e.g. the > references to stack data from sh_subsell in the global > subshell_data. One strange thing to me in coredump analysis > was that subshell_data prev field was pointing to itself when > it eventually crashed, what later was understood and expected... > > The attached patch handles SH_JMPERREXIT in the code > paths SH_JMPEXIT is handled, and the failed login, on > full disk, ends in a pause() call: > > ---terminal 1--- > $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full > --vgdb-error=0 /bin/ksh -l > ==17730== (action at startup) vgdb me ... > ==17730== > ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==17730== /path/to/gdb /bin/ksh > ==17730== and then give GDB the following command > ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730 > ==17730== --pid is optional if only one valgrind process is running > ==17730== > ==17730== Syscall param mount(type) points to unaddressable byte(s) > ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so) > ==17730== by 0x493E58: fs3d_mount (fs3d.c:115) > ==17730== by 0x493C8B: fs3d (fs3d.c:57) > ==17730== by 0x423E41: sh_init (init.c:1302) > ==17730== by 0x405CD3: sh_main (main.c:141) > ==17730== by 0x405B84: main (pmain.c:45) > ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==17730== > ==17730== (action on error) vgdb me ... > ==17730== Continuing ... > /etc/profile: line 28: write to 3 failed [No space left on device] > ---8<--- > > ---terminal 2--- > (gdb) c > Continuing. > ^C > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > att#1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665 > att#2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604 > att#3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0, > file=0x524804 <e_sysprofile> "/etc/profile") > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109 > att#4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0) > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202 > att#5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45 > (gdb) > ---8<---

stephane-chazelas mentioned this issue Apr 27, 2016

Issue with printf %Lb "\0200" in UTF-8 locales #14

Closed

dannyweldon added the bug label Mar 5, 2017

siteshwar closed this as completed Jun 30, 2017

siteshwar mentioned this issue Nov 30, 2017

path.sh test fails on macOS with a call to abort() #169

Closed

philippe56 mentioned this issue Apr 13, 2018

ksh93: Local variables are passed on to called functions when they share their name with an invocation-level binding #38

Open

krader1961 mentioned this issue Apr 25, 2019

"print -v" outputs incorrect binary value for integer variable #1291

Closed

krader1961 mentioned this issue Nov 16, 2019

Crash when attempting to open a TCP socket. #1438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

jens-maus commented Mar 8, 2016

jens-maus commented Mar 8, 2016

jens-maus commented Mar 8, 2016

siteshwar commented Jun 30, 2017

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

beta branch: Building ksh93 fails at libast/comp/tmpnam.c #2

Comments

jens-maus commented Mar 8, 2016

jens-maus commented Mar 8, 2016

jens-maus commented Mar 8, 2016

siteshwar commented Jun 30, 2017