Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nmake needs to be build with -D_FORTIFY_SOURCE=0 on MacOSX or buffer overlap protection kills it #4

Closed
jens-maus opened this issue Mar 9, 2016 · 2 comments
Assignees
Labels

Comments

@jens-maus
Copy link

During my tries to get ksh compiled on OSX El Capitan using the ast build environment I ran into the problem that I am presented with some "Abort trap 6" messages as soon as nmake is running of the ksh sources:

$ bin/package make ksh93 SHELL=sh

package: initialize the /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386 view
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/cc
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/ldd
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/lib/probe/C/make/probe
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/mamake
[...]
probing C language processor /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/cc for make information
++ set -
cmd/INIT:
sh: line 114: 68354 Abort trap: 6           /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/nmake --ignorelock --keepgoing --errorid=cmd/INIT .RWD.=cmd/INIT RECURSEROOT=.. believe
make: *** termination code 6 making cmd/INIT

Looking at the MacOSX system log files a crash is reported within nmake:

Process:               nmake [68354]
Path:                  /Users/USER/Documents/*/nmake
Identifier:            nmake
Version:               0
Code Type:             X86 (Native)
Parent Process:        ??? [68353]
Responsible:           nmake [68354]
User ID:               501

Date/Time:             2016-03-09 18:24:05.117 +0100
OS Version:            Mac OS X 10.11.4 (15E49a)
Report Version:        11
Anonymous UUID:        EDEE8ECF-E07E-787D-E6DF-2B5B6B158D92


Time Awake Since Boot: 1200000 seconds

System Integrity Protection: disabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
detected source and destination buffer overlap

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x9e3dd572 __pthread_kill + 10
1   libsystem_pthread.dylib         0x92438654 pthread_kill + 101
2   libsystem_c.dylib               0x9a5dbd00 __abort + 187
3   libsystem_c.dylib               0x9a5dbc45 abort + 173
4   libsystem_c.dylib               0x9a5dbd7f abort_report_np + 82
5   libsystem_c.dylib               0x9a60aad1 __chk_fail + 54
6   libsystem_c.dylib               0x9a60aae8 __chk_fail_overlap + 23
7   libsystem_c.dylib               0x9a60ab23 __chk_overlap + 59
8   libsystem_c.dylib               0x9a60ad29 __strcpy_chk + 72
9   nmake                           0x000d5515 resetvar + 341
10  nmake                           0x000d4ca8 setvar + 1576
11  nmake                           0x000b16cf assignment + 1295
12  nmake                           0x000ab8e9 parse + 1433
13  nmake                           0x00064432 apply + 706
14  nmake                           0x000ade10 assertion + 224
15  nmake                           0x000ab8cb parse + 1403
16  nmake                           0x000baf2e readfp + 6590
17  nmake                           0x000b9197 readfile + 1335
18  nmake                           0x000884b2 main + 7938
19  libdyld.dylib                   0x95e2c6ad start + 1

This crashlog suggests that the source and destination buffer in the strcpy() call in resetvar() overlaps and thus MacOSX is terminating the nmake process resulting in the "Abort trap: 6" messages above.

Using -D_FORTIFY_SOURCE=0 when calling bin/package make seems to workaround this problem. However, the build then fails at another sudden point (probably due to the still existing buffer overlap problem which is simply not reported anymore):

$ bin/package make ksh93 SHELL=sh CCFLAGS=-D_FORTIFY_SOURCE=0

[...]
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
make [cmd/ksh93]: *** exit code 2 making cd_pwd.o
make [cmd/ksh93]: *** exit code 2 making cflow.o
make [cmd/ksh93]: *** exit code 1 making deparse.o
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
[...]

I am using MacOSX 10.11.4 with Xcode 7.2.1 (7C1002) ending up in Apple LLVM version 7.0.2 (clang-700.1.81) being used for compilation.

@jens-maus
Copy link
Author

Please note that the above mentioned commit seems to workaround the buffer overlap problems I was seeing when trying to compile ksh for MacOSX 10.11.4 and failing when executing nmake.

@krader1961
Copy link
Contributor

This can be closed. See my comment on issue #78.

@krader1961 krader1961 self-assigned this Mar 21, 2019
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
src/cmd/ksh93/data/variables.c:
 - Running 'unset .sh.lineno' creates a memory fault, so fix that
   by giving it the NV_NOFREE attribute. This crash was happening
   because ${.sh.lineno} is an integer that cannot be freed from
   memory with free(3).

src/cmd/ksh93/sh/init.c:
 - Tell _nv_unset to ignore NV_RDONLY when $RANDOM and $LINENO are
   restored from the subshell scope. This is required to fully
   restore the original state of these variables after a virtual
   subshell finishes.

src/cmd/ksh93/bltins/typeset.c,
src/cmd/ksh93/sh/subshell.c:
 - Disabled some optimizations for two instances of 'sh_assignok' to
   fix 'readonly' in virtual subshells and '(unset .sh.level)' in
   nested functions. This fixes the following variables when
   '(readonly $varname); enum varname=' is run:

   $_
   ${.sh.name}
   ${.sh.subscript}
   ${.sh.level}

   The optimization in question prevents sh_assignok from saving the
   original state of these variables by making the sh_assignok call
   a no-op. Ksh needs the original state of a variable for it to be
   properly restored after a virtual subshell has run, otherwise ksh
   will simply carry over any new flags (being NV_RDONLY in this case)
   from the subshell into the main shell.

src/cmd/ksh93/tests/variables.sh:
 - Add regression tests from Martijn Dekker for setting special
   variables as readonly in virtual subshells and for unsetting
   special variables in general.

Fixes att#4
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
Hopefully this doesn't introduce new bugs, but it does fix at
least the following:

1. When whence -v/-a found an "undefined" (i.e. autoloadable)
   function in $FPATH, it actually loaded the function as a side
   effect of reporting on its existence (!). Now it only reports.

2. 'whence' will now canonicalise paths properly. Examples:
	$ whence ///usr/lib/../bin//./env
	/usr/bin/env
	$ (cd /; whence -v dev/../usr/bin//./env)
	dev/../usr/bin//./env is /usr/bin/env

3. 'whence' no longer prefixes a spurious double slash when doing
   something like 'cd / && whence bin/echo'. On Cygwin, an initial
   double slash denotes a network server, so this was not just a
   cosmetic problem.

4. 'whence -a' now reports a "tracked alias" (a.k.a. hash table
   entry, i.e. cached $PATH search) even if an actual alias by the
   same name exists. This needed fixing because in fact the hash
   table entry continues to be used when bypassing the alias.
   Aliases and "tracked aliases" are not remotely the same thing;
   confusing nomenclature is not a reason to report wrong results.

5. When using 'hash' or 'alias -t' on a command that is also a
   builtin to force caching a $PATH search for the external
   command, 'whence -a' double-reported the path:
	$ hash printf; whence -a printf
	printf is a shell builtin
	printf is /usr/bin/printf
	printf is a tracked alias for /usr/bin/printf
   This is now fixed so that the second output line is gone.
   Plus, if there were multiple versions of the command on $PATH,
   the tracked alias was reported at the end, which is the wrong
   order. This is also fixed.

src/cmd/ksh93/bltins/whence.c: whence():
- Refactor the do...while loop that handles whence -v/-a for path
  searches in such a way that the code actually makes sense and
  stops looking like higher esotericism. Just doing this fixed att#2,
  att#4 and att#5 above (the latter two before I even noticed them). For
  instance, the path_fullname() call to canonicalise paths was
  already there; it was just never used.
- Remove broken 'notrack' flaggery for deciding whether to report a
  hash table entry a.k.a. "tracked alias"; instead, check the hash
  table (shp->track_tree).

src/cmd/ksh93/sh/path.c:
- path_search(): Re att#3: When prefixing the PWD, first check if
  we're in '/' and if so, don't prefix it; otherwise, adding the
  next slash causes an initial double slash. (Since '/' is the only
  valid single-character absolute path, all we need to do is check
  if the second character pwd[1] is non-null.)
- path_search(): Re att#1: Stop autoloading when called by 'whence':
  * The 'flag==2' check to avoid autoloading a function was
    broken. The flag value is 2 on the first whence() loop
    iteration, but 3 on subsequent ones. Change to 'flag >= 2'.
  * However, this only fixes it if the function file does not have
    the x permission bit, as executable files are handled by
    path_absolute() which unconditionally autoloads functions!
    So, pass on our flag parameter when callling path_absolute().
- path_absolute(): Re att#1: Add flag parameter. Do not autoload
  functions if flag >= 2.

src/cmd/ksh93/include/path.h,
src/cmd/ksh93/bltins/typeset.c,
src/cmd/ksh93/sh/main.c,
src/cmd/ksh93/sh/xec.c:
- Re att#1: Update path_absolute() calls, adding a 0 flag parameter.

src/cmd/ksh93/include/name.h:
- Remove now-unused pathcomp member from union Value. It was
  introduced in 9906535 to allow examining the value of a tracked
  alias. This commit uses nv_getval() instead.

src/cmd/ksh93/tests/builtins.sh,
src/cmd/ksh93/tests/path.sh:
- Add and tweak various related tests.

Fixes: ksh93#84
gkamat pushed a commit to gkamat/ast that referenced this issue Apr 28, 2021
Original patch:
https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch

Prior discussion:
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01037.html
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01038.html
https://www.mail-archive.com/ast-users@lists.research.att.com/msg01042.html
https://bugzilla.redhat.com/1212992

On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote:
> I have a user with a ksh crashing problem, and that has
> some "Write error: No space left on device" messages
> in /var/log/messages.
>
> After some debugging, and creating a chroot on a file
> disk image, and a test user, and slowly filling the
> "on file" filesystem, e.g.
>
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
> dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2
>
> until leaving just around 12K, I managed to reproduce the
> problem, and be able to debug it with valgrind and vgdb;
> debugging on these conditions is tricky, as cannot tell
> valgrind to spawn gdb, because then gdb itself would fail
> to start.
>
> So, after following the code enough, I learned that at places
> it handles SH_JMPEXIT, there was almost non existing
> handling of SH_JMPERREXIT.
>
> ksh would evently cause a crash due to the struct
> subshell allocated on stack, in sh/subshell.c:sh_subshell
> kept set to the global subshell_data, after it siglongjmp
> back the stack due to, not fully handling the out of disk
> space errors. It would print a few messages, everytime
> a pipe was created, e.g.:
>
> /etc/profile: line 28: write to 3 failed [No space left on device]
>
> until eventually crashing due to corrupted memory; e.g. the
> references to stack data from sh_subsell in the global
> subshell_data. One strange thing to me in coredump analysis
> was that subshell_data prev field was pointing to itself when
> it eventually crashed, what later was understood and expected...
>
> The attached patch handles SH_JMPERREXIT in the code
> paths SH_JMPEXIT is handled, and the failed login, on
> full disk, ends in a pause() call:
>
> ---terminal 1---
> $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
> --vgdb-error=0 /bin/ksh -l
> ==17730== (action at startup) vgdb me ...
> ==17730==
> ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
> ==17730==   /path/to/gdb /bin/ksh
> ==17730== and then give GDB the following command
> ==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
> ==17730== --pid is optional if only one valgrind process is running
> ==17730==
> ==17730== Syscall param mount(type) points to unaddressable byte(s)
> ==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
> ==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
> ==17730==    by 0x493C8B: fs3d (fs3d.c:57)
> ==17730==    by 0x423E41: sh_init (init.c:1302)
> ==17730==    by 0x405CD3: sh_main (main.c:141)
> ==17730==    by 0x405B84: main (pmain.c:45)
> ==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==17730==
> ==17730== (action on error) vgdb me ...
> ==17730== Continuing ...
> /etc/profile: line 28: write to 3 failed [No space left on device]
> ---8<---
>
> ---terminal 2---
> (gdb) c
> Continuing.
> ^C
> Program received signal SIGTRAP, Trace/breakpoint trap.
> 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
> att#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
> att#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
> att#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
> file=0x524804 <e_sysprofile> "/etc/profile")
>     at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
> att#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
> at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
> att#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
> /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
> (gdb)
> ---8<---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants