Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in certain programs when used with shim library #69

Closed
petersilva opened this issue Jul 21, 2018 · 12 comments
Closed

segfault in certain programs when used with shim library #69

petersilva opened this issue Jul 21, 2018 · 12 comments

Comments

@petersilva
Copy link
Contributor

petersilva commented Jul 21, 2018

running in a ubuntu xenial container...on certain servers (not reproducible on 18.04 test machine)
given the following script:
2018071712_obsmap_0008

(rename to get around filtering.)

anyways, on the systems where it happens the third invocation via tcl fails like so:

018-07-21 12:49:00,298 [DEBUG] sr_post file2message start with: /fs/home/fs1/ssc/di/pas037/test/cmoi_convert/2018071712_ObsMap_0008_tcl.png sb=0x7ffeada6e820 islnk=0, isdir=0, isreg=1
2018-07-21 12:49:00,302 [INFO] published: 20180721124900.298754778 sftp://peter@localhost/ /fs/home/fs1/ssc/di/pas037/test/cmoi_convert/2018071712_ObsMap_0008_tcl.png topic=v02.post.fs.home.fs1.ssc.di.pas037.test.cmoi_convert sum=s,842a30434757bd5f9bcedc7cccc987fb03fe568b6817fc567b61e98dfc998cd3c83e3f54d64e18a98795628a59f06290231512ab553021fdc1788913244f3030 source=pas037 to_clusters=hpfx1.science.gc.ca from_cluster=hpfx1.science.gc.ca mtime=20180721124900.263802 atime=20180721122726.586971969 mode=0644 parts=1,49229,1,0,0
while executing
"exec convert $File.ppm ${File}_tcl.png"
(file "/home/pas037/test/cmoi_convert/convert.tcl" line 3)
output3:1

@petersilva
Copy link
Contributor Author

modified original scripts, ksh was noise, also variables all just use $HOME now.
you need to install imagemagick and tcl.

convert.sh.txt
convert.tcl.txt

now it is reproducible on 14.04 and my 18.04 laptop. Real bug.

@petersilva
Copy link
Contributor Author

well... it has nothing to do with AMQP, or log files. have removed the SR_POST_CONFIG setting so that it does not initialize configuration or attempt any sort of connection to broker. Also replaced log file handling with just plain fprintf( stderr. (by defining macro... fb5550169bdb56207197b168fdb637d4a5d7a856

@petersilva
Copy link
Contributor Author

this should actually be in the sarrac project. oops.

@petersilva
Copy link
Contributor Author

if I add gdb to the beginning of the line that fails (and compiled with SR_POST_CONFIG, and SR_SHIMDEBUG set, the listing ends with:

R_SHIMDEBUG fclose NO POST read-only.
SR_SHIMDEBUG fclose 0x5555557b2cf0 fd=32768 starting
SR_SHIMDEBUG fclose NO POST read-only.
invalid command name "ELF����>�@8@8"
while executing
"ELF����>�@8@8 @�"
(file "/usr/bin/tclsh" line 1)
[Inferior 1 (process 25328) exited with code 01]
FIXME srshim_initialize post
(gdb)

petersilva pushed a commit to MetPX/sarrac that referenced this issue Jul 21, 2018
something the library is calling that should not reflect the exit status.
this is for

MetPX/sarracenia#69

and it helps in the sense that with the previous patch and this one,
GDB no longer segfaults, but the tclsh case still does.
@petersilva
Copy link
Contributor Author

whatever the problem is, it is in libsrshim.c itself... in the working file I have it doesn't even call any of the test of the project. something to do with stderr, errno, exit status and/or logging.

@petersilva
Copy link
Contributor Author

I replaced all occurrences of stderr with an srlog file descriptor, and it then runs fine.
It's something to do with the stderr pointer.

@petersilva
Copy link
Contributor Author

petersilva commented Jul 21, 2018

ok so don't need image magick either... tclsh by itself is enough
hello.tcl contains: exec tclsh hello2.tcl
hello2.tcl contains: puts "hello"
and the main hello.sh script is just:
'#!/bin/bash

export SR_SHIMDEBUG=1
export LD_PRELOAD=$HOME/src/sarrac/libsrshim.so.1.0.0

tclsh $HOME/test/cmoi_convert/hello.tcl
echo output3:$?
'

gives same error:

...
while executing
"exec tclsh hello2.tcl"
(file "/home/peter/test/cmoi_convert/hello.tcl" line 1)
output3:1
blacklab%

@petersilva
Copy link
Contributor Author

http://wiki.tcl.tk/8489
"Arjen Markus (27 february 2003) The exec command will return an error whenever the program (or process) that was invoked writes to standard error (or exits with a non-zero value). Some programs, like compilers, use this output channel not only to report errors but to report progress as well."

@petersilva
Copy link
Contributor Author

petersilva commented Jul 21, 2018

It looks like tclsh sets the return code non-zero (failure) if anything is written to stderr.
not sure if there is anything we can do about this. suggestions welcome.

@petersilva
Copy link
Contributor Author

Found this: If we add the -ignorestderr option to exec, all is fine.

exec -ignorestderr convert $File.ppm ${File}_tcl.png

@petersilva
Copy link
Contributor Author

though it turns out tclsh was a poor example, it did serve to illustrate a real issue, as gdb was also seg faulting. The previous fixes have addressed that, and need to get into a future release.

@petersilva
Copy link
Contributor Author

2.18.07b4 released a few weeks ago. the client reports other stuff crashing now... I believe this is related to those binaries using uninitialized values, where the stack has already been used because it has been used by the shim library. Will close for now, but if that proves false, then will need to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant