New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTRL+C during scraping may trigger the exit of the main script later on #2524

Open
cmitu opened this Issue Oct 30, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@cmitu

cmitu commented Oct 30, 2018

(Started from #2520 (comment)).

I've noticed that using Ctrl+C while scraping, either using scraper or skyscraper, has the side effect of exiting the RetroPie-Setup script after choosing any subsequent option that produces any output. The script exits with exit code 141 (SIGPIPE), seemingly because of the tee command that pipes the output to the RetroPie-Setup log.

Steps to reproduce (using scraper, since it's fewer clicks):

  1. Open the RetroPie-Setup script, then open the scraper module (Configuration/Tools).
  2. Start a scraping session via Scrape all system or Scrape chosen system.
  3. Interrupt the scraping with Ctrl+C. The script correctly shows that Scraping was aborted and returns to the scraper selection GUI.
  4. Choose Upgrade scraper to the last version (last option in the GUI). The RetroPie-Setup script exits. ($?=141)

The problem doesn't appear when running the scraper gui with sudo ./retropie_packages.sh scraper gui.

The problem seems to be in

rps_logInit
{
rps_logStart
if fnExists "gui_${__mod_id[choice]}"; then
rp_callModule "$choice" depends
rp_callModule "$choice" gui
else
rp_callModule "$idx" clean
rp_callModule "$choice"
fi
rps_logEnd
} &> >(tee >(gzip --stdout >"$logfilename"))

tee catches somehow the SIGINT and stops, then any output from the rp_callModule "$choice" gui call will trigger a SIGPIPE in the gzip call and ends the script.

A possible fix would be to replace the tee call with

{
# call anything here
} | (tee -i >(gzip --stdout >"$logfilename"))

which doesn't trigger the script to exit, but still mangles the next gui command that has any output (after which GUI options are working fine)

Do you think this warrants more investigation/time or should be something that belongs to the scriptmodule to handle ?

@joolswills

This comment has been minimized.

Member

joolswills commented Oct 31, 2018

I will look into it. Thanks.

@hhromic

This comment has been minimized.

Contributor

hhromic commented Nov 5, 2018

Hi guys, I got entertained by this interesting puzzle, and I think I found out the problem.
It turns out that everything in the same cgroup receives the SIGINT and bails out accordingly. In this case, @cmitu correctly identified the piece of code and the processes involved are the userfunction, tee and gzip. @cmitu also correctly hinted that tee can be given the -i option for ignoring interrupt signals, however gzip is also involved and the one still aborting and mangling the log output. However gzip doesn't have such an option for ignoring signals, fortunately we can use the setsid command that launches a process in a new session (cgroup) and therefore gzip will not abort on the original SIGINT.

I made an isolated shell script that reproduces the original issue and shows the proposed solution:

#!/usr/bin/env bash

function userfunc() {  # this is just any user function with a trap, like scraper
  trap "trap INT; return" INT
  dd if=/dev/zero of=/dev/null bs=1M count=100000
  trap INT
}

function orig_broken() {  # this is the current code prone to fail on SIGINT
  {
    userfunc
    echo "userfunc retval: $?"
  } &> >(tee >(gzip --stdout >output.log.gz))
}

function fix_proposal() {  # this is the proposed solution using "tee -i" and "setsid"
  {
    userfunc
    echo "userfunc retval: $?"
  } &> >(tee -i >(setsid gzip --stdout >output.log.gz))
}

# if we CTRL+C during each execution, we randomly will exit the entire script or
# continue with no output (tee/gzip died too), the output.log.gz file will be most
# of the times empty because gzip/tee will abort without flushing
for i in $(seq 10); do
  echo "Starting $i"
  orig_broken
  echo "retval: $?"
done

# with the fix proposal, no matter how much we CTRL+C, the userfunc will
# abort gracefully and gzip and tee will cleanly finish-up the output.log.gz file
for i in $(seq 10); do
  echo "Starting $i"
  fix_proposal
  echo "retval: $?"
done

I hope this shed some light :)

@cmitu

This comment has been minimized.

cmitu commented Nov 6, 2018

@hhromic Thanks for digging into this. I wonder in this case - wouldn't it be easier to use setsid on the child processes - and have a new cgroup - to catch SIGINT there, without propagating it downstream (tee or gzip).

@hhromic

This comment has been minimized.

Contributor

hhromic commented Nov 6, 2018

That is a tempting idea, and I also tried it. However this won't work because setsid doesn't make the process the "leader", i.e. goes to background and won't receive CTRL+C keystrokes. You can try it in my example by adding setsid before dd and trying the orig_broken function. There is an option for running the command in the foreground (setsid -c, grab the controlling terminal) however this won't work inside a bash script ("Operation not permitted") because bash is already the process leader. From the manpage of the syscall (https://linux.die.net/man/2/setsid):

EPERM  The  process group ID of any process equals the PID of the call-
       ing process.  Thus, in particular, setsid() fails if the calling
       process is already a process group leader.

When you use setsid with gzip, you don't need gzip to be a process leader because it is just compressing the output of tee. Also, by fixing it at the RetroPie-level, you ensure any scriptmodule is protected without modifying them. I noticed there are many usages of tee >(gzip --stdout >"$logfilename") around the admin/setup.sh script, probably all of them should be modified to use tee -i and setsid gzip.

Lastly, did you try this solution in your particular failing scenario? I hope I reproduced the problem in the small snippet of bash script, but I would like to know if it actually solves your particular bug. Cheers!

Edit: the setsid command is in the util-linux package, I don't have a RetroPie image with me here, so not sure it is installed by default. If not, and this solution is accepted, probably util-linux needs to be added to the RetroPie depends list. Just checked, yes it is contained in the image :).


If you want to learn more, there is a nice explanation about leadership of processes in cgroups here: https://blog.nelhage.com/2011/02/changing-ctty/. However, I wouldn't recommend to use this util for this context, just to understand the problem.

@cmitu

This comment has been minimized.

cmitu commented Nov 6, 2018

Lastly, did you try this solution in your particular failing scenario? I hope I reproduced the problem in the small snippet of bash script, but I would like to know if it actually solves your particular bug. Cheers!

I tested the fix and it seems to work fine - interrupting the scraping session with Ctrl+C doesn't seem to have any other side effect now. I also had a test case built for this and it seems to fix that too.

@hhromic

This comment has been minimized.

Contributor

hhromic commented Nov 6, 2018

Sounds good! Lets wait then for @joolswills comments for further proceeding with a proper fix

@joolswills

This comment has been minimized.

Member

joolswills commented Nov 6, 2018

Thanks. Looks good but will have a test also :)

@joolswills

This comment has been minimized.

Member

joolswills commented Nov 8, 2018

Would this work ? Wouldn't this then run them both in a new cgroup ? If so I prefer it as it's a tiny bit shorter :)

function fix_proposal() {  # this is the proposed solution using "tee -i" and "setsid"
  {
    userfunc
    echo "userfunc retval: $?"
  } &> >(setsid tee >(gzip --stdout >output.log.gz))
}
@cmitu

This comment has been minimized.

cmitu commented Nov 8, 2018

It doesn't seem to work for the original test report (using scraper), gzip gets the same SIGPIPE. I think the sub-shell created for gzip still runs in the same cgroup as the main script.

@hhromic

This comment has been minimized.

Contributor

hhromic commented Nov 8, 2018

I also thought about making it as elegant as possible like that, but unfortunately that won't work because gzip is started by the shell, not tee, and will receive the CTRL+C and mangle the output.
In that case, setsid launches tee in the new cgroup but the shell still launches gzip in its own cgroup for the redirection. Another variation that will work is to use setsid both for tee and gzip:

>(setsid tee >(setsid gzip --stdout >output.log.gz))

Which is arguably not less short than the original fix proposal, but at least doesn't blind tee of receiving signals. I thought actually about it and I think this behaviour would be more robust.

@hhromic

This comment has been minimized.

Contributor

hhromic commented Nov 8, 2018

@cmitu is correct, as I just explained almost at the same time :)

@joolswills

This comment has been minimized.

Member

joolswills commented Nov 8, 2018

Thanks. I'd like to try a couple more things also - will get back to you :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment