Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make running cron jobs on demand more reliable #26

Open
bogdanghervan opened this issue Jan 11, 2015 · 3 comments
Open

Make running cron jobs on demand more reliable #26

bogdanghervan opened this issue Jan 11, 2015 · 3 comments

Comments

@bogdanghervan
Copy link
Member

Sometimes running cron jobs from the interface fails, and it does so in a silent way, still reporting that the job has started.

CronKeep should (better) report that a cron job failed to run and present the user with whatever information it has available (such as the command's output or the return code).

Capturing command output could be tricky to do with the current setup of running the command asynchronously (which is either scheduling it to run immediately using at, or via proc_open otherwise). CronKeep currently passes on the intricacies of invoking the process to symfony/Process, which uses proc_open internally.

We should strive to keep it running separately from Apache, if possible, which is what at currently does for us.

A nice to have: have output coming up on the screen as the process is being executed (TBD).

@micschk
Copy link

micschk commented Nov 24, 2016

What do you mean by 'running separate from Apache'? Because I think any the cronjobs you're editing are those of the user which Apache is running PHP as anyway, also 'at' does its best to copy the whole environment for the command to be run in later.

I've just removed the whole dependency on the 'at' command (doesn't work on OSX, my dev box), and changed the Crontab::run() to: sprintf("sh -c "%s" > /dev/null 2>&1 &", $command)

Then, $process->run(); instead of $process->start();, as start() doesn't seem to move it to leave the command running in the background after returning from php (this could be the reason for any unreliable behaviour, eg when the function returns before the $process->start() has finished (should call $process->wait() for this)).

$process->run() with the background command seems to work fine. If you would like, I can PR my changes.

@bogdanghervan bogdanghervan self-assigned this Dec 3, 2016
@bogdanghervan
Copy link
Member Author

bogdanghervan commented Dec 4, 2016

@micschk, thank you for your detailed note!

What do you mean by 'running separate from Apache'?

What I was trying to achieve is "detach" any commands being run from the Apache process triggering them, so whenever you would restart Apache your jobs would still run and not be killed pending the restart, since whenever Apache is stopped or restarted, all processes in the process group (see PGID below) receive a SIGTERM, including those launched by CronKeep.

Here's a short experiment running a script called test.sh that sleeps for 10 minutes when launched (/var/www/test.sh 2>&1 | logger -ttest).

Launched from CronKeep without at:

$ ps axfj # output truncated for brevity
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1  2881  2881  2881 ?           -1 Ss       0   0:00 /usr/sbin/apache2 -k start
 2881  2884  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2885  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2886  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2887  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2888  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2901  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2903  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2904  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2905  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2906  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2907  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
 2881  2908  2881  2881 ?           -1 S       33   0:00  \_ /usr/sbin/apache2 -k start
    1  2914  2881  2881 ?           -1 S       33   0:00 /bin/bash /var/www/test.sh
 2914  2918  2881  2881 ?           -1 S       33   0:00  \_ sleep 600
    1  2915  2881  2881 ?           -1 S       33   0:00 logger -ttest

Launched from CronKeep with at:

$ ps axfj # output truncated for brevity
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1  2985  2985  2985 ?           -1 Ss       1   0:00 atd
 2985  3005  2985  2985 ?           -1 S        1   0:00  \_ atd
 3005  3006  2985  2985 ?           -1 SN      33   0:00      \_ sh
 3006  3007  2985  2985 ?           -1 SN      33   0:00          \_ /bin/bash /var/www/test.sh
 3007  3011  2985  2985 ?           -1 SN      33   0:00          |   \_ sleep 600
 3006  3008  2985  2985 ?           -1 SN      33   0:00          \_ logger -ttest

Admittedly the job moves under atd (the at daemon) and people can restart Apache freely, which they probably do more often than they care about atd.

The same job launched by the cron daemon:

$ ps axfj # output truncated for brevity
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1  2651  2651  2651 ?           -1 Ss       0   0:00 cron
 2651  3031  2651  2651 ?           -1 S        0   0:00  \_ CRON
 3031  3032  3032  3032 ?           -1 Ss      33   0:00      \_ /bin/sh -c /var/www/test.sh 2>
 3032  3033  3032  3032 ?           -1 S       33   0:00          \_ /bin/bash /var/www/test.sh
 3033  3037  3032  3032 ?           -1 S       33   0:00          |   \_ sleep 600
 3032  3034  3032  3032 ?           -1 S       33   0:00          \_ logger -ttest

We can see how the job spawned by the cron service gets its very own session (see column SID). This means that when we stop cron orphaned processes will be inherited by init and keep on running:

$ service cron stop
$ ps axfj # output truncated for brevity
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1  3032  3032  3032 ?           -1 Ss      33   0:00 /bin/sh -c /var/www/test.sh 2>&1 | log
 3032  3033  3032  3032 ?           -1 S       33   0:00  \_ /bin/bash /var/www/test.sh
 3033  3037  3032  3032 ?           -1 S       33   0:00  |   \_ sleep 600
 3032  3034  3032  3032 ?           -1 S       33   0:00  \_ logger -ttest

I think what we really need is wrap the job being called by a script which calls setsid or setpgid as it starts, to make that process a session leader (or group leader respectively). I will experiment around and come back with my findings since I'd also like to get rid of the dependency on at.

Also, please note that we cannot afford to wait for a cron job to finish in the CronKeep UI since people can have long-running scripts that can run for hours.

@RaspiGuru
Copy link

RaspiGuru commented May 2, 2017

Hi micschk,
I made changes according to your comments (commented on 24 Nov 2016) and everything woks fine.
Great thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants