Fix instance checking in daily pipeline #1270
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR only impacts one function
desispec.workflow.utils.check_running()
used in the daily pipeline to check for other instances of itself. It didn't work in a cron environment because it checked for the python script name in the full process command of every running process, but cron launches a bash process with the python command in it. Since the bash process had a different PID, it always exited incorrectly thinking there was another python process.This change checks that the command be from python or the script itself rather than bash. It also checks that the other instance is from the same user (which I think is the correct thing to do and certainly proved useful when testing it as my user last night in parallel with the official pipeline under the desi user).
Usage of scripts/functions doesn't change and no other functions are impacted.
Testing
I tested this last night under my user with cron job:
*/30 0-7,20-23 * * * source ${HOME}/bin/desi_daily_test_env.sh && nohup desi_daily_proc_manager --dry-run &>${HOME}/daily-${TIMESTAMP}.log &
The logs indicate it did the correct thing. The first job woke up and continued running. Cron jobs after woke up and exited, citing the PID of the first instance. I killed that instance and the next cron job started running and future jobs exiting citing the new PID as already running.