Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

desi_proc_tilenight processes skysub when --laststeps=skysub #2015

Merged
merged 4 commits into from Mar 16, 2023

Conversation

marcelo-alvarez
Copy link
Contributor

@marcelo-alvarez marcelo-alvarez commented Mar 4, 2023

Fixes #2007 by propagating the --laststeps option through to tilenight. I have tested with:

% desi_run_night -n 20230210 --z-submit-types=cumulative --laststeps="all,skysub,fluxcal" --all-tiles --dry-run-level=1

and I obtain the correct flags in the tilenight batch script:

% grep desi_proc_tilenight /global/cfs/cdirs/desi/users/malvarez/spectro/redux/tilenight-skysub/run/scripts/night/20230210/tilenight-20230210-5627.slurm  | grep -v echo
 srun -N 1 -n 64 -c 2 --cpu-bind=cores desi_mps_wrapper desi_proc_tilenight -n 20230210 -t 5627 --mpi --mpistdstars --laststeps="all,skysub,fluxcal" --timingfile /global/cfs/cdirs/desi/users/malvarez/spectro/redux/tilenight-skysub/run/scripts/night/20230210/tilenight-20230210-5627-timing-$SLURM_JOBID.json

@sbailey please take a look and see if this is the behavior you are expecting for the pipeline, and then we can proceed with further testing. I would also appreciate an extra set of eyes, as the "laststeps" argument had to be copied recursively to a significant depth of function calls. Thanks.

@marcelo-alvarez
Copy link
Contributor Author

marcelo-alvarez commented Mar 10, 2023

@sbailey do the changes included in this PR reflect the behavior you are expecting for the pipeline?

Specifically I am referring to the dependence on the option provided to the --laststeps command line argument of desi_proc_tilenight, which is now described as follows in its help message:

Comma separated list of LASTSTEP values (e.g. all, skysub, fluxcalib, ignore); by default, exposures with LASTSTEP 'all' and 'fluxcalib' will be processed to the poststdstar step, and those with LASTSTEP 'skysub' to the prestdstar step. If specified, exposures with LASTSTEP values in the list will be processed to the poststdstar step, and all others will not be processed at all.

If this looks good then we can go ahead and test for the example cited in #2007. Thanks.

@@ -150,7 +150,8 @@ def submit_night(night, proc_obstypes=None, z_submit_types=None, queue='realtime

## If laststeps not defined, default is only LASTSTEP=='all' exposures for non-tilenight runs
if laststeps is None:
laststeps = ['all',]
if not use_tilenight:
laststeps = ['all',]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when laststeps=None and use_tilenight=True, this block leaves laststeps=None. This results in line 232 not selecting any exposures to process because nothing is in None:

    good_exps = np.isin(np.array(etable['LASTSTEP']).astype(str), laststeps)

For example:

$> desi_run_night --dry-run -n 20230210 --z-submit-types cumulative --all-tiles
Processing the following obstypes: ['bias', 'dark', 'arc', 'flat', 'science', 'twilight', 'sci', 'dither']
Redshift fitting with redshift group types: ['cumulative']
Processing exposures with the following LASTSTEP's: None
INFO:tableio.py:334:load_table: Found table: /global/cfs/cdirs/desi/users/sjbailey/spectro/redux/pr2015/exposure_tables/202302/exposure_table_20230210.csv
INFO:tableio.py:337:load_table: Table /global/cfs/cdirs/desi/users/sjbailey/spectro/redux/pr2015/processing_tables/processing_table_pr2015-20230210.csv not found, creating new table of type proctable
INFO:submit_night.py:247:submit_night: Submitting cumulative redshifts for 0/0 tiles for which 20230210 is the last night: []
Completed submission of exposures for night 20230210. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbailey thanks for catching this. The most recent commit preserves the previous (correct) assignment to good_exps when laststeps is None, while also passing alaststeps value of None to submit_tilenight_and_redshifts.

@marcelo-alvarez
Copy link
Contributor Author

@sbailey I think I resolved the problem you cited above and the correct options are passed to desi_proc_tilenight in the slurm scripts. For example when running

desi_run_night --dry-run-level 1 -n 20230210 --z-submit-types=cumulative --all-tiles --laststeps="all,skysub,fluxcal"

files like

/global/cfs/cdirs/desi/users/malvarez/spectro/redux/tilenight-laststep/run/scripts/night/20230210/tilenight-20230210-21354.slurm

which contains the line

srun -N 1 -n 64 -c 2 --cpu-bind=cores desi_mps_wrapper desi_proc_tilenight -n 20230210 -t 21354 --mpi --cameras a0123456789 --mpistdstars --laststeps="all,skysub,fluxcal" ...

are produced. Can you please verify that the behavior of the pipeline is as expected (including when processing without the --dry-run-level 1 argument to desi_run_night) after these changes, and merge if appropriate?

@sbailey
Copy link
Contributor

sbailey commented Mar 14, 2023

Getting closer. The scripts look ok, but tilenight is still different from non-tilenight when it runs on tiles with a mix of all and skysub exposures. Consider tile 5627 on night 20230210:

TILEID EXPID  LASTSTEP
------ ------ --------
  5627 166998      all
  5627 166999      all
  5627 167000   skysub

When run with --laststeps all or not specified (defaulting to "all"), I would expect this to process exposures 166998 and 166999 but not 167000, which is what happens, good. Small issue: it still creates an output directory for 167000, but it doesn't write anything there.

When run with --laststeps skysub,all, I would expect this to process 166998 and 166999 through cframes (i.e. all), and 167000 only through sframes but not cframes. Instead, it also flux calibrates 167000 through cframes, which it can do because it has good data on the other exposures to fit standards. However, since 167000 is flagged as LASTSTEP=skysub, that generally means that something is wrong with that exposure and it shouldn't be included in downstream coadds and redshifts, even if we are still requesting sframes for debugging. I think the downstream steps do their own exposure filtering and would skip 167000 anyway, but that still leaves us in the potentially confusing state of having cframes on disk that never enter any coadd.

I tested this in $CFS/desi/users/sjbailey/spectro/redux/pr2015 with a link to daily exposure_tables and commands like

desi_run_night --dry-run-level 1 -n 20230210 --z-submit-types None --all-tiles --proc-obstypes SCIENCE --laststeps "skysub,all"
sbatch run/scripts/night/20230210/tilenight-20230210-5627.slurm

Another corner case: the above generates three run/scripts/night/20230210/tilenight*.slurm scripts, from which I submit one. However, if I add the --tiles 5627 option to try to get it to generate only that one tile that I want, it doesn't generate scripts for any tiles.

@marcelo-alvarez
Copy link
Contributor Author

@sbailey thanks for the feedback on what's expected for a given set of flags. After my most recent commit, exposure 167000 is processed only to sframes when -laststeps all,skysub is provided on the command line to desi_proc_tilenight:

% desi_proc_tilenight -n 20230210 -t 5627 --mpi --mpistdstars --laststeps all,skysub --dryrun | grep "running desi_proc"
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166999 --cameras a0123456789
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 167000 --cameras a0123456789
INFO:proc_tilenight.py:219:main: running desi_proc_joint_fit --night 20230210 --mpistdstars --obstype science --expids 166998,166999 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166999 --cameras a0123456789

while it is not processed at all when the --laststeps all option (or no --laststeps option at all) is provided:

% desi_proc_tilenight -n 20230210 -t 5627 --mpi --mpistdstars --laststeps all --dryrun | grep "running desi_proc"
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166999 --cameras a0123456789
INFO:proc_tilenight.py:219:main: running desi_proc_joint_fit --night 20230210 --mpistdstars --obstype science --expids 166998,166999 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166999 --cameras a0123456789

% desi_proc_tilenight -n 20230210 -t 5627 --mpi --mpistdstars --dryrun | grep "running desi_proc"
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:183:main: running desi_proc --night 20230210 --nostdstarfit --nofluxcalib --expid 166999 --cameras a0123456789
INFO:proc_tilenight.py:219:main: running desi_proc_joint_fit --night 20230210 --mpistdstars --obstype science --expids 166998,166999 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166998 --cameras a0123456789
INFO:proc_tilenight.py:256:main: running desi_proc --night 20230210 --nostdstarfit --noprestdstarfit --expid 166999 --cameras a0123456789

I am still looking into the edge cases you cited and will leave it up to you whether to merge this now and address those as part of the upcoming refactor or wait until those have been addressed and then merge.

@marcelo-alvarez
Copy link
Contributor Author

Update on this:

I am still looking into the edge cases you cited and will leave it up to you whether to merge this now and address those as part of the upcoming refactor or wait until those have been addressed and then merge.

I created separate issues (#2025 and #2026) for these.

@sbailey
Copy link
Contributor

sbailey commented Mar 16, 2023

@marcelo-alvarez thanks for the updates; this looks good. I agree with spinning off the other cases to separate tickets and not letting them block this. We can sort out their level of urgency in those tickets.

@sbailey sbailey merged commit 399ddb6 into main Mar 16, 2023
24 checks passed
@sbailey sbailey deleted the tilenight-laststep branch March 16, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

desi_proc_tilenight doesn't process skysub exposures
2 participants