Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stdstar indexing crash #1872

Merged
merged 3 commits into from Oct 12, 2022
Merged

fix stdstar indexing crash #1872

merged 3 commits into from Oct 12, 2022

Conversation

sbailey
Copy link
Contributor

@sbailey sbailey commented Oct 11, 2022

This PR fixes #1871 where an indexing bug lead to standard star crashes. It happened to be on a backup tile with Gaia standards, but the actual bug was not directly related to that. It was a problem with a pre-filter that the standard stars have valid photometry and fragility with multiple ways of filtering: slicing frames, tracking lists of indices, tracking lists of booleans. This update simplifies the tracking by dropping the independent (and inconsistent in this case) tracking of indices, treating slicing frames as the primary method. It does still keep the starfibers array which is redundant with frame.fibermap['FIBER'] but convenient for logging and convenient since there are N>1 frames but they all have the same fiber selections (which is explicitly verified).

Command that crashes on main and now runs:

cd $CFS/desi/spectro/redux/himalayas
export SP=0 && export EXPID=00109351 && desi_fit_stdstars --delta-color 0.1 \
  --frames exposures/20211118/${EXPID}/frame-?${SP}-${EXPID}.fits.gz \
  --skymodels exposures/20211118/${EXPID}/sky-?${SP}-${EXPID}.fits.gz \
  --fiberflats calibnight/20211118/fiberflatnight-?${SP}-20211118.fits \
  --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits \
  --outfile $SCRATCH/stdstars-${SP}-${EXPID}.fits.gz

Detail 1: with the correct indexing, none of the standard stars pass SNR(B) cuts and it exists with an informative error message about that but not a crash like before. i.e. this PR doesn't recover that exposure.

Detail 2: this expid/spectrograph did "succeed" in daily, why not now? That turns out to be due to some bugs fixed in PR #1817. Previously (at the time daily was run) stdstar was not correctly scaling ivar to account for the fiberflat and sky, and using the wrong indices too, which led to over-optimistic SNR and having stars "pass" cuts. With the correct ivar propagation, they no longer pass. I suspect we will discover more classes like this before Himalayas/Iron are done.

Cross check: For spectrographs that do succeed, this PR produces the same output as current main (himalayas), e.g.

export SP=1 && export EXPID=00109351 && desi_fit_stdstars --delta-color 0.1 \
  --frames exposures/20211118/${EXPID}/frame-?${SP}-${EXPID}.fits.gz \
  --skymodels exposures/20211118/${EXPID}/sky-?${SP}-${EXPID}.fits.gz \
  --fiberflats calibnight/20211118/fiberflatnight-?${SP}-20211118.fits \
  --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits \
  --outfile $SCRATCH/stdstars-${SP}-${EXPID}.fits.gz

export SP=2 && export EXPID=00109340 && desi_fit_stdstars --delta-color 0.1 \
  --frames exposures/20211118/${EXPID}/frame-?${SP}-${EXPID}.fits.gz \
  --skymodels exposures/20211118/${EXPID}/sky-?${SP}-${EXPID}.fits.gz \
  --fiberflats calibnight/20211118/fiberflatnight-?${SP}-20211118.fits \
  --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits \
  --outfile $SCRATCH/stdstars-${SP}-${EXPID}.fits.gz

fitsdiff -k CHECKSUM,DEP\* -c DATASUM,CHECKSUM $CFS/desi/spectro/redux/himalayas/exposures/20211118/00109351/stdstars-1-00109351.fits.gz $SCRATCH/stdstars-1-00109351.fits.gz
fitsdiff -k CHECKSUM,DEP\* -c DATASUM,CHECKSUM $CFS/desi/spectro/redux/himalayas/exposures/20211118/00109340/stdstars-2-00109340.fits.gz $SCRATCH/stdstars-2-00109340.fits.gz

(SCRATCH=/global/cscratch1/sd/sjbailey if you want to look at the files without re-running yourself).

Caveats: bitwise identical if on KNL, otherwise close but not identical. Header keywords differences are due to pipeline being run with MPI thus including an mpi4py version, and harmless timestamps and development checkout versions vs. installed versions.

Came along for the ride: io.read_frame wasn't setting frame.spectrograph, now it is (which also required test updates which were generating frame files with inconsistent spectrographs given their fibers).

@sbailey sbailey added the crash label Oct 11, 2022
@sbailey sbailey requested a review from akremin October 11, 2022 21:43
@sbailey sbailey added this to In progress in Himalayas via automation Oct 11, 2022
Copy link
Member

@akremin akremin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A much appreciated change to simplify the selection logic from using both indices and booleans to just using booleans (mostly). I do have a question about re-initializing those boolean arrays inline that I would like the answer to before I approve this PR. I will also run the code on a random exposure to ensure that it does the right thing (but I haven't done that yet).

#- Confirm that all fluxes have entries but trust targeting bits
#- to get basic magnitude range correct
keep_legacy = np.ones(len(frame_starindices), dtype=bool)

keep_legacy = np.ones(len(frame_fibermap), dtype=bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep_legacy, keep_gaia, n_legacy_std, and n_gaia_std all seem to be redefined each loop here over both exposures (outer loop) and cameras (inner loop), but then used generically later in the code outside of any loops. Is there an assumption made here that allows that to be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts.stdstars requires all input frames to be from the same spectrograph of the same tile, so if given correct inputs then this calculation should be redundant. This block of code at the end of the loop confirms that the selected stars from each frame are consistent with the ones cached from the first frame, which is the core assumption made later in the code that needs to be checked before proceeding:

            keep = keep_legacy | keep_gaia
            ...
            frame_starindices = frame_starindices[keep]

            if spectrograph is None :
                spectrograph = frame.spectrograph
                fibermap = frame_fibermap
                starindices=frame_starindices
                starfibers=np.asarray(fibermap["FIBER"][starindices])
            elif spectrograph != frame.spectrograph :
                log.error("incompatible spectrographs {} != {}".format(spectrograph,frame.spectrograph))
                raise ValueError("incompatible spectrographs {} != {}".format(spectrograph,frame.spectrograph))
            elif starindices.size != frame_starindices.size or np.sum(starindices!=frame_starindices)>0 :
                log.error("incompatible fibermap")
                raise ValueError("incompatible fibermap")

the most common user error would be providing frames from multiple spectrographs or multiple tiles, in which case simply checking FIBER and TARGETID values would be sufficient, but this loop adds additional checks on the consistency. The calculation is ~0.005 sec per frame so the runtime is trivial. There is in theory a pathological corner case where keep_legacy and keep_gaia could be different across frames while leaving keep = keep_legacy | keep_gaia the same, which this check would not catch. It also wouldn't catch if the selected stars had "valid" but different photometry while still having the same fibers.

The structure of the loop pre-dates this PR and could be refactored for arguably better maintainability by separating out the concept of "keep" (on one frame) from checking the consistency of TARGETID, FIBER, FLUX* , GAIA_PHOT_* etc. across all frames. But unless you think the current code is actively confusing or especially fragile, I suggest leaving it as is.

Copy link
Member

@akremin akremin Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. I now understand the flow of this and also agree that any inefficiency is trivial in the grand scheme of things.

Just for the record, the reference snippet above is the current main version of the code before your changes, which removed keep, starindices, and redefined starfibers. I believe your points still stand, however, so I'll approve this once I test an example.

@akremin akremin self-requested a review October 12, 2022 18:50
Copy link
Member

@akremin akremin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the new code with a random example from a recent daily bright tile-exposure (Exp 146748, Night 20221008):

cd $CFS/desi/spectro/redux/daily
export SP=6; export EXPID=00146748; export NIGHT=20221008
desi_fit_stdstars --delta-color 0.1   \
     --frames exposures/${NIGHT}/${EXPID}/frame-?${SP}-${EXPID}.fits.gz  \
     --skymodels exposures/${NIGHT}/${EXPID}/sky-?${SP}-${EXPID}.fits.gz   \
     --fiberflats calibnight/${NIGHT}/fiberflatnight-?${SP}-${NIGHT}.fits   \
     --starmodels /global/cfs/cdirs/desi/spectro/templates/basis_templates/v3.2/stdstar_templates_v2.2.fits \
    --outfile $SCRATCH/stdstars-${SP}-${EXPID}.fits.gz
fitsdiff -k CHECKSUM,DEP\* -c DATASUM,CHECKSUM $CFS/desi/spectro/redux/daily/exposures/$NIGHT/$EXP/stdstars-${SP}-${EXP}.fits.gz $SCRATCH/stdstars-${SP}-${EXP}.fits.gz

The outputs were identical except for non-consequential header keywords.

@akremin akremin merged commit d37c5bf into main Oct 12, 2022
Himalayas automation moved this from In progress to Done Oct 12, 2022
@akremin akremin deleted the stdcrash branch October 12, 2022 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

stdstar crash tile 82406 20211118/00109351
2 participants