Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emlinefit hangs on 82406/20211118/coadd-5-82406-thru20211118.fits #1985

Closed
sbailey opened this issue Jan 27, 2023 · 4 comments
Closed

emlinefit hangs on 82406/20211118/coadd-5-82406-thru20211118.fits #1985

sbailey opened this issue Jan 27, 2023 · 4 comments
Assignees

Comments

@sbailey
Copy link
Contributor

sbailey commented Jan 27, 2023

@araichoor the following emlinefit command hangs (no errors, no crash, it just gets stuck doing something):

desi_emlinefit_afterburner \
  --coadd /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/82406/20211118/coadd-5-82406-thru20211118.fits \
  --redrock /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/82406/20211118/redrock-5-82406-thru20211118.fits \
  --output $SCRATCH/emline-5-82406-thru20211118.fits

The same occurs for spectrograph 6, but the other petals from that tile work fine.

This causes this ztile job to timeout with those two missing emline files. Please check and see if you can implement a fix that wouldn't change the answer for anything that has already been run.

@araichoor
Copy link
Contributor

araichoor commented Jan 27, 2023

investigation report:

I guess the reason is because it s a tile with insane ebv.. it s a specialbackup test tile at galactic b=-2...
in short:

  • in emlinefit, the flux is corrected for Galactic extinction, here:

    if mwext_corr:
    # AR TBD: use a smarter way with no loop on tids...
    # AR TBD: but as 500 rows at most, ~ok
    tmpexts = ext_odonnell(tmpw, Rv=rv)
    for i in range(nspec):
    tmp_mw_trans = 10 ** (-0.4 * ebvs[i] * rv * tmpexts)
    tmpfl[i, :] /= tmp_mw_trans
    tmpiv[i, :] *= tmp_mw_trans ** 2

  • some fibers have flux > 1e4, so it shoots up to 1e6-1e7 after correction, likely being the reason for the code hanging.

a possibility is to simply remove that tile :)
if we want a code fix for that, nulling the ivar for insane fluxes works.
i.e. add at the end of this loop the following lines

                # AR nulling ivar for insane flux
                sel = tmpfl[i, :] > 1e6
                if sel.sum() > 0:                                                                                                                                                                   
                    tmpiv[i, sel] = 0

we could add a condition to perform this test only if TILEID=82406, in which case we would be 100% certain that we do not change any computation for other tiles.

few further infos/details:

here is the position of this tile in Galactic coordinates (displaying all iron tiles):
tmp-iron-gal

(and here the tile design: https://desi.lbl.gov/trac/wiki/SurveyOps/TileDesigns#a82399-82411-BACKUPprogramtests)

I looked at the first row where it hangs, i.e. TARGETID=2305843012414221480 (index 103; it s a Gaia/G2 star of mag ~15).
note that it also hangs for other rows.

the code hangs here when trying to fit the OII line in the 3687.2 A < lambda_obs < 3769.6 A region:

popt, pcov = curve_fit(
myfunc,
waves[keep_line],
fluxes[keep_line],
p0=p0,
sigma=1. / np.sqrt(ivars[keep_line]),
maxfev=10000000,
gtol=1.49012e-8,
bounds=bounds,
)

here is the flux for that spectrum (with highlighting in light grey the region used for the fit):
tmp-2305843012414221480
the flux there can be of ~1e4 1e-17 ergs/s/cm2/A; and, when corrected with the super-high extinction factor, it shoots up to ~1e6-1e7 ergs/s/cm2/A.

@moustakas
Copy link
Member

I agree it's a pretty extreme position in the sky, but---FWIW---FastSpecFit doesn't crash on this tile, so I'm not sure throwing it away completely is the right choice.

@sbailey
Copy link
Contributor Author

sbailey commented Jan 30, 2023

Accepting 2 missing afterburner files is indeed better than discarding the entire tile, even if it was a backup program test tile :)

Although the flux numbers are big, they don't seem big enough to run into overflow issues that would actually cause a fit failure. I think there is something else making this a pathological case.

@sbailey
Copy link
Contributor Author

sbailey commented Mar 14, 2024

fixed by PR #2195

@sbailey sbailey closed this as completed Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants