Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle multiple ToO files #437

Merged
merged 4 commits into from Nov 1, 2022
Merged

handle multiple ToO files #437

merged 4 commits into from Nov 1, 2022

Conversation

araichoor
Copy link
Contributor

@araichoor araichoor commented Oct 3, 2022

This PR handles the use of several ToO files in the --custom_too_file argument.

This is in the context of the calibration fields (https://desi.lbl.gov/trac/wiki/SurveyOps/CalibrationFields), where we will:

  • use the tertiary program approach to design those (those feeding --custom_too_file with a list of regular targets)
  • want to add actual ToOs.

The current main code version does not allow this, as the --custom_too_file is already "taken" for the regular targets (choice made when implementing the tertiary program approach, with "hacking" the ToO channel for the tertiary program targets).
With this PR, one can provide 2 or more ToO input files.

In fba_launch_io.create_too(), all the ToO files are read (+cut), before being "vstacked" in a single catalog.

When reading the individual ToO files, if missing, we had the NUMOBS, NUMOBS_MORE, PRIORITY columns, with setting those to NUMOBS_INIT, 0, PRIORITY_INIT (otherwise, the "vstacking" would set those to zero).

And we perform two safety checks before "vstacking", as:

  • verify that the column content is similar;
  • there are no duplicated TARGETIDs.

This last check (no duplicated TARGETIDs) sounds safe to me, as I cannot foresee why we would want to provide several files with duplicates.
One smoother way to deal with such a case could be to arbitrarily pick one representation of each TARGETID.
But that could cause ambiguity/discrepancy in knowing which was the parent catalogue for that TARGETID.
Provided that, I think I ll defer to a later PR the handling of such cases.

If only one catalog is provided, the code should behave as in the current main version.

For sanity, I have assessed that the main bright/dark tiles can be rerun with this branch.


Updates from Oct., 31, 2022:

This PR also includes the following (thanks to suggestions):

  • if ToO files are provided with --custom_too_tile, they (1) have to exist; (2) have in be in $DESI_SURVEYOPS;
  • a new --custom_too_development argument to fba_launch, which disables the latter above check (2);
  • if multiple ToOs are provided, they are stored in the header as TOO, TOO2, TOO3, etc (in a similare fashion as for the SCND, SCN2, etc).

@geordie666
Copy link

If I understand correctly, the custom_too_file will create ToOs that have TARGETIDs with RELEASE=8888. Is that correct?

I'm just checking that there's no chance of creating duplicate TARGETIDs from the "regular" and "tertiary" approaches downstream of fiberassign. I think that can never be the case if RELEASE=8888 for all tertiary targets, including "custom ToOs".

@araichoor
Copy link
Contributor Author

Thanks for the question.

This part of the code just reads the ToO files, and feed them fiberassign.
i.e. --custom_too_file just takes the file names; the actual ToO files have been created beforehand.

For the tertiary program, we do use RELEASE=8888 (and BRICKID=PROGNUM) to generate the TARGETID.
For the "regular" ToO-fiber.ecsv, I believe desitarget uses RELEASE=9999, right?

So there should be no possible TARGETID issue downstream, hopefully!

@geordie666
Copy link

I think we're fine unless the --custom_too_file can be any file. It would seem fragile, to me, to be able to pass any file because then people could accidentally duplicate a TARGETID for a ToO.

Do we check, somehow, that the --custom_too_file is always from the allowed directory paths in $SURVEYOPS or $MTL_DIR?

@araichoor
Copy link
Contributor Author

thanks for the clarification.
the --custom_too_file can by any file.

no, there is no such checks for now.

note that this --custom_too_file was introduced ~six months ago, but I see your point.

I could add a check that the RELEASE from decode_targetid() only has "safe" values.
I could restrict to RELEASE in 8888,9999; or exclude ; but I m wondering if that will not block us in the future? but maybe that s ok, we will then just update the code.

about controlling that the --custom_too_file paths are from $DESI_SURVEYOPS/mtl or $DESI_SURVEYOPS/tertiary:
that could make sense; though it will prevent -- or complicate a lot -- all the development work before finalizing a tile, which I perform in some of my local folders.
so, unless suggestions of how to overcome that, I d vote against such a check...

I ll keep thinking on that; thanks for raising that!

@geordie666
Copy link

Thanks — I'm not really worried about the RELEASE number.

My main concern is that somebody uses the select_too script (or the related desitarget.ToO.ledger_to_targets()) to make a new, custom ToO file and then that file is passed as the --custom_too_file.

The issue is that the TARGETID for each ToO is based on its row number in the ToO ledger. This made sense at the time, as we only expected to ever have one ToO ledger. But, if it's possible for extra ToO ledgers to be built in a custom fashion, then we could potentially produce duplicate TARGETIDs. That's why I've always been so careful to add to the bottom of the ToO ledger in $MTL_DIR rather than ever creating a new ToO ledger or ledgers.

I think this is probably fine as long as @araichoor and I hold the line that all new ToOs are always added to the one, main ToO ledger. It is a tad fragile, though.

@araichoor
Copy link
Contributor Author

I think I better understand your concern; though:

  • I remind that the goal here is only a change to allow the simultaneous use of the "regular" ToO in a tertiary program, as I "hijacked" the too channel for the tertiary program;
  • the scenario you describe seems very unlikely to me, as one would have to (1) generate a ToO file; (2) design the tile with it; (3) svn-commit the tile; (4) svn-checkout the tile at kpno; those steps will be executed by someone expert in the surveyops team (e.g. you and me as of today; someone with transferred knowledge in the future).

but.. I would understand we want to be extra-secure about those TARGETIDs.
I could:

  • implement @geordie666 suggestion to require that any file provided in --custom_too_file comes from $DESI_SURVEYOPS;
  • add (another...) argument like --development (boolean, defaulting to False), which would disable this requirement (--custom_too_file comes from $DESI_SURVEYOPS), so that I can do development on local, not-official-yet files.

@schlafly @geordie666 : what do you think?
do we consider the current PR is "safe enough"?
or shall I implement that suggestion? (or any better!)

thanks!

@geordie666
Copy link

@araichoor: I'm actually comfortable with the current level of security (I agree that it's highly unlikely that other people than you or me, or a couple of others, would use the --custom_too_file option). I just wanted to record my concerns here so enough people are aware that we shouldn't really be making "custom" ToO files because of the TARGETID concerns I've stated.

It would be nice (and somewhat safer) to implement the suggestion to restrict path-options for the --custom_too_file. But, I don't think that should be a blocking factor for this PR. For now, we can assume that you know what you're doing and other people should be wary of using that option. I'm happy for you to merge this PR if you need to make progress.

@araichoor
Copy link
Contributor Author

@geordie666 : the code change to control the --custom_too_file paths are as expected and the addition of a --development like argument to fba_launch are minimal.
so I ll implement those now; it won t delay me much; and it will be done.

btw, you wrote: "Do we check, somehow, that the --custom_too_file is always from the allowed directory paths in $SURVEYOPS or $MTL_DIR?"
=> as of today, $MTL_DIR is not used in fiberassign; the "official" ToO file, and the tertiary target ToO files are in $DESI_SURVEYOPS, so maybe I ll even restrict to $DESI_SURVEYOPS? (as a -- good -- consequence, that will force us to store in $DESI_SURVEYOPS any future custom ToO file we would make).

@geordie666
Copy link

Yep, I think restricting to $DESI_SURVEYOPS makes a lot of sense. Thanks!

@araichoor
Copy link
Contributor Author

great.

actually, while doing the coding, I m thinking of a corner case: if we design the tile at kpno, and if we want to make a rerun at nersc (or the opposite), then check on $DESI_SURVEYOPS will fail for the rerun.
I guess that is super-unlikely, but I ll add a case-handling for that in https://github.com/desihub/fiberassign/blob/main/bin/fba_rerun (using the DESIROOT info in the header).

@araichoor
Copy link
Contributor Author

I ve pushed commits now implementing:

  • if ToO files are provided with --custom_too_tile, they (1) have to exist; (2) have in be in $DESI_SURVEYOPS
  • a new --custom_too_development argument to fba_launch, which disables the two above checks.

The requirement that the provided files have to exist is not compulsory; I thought it made sense though, as I see it a safeguard against typo in providing the file path, which would otherwise be silently ignored (the code wouldn t find any file, then proceed with adding zero targets).

If we re happy with that, I ll update the first comment of this PR with those changes.

@geordie666
Copy link

Thanks for adding these checks, Anand. I think you've addressed my concerns. Others might want to comment, of course, but I think this looks good to merge.

@moustakas
Copy link
Member

Bringing an off-list discussion to this PR:

For downstream parsing of the fiberassign file headers (e.g., for the targetphot/tractorphot VACs), it would be more convenient to have the multiple TOO filenames match the secondary targeting filename model, i.e., use TOO, TOO2, TOO3, to match SCND, SCND2, SCND3, etc., instead of using a comma-separated list of TOO files.

@araichoor
Copy link
Contributor Author

thanks for bringing that point here @moustakas.
the commit I just pushed addresses that point.

I ll update the original PR message with the changes made the -- useful! -- discussions here.

I plan to merge that PR tomorrow morning.

@araichoor araichoor merged commit 559ff09 into main Nov 1, 2022
@araichoor
Copy link
Contributor Author

for info, @moustakas:

I ve designed with those changes the following tiles (calibration fields on xmm-lss):
https://data.desi.lbl.gov/desi/survey/fiberassign/special/tertiary/0005/083/fiberassign-083000.fits.gz
https://data.desi.lbl.gov/desi/survey/fiberassign/special/tertiary/0006/083/fiberassign-083020.fits.gz

if you ve time, it would be great if you could confirm your scripts run smoothly on those!
I expect things to be fine, but better safe than sorry...

further infos, if relevant:

raichoor@cori08:~> fitsheader -e 0 -k TOO* $DESI_ROOT/survey/fiberassign/special/tertiary/0005/083/fiberassign-083000.fits.gz 
# HDU 0 in /global/cfs/cdirs/desi/survey/fiberassign/special/tertiary/0005/083/fiberassign-083000.fits.gz:
TOO     = 'DESIROOT/survey/ops/surveyops/trunk/tertiary/0005/ToO-0005-083000.e&'CONTINUE  'csv     '                                                            
TOO2    = 'DESIROOT/survey/ops/surveyops/trunk/mtl/main/ToO/ToO.ecsv'
  • those have been run with the --nosteps scnd option, ie the secondary folders/files have not been queried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants