Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-lookup sky locations instead of using sky catalog even for good positioners #335

Open
sbailey opened this issue Apr 30, 2021 · 6 comments

Comments

@sbailey
Copy link
Contributor

sbailey commented Apr 30, 2021

Now that @dstndstn has implemented the on-the-fly lookup for whether stuck positioners are pointing at a blank sky location, this opens up the possibility of deprecating the DR9-based sky catalog and always looking up on the fly. Several cases:

  1. stuck positioner at fixed (x,y) projected to fixed (ra,dec) -> sky or not?
  2. bad phi motor, but can reliably rotate in theta: search a circle in (x,y) projected into an ellipse in (ra,dec) for sky locations that don't collide with neighbors
  3. bad theta motor, but can reliably rotate in phi: search an arc in (x,y) projected into an ellipse in (ra,dec) for sky locations that don't collide with neighbors.
  4. fully working positioner that isn't already assigned to a science target: instead of restricting its options to ~2 pre-selected sky targets that might not actually be reachable without causing neighboring collisions, let it search its entire non-colliding x,y space for options.

All of these would help us give more good fibers to science targets and reduce the number of genuinely unassigned fibers. I think we absolutely want/need to do (1) and (2) above and those are partially covered in other tickets. Item (4) is the new piece of this ticket.

If this was implemented, I think we could completely deprecate the DR9-based sky catalog, for which we currently spend a fair amount of I/O time, output disk space (listing targets that could have been covered), and (ra,dec) -> (x,y) mapping CPU time pre-calculating sky targets that we never use.

We would still need to support the Gaia-avoidance based "SUPP_SKY" target catalog for off-footprint tiles, or otherwise implement that version of "is this a sky" on the fly (thus bringing in the entire Gaia catalog as a input, which is non-trivial size).

@dstndstn @tskisner

@sbailey sbailey added this to To do in Main Survey Start via automation Apr 30, 2021
@tskisner
Copy link
Member

I agree that all this is a good idea. Just one note: currently the RA/DEC --> X/Y mapping of all targets (including the sky) available to each tile is very cheap. This is done in threaded C++ code.

Actually doing these "test movements" of each positioner looking for a sky location may be very expensive. Maybe it doesn't matter for realtime fiber assignment of a few tiles, but replaying fiberassign could suffer a hit. Currently there are only a handful of test movements (checking reachable targets for a collision). If this proposal is implemented, I would recommend trying to minimize the number of test angles / positions that are tried.

@sbailey
Copy link
Contributor Author

sbailey commented Apr 30, 2021

Good point about watching out for performance. My initial pseudo-algorithm idea was:

  1. start with only fibers that we might want to assign to sky locations
  2. make a grid of (ra,dec) points that the fiber covers
  3. trim to those points that cover sky locations
  4. project those sky locations to x,y
  5. loop through those x,y locations until finding one that doesn't collide with neighbors, breaking out as soon as a match is found (i.e. don't do expensive collision calculations for N>>1 sky locations unless you really have to).

(5) could use some optimization, since if one grid point is colliding its neighboring grid point likely is too so we'd rather test a very different point, but for long term reproducibility we don't want to randomize the order with a random number generator. e.g. maybe use several interleaved larger grids.

An additional complication is that if we move to using desimeter for (ra,dec) -> (x,y) to get more accurate transforms, it's unclear to me how the C++ code could peek back out in to python-land to make that call. It's straightforward to do that ahead of time for known target locations, but less straightforward for on-the-fly sky locations.

@tskisner
Copy link
Member

Some comments:

for (1), you don't know that subset of fibers ahead of time, since you don't know which fibers are assigned to targets that might get bumped for sky.

The rest of the steps are basically what is already done in the code (including randomly trying the sky locations based on subpriority), except using RA/DEC points loaded from a giant FITS file. In the C++ memory, each target takes up about 500 bytes. Add another ~128 bytes to store the X/Y position on a tile (ignoring multiple tiles overlapping a target at the moment). So even rounding up to 1KB per target, we have ~1MB to store 1 million sky targets generated on the fly.

Suggestion: Why not keep all the existing infrastructure and just generate the sky targets on the fly (in python) rather than reading them from a FITS file? We should take some care that this list of generated sky targets is reproducible (i.e. set any random seeds based on the tile ID, etc).

This seems much simpler than introducing new complicated logic and special cases (we have enough of those already)...

@sbailey
Copy link
Contributor Author

sbailey commented May 4, 2021

Generating on-the-fly a bunch of skies covering the whole tile (and including the special cases for disabled theta and/or phi per positioner) is an interesting possibility; the limiting factor might be doing that while replaying on N>>1 mocks on M>>1 tiles (one tile currently requires reading O(100) skybrick files for full coverage). Note that 1M targets is O(1GB) of skies, not O(1MB), but even 100k targets would be nicely dense.

My original suggestion was to wait until science targets had been initially assigned to know which fibers might get skies, and do the possible sky lookup only in the regions covered by those fibers under the assumption that the sky lookup logic in the blob maps was semi-expensive. This would require updating the Assignment object sky target list after the initial assign_unused calls for science and standards before proceeding with sky (e.g. assign.py line 1612). Is that possible as-is, or is setting the target list only possible at the time of constructing the Assignment object?

Side note: The current pre-calculated sky files are quite large because they carry along a bunch of columns that feel optional (BGS_TARGET, FIBERFLUX_IVAR_*, PRIORITY_INIT, ...); I suspect that was driven more by I/O format consistency with normal target files rather than really needing all of those columns, though some of them (FIBERFLUX_*) were motivated by postfacto QA checks on sky spectra.

@tskisner
Copy link
Member

tskisner commented May 4, 2021

Thanks for thinking about this more @sbailey . Currently the target properties (Targets class) and available targets for each tile (TargetsAvailable class) are computed before using the Assignment class. A shared pointer to these is passed to the Assignment constructor. The constructor then computes the X/Y target positions on each tile (this information is computed once and used repeatedly when attempting to move positioners to target locations). So, to introduce new targets would need to append those to the Targets instance, construct a new TargetsAvailable, and update the information in the Assignment class.

Obviously nothing is impossible, but it does require some changes since this is a departure from the original design / requirements.

Question: do we save much in terms of skybrick I/O by loading the bricks for a subset of positioners scattered over the tile versus loading the skybricks that cover the whole tile? If this I/O is the dominant cost, then it seems like we would not save much...

@sbailey
Copy link
Contributor Author

sbailey commented May 4, 2021

Updates:

  1. Dustin's skybricks are ~1deg2, so with edge effects a tile requires reading ~16 files instead of the O(100) I had originally estimated
  2. Using Lookup for dynamic sky fiber placement desitarget#722 I did some timing tests with SKYBRICKS_DIR=/global/cscratch1/sd/dstn/skybricks/ . Looking up skies for a tile takes ~1.1 sec and scales with the number of files to read, not the number of targets that cover those tiles (e.g. 50k lookups is only 0.1s slower than 5k lookups)
  3. I haven't dug into whether the time is actually I/O, or something like WCS solutions on the skybricks after reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants