Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quickcat2 #85

Merged
merged 17 commits into from
Feb 29, 2016
Merged

quickcat2 #85

merged 17 commits into from
Feb 29, 2016

Conversation

sbailey
Copy link
Contributor

@sbailey sbailey commented Feb 18, 2016

The original quickcat branch/PR had gotten old and messy to merge, so I started fresh with a new quickcat2 branch, borrowing pieces from the previous (which will be abandoned).

This PR adds:

  • desisim.quickcat module to quickly generate a redshift catalog given fiberassign tiles and input truth
  • a desisim/bin/quickcat wrapper script for use at the command line
  • tests with 100% coverage

The redshift performance is based upon Redmonster applied to zdc1 data by @gdhungana as documented in DESI-1657 (his DESI data telecon presentation from last week). I hope we'll be able to improve the zwarn efficiency numbers, but let's start realistically pessimistic for now.

quickcat includes a perfect=True option to maintain input z -> output z.

@gdhungana please review the redshift errors and zwarn fractions.

@forero please check the module interface — is this usable for quicksurvey?

@gdhungana
Copy link
Contributor

@sbailey Thanks for this reorganization. The numbers look okay other than for QSO. 255 km/s is for the distribution excluding catastrophic failures that still have zwarn=0. So for zwarn fraction of 1-4094/5000, the redshift error should be 423 km/s. We may also somehow need to address these catastrophic failures.
Also these target bits look different than what I was seeing in https://github.com/desihub/desitarget/blob/master/py/desitarget/targetmask.yaml
Are these redefined somewhere?
If that is true,
@forero When you have time, can you produce some FA output test files with now redefined target bit masks?

@sbailey
Copy link
Contributor Author

sbailey commented Feb 18, 2016

The hardcoded TYPE mapping is not formally defined; it was taken from current fiberassign usage. After discussions with @rncahn, we're working on yet another update to the formats to clean this up. More updates after that settles down.

@forero
Copy link
Member

forero commented Feb 19, 2016

@sbailey @rncahn what are exactly the planned updates?

@sbailey
Copy link
Contributor Author

sbailey commented Feb 20, 2016

this quickcat2 branch and current fiberassign use the truth file "TYPE" as an integer index into the fiberassign features file type categories such as:

0 : 'QSO',      #- QSO-LyA
1 : 'QSO',      #- QSO-Tracer
2 : 'LRG',      #- LRG
3 : 'ELG',      #- ELG
4 : 'STAR',     #- QSO-Fake
5 : 'UNKNOWN',  #- LRG-Fake
6 : 'STAR',     #- StdStar
7 : 'SKY',      #- Sky

This is convenient for fiberassign_surveysim but not for anything else. I propose to go back to file formats that most closely match the final data flow, and add a translation script to convert from the MTL target+truth files into whatever Bob needs for his "secret" file with this integer TYPE index. The point of that is to maintain a working fiberassign_surveysim while also providing a reasonable dataflow for getting the more complete quicksurvey sim going.

Everytime we try to implement this data flow some new issue comes up, but I think we are getting closer to what we want. Here is my draft of what I think should be in each file:

targets: input to merged target list (mtl)
    TARGETID            i8  -- global unique identifier
    BRICKNAME           S8
    RA                  f8
    DEC                 f8
    DESI_TARGET         i8  -- target masks, as defined in desitarget.yaml
    BGS_TARGET          i8
    MWS_TARGET          i8
    SUBPRIORITY         f8  -- [0-1] for choosing between equivalent targets

mtl: input to fiberassign
    [targets columns]
    NUMOBS              i4  -- number of additional observations wanted
    PRIORITY            i4
    LASTPASS            i4  -- this is a hack (i2, bool?)

truth: input to quickcat
    TARGETID            i8
    TRUEZ               f4
    TRUETYPE            S10 -- GALAXY, QSO, STAR, SKY [ELG, LRG, ...]
    -- optional
    RA, DEC, BRICKNAME

zcatalog: output from spectro pipeline and quickcat
    TARGETID            i8
    BRICKNAME           S8
    Z                   f4
    ZERR                f4
    ZWARN               i4
    TYPE                S8  -- (not S7, not i8)
    NUMOBS              i4  -- number observed (different from mtl.NUMOBS)

tile: output of fiberassign
    -- from targets/mtl
    TARGETID            i8
    BRICKNAME           S8  -- new
    RA                  f8
    DEC                 f8
    DESI_TARGET         i8  -- renamed
    BGS_TARGET          i8  -- new
    MWS_TARGET          i8  -- new
    -- added by fiberassign
    FIBER               i4
    POSITIONER          i4
    NUMTARGET           i4  -- how many targets does this cover
    XFOCAL_DESIGN       f4
    YFOCAL_DESIGN       f4
    -- maybe
    PRIORITY            i4
    SUBPRIORITY         f8  -- [0-1]
    -- dropping from MTL
    NUMOBS
    LASTPASS

secret: for fiberassign_surveysim row matched to mtl
    -- renaming fiberassign "id" / MTL truth "TYPE" -> "CATEGORY"
    CATEGORY            i8  -- 0..n
    -- maybe (or keep mapping in features file)
    NUMOBS_PRE          i4
    NUMOBS_POST         i4
    PRIO_PRE            i4
    PRIO_POST           i4

This impacts fiberassign, desitarget, and desisim for them all to coordinate. I'm drafting code to try this out now to see if it is viable. Please comment if you think anything is missing/wrong/ill-advised.

@moustakas
Copy link
Member

This still has the BGS,MWS != DESI bias.... (I'm thinking specifically of DESI_TARGET.) Should this be DARK_TARGET or is this choice already too entrenched...?

@forero
Copy link
Member

forero commented Feb 23, 2016

  • In targets. My only question concerns SUBPRIORITY. If it's only going to be used for choosing between equivalente targets, is it simply a random number? Why not labeling it RANDOM_NUMBER?
  • In truth. I find TRUETYPE inconvenient as string. Inside quickcat it would be more convenient to have it as a 64bit mask to use targetmask.yaml. With this in mind, perhaps we need TRUE_DESITYPE, TRUE_BGSTYPE, TRUE_MWSTYPE given that this split DESI/BGS/MWS is all over the place. But this points to @moustakas comment above, do we want to keep DESI, BGS and MWS separate?
  • In zcatalog. Same as in truth. Why not having TYPE as a 64bit mask? It would make life easier inside the MTL code.
  • In tile. I don't understand what NUMTARGET is. Is it the number of available targets per fiber?
  • In secret. There are variables labeled _PRE and _POST. Those are before and after what exactly?

@sbailey
Copy link
Contributor Author

sbailey commented Feb 23, 2016

@moustakas the choice isn't entrenched, but it also wasn't random. After a bit of discussion about DESI_TARGET vs. DARK_TARGET, we originally went with DESI_TARGET as containing a combination of:

  • Dark survey target bits
  • General calibration bits like sky and various kinds of standard stars
  • BGS_ANY and MWS_ANY bits to indicate that another bit mask contains additional information that you may want.
    So there is a whiff of a BGS,MWS != DESI bias, but it is more than just DARK_TARGET bits.

@forero, thanks for the comments.

In targets. My only question concerns SUBPRIORITY. If it's only going to be used for choosing between equivalente targets, is it simply a random number? Why not labeling it RANDOM_NUMBER?

The dark time survey will likely use this as a random number — its existence was based upon Daniel's suggestion to put the random choice in the target file itself rather than the code. However, the Milky Way Survey would like to rank order their targets, so for them it is a real priority. The overall idea is that you could add the integer PRIORITY with the [0-1] floating point SUBPRIORITY and get the order in which you should chose targets.

In truth. I find TRUETYPE inconvenient as string. Inside quickcat it would be more convenient to have it as a 64bit mask to use targetmask.yaml. With this in mind, perhaps we need TRUE_DESITYPE, TRUE_BGSTYPE, TRUE_MWSTYPE given that this split DESI/BGS/MWS is all over the place. But this points to @moustakas comment above, do we want to keep DESI, BGS and MWS separate?

The challenge that we previously didn't anticipate is that the targeting bits don't cover all the possibilities for true types. e.g. DESI_TARGET.QSO means something that was targeted as a QSO, but doesn't include things that actually are QSOs but have colors that fail the target selection. Similarly for genuine stars that pass QSO target selection cuts (DESI_TARGET.QSO) but not the star cuts (DESI_TARGET.STD_*).

So TRUETYPE was intended to be a map of what a redshift fitter would report if everything was working perfectly. Currently our redshift fitters are outputting strings, which is sometimes convenient (for humans) and sometimes a pain (slow for code).

In zcatalog. Same as in truth. Why not having TYPE as a 64bit mask? It would make life easier inside the MTL code.

These don't necessarily have to be a string, but within python it's not that bad to work with. See the mtl branch of desitarget for my current draft of working with these files. I'm willing to reconsider if it turns out to be a huge pain, but let's try the string version for a bit longer.

In tile. I don't understand what NUMTARGET is. Is it the number of available targets per fiber?

Hehe. This one is from you! Yes, it is the number of targets that are covered by this fiber, and is used in interpreting HDU2 of the tile file that lists exactly which targets those are.

In secret. There are variables labeled _PRE and _POST. Those are before and after what exactly?

The original idea was that this was the priority and numobs before taking any observations, and the priority and numobs after you have taken the first observation and determined what type it actually is. However, I'm having second (third?) thoughts about that, and am thinking about going back to having a simple integer CATEGORY in the truth file which is used to look up the pre/post values in the features files. That CATEGORY is redundant with a combination of (Z, ZWARN, TRUETYPE, and *_TARGET), but it might be the least hassle while we are transitioning from the internal fiberassign_surveysim to the full quicksurvey = mtl+fiberassign+quickcat simulation loop. I'll work on this this afternoon.

@sbailey
Copy link
Contributor Author

sbailey commented Feb 24, 2016

I updated this branch to use the new formats described in fiberassign PR #24 and desitarget PR #32.

@forero please review these 3 pull requests as a coordinated set, and let me know if you have questions!

NOTE: I have not yet updated the bin/quickcat script to match the desisim.quickcat module. I have to run right now, but please wait for the bin/quickcat script to also be updated before doing the final merge. But you can use the python module as:

zcat_new = quickcat(tilefiles, targets, truth, zcat=zcat_previous, perfect=False)

@gdhungana example files for using this version of quickcat are at NERSC in /project/projectdirs/desi/mocks/preliminary/mtl/v2

@forero
Copy link
Member

forero commented Feb 26, 2016

I tested it as shown in this python script. It worked but it took a long time to run on the whole survey. Something like 25 min on Edison (for comparison running fiberassign takes ~20min). This quickcat implementation might need some optimization in the future.

@sbailey
Copy link
Contributor Author

sbailey commented Feb 29, 2016

I'll merge this now, even if it is slow. I used astropy Tables for their database-like join syntax convenience for combining the truth catalog with the zcatalog and that may be the cause of the slowness. The other possible reason is the counter of how many times has each target been observed (I used collections.Counter from the python standard library, but there might be a faster way).

sbailey added a commit that referenced this pull request Feb 29, 2016
@sbailey sbailey merged commit 52e38dc into master Feb 29, 2016
@sbailey sbailey deleted the quickcat2 branch February 29, 2016 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants