quickcat2 #85

sbailey · 2016-02-18T06:19:49Z

The original quickcat branch/PR had gotten old and messy to merge, so I started fresh with a new quickcat2 branch, borrowing pieces from the previous (which will be abandoned).

This PR adds:

desisim.quickcat module to quickly generate a redshift catalog given fiberassign tiles and input truth
a desisim/bin/quickcat wrapper script for use at the command line
tests with 100% coverage

The redshift performance is based upon Redmonster applied to zdc1 data by @gdhungana as documented in DESI-1657 (his DESI data telecon presentation from last week). I hope we'll be able to improve the zwarn efficiency numbers, but let's start realistically pessimistic for now.

quickcat includes a perfect=True option to maintain input z -> output z.

@gdhungana please review the redshift errors and zwarn fractions.

@forero please check the module interface — is this usable for quicksurvey?

gdhungana · 2016-02-18T14:55:14Z

@sbailey Thanks for this reorganization. The numbers look okay other than for QSO. 255 km/s is for the distribution excluding catastrophic failures that still have zwarn=0. So for zwarn fraction of 1-4094/5000, the redshift error should be 423 km/s. We may also somehow need to address these catastrophic failures.
Also these target bits look different than what I was seeing in https://github.com/desihub/desitarget/blob/master/py/desitarget/targetmask.yaml
Are these redefined somewhere?
If that is true,
@forero When you have time, can you produce some FA output test files with now redefined target bit masks?

sbailey · 2016-02-18T21:05:13Z

The hardcoded TYPE mapping is not formally defined; it was taken from current fiberassign usage. After discussions with @rncahn, we're working on yet another update to the formats to clean this up. More updates after that settles down.

forero · 2016-02-19T14:57:32Z

@sbailey @rncahn what are exactly the planned updates?

sbailey · 2016-02-20T01:09:39Z

this quickcat2 branch and current fiberassign use the truth file "TYPE" as an integer index into the fiberassign features file type categories such as:

0 : 'QSO',      #- QSO-LyA
1 : 'QSO',      #- QSO-Tracer
2 : 'LRG',      #- LRG
3 : 'ELG',      #- ELG
4 : 'STAR',     #- QSO-Fake
5 : 'UNKNOWN',  #- LRG-Fake
6 : 'STAR',     #- StdStar
7 : 'SKY',      #- Sky

This is convenient for fiberassign_surveysim but not for anything else. I propose to go back to file formats that most closely match the final data flow, and add a translation script to convert from the MTL target+truth files into whatever Bob needs for his "secret" file with this integer TYPE index. The point of that is to maintain a working fiberassign_surveysim while also providing a reasonable dataflow for getting the more complete quicksurvey sim going.

Everytime we try to implement this data flow some new issue comes up, but I think we are getting closer to what we want. Here is my draft of what I think should be in each file:

targets: input to merged target list (mtl)
    TARGETID            i8  -- global unique identifier
    BRICKNAME           S8
    RA                  f8
    DEC                 f8
    DESI_TARGET         i8  -- target masks, as defined in desitarget.yaml
    BGS_TARGET          i8
    MWS_TARGET          i8
    SUBPRIORITY         f8  -- [0-1] for choosing between equivalent targets

mtl: input to fiberassign
    [targets columns]
    NUMOBS              i4  -- number of additional observations wanted
    PRIORITY            i4
    LASTPASS            i4  -- this is a hack (i2, bool?)

truth: input to quickcat
    TARGETID            i8
    TRUEZ               f4
    TRUETYPE            S10 -- GALAXY, QSO, STAR, SKY [ELG, LRG, ...]
    -- optional
    RA, DEC, BRICKNAME

zcatalog: output from spectro pipeline and quickcat
    TARGETID            i8
    BRICKNAME           S8
    Z                   f4
    ZERR                f4
    ZWARN               i4
    TYPE                S8  -- (not S7, not i8)
    NUMOBS              i4  -- number observed (different from mtl.NUMOBS)

tile: output of fiberassign
    -- from targets/mtl
    TARGETID            i8
    BRICKNAME           S8  -- new
    RA                  f8
    DEC                 f8
    DESI_TARGET         i8  -- renamed
    BGS_TARGET          i8  -- new
    MWS_TARGET          i8  -- new
    -- added by fiberassign
    FIBER               i4
    POSITIONER          i4
    NUMTARGET           i4  -- how many targets does this cover
    XFOCAL_DESIGN       f4
    YFOCAL_DESIGN       f4
    -- maybe
    PRIORITY            i4
    SUBPRIORITY         f8  -- [0-1]
    -- dropping from MTL
    NUMOBS
    LASTPASS

secret: for fiberassign_surveysim row matched to mtl
    -- renaming fiberassign "id" / MTL truth "TYPE" -> "CATEGORY"
    CATEGORY            i8  -- 0..n
    -- maybe (or keep mapping in features file)
    NUMOBS_PRE          i4
    NUMOBS_POST         i4
    PRIO_PRE            i4
    PRIO_POST           i4

This impacts fiberassign, desitarget, and desisim for them all to coordinate. I'm drafting code to try this out now to see if it is viable. Please comment if you think anything is missing/wrong/ill-advised.

moustakas · 2016-02-22T04:22:06Z

This still has the BGS,MWS != DESI bias.... (I'm thinking specifically of DESI_TARGET.) Should this be DARK_TARGET or is this choice already too entrenched...?

forero · 2016-02-23T19:47:54Z

In targets. My only question concerns SUBPRIORITY. If it's only going to be used for choosing between equivalente targets, is it simply a random number? Why not labeling it RANDOM_NUMBER?
In truth. I find TRUETYPE inconvenient as string. Inside quickcat it would be more convenient to have it as a 64bit mask to use targetmask.yaml. With this in mind, perhaps we need TRUE_DESITYPE, TRUE_BGSTYPE, TRUE_MWSTYPE given that this split DESI/BGS/MWS is all over the place. But this points to @moustakas comment above, do we want to keep DESI, BGS and MWS separate?
In zcatalog. Same as in truth. Why not having TYPE as a 64bit mask? It would make life easier inside the MTL code.
In tile. I don't understand what NUMTARGET is. Is it the number of available targets per fiber?
In secret. There are variables labeled _PRE and _POST. Those are before and after what exactly?

sbailey · 2016-02-23T21:53:39Z

@moustakas the choice isn't entrenched, but it also wasn't random. After a bit of discussion about DESI_TARGET vs. DARK_TARGET, we originally went with DESI_TARGET as containing a combination of:

Dark survey target bits
General calibration bits like sky and various kinds of standard stars
BGS_ANY and MWS_ANY bits to indicate that another bit mask contains additional information that you may want.
So there is a whiff of a BGS,MWS != DESI bias, but it is more than just DARK_TARGET bits.

@forero, thanks for the comments.

In targets. My only question concerns SUBPRIORITY. If it's only going to be used for choosing between equivalente targets, is it simply a random number? Why not labeling it RANDOM_NUMBER?

The dark time survey will likely use this as a random number — its existence was based upon Daniel's suggestion to put the random choice in the target file itself rather than the code. However, the Milky Way Survey would like to rank order their targets, so for them it is a real priority. The overall idea is that you could add the integer PRIORITY with the [0-1] floating point SUBPRIORITY and get the order in which you should chose targets.

In truth. I find TRUETYPE inconvenient as string. Inside quickcat it would be more convenient to have it as a 64bit mask to use targetmask.yaml. With this in mind, perhaps we need TRUE_DESITYPE, TRUE_BGSTYPE, TRUE_MWSTYPE given that this split DESI/BGS/MWS is all over the place. But this points to @moustakas comment above, do we want to keep DESI, BGS and MWS separate?

The challenge that we previously didn't anticipate is that the targeting bits don't cover all the possibilities for true types. e.g. DESI_TARGET.QSO means something that was targeted as a QSO, but doesn't include things that actually are QSOs but have colors that fail the target selection. Similarly for genuine stars that pass QSO target selection cuts (DESI_TARGET.QSO) but not the star cuts (DESI_TARGET.STD_*).

So TRUETYPE was intended to be a map of what a redshift fitter would report if everything was working perfectly. Currently our redshift fitters are outputting strings, which is sometimes convenient (for humans) and sometimes a pain (slow for code).

In zcatalog. Same as in truth. Why not having TYPE as a 64bit mask? It would make life easier inside the MTL code.

These don't necessarily have to be a string, but within python it's not that bad to work with. See the mtl branch of desitarget for my current draft of working with these files. I'm willing to reconsider if it turns out to be a huge pain, but let's try the string version for a bit longer.

In tile. I don't understand what NUMTARGET is. Is it the number of available targets per fiber?

Hehe. This one is from you! Yes, it is the number of targets that are covered by this fiber, and is used in interpreting HDU2 of the tile file that lists exactly which targets those are.

In secret. There are variables labeled _PRE and _POST. Those are before and after what exactly?

The original idea was that this was the priority and numobs before taking any observations, and the priority and numobs after you have taken the first observation and determined what type it actually is. However, I'm having second (third?) thoughts about that, and am thinking about going back to having a simple integer CATEGORY in the truth file which is used to look up the pre/post values in the features files. That CATEGORY is redundant with a combination of (Z, ZWARN, TRUETYPE, and *_TARGET), but it might be the least hassle while we are transitioning from the internal fiberassign_surveysim to the full quicksurvey = mtl+fiberassign+quickcat simulation loop. I'll work on this this afternoon.

sbailey · 2016-02-24T23:48:58Z

I updated this branch to use the new formats described in fiberassign PR #24 and desitarget PR #32.

@forero please review these 3 pull requests as a coordinated set, and let me know if you have questions!

NOTE: I have not yet updated the bin/quickcat script to match the desisim.quickcat module. I have to run right now, but please wait for the bin/quickcat script to also be updated before doing the final merge. But you can use the python module as:

zcat_new = quickcat(tilefiles, targets, truth, zcat=zcat_previous, perfect=False)

@gdhungana example files for using this version of quickcat are at NERSC in /project/projectdirs/desi/mocks/preliminary/mtl/v2

…uickcat2

forero · 2016-02-26T20:12:53Z

I tested it as shown in this python script. It worked but it took a long time to run on the whole survey. Something like 25 min on Edison (for comparison running fiberassign takes ~20min). This quickcat implementation might need some optimization in the future.

sbailey · 2016-02-29T05:27:54Z

I'll merge this now, even if it is slow. I used astropy Tables for their database-like join syntax convenience for combining the truth catalog with the zcatalog and that may be the cause of the slowness. The other possible reason is the counter of how many times has each target been observed (I used collections.Counter from the python standard library, but there might be a faster way).

quickcat2

sbailey and others added 9 commits October 30, 2015 13:33

Adds work in progress quickcat v0

4ed1641

a print line

7d14cb6

Merge branch 'master' into quickcat

2811b2b

update FIBER_ASSIGNMENTS EXTNAME

2f334db

Merge branch 'master' into quickcat

b0605cd

checkpoint while refactoring quickcat into a module

d7554b8

adding refactored quickcat

3973666

rename apply_redshift_precision to get_observed_redshifts

7572619

Merge branch 'master' into quickcat2

391feed

sbailey added the enhancement label Feb 18, 2016

sbailey assigned forero Feb 18, 2016

sbailey mentioned this pull request Feb 18, 2016

Put a zeroth order redshift efficiency as a place holder #46

Closed

Stephen Bailey added 3 commits February 23, 2016 23:02

update quickcat to work with new truth format

7fa317c

refactor how truth columns go into zcat

3dbe624

fix tests

9f62d21

sbailey added 2 commits February 24, 2016 17:07

trying to fix dependencies for travis

98fc884

Merge branch 'quickcat2' of https://github.com/desihub/desisim into q…

c4c404f

…uickcat2

sbailey added 3 commits February 26, 2016 14:08

merging master; fixing travis desimodel install

4c4b8e1

update bin/quickcat to use desisim.quickcat module

2a442f0

updated qso sigmav number from Govinda

ed12761

sbailey added a commit that referenced this pull request Feb 29, 2016

Merge pull request #85 from desihub/quickcat2

52e38dc

quickcat2

sbailey merged commit 52e38dc into master Feb 29, 2016

sbailey deleted the quickcat2 branch February 29, 2016 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickcat2 #85

quickcat2 #85

sbailey commented Feb 18, 2016

gdhungana commented Feb 18, 2016

sbailey commented Feb 18, 2016

forero commented Feb 19, 2016

sbailey commented Feb 20, 2016

moustakas commented Feb 22, 2016

forero commented Feb 23, 2016

sbailey commented Feb 23, 2016

sbailey commented Feb 24, 2016

forero commented Feb 26, 2016

sbailey commented Feb 29, 2016

quickcat2 #85

quickcat2 #85

Conversation

sbailey commented Feb 18, 2016

gdhungana commented Feb 18, 2016

sbailey commented Feb 18, 2016

forero commented Feb 19, 2016

sbailey commented Feb 20, 2016

moustakas commented Feb 22, 2016

forero commented Feb 23, 2016

sbailey commented Feb 23, 2016

sbailey commented Feb 24, 2016

forero commented Feb 26, 2016

sbailey commented Feb 29, 2016