Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move catalog readers to GCRCatalogs #170

Closed
yymao opened this issue Jun 21, 2018 · 8 comments · Fixed by #177
Closed

Move catalog readers to GCRCatalogs #170

yymao opened this issue Jun 21, 2018 · 8 comments · Fixed by #177
Assignees
Labels
Data products Tutorial Development of tutorials aimed at general DESC audience
Milestone

Comments

@yymao
Copy link
Member

yymao commented Jun 21, 2018

Earlier in #157 I created a GCR reader in this repo for @wmwv's merged catalogs as a demonstration; however, now that we plan to validate the merged catalogs in DESCQA and also advertise them to the Collaboration, we should move the reader to https://github.com/LSSTDESC/gcr-catalogs as GCRCatalogs is a python package that is installed in the DESC shared environment and also directly used by DESCQA. I believe @djperrefort has been working on this.

Similarly, we should move @slosar's #169 to https://github.com/LSSTDESC/gcr-catalogs as well.

(cc @fjaviersanchez as this issue is related to #168)

@djperrefort
Copy link

I'm modifying the catalog reader for the DC2 static coadds to return values specified in the Data Products Definition Document. The last value I'm adding before submitting a pull request is the covariance matrix Icov defined as the "Ixx, Iyy, Ixy covariance matrix".

The DPDD specifies Icov as type float[6], but this raises some confusion. There are both six unique values in the 3 x 3 covariance matrix and also six photometric bands. This raises a handful of different possible ways that Icov can be calculated. Some naive examples include

  1. Determining a separate covariance matrix in each band and storing a different Icov entry for each one (u_Icov, g_Icov, r_Icov, etc.)
  2. Combining all the bands in some way and calculating the band to band covariance
  3. Determining a separate covariance matrix in each band and storing Icov as a single float[6] entry with some derivative value of the covariance matrix in each band

@TallJimbo do you know what method of calculating the covariance Icov is being implied by the DPDD?

@yymao
Copy link
Member Author

yymao commented Jun 28, 2018

On a related note, while GCR does support multidimensional numpy array, the user won't be able to directly convert GCR's return to, say, a pandas DataFrame. This may or may not be important but we should keep this in mind.

@djperrefort, also, can you review this PR first and the create your PR on top on it?

@TallJimbo
Copy link
Member

I believe the intent of the DPDD is for (Ixx, Iyy, Ixy) to be some cross-band average or "reference band" shape, and hence for Icov was to be the 6 unique values of the 3x3 covariance matrix for that cross-band or reference-band (Ixx, Iyy, Ixy). The most natural analog in the current DM pipeline outputs would be the shape slot values in the deepCoadd_ref catalog, which is the shape measured in whatever band was considered "best" for that object.

The DM pipelines currently produce moments for all bands, of course; whether they will continue to depends on how we do deblending across bands (a naive configuration of Scarlet, for instance, would guarantee that the shape for all bands would be identical). I think it's highly likely that we'll also have per-band (Ixx, Iyy, Ixy) and asosciated Icov.

On a related note, while GCR does support multidimensional numpy array, the user won't be able to directly convert GCR's return to, say, a pandas DataFrame.

Note that the DPDD is very much a conceptual document; the presence of arrays in the tables there should not be taken as an indication that we will use arrays in the actual database schema. So there's no actual gain from using arrays in your interfaces now if they're problematic.

@yymao yymao added the Tutorial Development of tutorials aimed at general DESC audience label Jun 29, 2018
@djperrefort
Copy link

@TallJimbo This makes sense, thank you. To clarify one last detail, Icov should be a single matrix representing the covariance of Ixx, Iyy, and Ixy for all objects, correct?

This relates to @yymao's earlier comment on converting to a pandas data frame since there would only be a single matrix for the whole table and not one entry per object.

@TallJimbo
Copy link
Member

I'm not totally sure I understand your question, but I'd say that Icov should be a single matrix for each Object containing the (correlated) uncertainties of (Ixx, Iyy, Ixy) for that Object.

@djperrefort
Copy link

Thank you for clarifying. Do you know the column name for the error in the second moments? I'm using slot_Shape_xx, slot_Shape_yy , and slot_Shape_xy for the principle values, but don’t see a slot_Shape_xxSigma (or equivalent). The closest thing I see is base_SdssShape_yySigma but I'm not sure what this value actually is.

The code is in place to calculate the covariance, it's just a matter of specifying the correct values to use.

@TallJimbo
Copy link
Member

TallJimbo commented Jun 30, 2018

The slot_Shape_* values are actually just aliases, and I believe they're pointing at the HSM moments implementation, and that may not report uncertainties. The base_SdssShape_* ones are our other implementation, and I believe it only reports the diagonal elements of the covariance matrix (e.g. base_SdssShape_yySigma is the square root of the variance on base_SdssShape_yy).

The two implementations are pretty similar, so you're welcome to use the SDSS one instead if you do care about having uncertainties. But there's really nothing you can do to compute the uncertainties if they're not reported, unless you go back to the pixels - these are not empirical covariances, they're uncertainties propagated from the pixel uncertainties. When they're not provided, it's best to just set them to zero.

@RobertLuptonTheGood
Copy link

RobertLuptonTheGood commented Jun 30, 2018 via email

@yymao yymao added this to the DC2 Tutorials milestone Jul 6, 2018
@yymao yymao closed this as completed in #177 Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data products Tutorial Development of tutorials aimed at general DESC audience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants