Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N(z) test #11

Closed
4 tasks done
yymao opened this issue Nov 6, 2017 · 45 comments
Closed
4 tasks done

N(z) test #11

yymao opened this issue Nov 6, 2017 · 45 comments

Comments

@yymao
Copy link
Member

yymao commented Nov 6, 2017

I believe @evevkovacs is already working on this. This issue is to track progress.

  • code to reduce mock data
  • code that works within DESCQA framework
  • validation data
  • validation criteria
@yymao
Copy link
Member Author

yymao commented Dec 4, 2017

Thanks to @evevkovacs. #34 closes this.

@yymao yymao closed this as completed Dec 4, 2017
@yymao
Copy link
Member Author

yymao commented Dec 13, 2017

reopen this issue as we haven't finalized validation criteria.

@yymao
Copy link
Member Author

yymao commented Dec 14, 2017

@evevkovacs has implemented this test but we still need to impose a validation criterion.

The plot below is Nz_i_Coil2004_maglim test, taken from this DESCQA run
image

The plot below is Nz_r_DEEP2_JAN test, taken from this DESCQA run
image

@yymao yymao added this to the DC2 CS Code Freeze milestone Dec 14, 2017
@rmandelb
Copy link

Just to smooth out some of the structure, could we imagine fitting the data to the same parametric form for the N(z) from DEEP2? The advantage of this in my mind is that (a) it means we don't have to worry about structure in the mocks that is just due to small areas involved, and (b) it means that our test is still at least somewhat valid only with z<1.

@rmandelb
Copy link

(And then our validation criterion can be on the values of the fit parameters.)

@evevkovacs
Copy link
Contributor

evevkovacs commented Dec 20, 2017

@rmandelb @janewman-pitt-edu Yes, that is a possibility. The plots above are both for the small-area catalogs, so are you imagining that this will be an interim criterion, until cosmoDC2 is available? For comparison, here is a similar plot for Buzzard (10 k sq. deg). [](image
So, if you think there is a better metric for this plot than comparing fits, that would be the preferred one to implement, so that we don't have to redo it when cosmoDC2 arrives. (Or we could have one method for big catalogs and another for small ones, but I'm a bit relucatant to invest a lot of effort in tailoring something special for small catalogs when the larger one is coming)
The other issue with comparing the fits (which are z**2*exp(-z/z0)), is that the current values I have for the fit parameter (z0) do not include errors (with one exception), so we have no way to evaluate whether the deviation is acceptable. That is, suppose the criterion is +/- 3 sigma for z0... we don't have a value for sigma (except for i-band for a limited range of magnitudes). So we would have to look for other validation data sets that include errors. Jeff may be able to help here.

@janewman-pitt-edu
Copy link

The errors in z0 coming from the fit of z0 vs. magnitude s are quite small. Inaccuracy in the model would be a bigger worry (i.e.: a fit like z^2 exp(-(z/z0)^1.25 or ^1.5 would also give an acceptable solution for some value of z0, given the sample/cosmic variance in DEEP2). I actually think these plots look pretty acceptable, especially because they look better at the fainter end which is the LSST regime anyways.

@rmandelb
Copy link

@janewman-pitt-edu - speaking of the statement that these plots looking acceptable, I had some thoughts about this:

  • Correct me if I'm wrong, but I think anything with i>24 from Coil et al is an extrapolation of a fitting formula, right? I think that any validation tests using those extrapolations should be designed to be less stringent.

  • I don't think we need a validation tests for all the magnitude ranges shown. We could pick ~2 mag ranges and stick with those.

  • I was wondering what to make of the bright ones looking somewhat odd. My first thought was "OK, we don't care too much if i<21 is wonky for much of our science". But can/should these plots be used as input into the tuning of the galaxy models that @aphearin is doing?

To answer the question from @evevkovacs - I wasn't proposing that test as an interim solution until cosmoDC2 is available, I was proposing as something we do now and keep for later. I like it because it works both for large-area and small-area catalogs, with somewhat limited or very broad redshift ranges, and probably gets at enough of the features in the N(z) that it can provide a good enough test for our purposes. Curious to hear if @janewman-pitt-edu agrees with this sentiment.

With that said, the bigger issue is (a) should we allow a bit more freedom in the fitting formula? (it doesn't make sense to change the power-law out front from z^2, but it could make sense to change the bit in the exponential as Jeff says) (b) what is our validation criterion? I have code to re-fit those distributions, and I'm sure Jeff does too, so most likely either of us could provide statistical errors. But I'm not sure that basing it on a few-sigma statistical error in the fit parameters is the way to go. For example, there is incompleteness in the spec-z sample especially at the faint end, and that incompleteness is likely a function of z, so there is systematic uncertainty in the distributions in the data. And for some fainter magnitude ranges these aren't even parameters output from a fit to real data, but rather extrapolations of fitting formulae based on fit parameters for brighter magnitude ranges! It's really the systematic error in that extrapolation that is likely to dominate. I am not sure what's a great way to decide whether the distribution is close enough the data to not affect our science too badly. Curious to hear if @slosar has some thoughts for LSS?

@janewman-pitt-edu
Copy link

@rmandelb : I'm pretty sure the biggest uncertainty for the DEEP2 fits is sample/cosmic variance, not incompleteness. Without much more area it's just not possible to distinguish between exp(-(z/z0)^1 vs 1.5. (We can rule out 2 or 0.5).

Correct me if I'm wrong, but I think anything with i>24 from Coil et al is an extrapolation of a fitting formula, right? I think that any validation tests using those extrapolations should be designed to be less stringent.

Past i=23 (or r=24). The z0 vs magnitude limit curve is such a straight line (with very small residuals) that I have confidence in those extrapolations; what I would say is more uncertain is the functional form to use.

I don't think we need a validation tests for all the magnitude ranges shown. We could pick ~2 mag ranges and stick with those.

Absolutely.

I was wondering what to make of the bright ones looking somewhat odd. My first thought was "OK, we don't care too much if i<21 is wonky for much of our science". But can/should these plots be used as input into the tuning of the galaxy models that @aphearin is doing?

I have thought of this as a sanity check. I think if we are really matching a) the number counts, b) the galaxy color distributions for galaxies, and c) luminosity functions we should be getting this right; but a, b, and c are all easier to measure empirically than dN/dz. I thus feel like we should focus on things more like those for validation tests we want to strictly enforce. (I'll note that I prefer luminosity functions to mass functions as the latter has systematic uncertainties in the observations and at the high-z end again has sample/cosmic variance issues).

I believe it would be possible for me to do the z0 vs. r or i magnitude limit fits for ^1.5 and probably ^1.25, if I can remember how the code works :)

@slosar
Copy link
Member

slosar commented Dec 21, 2017

I don't have much to add, except that absolute number of object as a function of magnitude matters much more than how they are distributed across redshift. The former affects number of blends, etc which propagates into pretty much everything and will be crucial for some decision (do you just throw really bad ones away, or there are just too many so you need to fight them), the latter just changes relative SNR and perhaps PZ accuracy, but while these affect FoM, etc, they don't fundamentally change the way we want to do data reduction.

@evevkovacs
Copy link
Contributor

@rmandelb @yymao @janewman-pitt-edu @slosar So to summarize this:
i) DN/dmag is much more important than dN/dz
ii) dN/dz could be checked "by eye" if the fits to the data were more representative of the true uncertainties involved and the catalog data included errors for cosmic variance
iii) the fits to the data for dN/dz need to explore other functional forms.

Actually, since Coil at al do have other fits available, with exp(-(z/z0)*1.2), I propose that I should:

  1. include this variant in the plots and make some kind of shaded band for the data fits
  2. implement jack-knife errors for the catalogs to get more realistic error bars on the catalog data.
    The we can see how the plots look. What do you all think?

@aphearin
Copy link

@evevkovacs @yymao - a lot of work needs to be done on the catalogs in order for them to meet basic specs. This test you describe is important, but for example implementing jack-knife errors on the catalogs seems to me a much lower priority than inspecting the result and trying to make adjustments to the catalog. Worth keeping in mind as we have very limited time remaining.

@evevkovacs
Copy link
Contributor

@aphearin @yymao We already have a jack-knife module from DESCQA1, so I hope this will be easy.

@yymao
Copy link
Member Author

yymao commented Dec 22, 2017

Not necessary for this specific case but generally, I agree with @aphearin that we are currently human power limited and we really need to prioritize our efforts.

@janewman-pitt-edu
Copy link

janewman-pitt-edu commented Dec 22, 2017 via email

@janewman-pitt-edu
Copy link

@pefreeman has been testing out applying a photo-z algorithm and found some anomalies that caused us to investigate further. @sschmidt23 has done a couple of tests showing that the halo mass cutoffs in protoDC2 are having a big effect -- such that, although redshift distributions look OK for integrated bins in magnitude, they will look way off for differential bins (see https://github.com/sschmidt23/DC2stuff/blob/master/protoDC2noerrorscheck.ipynb for DC2, vs. https://github.com/sschmidt23/DC2stuff/blob/master/Buzzard_quicklook.ipynb for Buzzard).

The key issue is visible in this plot:

image

where the solid red line indicates the LSST gold sample weak lensing limit and the dashed line represents a deeper limit that could perhaps be used for LSS studies. The mass limit causes a deficiency in faint objects at low redshift (where they do exist in real samples). Amongst other things, this will cause photo-z performance to be poor if realistic magnitude priors are used with template techniques, whereas it will be too good with training-based techniques as magnitude will be more informative about redshift than it should be.

Other surprising things that Peter and Sam have found are the gaps in the color-color diagram (g-r vs. r-i) and the jumps at what our best guess is are the boundaries between where different-redshift cubes were used to construct the light cones (most visible in the r-i vs. redshift plot at higher redshifts).

@janewman-pitt-edu
Copy link

Here's the color-color plot:

image

@janewman-pitt-edu
Copy link

And here's the r-i vs. z plot:

image

@rmandelb
Copy link

@janewman-pitt-edu - thanks for these plots. A few people had been looking at color vs. redshifts, but I'm not sure anybody has considered mag vs. redshift - and indeed that's a pretty important gap. We should not just be testing 1D dN/dmag and dN/dz because that will cause us to miss important features in these distributions.

I just tried to find how the mass limit changes from proto-DC2 to cosmo-DC2, and failed. Must have been looking in the wrong place. Hopefully @evevkovacs or @yymao will comment. It could be that the move to cosmo-DC2 will help with this problem.

@aphearin
Copy link

@janewman-pitt-edu - Thanks for posting the tests of protoDC2. I agree with your assessment that the chunky edges shown in your z_phot vs. z_spec are most likely due to finite time-stepping of the simulation snapshots used to construct the lightcone.

I've also noticed some unrealistic features in color-color and color-magnitude space for protoDC2. I've been working on rescaling a single snapshot of protoDC2, using a mishmash of Monte Carlo resampling methods and drawing upon the UniverseMachine to resample model galaxies as a function of {M*, sSFR}, so that I can get two-point clustering correct as a function of these variables.

Since I'm currently only focused on a single snapshot, I can't plot things as a function of redshift, but this four-panel plot shows a range of quantities that are scaling better than what you're showing here. All panels show results for a volume-limited sample at z=0, complete in stellar mass down to 10**9.5 Msun (~4e5 galaxies).

four_panel_color_magnitude

@janewman-pitt-edu - I looked at the notebooks you posted and I do not follow your argument about halo mass cutoffs. None of those plots show halo mass on any axis, and no plotted samples have done any halo mass masking. Apologies if I'm being dense or just missed something, but why are you saying that halo mass cutoffs are connected to the problems you are seeing?

CC @evevkovacs @dkorytov @katrinheitmann

@janewman-pitt-edu
Copy link

@aphearin : My belief that this is due to a mass cutoff is based on past discussions with @evevkovacs et al. about how Galacticus was run (at least in the past). The shape looks entirely consistent with that to me; basically, the envelope of i magnitude vs. redshift is set by the mass cutoff combined with the highest mass-to-light ratio amongst the low-mass galaxies (or maybe I'm flipping that and it's lowest). I.e.: it looks like a line of ~constant luminosity. At high z that luminosity is brighter than our magnitude limit but at low z it is not (as distance is smaller).

You mentioned the Universe Machine mapping is happening as a box; just to confirm: galaxies will still end up with observed colors that properly kcorrect to their assigned z's? Otherwise photo-z's will be very messed up...

@evevkovacs
Copy link
Contributor

I rechecked the parameter file that I used to run Galacticus. In order to speed up the calculation of luminosities, there is a cut-off in absolute magnitude of -10; ie. if the galaxy is fainter than -10, it is not evaluated. However, at z ~ 0.2, I estimated that this would cut out galaxies fainter than apparent magnitude ~ 30, so unless I made a mistake, this cut would not produce the behavior seen above. We will investigate further.

@janewman-pitt-edu
Copy link

@evevkovacs : I come up with ~30 too. That suggests again that it might be due to a mass limit on how Galacticus was run (or the simulations/merger trees used as input) rather than a luminosity limit directly.

@evevkovacs
Copy link
Contributor

@dkorytov OK good. We are checking now, but I believe Dan added an additional cut of M<-16 to protoDC2 because that is where the numbers of galaxies started to fall off. And M<-16 would correspond to m~24 at z=0.2, so that is exactly where you see the cutoff. We can easily remove this cut. The number density of the resulting galaxies would probably be too low, but at least there would be some to look at.

@janewman-pitt-edu
Copy link

That would explain it!

Why would the number density be low?

@dkorytov
Copy link
Contributor

dkorytov commented Jan 15, 2018 via email

@aphearin
Copy link

@dkorytov is correct about simulation resolution - properly resolving the halos hosting galaxies fainter than Mr < -16 is not possible for present-day simulations of cosmological volumes

@janewman-pitt-edu
Copy link

I know it's late for DC2, but for DC3 can we at least try to emulate the subhalos down to lower masses? Redshift distributions and photo-z tests will be off if we don't.

@aphearin
Copy link

We may be able supplement the existing simulation on DC2 timescales, @janewman-pitt-edu - for example using something along the lines of this paper, which is similar to what is done in Buzzard. I thought for purposes of present discussion that it was worth making sure you were well aware that the specs you are asking for are beyond the current capabilities of any simulation that has ever been run.

@rmandelb
Copy link

rmandelb commented Jan 30, 2018

@yymao and @evevkovacs - I wanted to ask you about an update to this test that could address the issue @janewman-pitt-edu raised. Since Andrew showed that it would be very challenging in DC2 to get a sample to the depth of the LSST gold sample at the lowest redshift range (see discussion in this issue), I think that for the dN/dz test we need to only use the redshift range where the sample is complete down to the magnitude we are using for the test. So basically, we'd need to make a 2D plot like the one Jeff showed in #11 (comment), and use it to find the redshift at which the limiting magnitude is the one we are using for the test, and only require the test to pass above that redshift.

In other words: if we're using i<X (X=25 or 24 or whatever), then we take narrow redshift bins, and in each one, we ask what its limiting magnitude is. Only when the limiting magnitude is X or larger do we consider the test valid. That becomes a lower limit in redshift, zmin, for this test, and we only test the dN/dz above that zmin.

Is that a simple change to the setup? Is there some obstacle? My impression is that this should still be an issue in cosmoDC2 or any current N-body-based simulation, so when devising the test to be generally applicable we should account for this.

@aphearin
Copy link

@rmandelb - I think this generally a good idea. I will just briefly point out that we have made some progress on this faint-end problem since receiving input from @janewman-pitt-edu and @sschmidt23. The plot here shows that we're now pushing down past the previous hard cutoff at Mi=-14 (shown with the black curve to guide the eye). The hard cut in M* is currently what drives the cut in Mi, and it should be possible to extend this further still for cosmoDC2, if not the next release of protoDC2.

new_scatter

@yymao
Copy link
Member Author

yymao commented Jan 30, 2018

@rmandelb the procedure you proposed is certainly doable. It does require some work to modify the current code.

Given that, I am unchecking the box for "code to reduce mock data" as it still need to be worked out.

@yymao
Copy link
Member Author

yymao commented Jan 31, 2018

@janewman-pitt-edu re: your comment at #50 (comment) --- I think what @rmandelb suggested above, if I interpret that corrected, is N(m < X, z), but only for a range of z that the galaxy is complete down to X.

Is your suggestion at #50 (comment) something different?

@janewman-pitt-edu
Copy link

It's slightly different as Anze was requesting doing everything in a general framework (but 100% inspired by Rachel's discussion here). Basically we'd be comparing a particular simulation in a restricted z and magnitude range, but we'd define the test quantity in broader ranges than that (just only use the relevant cells).

@yymao
Copy link
Member Author

yymao commented Jan 31, 2018

@janewman-pitt-edu hmm, they seem the same to me operationally (need to count the number of galaxies in bins of m and z). Maybe I missed something? Or you are suggesting a different to compare with the validation data?

@slosar
Copy link
Member

slosar commented Jan 31, 2018

Ok, this looks like we're turning a bit in circles. You count galaxies in bins of m and z and then compare only those bins that you are complete in your test datasets. Once we have more datasets, this will naturally extend to some other bins.

@yymao
Copy link
Member Author

yymao commented Jan 31, 2018

@slosar thanks --- I guess what I am trying to figure out is how to compare with data, or, more specifically, how to normalize the counts in bins of mag and z and how to define the validation criteria.

I think what you are saying is just do number of galaxies per sky area, in bins of mag and z. I thought Rachel's suggestion was to look at P(z) for z > z*, where z* is the redshift above which the galaxy catalog is complete to m < X. I understand that they are not that different but the choice affects how the validation criteria are defined.

@janewman-pitt-edu
Copy link

I think we want to:

  1. do the analysis for integral bins in magnitude (i.e. m < X), not differential bins in magnitude (A < m < B). That is tied down better by the data than differential bins.

  2. have both minimum and maximum redshifts for any given comparison. Most observational samples will have an effective z limit.

I agree this is all delving in the details.

@yymao
Copy link
Member Author

yymao commented Feb 14, 2018

I think we have come to some agreement on how to implement the criteria given the above discussion.

@evevkovacs what's the current status of this test? Are you actively working on it or should we find help? If you are currently working on it, can you provide a status update? Thanks!

@evevkovacs
Copy link
Contributor

@yymao @janewman-pitt-edu I am working on the test. I need to do a little code refactoring. I need a source for validation data with galaxy number densities. Right now, I only have shape information to compare with (from Jeff and from Coil et al 2004) Once I have the data, I can redo the test to conform to it. Thanks

@rmandelb
Copy link

If we're already doing a number density validation of N(<mag) in a separate test, then can we just test N(z) based on shape, ignoring normalization?

@janewman-pitt-edu
Copy link

janewman-pitt-edu commented Feb 16, 2018 via email

@evevkovacs
Copy link
Contributor

@janewman-pitt-edu Is there anything more up-to-date than the Coil et al 2004 data that I have been using for the shape test for the N(z) distributions? Thanks

@janewman-pitt-edu
Copy link

janewman-pitt-edu commented Feb 20, 2018 via email

@yymao
Copy link
Member Author

yymao commented Jun 14, 2018

This test has been done for a while. There's a bug fix currently open in #119. But this issue should be closed.

@yymao yymao closed this as completed Jun 14, 2018
patricialarsen added a commit that referenced this issue Feb 21, 2023
adding srv tests, testing add on catalogs, testing analysis_tools (th…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants