New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing subvolume(s) in Bolshoi-Planck z=0 catalog #598

Closed
aphearin opened this Issue Jul 20, 2016 · 7 comments

Comments

Projects
None yet
2 participants
@aphearin
Copy link
Contributor

aphearin commented Jul 20, 2016

The following Halotools-provided halo catalog is missing a substantial number of halos:

simname = bolplanck.
redshift = 0
halo_finder = rockstar
version_name = halotools_alpha_version2.

The missing halos appear to be isolated to x, y > 200 Mpc/h. All scientific results deriving from the halotools_alpha_version2 catalog are invalid.

Until this is resolved with the v0.4 release, users should download the latest ASCII data from http://www.slac.stanford.edu/~behroozi/BPlanck_Hlists/hlist_1.00231.list.gz and reprocess it themselves using RockstarHlistReader.

CC @andrew-zentner, @duncandc, @vdbosch69.

@aphearin aphearin added this to the v0.4 milestone Jul 20, 2016

@aphearin aphearin self-assigned this Jul 20, 2016

@aphearin aphearin referenced this issue Jul 30, 2016

Merged

Reprocess sims #614

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 10, 2016

multi_multi_scatter

This figure shows that the v0.4 catalogs currently up on the Yale website resolve the problem missing subvolume problem first pointed out by @johannesulf. The quick way to read these plots is just to notice that there are no "holes" missing in any of them. In a little more detail, each panel shows a 2-d scatter plot of the positions of 1e4 randomly selected halos from a single snapshot; the left column shows x-y scatter plots, the middle column x-z, the right column y-z. From top to bottom, the rows show bolshoi, bolshoi-planck, consuelo and multidark. Within each panel, results for all four redshifts are shown.

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 10, 2016

To help protect against this problem in the future, Peter Behroozi now includes a file containing the result of running sha1sum on all halo catalogs posted on SLAC. Before processing these snapshots, I verified that the sha1sum run on each downloaded catalog agrees with his tabulated values, which should guarantee that the download of each catalog proceeded without interruption.

This check should be done every time any rockstar halo catalog is downloaded, either by halotools developers or users. The reason this is so important for large-scale structure statistics is that the rows of publicly available rockstar catalogs are chunked by spatial subvolume, so silently-incomplete downloads are systematically missing spatial sections of the snapshot. The above plot, and the ones to follow, provide further testing on the updated catalogs.

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 10, 2016

xi_comparison_mass_1e12_multipanel

The above plot compares the 3d clustering of halos of a fixed mass of Mvir~1e12. Different colored curves show different simulations. Different panels show results for different redshifts.

In the plot below, I show the ratio of each bolshoi-planck and consuelo relative to bolshoi, so that values on the vertical axis less than unity correspond to situation in which bolshoi-planck (consuelo) has weaker clustering than bolshoi.

xi_comparison_mass_1e12_multipanel_residual

Notice that milky way halos in bolshoi-planck show 10-15% weaker clustering than bolshoi. That's the sense of the effect that should be expected by the shift in M*, but that magnitude is a bit surprising. This has been confirmed by @johannesulf in an independently downloaded catalog. @vandenbosch69 and/or @surhudm - does this level of difference also seem a bit high to you?

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 10, 2016

The plot below is the same as the one above, except here I go to a slightly higher mass, logMvir ~ 12.5, so that I can include multidark.

xi_comparison_mass_5e12_multipanel_residual

Even though the z=2 panel shows a larger discrepancy for multidark than for the bolshoi-planck ratio, this is reasonable since I've made no attempt to compare halo clustering at fixed peak height, and this mass range is way above collapse mass at z=2, where bias is a more rapidly varying function of mass. The fact that the multidark discrepancy increases with redshift is comforting, and note that the bolshoi-planck ratio does not show this redshift-dependence.

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 10, 2016

Another simple way to check this is just to do simple counts-in-subvolume statistics. I divide each snapshot into the same subvolumes used to chunk the data hosted on SLAC, and just count the number of host halos with peak mass greater than 300 particles, Mpeak > 300mp. I then compute the minimum counts divided by the median counts and plot the result below. I show results for all redshifts and simulations. At each of the four redshifts 0, 0.5, 1 and 2, I slightly stagger each simulation's bar plot to make it easier to read.

subvol_counts

In the previous buggy catalogs, the z = 0 value of bolshoi-planck would have been zero. The bolshoi(-planck) subvolumes are 50 comoving Mpc/h in size, and there's still quite a lot of cosmic structure on these scales: the typical value of the vertical axis for Poisson statistics would be ~0.95.

@surhudm

This comment has been minimized.

Copy link
Contributor

surhudm commented Aug 11, 2016

Hi Andrew,

The Tinker 2010 bias for BolshoiP is about 4 percent lower at z=0 for 1.E12
Msun/h halos (for 200m definition), which would result in 8 percent
difference. Can you check what the rough difference is for the virial
definition?

Cheers,
Surhud

On Thu, Aug 11, 2016 at 7:51 AM Andrew Hearin notifications@github.com
wrote:

Another simple way to check this is just to do simple counts-in-subvolume
statistics. I divide each snapshot into the same subvolumes used to chunk
the data hosted on SLAC, and just count the number of host halos with peak
mass greater than 300 particles, Mpeak > 300mp. I then compute the
minimum counts divided by the median counts and plot the result below. I
show results for all redshifts and simulations. At each of the four
redshifts 0, 0.5, 1 and 2, I slightly stagger each simulation's bar plot to
make it easier to read.

[image: subvol_counts]
https://cloud.githubusercontent.com/assets/6951595/17573766/89723ae4-5f2a-11e6-9c89-7a55ada95554.png

In the previous buggy catalogs, the z = 0 value of bolshoi-planck
would have been zero. The bolshoi(-planck) subvolumes are 50 comoving
Mpc/h in size, and there's still quite a lot of cosmic structure on these
scales: the typical value of the vertical axis for Poisson statistics would
be ~0.95.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#598 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEMGuYCVRdAqDJxukz_1-zhTTuBIB_t4ks5qelXlgaJpZM4JRAha
.

@aphearin

This comment has been minimized.

Copy link
Contributor Author

aphearin commented Aug 11, 2016

Many thanks for the sanity check, Surhud. I figured you had code for that tinker bias estimate at-the-ready. That's slightly lower than what I'm seeing here, but close enough to chalk up the remaining residual to a combination of sample variance and fitting function error, so I'm not so worried about this anymore. I think this is convincing that the v0.4 catalogs have been processed properly.

@aphearin aphearin closed this Aug 11, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment