New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: seed keyword in populate/populate_mock #672

johannesulf opened this Issue Feb 23, 2017 · 0 comments


None yet
2 participants
Copy link

johannesulf commented Feb 23, 2017

Generally, it's very handy to have a seed keyword argument that can be passed to the populate/populate_mock functions of the model factories. This makes populating a halo catalogue reproducible instead of stochastic. However, in its current implementation there's a very big risk for serious bugs. The problem is that the value of seed is never changed. Most importantly, different Monte-Carlo functions use the same seed when populate/populate_mock is called. This can lead to correlations between unrelated quantities and totally wrong science results.

Take the Tinker13Cens HOD occupation component. It has 2 important Monte-Carlo functions: mc_sfr_designation and mc_occupation. The code internally first determines a SFR designation for each halo, 'active' or 'quiescent' and then draws the occupation, 0 or 1, based on the SFR designation. Here, it becomes very obvious why using the same seed is highly problematic. The SFR designation is determined by drawing a random number between 0 and 1 for each halo in the model. The SFR designation 'quiescent' is assigned if that random number is below the mean quiescent fraction. Conversely, the occupation is determined by also drawing a random number between 0 and 1 for each halo and assigning a central if that number is below the mean occupation of centrals of that SFR designation.

Let's for simplicity assume that the mean occupation for active centrals, the mean occupation for quiescent centrals and the quiescent fraction are all 0.5. We first call mc_sfr_designation. Half the halos will have a random number below 0.5 and will be 'quiescent', while the other half with have a random number above 0.5 and will be 'active'. So far so good. However, when we now call mc_occupation with a seed supplied the random number for each halo will be the same. Particularly, all haloes that previously had a random number below 0.5, will have a "random" number below 0.5 again. Consequently, all 'quiescent' halos will have a central. Conversely, no 'active' halo will have a central.

I attached a short python script that demonstrates the above effect. Change the seed keyword argument in mock_populate from None to 1 and watch how the result changes.

To summarize, I think the current implementation is way too problematic. I would strongly urge to change the value of the seed keyword argument after every function call in populate. For example, simply increasing it by 1 should do the job.

@aphearin aphearin added this to the v0.5 milestone Feb 23, 2017

@aphearin aphearin self-assigned this Feb 23, 2017

aphearin added a commit to aphearin/halotools that referenced this issue Feb 24, 2017

@aphearin aphearin closed this in d4a5160 Feb 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment