## Mock Population Covariance w/ Fixed Subhalo Catalogs

This notebook illustrates a "feature" of populating mocks using subhalo catalogs with a fixed number of objects.

We'll use a relatively simple model of there being 10 objects in the catalog each with a probability of assignment to one of five bins of `[0.1, 0.2, 0.4, 0.2, 0.1]`. 

In the first case, we'll fix the number of objects to 10. In the second case, we'll let that total number vary using a Poisson distribution with a mean of 10. 

What we'll see is that in the first case there is a strong anti-correlation between the bins, whereas in the second case, the anti-correlation is approximately zero. It turns out this anti-correlation is exactly zero, but I cannot figure out the proof at the moment. 

In [1]:
import numpy as np

In [2]:
def sim_sample(rng, n_mn=10, pvals=[0.1, 0.2, 0.4, 0.2, 0.1], poisson_tot=True):
    if poisson_tot:
        n = int(rng.poisson(lam=n_mn))
    else:
        n = int(n_mn)

    h = rng.multinomial(n, pvals)
    return h

In [3]:
rng = np.random.RandomState(seed=100)
hs = np.array([
    sim_sample(rng, poisson_tot=False)
    for _ in range(100000)
])

print("correlation matrix total fixed:")
print(np.corrcoef(hs.T))
print("mean total counts:", hs.sum(axis=1).mean())
print("std total counts:", hs.sum(axis=1).std())
print("mean counts:", hs.mean(axis=0))
print("std counts:", hs.std(axis=0))

correlation matrix total fixed:
[[ 1.         -0.16675678 -0.27026197 -0.16831935 -0.10741925]
 [-0.16675678  1.         -0.41093529 -0.24903522 -0.16360385]
 [-0.27026197 -0.41093529  1.         -0.40545936 -0.27528423]
 [-0.16831935 -0.24903522 -0.40545936  1.         -0.17071062]
 [-0.10741925 -0.16360385 -0.27528423 -0.17071062  1.        ]]
mean total counts: 10.0
std total counts: 0.0
mean counts: [0.99856 2.00447 3.99759 1.99982 0.99956]
std counts: [0.94513381 1.26538137 1.55027875 1.26517982 0.95129375]


In [4]:
rng = np.random.RandomState(seed=100)
hs = np.array([
    sim_sample(rng, poisson_tot=True)
    for _ in range(100000)
])

print("correlation matrix total varies:")
print(np.corrcoef(hs.T))
print("mean total counts:", hs.sum(axis=1).mean())
print("std total counts:", hs.sum(axis=1).std())
print("mean counts:", hs.mean(axis=0))
print("std counts:", hs.std(axis=0))

correlation matrix total varies:
[[ 1.00000000e+00 -4.91487366e-03  1.47399865e-03 -3.31234180e-03
   2.01407877e-03]
 [-4.91487366e-03  1.00000000e+00 -3.90916015e-03 -3.77722769e-03
   2.40426177e-03]
 [ 1.47399865e-03 -3.90916015e-03  1.00000000e+00  4.34238386e-03
  -4.48634761e-03]
 [-3.31234180e-03 -3.77722769e-03  4.34238386e-03  1.00000000e+00
   5.42309807e-04]
 [ 2.01407877e-03  2.40426177e-03 -4.48634761e-03  5.42309807e-04
   1.00000000e+00]]
mean total counts: 10.00473
std total counts: 3.1581493991101817
mean counts: [0.9942  2.00269 4.00372 2.00302 1.0011 ]
std counts: [1.00054303 1.41623542 2.00134109 1.41288035 1.00059922]
