# Derivation of Covariance of Weighted Power

Consider an EoR simulation (say from 21cmFAST), without any noise, but having been put through an instrumental setup (i.e. sampled onto chromatic baselines, and then re-gridded). The likelihood of a set of physical parameters is then

$$ \mathcal{L}(\theta|P_{\rm obs}) = -\frac{1}{2}\sum_{u,\eta} \frac{(P(u,\eta, \theta) - P^{u,\eta}_{\rm obs})^2}{\sigma^2_P(u,\eta,\theta)}. $$

Here, the $P(u,\eta,\theta)$ and $\sigma^2$ are determined uniquely by the model and the two scales involved. However, we do not have access to this information, which is why we run a simulation in the first place. Rather, we have an estimate of each, which we shall call $\bar{P}_{u,\eta}(\theta)$ and $s^2_{u,\eta}(\theta)$ (from here on we'll drop the explicit dependence on $\theta$). 

Now, it is not entirely certain in the first place how to construct the likelihood when the true $P$ and $\sigma^2$ are unknown. Some work has gone into this, especially in [Sellentin & Heavens (2016)](https://arxiv.org/pdf/1511.05969.pdf). However, it deals only with an unknown $\sigma^2$, not an unknown mean. Apparently people are working on this currently. Regardless, whether or not we use updated forms, or just simply replace the true values with their estimated counterparts, we require the calculation of the estimated values.

One obvious way of calculating the estimated values would be to run several simulations per iteration of the MCMC, and manually calculate them. However, this is fairly poor computationally. However, we do have information about the distribution of $P$ within a single simulation, as we form $\bar{P}_{u,\eta}$ by summing over bins in the UV plane with $u^2 \approx u^2 + v^2$. All of these bins are assumed to be statistically equivalent, and therefore tell us something about the distribution of their sum. In what follows, we determine how to calculate these two necessary quantities from the simulation data.

We note that each grid cell, $P_{u,v,\eta}$ has an associate weight (from the number of baselines which contribute to it). These weights are calculated for the visibilities, so **there is a possibility that their squares should be used when dealing with power**. Restricting ourselves to one radial bin in the UV plane, which has $N$ cells within it (thus dropping the dependence on $u$ and $\eta$), and considering each of the cells to be indexed by a single number within this bin, we have that

$$ \bar{P} = \frac{1}{\sum w_i} \sum w_i P_i \equiv \frac{1}{V_1} \sum w_i P_i. $$

This is unbiased because $E[\bar{P}] = E[P] = \mu$.

**Remember to do MSE... need to know how wrong \bar{P} could be...**

Now, we need to calculate the estimated variance. We can at once write down

$$ {\rm Var}(\bar{P}) = \frac{1}{V_1^2} \sum w_i^2 {\rm Var}(P_i), $$

since the weights are not stochastic (we know them if we know the instrument). However, ${\rm Var}(P_i)$ is a constant w.r.t $i$ because the cells are statistically equivalent. We call this $s_p^2$, and write

$$ {\rm Var}(\bar{P}) = \frac{s_p^2 \sum w_i^2}{V_1^2} \equiv \frac{s_p^2 V_2}{V_1^2} $$


Here we can pause and consider the case that all the weights are unity. Then we get that ${\rm Var}(\bar{P}) = s^2_p/N$, which is just the usual standard error on the mean.

However, we still need to know how to calculate $s_p^2$. 

To calculate the weighted $s_p^2$, we follow [the Wikipedia article on weighted variance](https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights). I have checked this calculation analytically, and it is correct when an *estimate* of the weighted mean is used. 

That is, we use

$$ s_p^2 = \frac{\sum_i w_i (P_i - \bar{P})^2}{V_1 - \frac{V_2}{V_1}}. $$