In [None]:
from resources.workspace import *

The ensemble approach is an approximation to Bayesian inference. Instead of computing the full posterior distributions, we instead try to generate ensembles from them.

An ensemble is an *iid* sample. I.e. a set of "members" ("particles", or "sample points") that have been drawn ("sampled") independently from the same distribution. With the EnKF, these assumptions are generally tenous, but pragmatic.

Ensembles can be used to characterize uncertainty: either by reconstructing (estimating) the distribution from which it is assumed drawn, or by computing various *statistics* such as the mean, median, variance, covariance, skewness, confidence intervals, etc (any function of the ensemble can be seen as a "statistic"). This is illustrated by the code below.

In [None]:
mu  = 0
P   = 25    
P12 = sqrt(P)

xx = linspace(-20,20,201)
plt.plot(xx,ss.norm.pdf(xx,mu,sqrt(P)),label="True");

M = 1   # length of state vector
N = 100 # ensemble size
E = mu + P12*randn((N,M))

plt.hist(E,normed=1,bins=max(10,N//30),label="Histogram estimate")
plt.plot(xx,ss.norm.pdf(xx,np.mean(E),sqrt(np.var(E))),label="Parametric estimate")
plot_ensemble(E)
plt.legend();

**Exc 2:** Which approximation to the true pdf looks better: Histogram or the parametric?   
Does one approximation actually start with more information? The EnKF takes advantage of this.

**Exc 4*:** Suppose the histogram bars get normalized (divided) by the value of the pdf at their location. How do you expect the resulting histogram to look?

**Exc 6:** Multivariate Gaussian sampling.
Suppose $\mathbf{z}$ is a standard Gaussian,
i.e. $p(\mathbf{z}) = \mathcal{N}(\mathbf{z} \mid 0,\mathbf{I}_M)$,
where $\mathbf{I}_M$ is the $M$-dimensional identity matrix.  
Let $\mathbf{x} = \mathbf{L}\mathbf{z} + \mathbf{b}$. 
Recall [Exc 3.1](T3%20-%20Univariate%20Kalman%20filtering.ipynb#Exc-3.1:),
which yields $p(\mathbf{x}) = \mathcal{N}(\mathbf{x} \mid \mathbf{b}, \mathbf{L}^{}\mathbf{L}^T)$.
    
 * (a). $\mathbf{z}$ can be sampled using `randn((M,1))`. How (where) is `randn` defined?
 * (b). Consider the above definition of $\mathbf{x}$ and the code below.
 Complete it so as to generate a random realization of $\mathbf{x}$.  
 Hint: matrix-vector multiplication can be done using the symbol `@`. 

In [None]:
M   = 3 # ndim
b   = 10*ones(M)
P   = diag(1+arange(M))
L   = np.linalg.cholesky(P)
print("True mean and cov:")
print(mu)
print(P)

### INSERT ANSWER (b) ###

In [None]:
#show_answer('Gaussian sampling a')

In [None]:
#show_answer('Gaussian sampling b')

 * (c). Now sample $N = 100$ realizations of $\mathbf{x}$
 and collect them in an $M$-by-$N$ "ensemble matrix" $\mathbf{E}$.  
 The main thing to figure out here is: how to add the mean vector to the ensemble matrix.

In [None]:
N  = 100 # ensemble size

### INSERT ANSWER (c) ###

# Use the code below to assess whether you got it right
x_bar = np.mean(E,axis=1)
P_bar = np.cov(E)
print("Estimated mean and cov:")
with printoptions(precision=1):
    print(x_bar)
    print(P_bar)
plt.matshow(P_bar,cmap="Blues"); plt.grid('off');

In [None]:
#show_answer('Gaussian sampling c')

**Exc 8*:** How erroneous are the ensemble estimates on average?

In [None]:
#show_answer('Average sampling error')

**Exc 10:** Given the previous ensemble matrix $\mathbf{E}$, compute its sample mean $\overline{\mathbf{x}}$ and covariance matrix, $\overline{\mathbf{P}}$. Formulea are provided by eqn (2.9) of the [theoretical companion](./resources/DA_intro.pdf#page=11):
$$ \overline{\mathbf{x}} = \frac{1}{N}   \sum_{n=1}^N \mathbf{x}_n \\
   \overline{\mathbf{P}} = \frac{1}{N-1} \sum_{n=1}^N (\mathbf{x}_n - \overline{\mathbf{x}}) (\mathbf{x}_n - \overline{\mathbf{x}})^T  $$

In [None]:
# Don't use numpy's mean, cov
def estimate_mean_and_cov(E):
    M, N = E.shape
    
    ### INSERT ANSWER ###
    
    return x_bar, P_bar

x_bar, P_bar = estimate_mean_and_cov(E)
print(x_bar)
print(P_bar)

In [None]:
#show_answer('ensemble moments')

**Exc 12:** Why is the normalization by $(N-1)$ for the covariance computation?

In [None]:
#show_answer('Why (N-1)')

**Exc 14:** Like Matlab, Python (numpy) is quicker if you "vectorize" loops. This is emminently possible with computations of ensemble moments. 
 * (a). Let $\mathbf{A} = \begin{bmatrix}
		\mathbf{x}_1 -\mathbf{\bar{x}}, & \ldots & \mathbf{x}_n -\mathbf{\bar{x}}, & \ldots & \mathbf{x}_N -\mathbf{\bar{x}}
	\end{bmatrix} \, .
	$
Show that $\overline{\mathbf{P}} = \mathbf{A} \mathbf{A}^T /(N-1)$.
 * (b). Code up this formula for $\overline{\mathbf{P}}$ and insert it in `estimate_mean_and_cov(E)`

In [None]:
#show_answer('ensemble moments vectorized')

**Exc 16:** Implement the cross-covariance estimator $\overline{Cov(\mathbf{x}^1,\mathbf{x}^2)} = \frac{1}{N-1} \sum_{n=1}^N (\mathbf{x}^1_n - \overline{\mathbf{x}^1}) (\mathbf{x}_n^2 - \overline{\mathbf{x}^2})^T  $. If you can, use a vectorized form similarly to Exc 14a. 

In [None]:
def estimate_cross_cov(E1,E2):
    ### INSERT ANSWER ###

In [None]:
#show_answer('estimate cross')

**Exc 18*:**
 * (a). What's the difference between error residual?
 * (b). What's the difference between error and bias?
 * (c). Show `MSE = RMSE^2 = Bias^2 + Var`

In [None]:
#show_answer('errors')

**Exc 20*:** Suppose $\mathbf{x}$ is $M$-dimensional and has a covariance matrix $\mathbf{B}$.
 * (a). What's the size of $\mathbf{B}$?
 * (b). How many "flops" (approximately, i.e. to leading order) are required to solve the "weighted average" form of the KF update equation, eqn (A.16a) of the [DA intro](resources/DA_intro.pdf#page=29) ?
 * (c). How much memory (bytes) is required to hold its covariance matrix $\mathbf{B}$ ?
 * (d). How many mega bytes's is this if $M$ is a million?

In [None]:
#show_answer('Cov memory')

This is one of the principal reasons why basic extended KF is infeasible for DA. Although not developed here, the EnKF avoids the explicit computation of covariance matrices, working instead with reduced-rank square roots.

### Next: [Writing your own EnKF](T8 - Writing your own EnKF.ipynb)