# Organising my thoughts around what this project is trying to achieve, and how

### 1. Project aim

The overarching aim of this project is to demonstrate that a common block-bootstrapping approach for inferring correlation skill in multi-year forecast assessment is often not fit for purpose. The block-bootstrapping approach for a given set of N validation cases (pairs of observations and E ensemble hindcasts) is as follows:

1. Randomly sample with replacement the N validation cases. To take autocorrelation into account, this is done in blocks of five consecutive cases.

2. For each case, randomly sample with replacement E ensemble members.

3. Compute the Pearson correlation coefficient for the ensemble mean.

4. Repeat steps (1)–(3) many times to create a probability distribution.

5. Calculate the p-value from a two-tailed test of the hypothesis that correlation is 0 (or alternatively, assign significance at $\alpha$ if the percentile-based confidence interval for $1-\alpha$ does not include 0).

This approach can significantly underestimate p-values (or infer significant skill too often) because the series being correlated are generally very short (< 50 years) and have higher serial correlation than the block-bootstrapping approach can account for due to:

- the nature of the process being assessed
- ensemble averaging
- long temporal averages (e.g. often 8-year means are assessed)

### 2. Project approach

My approach has been to try and test how inadequate the block-bootstrap approach can be. The answer to this question depends, of course, on:

- the nature of the process being assessed
- the number of ensemble members averaged
- the length of the temporal averaging
- the length of the signals being correlated

To test the block-bootstrap, I generate many pairs of observations and ensemble hindcasts that are constructed to be independent of each other (i.e., they are generated such that the null hypothesis of zero correlation is true). 

I then test what proportion of pairs reject the null hypothesis using the block-bootstrapping approach with $\alpha$=0.05. An appropriate statistical test should reject 5$\%$ of the pairs (Tim had a nice suggestion to use a binomial distribution to put confidence intervals on this).

### 3. Generation of independent pairs

Key requirements for the process of generating the observations and ensemble hindcasts are:

1. For each pair, the observations and hindcasts are indepedent of each other

2. The pairs are independent of each other

3. Lots of pairs can be generated

3. The observation series look like real observed/reanalysis data (e.g. similar auto-correlation functions)

4. The ensemble hindcast series look like real climate model data (e.g. similar auto-correlation functions)

It would be great if the process also provides the ability to readily test different climate processes, ensemble sizes, temporal averages, signal lengths etc.

