# Module 27: FWER Correction

## This module gives more detail on how to correct for the family-wise error rate (FWER)

The FWER is the probability of making one or more Type I errors in a family of tests under the null hypothesis.Again, a Type I error is rejecting the null hypothesis when we shouldn't have. Popular methods for controlling the FWER are:
<ul>
<li>Bonferroni correction
<li>Random Field Theory
<li>Permutation Tests
</ul>

### Bonferroni Correction

Let $H_{0i}$ be the hypothesis that there is no activation in voxel $i$, where $i\in V=\{1,...m\}$ and $m$ is the total number of voxels in the image.

Let $T_i$ be the value of the test statistic at voxel $i$

The family-wise null hypothesis, $H_0$, states that there is no activation in any of the $m$ voxels across the brain:
$$H_0 = \underset{i\in V}{\cap}H_{0i}$$

which basically states that the $H_0$ has to be true at each voxel $i$. If we reject a single voxel null hypothesis $H_0i$, then we will reject the family-wise null hypothesis. Therefore a false positive at any voxel gives a Family-Wise Error (FWE).

If we want to minimize the probability of falsely rejecting the null hypothesis, we can control it with $\alpha$ where
$$P(\underset{i\in V}{\cup}\{T_i\geq u\}|H_0)\leq \alpha$$

here we're saying that the probability of the test statistic T that we calculate for any voxel is greater than some threshold $u$, GIVEN that the null hypothesis is true, is controlled by the value $\alpha$ which we can set! If $T_i$ is above $u$ at some voxel, we're going to reject the null hypothesis, and we don't want to do that here because $H_0$ is true.

To do this we choose the threshold so that
$$P(T_i\geq u|H_0)\leq\frac{\alpha}{m}$$

where $m$ is the total number of voxels.

Therefore,
$$FWER = P(\underset{i\in V}{\cup}\{T_i\geq u\}|H_0)$$
and by Boole's inequality,
$$\leq \sum_iP(T_i\geq U|H_0)$$
and finally,
$$\leq \sum_i\frac{\alpha}{m}=\alpha$$

<img src='uexample.png'>

In the example above we generate each pixel from a normal distribution, and we set $u$ to be 1.645 which would be the 95th percentile of values in this case. So if $T_i$ is above 1.645 the pixel is white, otherwise it is black. We see that even with this high value we get many false positives.

To control for the FWER of 0.05, the Bonferroni correction is 0.05/10,000 = 0.000005. This corresponds to $u$=4.42. And when we set $u$ to this higher value, we get no false positives.

<b>Note</b>:This means that on average only 5 out of every 100 datasets generated in this fashion will have one or more values above $u$. This is a very stringent way to control for false positives, and if you do have actual activation in this group, it's going to become very hard to detect it. 

#### Summary

The Bonferroni Correction is very conservative and results in very strict significance levels. It <i>decreases</i> the power of the test (detecting a true activation if one exists) and greatly increases the chance of false negatives.

Lastly, it does not account for correlated data, and most fMRI data has significant spatial correlation, meaning the number of independent tests in fewer than the number of voxels.

### Random Field Theory

Random Field Theory allows us to incorporate the spatial correlation of our data in the the calculation of the appropriate threshold. This is based on approximating the <b>maximum statistic</b> over the whole image. 

#### Maximum Statistic

The link between the FWER and the maximum statistic is:
$$FWER = P(FWE)$$

$$=P(\underset{i\in V}{\cup}\{T_i\geq u\}|H_0)$$which is the probability that any t-value exceeds $u$ given $H_0$
$$=P(max_iT_i\geq u|H_0)$$ which is the probability that the MAX t-value exceeds $u$ given $H_0$

so we could choose the threshold $u$ such that the max only exceeds it $\alpha$% of the time.

#### Random Fields

A random field is a set of random variables defined at every point in D-dimensional space.

A Gaussian random field has a Gaussian distribution at every point and every <i>collection</i> of points. A Gaussian random field is defined by its mean and covariance functions.

Using random field methods we are able to:
<ol>
<li>approximate the upper tail of the maximum distribution, which is the part needed to find the appropriate thresholds and simultaneously
<li>account for the spatial dependence in the data
</ol>

#### Euler Characteristic

The Euler Characteristic $\chi_u$
<ul>
<li>A property of an image <b>after</b> it has been thresholded
<li>Counts #blobs - #holes
<li>At high thresholds, just counts #blobs
</ul>
In the images below, the white blobs can be thought of as the peaks in the colorful random field image.<img src='randomfield.png'>

<img src='euler.png'>

when $u = 0.5$ we get 28 - 1 = 27 blobs. When $u=2.75$ we get 2 blobs, and one blob below that.

The link between the FWER and the Euler Characteristic is that:
$$FWER = P(max_iT_i\geq u|H_0)$$
$$=P(One\ or\ more\ blobs\ |H_0)$$ assuming no holes exist
$$\approx P(\chi_u\geq 1|H_0$$ never more than one blob
$$\approx E(\chi_u|H_0)$$
which is the <b>expected</b> Euler Characteristic

So the FWER is just the expected characteristic, and there are closed form results for $E(\chi_u)$ for Z, t, F, and $\chi^2$ continuous random fields.

#### 3D Gaussian Random Fields

<img src='3gdrf.png'>

Using this result we can see that the FWER is approximately $E(\chi_u)$ above. And we can use this to choose $u$ and control the Family-Wise Error Rate.

<b>Properties</b>
<ul>
<li>As $u$ increases, FWER decreases (for large $u$)
<li>As V increases, FWER increases
<li>As smoothness increases (denominator), FWER decreases
</ul>
<b>Note</b>: this last item means that if we have an image that is very smooth and adjacent voxels behave similarly, we'll have fewer independent tests.

#### RFT Assumptions

1.) The entire image is either multivariate Gaussian or derived from multivariate Gaussian images (including Chi-squared, T, and F distributions)

2.) The statistical image must be sufficiently smooth to approximate a continuous random field. This requires the FWHM to be at least twice the voxel size and a FWHM smoothness of 3-4 times the voxel size is preferrable. 

3.) The amount of smoothness is assumed to be known because the estimate is biased when images are not sufficiently smooth.

4.) Several layers of approximations must be made.