# Module 26: Multiple Comparison Problem in fMRI

### Localizing Activation

First we construct a model for each voxel of the brain. This is the massive univariate approach using GLM.

Second we can perform a statistical test with the null hypothesis that there is no activation in the voxel, to determine whether task related activation is present in the voxel. We test: $$H_0:c^T\beta =0$$

(for example we might test if condition A - condition B is equal to 0. Recall that c is he contrast matrix that determines the hypothesis test. We can do this for every voxel in the brain to produce a t-map image like this: 
<img src='tmap.png'>

Third we then must choose an appropriate threshold for determining statistical significance. We basically ask what the cutoff in the t-map (small enough p-value) is for significant activation, and anything meeting or exceeding that threshold is kept:
<img src='sigtmap.png'>

But how do we determine this level?

### Hypothesis Testing

#### Basic nomenclature

<ul>
<li>Null Hypothesis: $H_0$ is a statement of no effect (e.g. $\beta_1-\beta_2 = 0$)
<li>Test Statistic; $T$ measures compatibility between the null hypothesis and the data. 
<li>P-value: is the probability that the test statistic would take a value as or more extreme than that actually observed if $H_0$ is true. In other words $P(T>t|H_0)$ The smaller the value, the less likely it is that the null value holds.
<li>Significance Level: threshold $u_\alpha$ controls the false positive rate at level $\alpha=P(T>u_\alpha |H_0)$ Note that here 0.05 is often chosen, meaning we expect a false positive about 5% of the time.
</ul>


#### Errors

<ul>
<li>Type I error: occurs when $H_0$ is true, but we mistakenly reject it (False Positive). By increasing our significance threshold, $\alpha$, we can require stronger and stronger evidence, which helps guard against a false positive.
<li>Type II error: occurs when $H_0$ is false, but we mistakenly fail to reject it (False Negative). 
</ul>

The probability that a hypothesis test will correctly reject a false null hypothesis is the <b>power</b> of the test. In other words this is the ability of the test to discern an effect if one truly exists.

#### Multiple Comparisons

Becase we are dealing with a family of tests, choosing an appropriate threshold is complicated. Assume our $\alpha$ level is set at 0.05. Then if we only did one hypothesis test, the risk of making a Type I error would be 5%. However, because many hypothesis tests are performed, and there is a 5% chance of a Type I error each time, the risk of making at least one Type I error is greater than 5%. The more tests performed, the greater the likelihood of getting at least one false positive.

If we have 100,000 voxels and $\alpha =0.05$ then we can expect 5,000 false positives. Here we would deem these voxels to be active even though they were not.

Choosing a threshold is a balance between:
<ul>
<li><b>Sensitivity</b>: the true positive rate and
<li><b>Specificity</b>: the true negative rate
</ul>

#### Quantifying False Positive Rate

Family-Wise Error Rate (FWER): this is the probability of making <b>any</b> false positives.

False Discovery Rate (FDR): this is the proportion of false positives among the rejected tests.

More to come on these next.