# Module 29: Pitfalls and Multiple Comparisons

Imagine a situation where the simulated signal-noise looks like this:
<img src='fdr1.png'>

We can compare the sensitivity and specificity we get here using no correction for multiple comparisons, using the FWER adjustment alone, and using the FDR adjustment:
<img src='fdr2.png'>

We can see that we get a lot of false positives without correction, these results are not super useful. With the FWER adjustment alone, we only get 1 false positive in 10 images, which is quite good, but the sensitivity (white inside the box) is also very low. So this is not that useful either. 

Using the FDR control, we get lots of sensitivity AND a relatively low rate of false positives in the black outside the box.

### Cluster-level Inference

This process is two steps:
<ol>
<li>First we define clusters by an arbitrary threshold $u_{clus}$
<li>Then we retain any clusters larger than $\alpha$-level threshold $k_{\alpha}$

This might mean that we have a very small number of contiguous voxels that have very high test statistics, but because the cluster is so small, we will not retain it. We might also have a large group of contiguous voxels that all have test statistics just barely above this threshold, $u_{clus}$. We <i>will</i> retain this cluster.
<img src='cluster.png'>

Clustering has better sensitivity for weak, distributed signals. However, because we are grouping voxels, there is worse spatial specificity. The null hypothesis of the entire cluster might be rejected, and this only means that one or more voxels in the cluster are active.

Basically all you can say is that there is activity somewhere in this big blob, but you can't say where the activity is. 

### Threshold-Free Cluster Enhancement (TFCE)

We can combine cluster size and cluster intensity information. Basically we take the integral of the magnitude of the test statistics times their cluster area, and see if it is above some threshold. It is then evaluated at multiple thresholds.

This can now be implemented using FSL's <i>Randomize</i> package.
<img src='tfce.png'>

### What are Publications Actually Using?

<img src='pubuse.png'>

(woo et al. 2014)

### Uncorrected Thresholds

Many published PET and fMRI studies use arbitrary uncorrected thresholds (e.g. p<0.001). This is likely because that the sample sizes available are small, so corrected thresholds are so stringent that power becomes very very low.

However, this is problematic when interpreting conclusions from individual studies because many activated regions may be false positives. 

Further, null findings are hard to disseminate, so it is difficult to refute the false positives established in the literature.

##### Extent Threshold

Sometimes an arbitrary extent threshold is used where a voxel is only deemed truly active if it belongs to a cluster of k contiguous active voxels (e.g. p<0.001, 10 contiguous voxels). 

Unfortunately, this does not necessarily correct the problem because imaging data are spatially smooth and the chance of getting 10 false positives in an area may be quite high.

##### Example

<img src='extent.png'>
Activation maps with spatially correlated noise (like fMRI) thresholded at 3 different significance levels. Due to the smoothness, the false-positive activation regions are contiguous regions of multiple voxels. 

In other words, if one voxel is significant by chance, its neighbors are also likely to be significant by chance.

**The Voxels are not independent.

This is why it is not appropriate to impose arbitrary height and extent thresholds. This is too liberal. This technique does not provide family wise error rate control.

### Cluster-Extent Based Correction (including TFCE)

<img src='clusterthreshold.png'>
In the literature, people tend to use the threshold that is the default of whatever software package they are using.

So what is the impact of setting this thresholds so arbitrarily?

<img src='clusterbrain.png'>
Here we set a relatively low p-value, and we get two large blobs. All we really know is that at least one voxel somewhere in those blobs is truly significant. This really limits the specificity we're using fMRI for to begin with. 

The problem illustrated above is a common one. Because if we set a primary p-value threshold that is low (0.01), then the mean size of the cluster increases! Most findings will be larger than useful anatomical areas. Most of these voxels are actually not active. This is a really high (40-70%) false discovery rate.

Thresholding at 0.001 at the highest is good practice. The FWER is not controlled with just a low primary threshold.
<img src='thresholding.png'>

It makes sense to set a threshold that does not span giant anatomical regions.

### Summary

<ul>
<li>Multiple comparison methods
    <ul>
    <li>Uncorrected
    <li>FWE
    <li>FDR
    <li>Cluster extent-based (FWE,FDR)
    </ul>
</ul>
<ul>
<li>Pitfalls
    <ul>
    <li>Uncorrected thresholds
    <li>Cluster extant-based thresholds
    </ul>
</ul>