# The Controversy of NHST
...

## Problems with NHST

### The Null Hypothesis is Implausible

### Dichotomisation of Evidence


### The Illusion of Obtaining Improbability
Another problem with NHST is that the logic almost universally applied when reaching a conclusion from a $p$-value is flawed. To see this, let us first examine a valid application of syllogistic reasoning. Consider the following adapted example from [Cohen (1994)](https://www.sjsu.edu/faculty/gerstman/misc/Cohen1994.pdf):

1. If a person is a Martian then they are not a Member of Parliment.
2. This person is a Member of Parliment.
3. Therefore, this person is not a Martian.

Putting aside any personal paranoia about aliens invading our political system, this is a perfectly valid application of a *modus tollens* argument. The conclusion logically and irrefutably follows from the premises. However, we can reach an *invalid* conclusion if any of the premises are faulty. 

1. If a person is British then they are not a Member of Parliment.
2. This person is a Member of Parliment.
3. Therefore, this person is not British. 

This is a perfectly valid argument, but the conclusion is not sensible because the first premise is wrong. *Some* British people are Members of Parliment. We can make the premise more correct by making it probabilistic.

1. If a person is British then it is unlikely that they are a Member of Parliment.
2. This person is a Member of Parliment.
3. Therefore, this person is unlikely to be British.

Unfortunately, the application of logic has now failed and has led to a conclusion that is not sensible. The reason is that applying probability in this way is fraught with issues. Probability quantifies *belief* or *uncertainty*, rather than absolute truth. In fact, probability does not preserve truth in the same way as formal logic. This is precisely because there are *always exceptions*. As such, we can easily reach false conclusions, as shown above. 

The error in assuming that someone is not British because they are a Member of Parliment should hopefully be clear. What is perhaps less immediately obvious is that the same error is made when we do the following:

1. If $H_{0}$ is true, then it is unlikely we would have generated this result (i.e. the $p$-value will be small).
2. This result has been generated (i.e. we have calculated a small $p$-value).
3. Therefore, $H_{0}$ is unlikely to be true.

This is exactly the same logical form as above and yet it *sounds* like a plausible line of reasoning. Nevertheless, in the same vein as the example above, it is logically invalid. And yet, this is the conclusion that is implicitly reached by every significant finding in the experimental psychology literature. This is what [Cohen (1994)](https://www.sjsu.edu/faculty/gerstman/misc/Cohen1994.pdf) refers to as "the illusion of obtaining improbability". In fact, jumping from 2 to 3 above requires an *inductive leap* because it does not follow logically using deduction. So this is the inductive reasoning that lies at the heart of using NHST. We are rejecting the possibility that we observed something rare and are instead assuming that $H_{0}$ is false, which is not something that can be deduced logically. So although the mathematics of calculating $p$-values are purely deductive, drawing conclusions about $H_{0}$ based on $p$-value is inductive. In other words, this form of inference:

- Is not logically valid,
- Is not guaranteed to be true, and
- Relies on assumptions about the world that go beyond the data and the mathematics.

Another way of thinking about this is that a probabilitsic conclusion has been reached about $H_{0}$, yet we know that a $p$-value is *not* a probability statement about $H_{0}$. Instead, a $p$-value is a probability statement about the *data*. In other words, $p = P(\mathcal{D}|H_{0}) \neq P(H_{0}|\mathcal{D})$. As such, no statement can be made about how likely or unlikely $H_{0}$ is, given the current data. Although $P(\mathcal{D}|H_{0})$ and $P(H_{0}|\mathcal{D})$ sound very similar, they are not the same quantity (as discussed further in the drop-down box below). If we have a significant $p$-value, we can conclude that our data are unlikely, if the null hypothesis were true. Nothing else. This presupposes that the null is true and thus *cannot* tell us anything about its probability. As such, is the $p$-value actually telling us anything useful?


```{epigraph}
What's wrong with NHST? Well, among many other things, it does not tell us what we want to know, and we
so much want to know what we want to know that, out of desperation, we nevertheless believe that it does! 

-- Jacob Cohen
```

```{admonition} Why $P(\mathcal{D}|H_{0}) \neq P(H_{0}|\mathcal{D})$
:class: tip, dropdown
It can be helpful in trying to understand why the two statements $P(\mathcal{D}|H_{0})$ and $P(H_{0}|\mathcal{D})$ are *not* the same quantity by studying an example. This is taken from [Cohen (1994)](https://www.sjsu.edu/faculty/gerstman/misc/Cohen1994.pdf) and concerns the results of a new test for Schizophrenia. Based on testing 1,000 random individual from the whole population, we have:

| Result                 | Normal    | Schiz  | Total  |
| :--------------------- | --------- | ------ | -----: |
| Negative test (Normal) | 949       | 1      | 950    |
| Positive test (Schiz)  | 30        | 20     | 50     |
| Total                  | 979       | 21     | 1000   |


To put this into the framework of NHST, let us then assume:

- $H_{0}$ = An individual is "normal"
- $H_{1}$ = An individual has Schizophrenia
- $\mathcal{D}$ = The test result is positive for Schizophrenia

From here, let us see what the $p$-value tells us. Remembering that $p = P(\mathcal{D}|H_{0})$, we are therefore conditioning on $H_{0}$ being true. For the table above, that means we are only looking at the *first column*. Based on this column alone, what is the probability that the result is positive? In this instance, we would calculate $30$ positive results out of a total of $979$, which gives a $p$-value of $30/979 = 0.031$. So we would say that this is a *significant* result. In other words, if it is true that the individual is "normal", the chance of getting a positive test result is small. Given our usual convention, we would therefore *reject the null-hypothesis* that the individual is "normal".

Now let us see what $P(H_{0}|\mathcal{D})$ tells us. This time, we are conditioning on the data we have obtained. Because these data indicate a *positive* test result, this means only looking at the *second row* of the table. Here, we want to know, out of all individuals who recieved a positive test result, how likely is it that they are "normal"? In this instance, we would calculate $30$ "normal" individuals out of a total of $50$ individuals who recieved a positive test, which gives a probability of $30/50 = 0.60$. So far from the null hypothesis being unlikely, it is actually fairly probable that the null hypothesis is *true*, given the data we have.

In this example, the extreme difference comes from the low base-rate of Schizophrenia. If we pluck a random individual off the street, it is fairly unlikely that they have Schizophrenia. Even if they test positive, it still remains unlikely that we found someone with Schizophrenia by chance. However, if we know ahead of time that we have only selected individuals *without* Schizophrenia, the chance of the test coming back positive is very low. This is the difference between conditioning on the data versus conditioning on the null. This illustrates why the two probability statements are not interchangeable, and also illustrates why you cannot say anything about the probability of the null from a $p$-value because the calculation of the $p$-value *presupposes the null is true*.
```

### Erroneous Scientific Reasoning
Taking rejection of the null as evidence of some other preferred hypothesis. 
Statistical versus practical significance.
Evidence of causality.
The difference between significant and non-significant is not itself significant.

## The ASA Statement on $p$-values

This statement is based on 6 key principles:

1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Taken together, these principles highlight how basing much of our scientific knowledge on $p$-values alone is deeply flawed and misleading. Yet, this is precisely what the field of experimental psychology has done for years.

## Should $p$-values be Abandoned?
THIS COULD BE THE BASIS FOR THE SYNCHRONOUS SESSION - MAYBE REMOVE THE WHOLE ALTERNATIVE SECTION AND DISCUSS THEN?