# p-value
In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.

# Graphical Summary

![Fig](./graphical_summary/slides/Slide29.png)

# Key Formula

The p-value represents the probability of observing a test statistic at least as extreme as what was actually observed, assuming the null hypothesis is true.

$$
p\text{-value} = P(\text{Test Statistic is as extreme or more extreme than observed} | H_0 \text{is true})
$$

# Technical Details

## Mathematical Definition

For a test statistic $T$ and observed value $t$:

$$
\begin{align}
p\text{-value} &= P(T \geq t | H_0) \quad \text{[right-tailed test]} \\
p\text{-value} &= P(T \leq t | H_0) \quad \text{[left-tailed test]} \\
p\text{-value} &= 2 \times P(T \geq |t| | H_0) \quad \text{[two-tailed test]}
\end{align}
$$

### Key Properties
1. **Monotonic Relationship**: Smaller p-values correspond to more extreme test statistics
2. **Significance Level**: We reject $H_0$ if $p\text{-value} \leq \alpha$

### Critical Limitation: Asymmetric Logic
**Important**: The null hypothesis $H_0$ can never be "accepted" or "proven true" - it can only be:
- **Rejected** (when $p\text{-value} \leq \alpha$): Strong evidence against $H_0$
- **Not rejected** (when $p\text{-value} > \alpha$): Insufficient evidence to reject $H_0$

"Not rejected" is not "accepted" or "true". Absence of evidence is not evidence of absence.

## Connection to Bayesian Inference


- **Important**: P-values are almost always reported in **summary statistics**, but it is not inference. They summarize how compatible the observed data is with $H_0$, but they don't directly provide probabilistic statements about hypotheses.
- **What would we do in Bayesian hypothesis testing to answer the same question**:
    - In Bayesian inference with posterior distribution $N(\mu, \sigma^2)$, we can check: Does the 95% High Probability Density (HPD) region cover zero?
    - This directly gives the probability that $H_0$ is true:

    $$
    P(H_0 | \text{D}) = \frac{P(\text{D} | H_0) \cdot P(H_0)}{P(\text{D})}
    $$
- **False Discovery Rate (FDR)**: In Bayesian mixture models, we can compute FDR based on data $P(H_0 | \text{D})$. This provides direct probabilistic interpretations that p-values cannot.

# Related Topics

- [OLS](https://gaow.github.io/statgen-prerequisites/ordinary_least_squares.html)
- [summary statistics](https://gaow.github.io/statgen-prerequisites/summary_statistics.html)
- [likelihood](https://gaow.github.io/statgen-prerequisites/likelihood.html)
- [Bayes factor](https://gaow.github.io/statgen-prerequisites/Bayes_factor.html)

# Supplementary

- [Connection between Bayes factor and p-value](https://stephens999.github.io/fiveMinuteStats/BF_and_pvalue.html)
- [Example of difficulty of calibrating p values](https://stephens999.github.io/fiveMinuteStats/pvalue_difficult_calibrate_example.html)