# Daily Blog #68 - P-Values, Hypothesis Testing, and Misconceptions
### July 7, 2025 


### What a P-Value Means:

> A p-value is the **probability of obtaining a result as extreme as the observed one**, *assuming the null hypothesis is true*.

In short:
**P(Data | H₀)**
Not
**P(H₀ | Data)** ← This is what most people *think* it is (but it’s Bayesian logic, not frequentist).


### Basic Structure of a Hypothesis Test:
| Step                                 | Description                                                                   |
| ------------------------------------ | ----------------------------------------------------------------------------- |
| **1. Null Hypothesis (H₀)**          | No effect or no difference (e.g., “There is no change in GPA after tutoring”) |
| **2. Alternative Hypothesis (H₁)**   | There *is* an effect (e.g., “Tutoring increases GPA”)                         |
| **3. Choose α (Significance Level)** | Common: 0.05                                                                  |
| **4. Run test → get p-value**        | t-test, ANOVA, chi-squared, etc.                                              |
| **5. Compare p-value to α**          | If p < α → reject H₀. Otherwise, fail to reject.                              |


### What You Need to Remember:

* **p < 0.05** = statistically significant → evidence *against* the null
* **p > 0.05** ≠ proof that H₀ is true → it just means you don’t have strong evidence to reject it
* A p-value is **not** the probability your hypothesis is true
* **Small p ≠ big effect** — it just means the effect is unlikely under the null


### Effect Size vs. P-Value:

| Metric                           | What It Tells You                             |
| -------------------------------- | --------------------------------------------- |
| **P-value**                      | Is there *statistical* evidence of an effect? |
| **Effect size** (e.g. Cohen’s d) | How *large* is that effect?                   |
| **Confidence Interval**          | How *precise* is the estimate of the effect?  |

Use **all three**, not just the p-value.


### Common Mistakes to Avoid:

* Running 20 tests and reporting only the one with p < 0.05 = **p-hacking**
* Confusing correlation with causation
* Ignoring context: A tiny p-value in a massive sample might mean nothing practically


### Test Types

| Use Case                      | Test                               |
| ----------------------------- | ---------------------------------- |
| Compare 2 means (normal data) | **t-test**                         |
| Compare 2+ means              | **ANOVA**                          |
| Compare proportions           | **Chi-squared test**               |
| Correlation significance      | **Pearson’s r test**               |
| Not normal?                   | **Mann-Whitney U, Kruskal-Wallis** |


### Example in Python

```python
from scipy.stats import ttest_ind

group1 = [85, 90, 88, 75, 95]
group2 = [70, 65, 60, 72, 68]

stat, p = ttest_ind(group1, group2)
print(f"t-statistic: {stat:.3f}, p-value: {p:.3f}")

if p < 0.05:
    print("Reject null hypothesis: there's a significant difference.")
else:
    print("Fail to reject null: no significant difference.")
```

