# Model selection, and p-values

Fall 2022: Peter Ralph

https://uodsci.github.io/dsci345

In [1]:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (15, 8)
import numpy as np
import pandas as pd
from dsci345 import pretty

rng = np.random.default_rng(seed=123)

$$\renewcommand{\P}{\mathbb{P}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\text{var}} \newcommand{\sd}{\text{sd}}$$
This is here so we can use `\P` and `\E` and `\var` and `\sd` in LaTeX below.

Outline:

1. We can fit models now (in two ways) but... what about "false positives",
    i.e., estimates of an effect when there isn't one?
2. Just fitting models willy-nilly leads to *bad* things:
    for instance, which studies are more likely to get spuriously large effects:
    small or large sample sizes?
3. One solution: also report the strength of support *against*
    a model of "no effect" (i.e., a p-value).
4. The t-test.

# False positives

## The danger of statistics

We now know how to fit models (in two ways!). Let's do experiments now!

Let's say we all go out and test our own favorite method for avoiding COVID:
vitamin C, vaccination, lots of coffee, lucky charms, chloroquine...
Everyone comes back with an estimate of
the relative risk of getting COVID using their treatment.
*How useful is this?*

Some people have much bigger studies than others:
whose methods will have the largest estimates:
large studies or small ones?

We need to also report a **measure of uncertainty**
and/or the **strength of statistical support**
for our results.

## My skeptical friend

*Me:* I have an amazing new treatment!
I had ten random people soak their feet in ice water three times a day for a month,
and none of them got sick!

*You:* Okay, sure, but they might not have got sick anyhow?

*Me:* No, I got that covered, I had ten other random people who didn't do this,
and three of them got sick.

*You:* Hm, okay, but isn't it kinda likely that you'd get such a big difference
even if the ice water doesn't do anything? Just by random chance?

*Me:* Gee, I dunno, how do we find out?

*You:* Well, this isn't perfect, but

In [4]:
ice = rng.binomial(n=10, p=3/20, size=10000)
not_ice = rng.binomial(n=10, p=3/20, size=10000)
np.mean(ice - not_ice >= 3)

0.0559

## Fingers

I think that writing affects the ligaments in your fingers,
so that the index finger on peoples' writing hands tends to be a different length
than the index finger on the other hand.

Let's collect some data!

In [5]:
# num_longer = 
# num_shorter =

It's maybe more natural to suppose there *isn't* a difference.
Okay, let's see how good my evidence is against this idea:
what's the probability that we'd see such a big difference
between those numbers even if we all just flipped coins to decide instead?

In [None]:
# num_heads =
# num_tails =

*Exercise:* find the "probability that we'd see such a big difference between those numbers even if we all just flipped coins", by simulation.

# The p-value

The $p$-value is

> the probability of seeing a result
> at least as surprising as what we observed in the data,
> if the null hypothesis is true.

The parts of this are:

- *the probability ... if the null hypothesis is true*:
    we need a concrete model we can compute probabilities with

- *a result*: a statistic summarizing how strongly our data suggest that model is *not* right

- *at least as surprising*: usually, the statistic is chosen so that larger values are more surprising

In the finger exercise, we found the probability that
if we all flipped fair coins,
the difference
between the number of people with Heads and with Tails
was at least as big as <FILL ME>,
the difference between the number of people with longer writing-hand fingers
and those with shorter writing-hand fingers.
    
*Why's this a $p$-value?*

# The $t$-test

It turns out that thanks to the Central Limit Theorem,
if 

- $X_1, \ldots, X_n$ are a bunch of independent samples from some distribution with mean $\mu$,
- $\bar X$ is the sample mean, and
- $S$ is the sample standard deviation,

then
$$  T = \frac{\bar X - \mu}{S}  $$
has, approximately, [Student's t distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution)
with $n-2$ degrees of freedom.

*Exercise:* interpret this. What's the units?
If it's small, what does that mean? Big?