# T-Distribution and Comparing Two Means

## T-Distribution

* When $\sigma$ is unknown (almost always), use the t-distribution to address
the uncertainty of the standard error estimate
* Bell shaped but thicker tails than the normal distribution
    * Observations more likely to fall beyond 2 SDs from the mean (more conservative)
    * Extra thick tails helpful for mitigating the effect of a less reliable estimate for the standard error of the sampling distribution


* Always centered at 0 (like the standard normal)
* One parameter: **degrees of freedom (df)** - determines the thickness of tails (normal distribution has two parameters: mean and SD)
* When degrees of freedom increases, the shape of the t-distribution approaches the normal distribution

### T-Score

$$
T = \frac{obs-null}{SE}
$$

In [1]:
# P(|Z| > 2)
pnorm(2, lower.tail = FALSE) * 2

# P(|t_df=50| > 2)
pt(2, df = 50, lower.tail = FALSE) * 2

# P(|t_df=10| > 2)
pt(2, df = 10, lower.tail = FALSE) * 2

## Inference for a Mean

### Confidence Interval

$$
\bar{x} \pm t^{\star}_{df} SE_\bar{x} \\
\bar{x} \pm t^{\star}_{df} \frac{s}{\sqrt{n}} \\
\bar{x} \pm t^{\star}_{n-1} \frac{s}{\sqrt{n}}
$$

### Degrees of Freedom for T-Statistic for Inference on One Sample Mean

$$
df = n - 1
$$

In [2]:
# Critical t-score for 0.95 confidence interval with df = 21
qt((1-0.95)/2, df = 21)

**Example**

Suppose the suggested serving of these biscuits is 30 grams. Do these data provide convincing evidence that the amount of snacks consumed by distracted eaters post lunch is different than the suggested serving size?

* x̄ = 52.1
* s = 45.1
* n = 22
* t_21 = 2.08

In [4]:
# Confidence interval
52.1 + 2.08 * 45.1/sqrt(22)
52.1 - 2.08 * 45.1/sqrt(22)

In [7]:
# H0: mu = 30
# HA: mu != 30

# T-score
(t = (52.1 - 30) / (45.1/sqrt(22)))

# P-value
pt(t, df = 21, lower.tail = FALSE) * 2

## Inference for Comparing Two Independent Means

### Confidence Interval

$$
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{df} SE_{(\bar{x}_1 - \bar{x}_2)} \\
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{df} \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}} \\
(\bar{x}_1 - \bar{x}_2) \pm t^{\star}_{min(n_1 - 1, n_2 - 1）} \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}
$$

### Standard Error of Difference between Two Independent Means

$$
SE_{(\bar{x}_1 - \bar{x}_2)} = \sqrt{\frac{s^2_1}{n_1} + \frac{s^2_2}{n_2}}
$$

### Degrees of Freedom for T-Statistic for Inference on Difference of Two Means

$$
df = min(n_1 - 1, n_2 - 1）
$$

**Example**

* Solitaire
    * x̄ = 52.1
    * s = 45.1
    * n = 22
* No Distraction
    * x̄ = 27.1
    * s = 26.4
    * n = 22


In [12]:
# Confidence interval

(df = 22-1)
(t_21 = qt((1-0.95)/2, df = 21)) 

(se = sqrt(45.1^2/22 + 26.4^2/22))

(52.1 - 27.1) - 2.08 * se
(52.1 - 27.1) + 2.08 * se

# We are 95% confident that those who eat with distractions 
# consume 1.83 g and 48.17 g more snacks than those 
# who eat without distractions, on average.

In [14]:
# H0: mu1 - mu2 = 0
# HA: mu1 - mu2 != 0

# T-score
(t = ((52.1 - 27.1) - 0) / se)

# P-value
pt(t, df = 21, lower.tail = FALSE) * 2