# Confidence Intervals for difference in means

We want to see, what are the confidence intervals for $\mu_1− \mu_2$, if:

- $X,X_2$ are i.i.d with a common variance $\sigma$
- $X,X_2$ are i.i.d with variances $\sigma_1 \neq \sigma_2$
- $X,X_2$ when the populations are dependent and normally distributed

## Theorem
***

<div class="alert alert-block alert-info">
Let $X_1, X_2, ... , X_n \sim \mathcal{N}\left(\mu_X, \sigma^2\right)$ and $Y_1, Y_2, ... , Y_m \sim \mathcal{N}\left(\mu_Y, \sigma^2\right)$ are independant random samples, then a $(1-\alpha)$ confidence interval for $\mu_X - \mu_Y$ is:

<center> $\overline{X} - \overline{Y} \pm \left(t_{\alpha/2,n+m-2}\right) S_p \sqrt{ \frac{1}{n} + \frac{1}{m}} $, </center> where

<center> $S^2_p = \frac{(n-1)S^2_X + (m-1)S^2_Y}{n+m-2}$ </center>
is an unbiased estimator of common variance $\sigma^2$

## Comments
- Measurements are independent
- Normally distributed
- Same Variance
- No restriction on sample size n,m


## Example
***

In [69]:
from scipy import stats as st
import numpy as np
from statsmodels.stats.weightstats import _zconfint_generic, _tconfint_generic

np.random.seed(0)
X = st.norm.rvs(size=100,loc=10,scale=1)
Y = st.norm.rvs(size=150, loc=5, scale=1)

n,m = len(X), len(Y)
X_mean, Y_mean = np.mean(X), np.mean(Y)
S_X, S_Y = np.var(X, ddof=1), np.var(Y, ddof=1)
S_P = ((n-1) * S_X + (m-1) * S_Y) / (n + m - 2)


_tconfint_generic(X_mean - Y_mean, 
                  np.sqrt(S_P), n + m - 2, 0.05, 'two-sided')

(3.0774417646849752, 7.0159826275504802)

# Confidence interval for mean 
## Example
***

# Hypothesis testing

"What is the value of the parameter θ?"

"Is the value of the parameter θ such and such?" 
***
 - estimate μ, the mean body temperature of adults
 - is μ, the mean body temperature of adults, 37 degrees Celsius

# Hypothesis testing framework

Compare experiment data with predicted data

$X^n = (X_1, ..., X_n), X \sim P$
- $X = {0,1}$
- $X = $ accuracy of prediction

What does data tell? E.g., does it tell that model predicts better than random?


- Null hypothesis $H_0: P \in \omega$
- Alternative: $H_1: P \notin \omega$
- Statistics : $T(X^n) \sim F(X)$ if $H_0$
- $F(X)$ - null distribution of statistics
- $F(X), T(X)$ - statistical criteria for testing $H_0$ against $H_1$

# P-value
What is the probability of getting t if $H_0$ is true?
<center>$p = P(T > t | H_0)$</center>

p - probability of getting that statistics value (or more extreme) if $H_0$ is true

We compare it with significance level $\alpha$
$H_0$ is rejected in favor of $H_1$ if $p < \alpha$

What we need
- Hypotheses and alternative
- Statitics
- Null distribution
- p-value

# Type I and II errors

- $H_0$ is correct, but rejected - Type I
- $H_0$ is incorrect, but not rejected - Type II

They are not symmetric! Type I is cricical error!
<center> $ P(reject H_0 | H_0 is correct) = P(p<\alpha | H_0) < \alpha $

Type II is associated with power of criteria. Within all correct ones we choose with max value.