# Central Limit Theorem (CLT)

The distribution of many independent, identically distributed (iid) random variables has a bell shape with the probability density function:

$$f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

### For Bernoulli trials
- mean: $p = \frac{\text{# successes}}{n}$
- variance: $\sigma^2 = p(1-p)$

### Many sets of Bernoulli trials forms a binomial distribution

$$ \binom{n}{r} $$
- mean: $np$
- variance: $\sigma^2 = np(1-p)$
- Confidence Interval: $$\hat{p} \pm Z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$,

where $\hat{p}$ indicates that we have applied the CLT to approximate a normal distribution from a binomial distribution

# Difference in proportions

<img src="../images/diff_in_props.png">

### with specified power and significance

**Z-score: ** The number of standard deviations a sample measurement x is from the population mean $\mu$

$$ Z = \frac{x-\mu}{\sigma},$$

$$n = \left(\frac{Z_{\alpha/2}\sigma}{E}\right)^2$$

where $\sigma$ is the standard deviation, defined by $$\sigma_{\text{discrete}} = \sqrt{\sum_1^N{p_i (x_i-\mu_i)^2}}$$

$$\sigma_{\text{continuous}} = \sqrt{\int_x{ (x-\mu)^2 p(x)dx}}$$

The notation $Z_{\alpha/2}$ denotes the number of standard deviations from the mean are encompassed in either direction by a particular Type-I error level ($\alpha$)

$$x = \left(\mu \pm Z_{\alpha/2}\sigma \right)$$


$$P(Z_\alpha/2) = 1.645$$ for a two-tailed 

**Significance:** If the null hypothesis is true, how likely are we to see such an extreme effect?

$$\alpha = 5\%:$$

$$Z_{\alpha/2} = 1.96$$
$$Z_{\alpha} = 1.645$$

Rejection region, aka **Critical value**: 
$$Zscore >= \mu + \sqrt{\frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2}}*Z_{\alpha}$$

**Power:** If the alternative hypothesis is true, how likely are we to reject the null hypothesis?

Given another population, what % of the population's distribution is in the rejection region?

$$Z_{\beta/2} = \frac{criticalValue - alternativeDiff}{\sqrt{\frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2}}}$$



# Multiple comparisons

When using a single data source to test k multiple hypotheses, the probability $\alpha$ of a false positive result grows**\***.

In this case, the Family-Wise Error Rate (FWER) is:

$$ \alpha_{FWER} = 1-(1-\alpha_{\text{per comparison}})^k$$



\* **note: ** inflation of Type I Error is avoided if hypothesese are perfectly dependent (in that case you're looking for the same thing k different ways)

