#### Importing
```python
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm
```

### Constructing Confidence Intervals

$$====================================================================================$$

#### Population proportion

$$Best\ Estimate \pm Margin\ of\ Error$$

<br>

<center>Answer is (<b>lower confidence bound (LCB)</b>, <b>upper confidence bound (UCB)</b>)</center>

<br>
<br>


$$Standard\ Error \ for\ Population\ Proportion = Z * \sqrt{\frac{Population\ Proportion * (1 - Population\ Proportion)}{Number\ Of\ Observations}} = Z * \sqrt{\frac{p * (1 - p)}{n}}$$

<br>

$$Standard\ Error \ for\ Population\ Proportion \ (Conservative \ Approach) = Z * \frac{1}{2\sqrt{n}}$$

<br>

$$Difference\ in \ Proportion\ Confidence \ Interval = p_{\ 1}-p_{\ 2} ± Z * \sqrt{\frac{p_{\ 1}*(1-p_{\ 1})}{n_{\ 1}} + \frac{p_{\ 2}*(1-p_{\ 2})}{n_{\ 2}}}$$


#### Different z-multipliers depending on confidence
<img src="different_z_multipliers.png" width="700px">

###### Calculate population proportion interval with python
```python
sm.stats.proportion_confint(n * p, n)
```

##### Example of calculating population proportion interval
```python
tstar = 1.96
p = .85
n = 659
# se - margin of error
se = np.sqrt((p * (1 - p))/n)
# lcb - lower confidence bound
# ucb - upper confidence bound
lcb = p - tstar * se
ucb = p + tstar * se
(lcb, ucb)
```

$$====================================================================================$$

#### Mean

Where the *Best Estimate* is the **observed population proportion or mean** from the sample and the *Margin of Error* is the **t-multiplier**.

<br>
<br>

$$Mean = μ\ ±\ t * \frac{Standard\ Deviation}{\sqrt{Number\ Of\ Observations}} = μ\ ±\ t * \frac{s}{\sqrt{n}}$$

<br>

$$Difference\ in\ Means\ for\ Paired\ Data = μ_{\ d}\ ±\ t * \frac{s_{\ d}}{\sqrt{n}}$$

**Difference in Means for Independent Groups:**

$$Unpooled\ Interval\ Confidence\ Calculations\ = μ_{\ 1}-μ_{\ 2}\ ±\ t * \sqrt{\frac{s_{\ 1}^2}{n_{\ 1}} + \frac{s_{\ 2}^2}{n_{\ 2}}}$$

<br>

$$Pooled\ Interval\ Confidence\ Calculations\ = μ_{\ 1}-μ_{\ 2}\ ±\ t * \sqrt{\frac{(n_{\ 1} - 1)s_{\ 1}^2 + (n_{\ 2} - 1)s_{\ 2}^2}{n_{\ 1}+n_{\ 2}-2}}\ \sqrt{\frac{1}{n_{\ 1}1} + \frac{1}{n_{\ 2}}}$$

##### Calculate mean interval with python
```python
sm.stats.DescrStatsW(df["CWDistance"]).zconfint_mean()
```

$$====================================================================================$$

##### To find t* multiplier depending on df and alpha
<img src="find_t_in_table.jpg" width="700px">

$$====================================================================================$$

In order to get rid of dummy and non values
```python
da["SMQ020x"] = da.SMQ020.replace({1: "Yes", 2: "No", 7: np.nan, 9: np.nan})
da["RIAGENDRx"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})
```
In order to drop dummy and non values
```python
dummy and non values
```
In order to cross different values in one common table
```python
pd.crosstab(dx.SMQ020x, dx.RIAGENDRx)
```
Result:
<center><b>RIAGENDRx Female Male</b></center>

| SMQ020x | Female | Male |
|---------|--------|------|
| No | 2066 | 1340 |
| Yes | 906 | 1413 |

In order to aggrigate values by group
```python
dz = dx.groupby("RIAGENDRx").agg({"SMQ020x": [np.mean, np.size]})
```
In order to see all available methods
```python
df.describe()
```

## Hypothesis Testing

The equation is:

$$\frac{Best\ Estimate - Hypothesized\ Estimate}{Standard\ Error\ of\ Estimate}$$ 

We will use the examples from our lectures and use python functions to streamline our tests.

### One Population Proportion

$$Z = \frac{\hat{p}-p_{\ 0}}{\sqrt{\frac{p_{\ 0} (1 - p_{\ 0})}{n}}}$$

### Difference in Population Proportions

$$Z = \frac{\hat{p_{\ 1}} - \hat{p_{\ 2}} - null\ hip}{\sqrt{\hat{p} (1 - \hat{p}) (\frac{1}{n_{\ 1}} + \frac{1}{n_{\ 2}})}}$$


### One Population Mean
$$t = \frac{\hat{μ}-μ_{\ 0}}{\frac{s}{\sqrt{n}}}$$


### Difference in Population Means (Independent)

$$t = \frac{(\hat{μ_{\ 1}} - \hat{μ_{\ 2}}) - null\ hip}{\sqrt{\frac{s_{\ 1}^2}{n_{\ 1}} + \frac{s_{\ 2}^2}{n_{\ 2}}}}$$


### Automate z and t search

#### One Population Proportion
In order to find z mutiplier and p-value for it:
```python
sm.stats.proportions_ztest(phat * n, n, pnull)
```
#### Difference in Population Proportions
In order to find z mutiplier and p-value for it:
```python
test_stat = (p1 - p2) / se
pvalue = 2*dist.norm.cdf(-np.abs(test_stat))
```


#### One Population Mean
In order to find t mutiplier and p-value for it:
```python
sm.stats.ztest(df["CWDistance"], value = 80, alternative = "larger")
```

#### Difference in Population Means (Independent)
In order to find t mutiplier and p-value for it:
```python
sm.stats.ztest(females["BMXBMI"].dropna(), male["BMXBMI"].dropna())
```

##### To find p-value in z-tests
<img src="fin_ p-value _n_z-tests.png" width="700px">