# EGCI 305: Chapter 6 (Hypothesis on 1 Sample)

Outline
> 1. [Packages](#ch6_packages)

> 2. [Example: glecerol concentration (manual calculation)](#ch6_ex_glycerol_manual)

> 3. [RHS probability](#ch6_rhs)
>    - [Example (1): T distribution](#ch6_ex_rhs1)
>    - [Example (2): Z distribution](#ch6_ex_rhs2)

> 4. [Z-test function (statmodels)](#ch6_ztest_func)
>    - [Example: dynamic cone penetrometer](#ch6_ex_dynamic)

> 5. [T-test function (scipy)](#ch6_ttest_func)
>    - [Example: glycerol concentration](#ch6_ex_glycerol)

<a name="ch6_packages"></a>

## Packages
> - **numpy** -- to work with array manipulation
> - **matplotlib** -- to work with visualization (backend)
> - **seaborn** -- to work with high-level visualization
> - **math** -- to work with calculation such as sqrt (if not using sympy)
> - **scipy.stats** -- to work with stat
> - **statsmodels.stats.weightstats** -- to work with hypothesis testing

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print("Numpy version =", np.version.version)
print("Seaborn version =", sns.__version__)

import math
import scipy
print("Scipy version =", scipy.__version__)

from scipy import stats
from scipy.stats import norm            # Normal distribution
from scipy.stats import t               # T distribution
from scipy.stats import chi2            # Chi-squared distribution

from statsmodels.stats.weightstats import ztest           # Z-test
from scipy.stats import ttest_1samp                       # T-test

<a name="ch6_ex_glycerol_manual"></a>

### Example : glycerol concentration (manual calculation)

In [None]:
A = np.array( [2.67, 4.62, 4.14, 3.81, 3.83] )

n  = A.size
mu = A.mean()
s  = A.std(ddof=1)
print("Sample size = %d, mean = %.3f, sd = %.3f \n" % (n, mu, s))

### Check Q-Q plot (values vs. ideal normal line)
fig = plt.figure( figsize = (3,2) )
stats.probplot(A, dist = 'norm', plot = plt)
plt.show()

loc   = mu
scale = s / float(math.sqrt(n))

<a name="ch6_rhs"></a>
## RHS Probability, i.e. P(X > x)
>- Use cumulative prob. function --> **1 - cdf**
>- Use survival function --> **sf**

<a name="ch6_ex_rhs1"></a>
### Example (1) : T distribution
>- t = 1.6, df = 8
>- P(T > t) = ?

In [None]:
by_cdf = 1 - t.cdf(1.6, 8)
by_sf  = t.sf(1.6, 8)
neg_t  = t.cdf(-1.6, 8)

print("Prob by cdf        = %.3f" % by_cdf)
print("Prob by sf         = %.3f" % by_sf)
print("Prob of negative t = %.3f" % neg_t)

<a name="ch6_ex_rhs2"></a>
### Example (2) : Z distribution
>- z = 2.16
>- P(Z > 2.16) = ?

In [None]:
by_cdf = 1 - norm.cdf(2.16)
by_sf  = norm.sf(2.16)
neg_z  = norm.cdf(-2.16)

print("Prob by cdf        = %.3f" % by_cdf)
print("Prob by sf         = %.3f" % by_sf)
print("Prob of negative z = %.3f" % neg_z)

<a name="ch6_ztest_func"></a>

## Z-test Function (statsmodels)
- **[Manual: statmodels.stats.weightstats.ztest](https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.ztest.html)**
- Can use this function to do the whole testing procedure **if raw data are available**

<a name="ch6_ex_dynamic"></a>

### Example : dynamic cone penetrometer
- Hypothesis
    >- H<sub>0</sub> : $\mu$ = 30
    >- H<sub>1</sub> : $\mu$ < 30

In [None]:
A = np.array( [14.1, 14.5, 15.5, 16.0, 16.0, 16.7, 16.9, 17.1, 17.5, 17.8,
               17.8, 18.1, 18.2, 18.3, 18.3, 19.0, 19.2, 19.4, 20.0, 20.0,
               20.8, 20.8, 21.0, 21.5, 23.5, 27.5, 27.5, 28.0, 28.3, 30.0,
               30.0, 31.6, 31.7, 31.7, 32.5, 33.5, 33.9, 35.0, 35.0, 35.0,
               36.7, 40.0, 40.0, 41.3, 41.7, 47.5, 50.0, 51.0, 51.8, 54.4,
               55.0, 57.0] )

n  = A.size
mu = A.mean()
s  = A.std(ddof=1)
print("Sample size = %d, mean = %.2f, sd = %.4f \n" % (n, mu, s))

In [None]:
### Result is returned as a tuple -- use [] to access each value

result = ztest(A, value = 30, alternative = 'smaller')
print("Calculated Z = %.2f" % result[0])
print("P-value      = %.4f" % result[1])

In [None]:
### What if we run t-test instead of z-test

### Result is returned as an object -- use .member to access each value

result = ttest_1samp(A, 30, alternative = 'less')
print("Calculated T = %.2f" % result.statistic)
print("df           = %d"   % result.df)
print("P-value      = %.4f" % result.pvalue)

<a name="ch6_ttest_func"></a>

## T-test Function (scipy)
- **[Manual: scipy.stats.ttest_1samp](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html)**
- Can use this function to do the whole testing procedure **if raw data are available**

<a name="ch6_ex_glycerol"></a>

### Example : glycerol concentration
- Hypothesis
    >- H<sub>0</sub> : $\mu$ = 4
    >- H<sub>1</sub> : $\mu$ $\ne$ 4

In [None]:
A = np.array( [2.67, 4.62, 4.14, 3.81, 3.83] )

n  = A.size
mu = A.mean()
s  = A.std(ddof=1)
#print("Sample size = %d, mean = %.3f, sd = %.3f \n" % (n, mu, s))

### Result is returned as an object -- use .member to access each value

result = ttest_1samp(A, 4, alternative = 'two-sided')
print("Calculated T = %.2f" % result.statistic)
print("df           = %d"   % result.df)
print("P-value      = %.3f" % result.pvalue)