# Lesson 3: Probability and Statistics (Part 2)

This lesson introduces you the hypothesis testing, including one-sample testing, two-sample testing, and ANOVA. In this demo, we will implement these test with **Scipy** & **Numpy**.

## Contents
1. Descriptive statistics
2. T-test:
    + for the mean of **ONE** group of scores
    + for the means of **TWO** independent samples of scores
    + for the means of **TWO** independent samples from descriptive statistics
    + on **TWO RELATED** samples of scores, a and b
4. One-way ANOVA

**References:** [Statistical functions (scipy.stats)](https://docs.scipy.org/doc/scipy/reference/stats.html)

In [16]:
# DEPENDENCIES
from scipy import stats
import numpy as np
# Seed for the reproducible results
np.random.seed(2020)

## 1. Descriptive statistics

In [21]:
data = [1,2,18,3,4,6,1]

### Central tendency
**Mean**

In [22]:
np.mean(data)

5.0

**Median**

In [23]:
np.median(data)

3.0

**Mode**

In [24]:
stats.mode(data)

ModeResult(mode=array([1]), count=array([2]))

### Dispersion
**Variance**

In [25]:
np.var(data)

30.857142857142858

**Standard deviation (std)**

In [28]:
np.std(data)

5.5549205986353085

## 2. T-test
### ONE SAMPLE
***ttest_1samp(a, popmean\[, axis, nan_policy\])***

In [8]:
# Create a random variable
rvs = stats.norm.rvs(loc=5, scale=10, size=(50,2))

In [9]:
stats.ttest_1samp(rvs, 5)

Ttest_1sampResult(statistic=array([ 0.42784599, -0.38669562]), pvalue=array([0.67063786, 0.70065584]))

In [10]:
stats.ttest_1samp(rvs, 0)

Ttest_1sampResult(statistic=array([3.89488562, 3.58961895]), pvalue=array([0.00029767, 0.00076436]))

### TWO SAMPLES
#### ttest_ind(a,b\[,axis,equal_var,nan_policy\])
Calculate the T-test for the means of two independent samples of scores<br>
[Independent two-sample t-test](https://en.wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test)<br>
[Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test)

In [29]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size = 500)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)
rvs3 = stats.norm.rvs(loc=5, scale=20, size=500)
rvs4 = stats.norm.rvs(loc=5, scale=20, size=100)

**rvs1 vs rvs2**

In [13]:
stats.ttest_ind(rvs1, rvs2)

Ttest_indResult(statistic=0.03960583196234099, pvalue=0.9684152991146652)

In [31]:
stats.ttest_ind(rvs1, rvs2, equal_var=False)

Ttest_indResult(statistic=0.039605831962341, pvalue=0.9684153293632791)

**rvs1 vs rvs3**

In [32]:
stats.ttest_ind(rvs1, rvs3)

Ttest_indResult(statistic=-1.18048956106737, pvalue=0.238086835061267)

**rvs1 vs rvs4**

In [33]:
stats.ttest_ind(rvs1, rvs4)

Ttest_indResult(statistic=0.6281510880435406, pvalue=0.5301447834002211)

In [34]:
stats.ttest_ind(rvs1, rvs4, equal_var=False)

Ttest_indResult(statistic=0.40956504918938147, pvalue=0.6829289677333327)

#### ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var)
T-test for means of two independent samples from descriptive statistics

In [35]:
stats.ttest_ind_from_stats(mean1=15.0, std1=np.sqrt(87.5), nobs1=13,
                          mean2=12.0, std2=np.sqrt(39.0), nobs2=11)

Ttest_indResult(statistic=0.9051358093310269, pvalue=0.3751996797581487)

### TWO RELATED SAMPLES
#### ttest_rel(a,b,axis, nan_policy)

In [38]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
rvs2 = (rvs1+stats.norm.rvs(scale=0.2, size=500))
rvs3 = (stats.norm.rvs(loc=8, scale=10, size=500) +
       stats.norm.rvs(scale=0.2, size=500))

In [37]:
stats.ttest_rel(rvs1, rvs2)

Ttest_relResult(statistic=1.3765549504684436, pvalue=0.1692672186590141)

In [39]:
stats.ttest_rel(rvs1, rvs3)

Ttest_relResult(statistic=-3.8442479575546384, pvalue=0.00013655494536249875)

## 3. One-way ANOVA
**f_oneway**

In [43]:
# Height of people from North, Middle, South areas
north = [1.62, 1.54, 1.7, 1.56, 1.63, 1.67, 1.72]
middle = [1.57, 1.65, 1.54, 1.58, 1.62, 1.73]
south = [1.72, 1.68, 1.69, 1.65, 1.6, 1.59, 1.65, 1.66, 1.54]

In [45]:
stats.f_oneway(north, middle, south)

F_onewayResult(statistic=0.33832972789960714, pvalue=0.7171684906720908)