# Tech #6 Statistical Tests

- This file illustrates how to use Python to perform simple statistical tests

### Import pandas and scipy packages ###
- SciPy (pronounced “Sigh Pie”) is a Python package for mathematics, science, and engineering.
- SciPy contains modules for optimization, linear algebra, integration, interpolation, statistical functions, and other tasks common in science and engineering.
- In this example, we use the sub-package, *stats*, to get statistical functions.

In [None]:
import pandas as pd
import scipy.stats as st

## Example: Grades from two classes ##
- Each class has 20 students

In [None]:
df= pd.DataFrame({'class1': [70, 73, 75, 80, 64, 56, 61, 99, 55, 60, 71, 72, 74, 81, 63, 53, 74, 95, 46, 65],
                    'class2': [65, 76, 75, 82, 61, 58, 66, 70, 45, 77, 62, 78, 72, 76, 60, 28, 67, 63, 49, 78]})

## One-sample *t*-test

```Python
st.ttest_1samp(data, value)
```
- This test is used to examine whether the mean of a sample is **significantly/statistically different from a certain value**.
- The returning statistics include both *t-*statistic and *p-*value.
- The output of interest is the *p-*value. A smaller *p-*value suggests a smaller chance that randomness could explain the difference.
- Normally, when the *p-*value is smaller than 0.1 or 10%, we can assess that the sample mean is **significantly** different from the given value.

In [None]:
df['class1']

In [None]:
df['class1'].mean()

In [None]:
df['class1'].plot.hist(range=(0,100))

**The below example examines whether the average grade of Class 1 is significantly larger than 60.**

**Step 1:** Calculate the difference between the sample mean and the given value.

In [None]:
df['class1'].mean() - 60

**Step 2:** Examine whether the difference between the sample mean and the given value is significant statistically.

In [None]:
st.ttest_1samp(df['class1'], 60)

**Interpretation:** Because (1) the average grade of Class 1 is larger than 60 and (2) the *p*-value of the statistical test on the difference is smaller than 10%, there is a <10% chance the the sample mean and the given value are the same. We can conclude that the average grade of Class 1 is significantly larger than 60.

**Note:** You can present these two statistics with fewer decimal points using the following code.

In [None]:
t, p = st.ttest_1samp(df['class1'], 60)
print("ttest_1samp_stats: t = %.3f  p = %.3f" % (t, p))

---
## Two-sample *t*-test

```Python
st.ttest_ind(data, data)
```
- This test is used to examine whether the mean of one sample is significantly different from the mean of the other sample.
- The returning statistics include both *t-*statistic and *p-*value.
- The output of interest is the *p-*value. A smaller *p-*value suggests a greater statistical significance.
- Normally, when the *p-*value is smaller than 0.1 or 10%, we can assess that the two samples' means are significantly different from each other.

In [None]:
df['class1'].mean()

In [None]:
df['class2'].mean()

In [None]:
df['class1'].plot.hist(range=(0,100))

In [None]:
df['class2'].plot.hist(range=(0,100))

In [None]:
df[['class1','class2']].plot.hist(range=(0,100), alpha=0.3)
#The parameter alpha is to set transparency to 30%.

**The below example examines whether the average grade of Class 1 is significantly larger than the average grade of Class 2.**

**Step 1:** Calculate the difference between the two sample means.

In [None]:
df['class1'].mean() - df['class2'].mean()

**Step 2:** Examine whether the difference between the two sample means is significant statistically.

In [None]:
st.ttest_ind(df['class1'], df['class2'])

**Interpretation:** Even though the average grade of Class 1 is larger that of Class 2, the *p*-value of the statistical test on the difference is 0.35, greater than 10%. So we can not conclude that the average grade of Class 1 is significantly larger than that of Class 2.

*Additional Note: Both tests presented through this file are two-tailed tests, which are generally stricter than one-tailed tests. If you are interested in the difference between one- and two-tailed tests, you can refer to your statistics book or online resources.*