# scipy.stats
### Overview

*scipy.stats* is a statistics library for python. It contains a large number of probability distribution and statistical functions[1] 

For the sake of continuity, I will use the iris dataset to demonstrate some algorithms of the scipy.stats library, specifically t-tests and ANOVA (analysis of variants).

## T TESTS

T-tests are used to "quantify the difference of arithmetic means between two samples of data" [11].
For example, if we wanted to observe two samples of petal length and try to determine whether or not they are from the same population of the same species or if the two samples are from different populations of the same species, we could run a t-test and figure that out [Ibid].

### Packages

In [None]:
# Efficient numerical arrays.
import numpy as np

# Data frames.
import pandas as pd

# Alternative statistics package.
import statsmodels.stats.weightstats as stat

# Mains statistics package.
import scipy.stats as ss

# Plotting.
import matplotlib.pyplot as plt

# Fancier plotting.
import seaborn as sns

# Better sized plots.
plt.rcParams['figure.figsize'] = (12, 8)

# Nicer colours and styles for plots.
plt.style.use("ggplot")

## T-Test with Iris Dataset

In [None]:
# Load the iris data set from a URL.
iris_data = pd.read_csv('tableconvert_csv_xin5ac.csv')

In [None]:
s = df[df['species'] == 'setosa']
r = df[df['species'] == 'versicolor']
a = df[df['species'] == 'virginica']

In [None]:
print(stats.ttest_ind(s['petal_length'], r['petal_length']))
print(stats.ttest_ind(s['petal_length'], a['petal_length']))
print(stats.ttest_ind(r['petal_length'], a['petal_length']))

print(stats.ttest_ind(s['petal_width'], r['petal_width']))
print(stats.ttest_ind(s['petal_width'], a['petal_width']))
print(stats.ttest_ind(r['petal_width'], a['petal_width']))

print(stats.ttest_ind(s['sepal_length'], r['sepal_length']))
print(stats.ttest_ind(s['sepal_length'], a['sepal_length']))
print(stats.ttest_ind(r['sepal_length'], a['sepal_length']))

print(stats.ttest_ind(s['sepal_width'], r['sepal_width']))
print(stats.ttest_ind(s['sepal_width'], a['sepal_width']))
print(stats.ttest_ind(r['sepal_width'], a['sepal_width']))

## References
1. Statistical functions (scipy.stats) — scipy v1.7.1 manual,” 2021. [Online]. Available:
https://docs.scipy.org/doc/scipy/reference/stats.html

2.	https: // en.wikipedia.org/wiki/Iris_flower_data_set
3.	https: // github.com/RitRa/Project2018-iris/blob/master/Project % 2B2018 % 2B-%2BFishers % 2BIris % 2Bdata % 2Bset % 2Banalysis.ipynb
4.	https: // tableconvert.com /?output = csv & data = https: // gist.github.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv
5.	https: // realpython.com/python-csv/
6.	https: // stackoverflow.com/questions/1526607/extracting-data-from-a-csv-file-in-python
# pandas.read_csv
7.	https: // pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
8. https://github.com/ianmcloughlin/jupyter-teaching-notebooks/blob/main/ttest.ipynb
9. https://github.com/ianmcloughlin/jupyter-teaching-notebooks/blob/main/anova.ipynb
10. https://www.qualtrics.com/uk/experience-management/research/anova/
11. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html