<img src="https://nlaongtup.github.io/post/scipy-lammps/featured.png" alt="image info" />

# Scipy-Stats
***

* Author: John Paul Lee
* Github: JPLee01
* Email: G00387906@gmit.ie
* Created: 04-11-2021, Last update: XX-01-2022
* Machine Learning and Statistics: Investigation into the Scikit-learn and Scipy-Stats Python libraries.
***
* This Jupyter Notebook has been created to investigate the Scipy-Stats Python library by offeing an overview, demonstration, plots and visualisations of each of the libraries.

**Lecturer:** Dr. Ian McLoughlin

The Project instructions can be found [here](https://github.com/JPLee01/Machine_Learning_and_Statistics/blob/main/Instructions.pdf)
***
As part of the project this notebook will deal with three main tasks:

1. Offer an overview of the Scikit-learn and Scipy-Stats Python libraries.
2. Demonstrate three Scikit-learn algorithms and a Scipy-Stats hypothsis test using ANOVA.
3. Create plots and visualisations as necessary.

## Preliminaries

Prior to dealing with each section we first need to import a number of libraries. We need to import the Scipy-Stats library to allow for a comprehensive explanation of the library and machine learning to take place. The NumPy library is imported to allow for synthesisation of data sets. The Pandas library will also be imported to allow for analysis of the data sets. Finally the Matplotlib and Seaborn libraries will also need to be imported to allow for the creation of visualisations in the notebook.

In [1]:
#Import Stats from the Scipy library to allow for analysis to take place
import scipy 
from scipy import stats

#Import Pandas for Data Management 
import pandas as pd

#Import Numpty for Analysis of the data 
import numpy as np

#Import matplotlib.pyplot and seaborn for Visualisation of the data 
import matplotlib.pyplot as plt
import seaborn as sns

Also as we will be displaying Plots in this Jupyter Notebook we will implement the inline magic command to allow the Plots to be rendered inline within the Notebook.<sup>[1]</sup>

In [2]:
#Inline Magic command implemented to ensure that the Plots are rendered inline
%matplotlib inline

To ensure uniformity throughout the Juypter Notebook in terms of the the Seaborn Plots display the style and palette fuction will be set.

The style function will be set to darkgrid. This will allow for optimal measurements of Plots as the darkened background with the built in grid lines will be best displayed against the white background of the Juypter Notebook.<sup>[2]</sup>

The palette fuction will be set to bright as it will allow for clear distinction of multiple outputs within one Plot.<sup>[3]</sup>

Finally in order to ensure uniformity throughout the notebook the plots size will be set using the rcParams function.<sup>[4]</sup>

In [3]:
#Setting of Seaborn dispays to enure uniformity throughout the Juypter Notebook
#Darkplot style selected to allow for optimal measurments of Plots
sns.set_style("darkgrid")
#Bright colour palette selected to allow for clear distinction of multiple outputs within one Plot 
sns.set_palette("bright")

In [4]:
# set plot style
plt.style.use("ggplot")

# Increase the size of the output plots
plt.rcParams["figure.figsize"] = (12,8)

## 1. Scipy-Stats
Scipy.stats is a sub-package within the SciPy (Scientific Python) library which is focused on statistical functions.<sup>[5]</sup> The SciPy library is a open-source Python library built on the NumPy python package which is used for scientific and technical computing.<sup>[6]</sup> The SciPy library is a collection of mathematical algorithms and convenience functions which allows users to manipulate and visualize data.<sup>[7]</sup> Functions within scipy.stats module include; probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation and quasi-Monte Carlo functionality.<sup>[8]</sup> Scipy.stats specializes in random variables and probability distributions.<sup>[9]</sup> The scipy.stats module can be used in a number of areas including; weather forecasting, insurance and politics.<sup>[10]</sup> At present the scipy.stats module implements more than 80 continuous distributions and 10 discrete distributions.<sup>[11]</sup> 

A complete listing of the scipy.stats functions can be obtained using the function:```scipy.info(stats) ```. It should be noted however that this function will be removed from SciPy version 2.0.0 onwards and will be replaced by the ```numpy.info``` function.<sup>[12]</sup>

## References
****

<a name="myfootnote1">1</a>: Stack Overflow - Purpose of “%matplotlib inline”, https://stackoverflow.com/questions/43027980/purpose-of-matplotlib-inline/43028034

<a name="myfootnote2">2</a>: The Python Graph Gallery - 104 Seaborn Themes, https://python-graph-gallery.com/104-seaborn-themes/

<a name="myfootnote3">3</a>: Seaborn - Choosing color palettes, https://seaborn.pydata.org/tutorial/color_palettes.html

<a name="myfootnote4">4</a>: Mathplotlib - Customizing Matplotlib with style sheets and rcParams, https://matplotlib.org/stable/tutorials/introductory/customizing.html

<a name="myfootnote5">5</a>: Data Flair - SciPy Stats – Statistical Functions in SciPy, https://data-flair.training/blogs/scipy-statistical-functions/

<a name="myfootnote6">6</a>: Hussain Mujtaba - SciPy Tutorial for Beginners | Overview of SciPy library, https://www.mygreatlearning.com/blog/scipy-tutorial/

<a name="myfootnote7">7</a>: Steve Campbell - SciPy in Python Tutorial: What is | Library & Functions Examples, https://www.guru99.com/scipy-tutorial.html

<a name="myfootnote8">8</a>: SciPy - User Guide Version 1.7.1 - Statistical functions (scipy.stats), https://docs.scipy.org/doc/scipy/reference/stats.html

<a name="myfootnote9">9</a>: Javatpoint - SciPy Stats, https://www.javatpoint.com/scipy-stats

<a name="myfootnote10">10</a>: Studious Guy - 8 Real Life Examples Of Probability, https://studiousguy.com/8-real-life-examples-of-probability/

<a name="myfootnote11">11</a>: Astro Stats - The package scipy.stats, https://www.great-esf.eu/AstroStats13-Python/numpy/scipystats.html

<a name="myfootnote12">12</a>: Cornellius Yudha Wijaya - 3 Top Python Packages to Learn Statistic for Data Scientist, https://towardsdatascience.com/3-top-python-packages-to-learn-statistic-for-data-scientist-d753b76e6099

## Bibliography
***

Within the course of this assessment the following sources were also used for research purposes:
* 
* Manish Pathak - Probability Distributions in Python Tutorial, https://www.datacamp.com/community/tutorials/probability-distributions-python
* Stanford University - Python for Probability, https://web.stanford.edu/class/archive/cs/cs109/cs109.1192/handouts/pythonForProbability.html
* Tutorials Point - SciPy - Stats, https://www.tutorialspoint.com/scipy/scipy_stats.htm
* 