# Scipy-stats library
[Official documentation](https://docs.scipy.org/doc/scipy/reference/stats.html)

***
 

In [None]:
# Efficient numerical arrays
import numpy as np

# Plotting
import matplotlib.pyplot as plt

# Part 1 - Overview of the SciPy-stats library


# Scipy library


SciPy stands for Scientific Python, it is a free open-source Python library used for scientific and technical computing as well as solving mathematical and technical problems. It is built on NumPy and allows manipulation and visualization with a wide range of high level commands. SciPy contains 16 different modules used for optimization, linear algebra, integration,interpolation, special functions, FFT, signal and image processing and other tasks common in science and engineering. 

SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. [1](#section)

## Scipy-stats


SciPy.stats is a sub-package module of Scipy library. It contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.

## ANOVA test

### What is ANOVA?

ANOVA stands for Analysis of Variance. It’s a statistical test that was developed by Ronald Fisher in 1918 and has been in use ever since. ANOVA tells you if there are any statistical differences between the means of three or more independent groups.

One-way ANOVA is the most basic form. There are other variations that can be used in different situations, including:

- Two-way ANOVA
- Factorial ANOVA
- Welch’s F-test ANOVA
- Ranked ANOVA
- Games-Howell pairwise test

### How does ANOVA work?

Like the t-test, ANOVA helps you find out whether the differences between groups of data are statistically significant. It works by analysing the levels of variance within the groups through samples taken from each of them.

If there is a lot of variance (spread of data away from the mean) within the data groups, then there is more chance that the mean of a sample selected from the data will be different due to chance.

As well as looking at variance within the data groups, ANOVA takes into account sample size (the larger the sample, the less chance there will be of picking outliers for the sample by chance) and the differences between sample means (if the means of the samples are far apart, it’s more likely that the means of the whole group will be too).

All these elements are combined into an F value, which can then be analysed to give a probability (p-value) of whether or not differences between your groups are statistically significant.

A one-way ANOVA compares the effects of an independent variable (a factor that influences other things) on multiple dependent variables. Two-way ANOVA does the same thing, but with more than one independent variable, while a factorial ANOVA extends the number of independent variables even further.

### How can ANOVA help?

The one-way ANOVA can help you know whether or not there are significant differences between the means of your independent variables.

Why is that useful?

Because when you understand how each independent variable’s mean is different from the others, you can begin to understand which of them has a connection to your dependent variable (such as landing page clicks) and begin to learn what is driving that behaviour.

You could also flip things around and see whether or not a single independent variable (such as temperature) affects multiple dependent variables (such as purchase rates of suncream, attendance at outdoor venues, and likelihood to hold a cook-out) and if so, which ones.

### Examples of using ANOVA

<b>Do age, sex, or income have an effect on how much someone spends in your store per month?</b>

To answer this question, a factorial ANOVA can be used, since you have three independent variables and one dependent variable. You’ll need to collect data for different age groups (such as 0-20, 21-40, 41-70, 71+), different income brackets, and all relevant sexes. A two-way ANOVA can then simultaneously assess the effect on these variables on your dependent variable (spending) and determine whether they make a difference.

### Understanding ANOVA assumptions

Like other types of statistical tests, ANOVA compares the means of different groups and shows you if there are any statistical differences between the means. ANOVA is classified as an omnibus test statistic. This means that it can’t tell you which specific groups were statistically significantly different from each other, only that at least two of the groups were.

It’s important to remember that the main ANOVA research question is whether the sample means are from different populations. There are two assumptions upon which ANOVA rests:

1) Whatever the technique of data collection, the observations within each sampled population are normally distributed.

2) The sampled population has a common variance of s2.

### Limitations and other considerations

While ANOVA will help you to analyse the difference in means between two independent variables, it won’t tell you which statistical groups were different from each other. If your test returns a significant F-statistic (the value you get when you run an ANOVA test), you may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which groups had a difference in means. [2](#section)

## References

[1. "Scipy" Wikipedia](https://en.wikipedia.org/wiki/SciPy "Press to check the reference source")

[2. "What is ANOVA (Analysis Of Variance) and what can I use it for?"](https://www.qualtrics.com/uk/experience-management/research/anova/ "Press to check the reference source")