# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Georgia;text-align:left;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b> DAY-57: SciPy Statistical Significance Tests</b></div>

### **What is Statistical Significance Test?**
* In statistics, statistical significance means that the result that was produced has a reason behind it, it was not produced randomly, or by chance.

* SciPy provides us with a module called scipy.stats, which has functions for performing statistical significance tests.

**Here are some techniques and keywords that are important when performing such tests:**

**Hypothesis in Statistics**
* Hypothesis is an assumption about a parameter in population.

**Null Hypothesis**
* It assumes that the observation is not statistically significant.

**Alternate Hypothesis**
* It assumes that the observations are due to some reason.

* It's alternate to Null Hypothesis.

**Example:**

**For an assessment of a student we would take:**

* "student is worse than average" - as a null hypothesis, and:

* "student is better than average" - as an alternate hypothesis.

**One tailed test**
* When our hypothesis is testing for one side of the value only, it is called "one tailed test".

* "the mean is equal to k", we can have alternate hypothesis:

* "the mean is less than k", or:

* "the mean is greater than k"

**Two tailed test**
* When our hypothesis is testing for both side of the values.

In [1]:
import numpy as np
from scipy.stats import ttest_ind

v1 = np.random.normal(size=100)
v2 = np.random.normal(size=100)

res = ttest_ind(v1, v2)

print(res)

TtestResult(statistic=0.3373896731753233, pvalue=0.7361805616749528, df=198.0)


In [2]:
...
res = ttest_ind(v1, v2).pvalue

print(res)

0.7361805616749528


**KS-Test**
* KS test is used to check if given values follow a distribution.

In [3]:
import numpy as np
from scipy.stats import kstest

v = np.random.normal(size=100)

res = kstest(v, 'norm')

print(res)

KstestResult(statistic=0.08043255473445021, pvalue=0.5113170451555196, statistic_location=0.6115053018546265, statistic_sign=1)


### **Statistical Description of Data**
* In order to see a summary of values in an array, we can use the describe() function.

**It returns the following description:**

1.number of observations (nobs)

2.minimum and maximum values = minmax

3.mean

4.variance

5.skewness

6.kurtosis

**Show statistical description of the values in an array:**

In [4]:
import numpy as np
from scipy.stats import describe

v = np.random.normal(size=100)
res = describe(v)

print(res)

DescribeResult(nobs=100, minmax=(-2.8041320994556895, 2.645938174103387), mean=0.0850426688123569, variance=1.1454754552488895, skewness=0.20337191922330827, kurtosis=-0.015942526664888046)


**Find skewness and kurtosis of values in an array:**

In [5]:
import numpy as np
from scipy.stats import skew, kurtosis

v = np.random.normal(size=100)

print(skew(v))
print(kurtosis(v))

0.2424943993725415
-0.044529862247124186
