## Scipy.Stats Package
- The `scipy.stats` package contains the all the statistical functions.
- A fiarly complete listing of these functions can be obtained using `info(stats)` function
```python
import scipy
help(scipy.stats)
```

### Normal Continuous Random Varaible
- A probability distribution in which the random variable $X$ can take any value is continous random variable.
- The `norm` object of `scipy.stats` inherits a collection of generic methods from the continous random variable class.
- The `norm.cdf` function generated a continous random number.  This function takes extra arguments
    - `loc` specifies the mean
    - `scale` specifies the standard deviation

#### scipy.stats.norm.cdf function (To compute the CDF at a number of points)

In [23]:
help(scipy.stats.norm.cdf)

Help on method cdf in module scipy.stats._distn_infrastructure:

cdf(x, *args, **kwds) method of scipy.stats._continuous_distns.norm_gen instance
    Cumulative distribution function of the given RV.
    
    Parameters
    ----------
    x : array_like
        quantiles
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        location parameter (default=0)
    scale : array_like, optional
        scale parameter (default=1)
    
    Returns
    -------
    cdf : ndarray
        Cumulative distribution function evaluated at `x`



In [6]:
from scipy.stats import norm
import numpy as np
data = [1,-1.,0,1,3,4,-2,6]
datacdf = norm.cdf(data,loc=1, scale=5)
print(datacdf)

[0.5        0.34457826 0.42074029 0.5        0.65542174 0.72574688
 0.27425312 0.84134475]


#### scipy.stats.norm.ppf function (To compute the median of a distribution)
- Here ppf stands for Percent Point Function (PPF), which is inverse of the CDF.

In [13]:
from scipy.stats import norm
import numpy as np
data = [0.5,0.2,0.01,0.1,0.3]
print(norm.ppf(data))

[ 0.         -0.84162123 -2.32634787 -1.28155157 -0.52440051]


#### scipy.stats.norm.rvs (To generate a sequence of random variables)
- The function tatkes the `size` argument.

In [17]:
from scipy.stats import norm, describe
r = norm.rvs(size=15)
describe(r)

DescribeResult(nobs=15, minmax=(-1.1712033749823623, 2.0520775523852195), mean=0.31610978068839957, variance=0.8095486610169318, skewness=0.047865217457338606, kurtosis=-0.819901110907912)

### Uniform Distribution
- A uniform distribution can be generated using the `scipy.stats.unifrom` function.
- Here also, the keyword `loc` specifies the mean and `scale` specifies the standard deviation

In [25]:
from scipy.stats import uniform
import numpy as np
data = np.arange(6)
ucdfdata = uniform.cdf(data, loc=1, scale=4)
print(ucdfdata)

[0.   0.   0.25 0.5  0.75 1.  ]


### Discrete Distrbution (Binomial Distribution)
- `binom` object inherits the collection of generic methods which are avialble in `rv_discrete` class necessary to work with discrte distribtuions

In [29]:
from scipy.stats import binom
data = np.arange(6)
bincdf = binom.cdf(data,n=2,p=1)
print(bincdf)

[0. 0. 1. 1. 1. 1.]


### Descriptive Statistics
|Function|Description|
|--------|:-----------|
|`describe()`|Computes several descriptive statistics of the passed array|
|`gmean()`|Computes geometric mean along the specified axis|
|`hmean()`|Computes harmonic mean along the specified axis|
|`kurtosis()`|Computes the kurtosis (tailedness of distribtuion)|
|`mode()`|Returns the modal value|
|`skew()`|Tests the skewness of the data|
|`f_oneway()`|Performs a 1-way ANOVA|
|`iqr()`|Computes the interquartile range of the data along the|
| | specified axis|
|`zscore()`|Calculates the $z-$score of each value in the sample,|
| | relative to the sample mean and standard distribution|

In [8]:
from scipy import stats
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
print(stats.describe(a))
print("gmean = ",stats.gmean(a))
print("hmean = ",stats.hmean(a))
print("zscore = ",stats.zscore(a))

DescribeResult(nobs=9, minmax=(1, 9), mean=5.0, variance=7.5, skewness=0.0, kurtosis=-1.2300000000000002)
gmean =  4.147166274396913
hmean =  3.181371861411138
zscore =  [-1.54919334 -1.161895   -0.77459667 -0.38729833  0.          0.38729833
  0.77459667  1.161895    1.54919334]


### T-test
- `stats.ttest_1samp` calculates the T-test for the mean of ONE group of scores.
- `stats.ttest_ind` calculates the T-test for the means of two independ samples of scores.

In [49]:
# Comparing mean of one sample
from scipy import stats
rvs = stats.norm.rvs(loc=5,scale = 10, size=(50,2))
print(stats.ttest_1samp(rvs,5.0))

Ttest_1sampResult(statistic=array([1.09940764, 0.38901711]), pvalue=array([0.27696267, 0.69894893]))


In [50]:
# Comparing means of two samples
from scipy import stats
rvs1 = stats.norm.rvs(loc=5,scale = 10, size=500)
rvs2 = stats.norm.rvs(loc=5,scale = 10, size=500)
print(stats.ttest_ind(rvs1,rvs2))

Ttest_indResult(statistic=0.8410148006091283, pvalue=0.40054109864195864)
