<h1> <u> STATISTICS - CHAPTER-1 - DEMO-2 </u> </h1>
<h2> <u> Measure of Dispersion </u> </h2>

This notebook will illustrate the implementation of the following :
1. Range
2. Inter Quartile Range
3. Standard Deviation
4. Variance
5. Cofficient of Variance

<h6> Importing Libraries </h6>

In [1]:
import pandas as pd
import scipy.stats as st
import numpy as np

<h6> Reading the Dataframe (CSV file) </h6>

In [2]:
df = pd.read_csv("cars.csv")
print(df.dtypes)

model     object
mpg      float64
cyl        int64
disp     float64
hp         int64
drat     float64
wt       float64
qsec     float64
vs         int64
am         int64
gear       int64
carb       int64
dtype: object


<h3> 1. Range </h3>

$$ Range  = x_{max} - x_{min}$$

Inter Quartile Range can be calculated using the _Scipy_ Library
<br>
- Input: Multidimensional Data/Single Dimensional Data
- Output: Numpy N-dimensional Array/ Numpy Float or Integer Value

In [3]:
df1 = df.drop(["model"], axis=1)
r = np.ptp(df1.values, axis=0)
print(r)

# Printing Data-Types
print("\n")
print("Scipy Library Returns an object of type:", type(r))

[ 23.5     4.    400.9   283.      2.17    3.911   8.4     1.      1.
   2.      7.   ]


Scipy Library Returns an object of type: <class 'numpy.ndarray'>


<h3> 2. Inter Quartile Range </h3>

$$ IQR = Q_3 - Q_1 $$

where:

$ Q_3 = x_{\frac{3}{4}(n+1)} $

$ Q_1 = x_{\frac{1}{4}(n+1)} $
<br>
<br>
if n is odd
<hr>
$ Q_3 = \frac{x_{\frac{3}{4}n} +  x_{\frac{3}{4}n+1}}{2}$

$ Q_1 = \frac{x_{\frac{1}{4}n} +  x_{\frac{1}{4}n+1}}{2}$
<br>
<br>
if n is even
<br>
<br>
<br>
<br>
Inter Quartile Range can be calculated using the _Scipy_ Library
<br>
- Input: Multidimensional Data/Single Dimensional Data
- Output: Numpy N-dimensional Array/ Numpy Float or Integer Value

In [4]:
iqr1 = st.iqr(df1[["mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"]], axis=0)
iqr2 = st.iqr(df1["mpg"], axis=0)

# Printing Values
print(iqr1)
print("\n")
print(iqr2)


# Printing Data Types
print("\n")
print("\n")
print("Scipy Library Returns an object of type:", type(iqr1))
print("\n")
print("Scipy Library Returns an object of type:", type(iqr2))

[  7.375     4.      205.175    83.5       0.84      1.02875   2.0075
   1.        1.        1.        2.     ]


7.375




Scipy Library Returns an object of type: <class 'numpy.ndarray'>


Scipy Library Returns an object of type: <class 'numpy.float64'>


<h3> 3. Standard Deviation </h3>

$$ \sigma = \sqrt{\frac{1}{N}\sum_{i=1}^{N}\left({x_i - \mu}\right)^2}$$
<br>
$$ S = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}\left({x_i - \bar{x}}\right)^2}$$

where:
<br>
$\sigma = \mbox{Population Standard Deviation}$
<br>
$S = \mbox{Sample Standard Deviation}$
<br>
$N = \mbox{Population Size}$
<br>
$n = \mbox{Sample Size}$
<br>
$\mu = \mbox{Populattion Mean}$
<br>
$\bar{x} = \mbox{Sample Mean}$


Standard deviation can be calculated using the _Numpy_ Library

- Input: Multidimensional Data/Single Dimensional data
- Output: Numpy N-dimensional Array/ Numpy Float or Integer Value
<br>
<br>

<div class="alert alert-block alert-info">
    
<b> Note </b>: In order to calculate the <b><i>Sample Standard Deviation</i></b>, we will use the <b> <i>ddof<i> </b> option of <b> <i>std</i></b> function. Setting <i> <b> ddof = 1 </b></i> will ensure that the furmulae for sample standard deviaition is used.
    
</div>

In [5]:
sd_population = np.std(df1[["mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"]], axis=0)
sd_sample = np.std(df1[["mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"]], axis=0, ddof=1)

# Printing Values
print(sd_population)
print("\n")
print(sd_sample)

# Printing Data Types
print("\n")
print("\n")
print("Scipy Library Returns an object of type:", type(sd_population))
print("\n")
print("Scipy Library Returns an object of type:", type(sd_sample))

mpg       5.932030
cyl       1.757795
disp    121.986781
hp       67.483071
drat      0.526258
wt        0.963048
qsec      1.758801
vs        0.496078
am        0.491132
gear      0.726184
carb      1.589762
dtype: float64


mpg       6.026948
cyl       1.785922
disp    123.938694
hp       68.562868
drat      0.534679
wt        0.978457
qsec      1.786943
vs        0.504016
am        0.498991
gear      0.737804
carb      1.615200
dtype: float64




Scipy Library Returns an object of type: <class 'pandas.core.series.Series'>


Scipy Library Returns an object of type: <class 'pandas.core.series.Series'>


<h3> 4. Variance </h3>

$$ S^2 = \frac{1}{n-1}\sum_{i=1}^{n}\left({x_i - \bar{x}}\right)^2$$
$$ \sigma^2 = \frac{1}{N}\sum_{i=1}^{N}\left({x_i - \mu}\right)^2$$

where:
<br>
$S^2 = \mbox{Sample Variance}$
<br>
$\sigma^2 = \mbox{Population Variance}$
<br>
$n = \mbox{Sample Size}$
<br>
$N = \mbox{Population Size}$
<br>
$\bar{x} = \mbox{Sample Mean}$
<br>
$\mu = \mbox{Populattion Mean}$



Variance can be calculated using the _Numpy_ Library

- Input: Multidimensional Data/Single Dimensional data
- Output: Numpy N-dimensional Array/ Numpy Float or Integer Value

<div class="alert alert-block alert-info">
    <b> Note </b>: In order to calculate the <b><i>Sample Variance</i></b>, we will use the <b> <i>ddof<i> </b> option of <b> <i>var</i></b>  function. Setting <i> <b> ddof = 1 </b></i> will ensure that the furmulae for <b><i>Sample Variance</i></b> is used.
    
</div>