##Estimates of variability

The variability is another feature of the data. It makes us understand whether our data is tightly clustered or dispersed out. In order to measure the variability in the data, there are various metrics that are listed below:



1.   **Deviation** - The difference between the observed values and the estimate of location. 
2.   **Variance** - The sum of squared deviations from the mean divided by n – 1 where n is the
number of data values.
3. **Standard deviation** - The square root of the variance.
4. **Mean absolute deviation** - The mean of the absolute values of the deviations from the mean.
5. **Median absolute deviation from the median** - The median of the absolute values of the deviations from the median.
6. **Range** - The difference between the largest and the smallest value in a data set
7. **Order statistics** - Metrics based on the data values sorted from smallest to biggest.
8. **Percentile** - The value such that P percent of the values take on this value or less and (100–P)
percent take on this value or more.
9. **Interquartile range** - The difference between the 75th percentile and the 25th percentile.



###Coding the variability estimates below:

In [5]:
import pandas as pd
import numpy as np
from statsmodels import robust

  import pandas.util.testing as tm


In [6]:
data = pd.read_csv("https://github.com/gedeck/practical-statistics-for-data-scientists/blob/master/data/state.csv?raw=true")
data.head()

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA


In [7]:
#calculate the standard deviation of the population column
print("The standard deviation for Population column is:", data['Population'].std())
#calculate the variance of the population column 
print("The variance for Population column is:", data['Population'].var())
#calculate the IQR for the population column
print("The IQR for Population column is:", data['Population'].quantile(0.75) - data['Population'].quantile(0.25))
#calculate the the MAD for the population column
print("The MAD for Population column is:", robust.scale.mad(data['Population']))

The standard deviation for Population column is: 6848235.347401142
The variance for Population column is: 46898327373394.445
The IQR for Population column is: 4847308.0
The MAD for Population column is: 3849876.1459979336


In [10]:
#Calculating the range of the population column
range = max(data.Population)-min(data.Population)
print(range)

36690330
