#Estimates of location

###Variables with measured or count data might have thousands of distinct values. A basic step in exploring your data is getting a “typical value” for each feature (variable): an estimate of where most of the data is located (i.e., its central tendency, middle value, most occuring value, etc).


###There are different types of estimates we can calculate, and they are listed below:



*   Mean - The sum of all values divided by the number of values.
*   Weighted mean - The sum of all values times a weight divided by the sum of the weights.

*   Median - The value such that one-half of the data lies above and below.
*   Percentile - The value such that P percent of the data lies below.
*   Weighted median - The value such that one-half of the sum of the weights lies above and below the sorted data.
*   Trimmed mean - The average of all values after dropping a fixed number of extreme values.

* Robust - Not sensitive to extreme values.
* Outlier - A data value that is very different from most of the data.



In [1]:
!pip install wquantiles

Collecting wquantiles
  Downloading wquantiles-0.6-py3-none-any.whl (3.3 kB)
Installing collected packages: wquantiles
Successfully installed wquantiles-0.6


In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import math as m
from scipy.stats import trim_mean
from statsmodels import robust
import wquantiles
import matplotlib.pyplot as plt


  import pandas.util.testing as tm


In [3]:
#load the dataset
data = pd.read_csv("https://github.com/gedeck/practical-statistics-for-data-scientists/blob/master/data/state.csv?raw=true")
data.head()

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA


In [4]:
#calculate the mean of the population
print("The mean for Population column is:", data['Population'].mean())
#calculate the meadian of the population
print("The median for Population column is:", data['Population'].median())
#calculate the trimmed mean of the population where the population is trimmed to the top and bottom 10% of the population.
print("The trimmed mean for Population column is:", trim_mean(data['Population'], 0.1))
#calculate the trimmed median of the population where the population is trimmed to the top and bottom 10% of the population.
print("The trimmed median for Population column is:", trim_mean(data['Population'], 0.1))
#calculate the mean of the Murder.Rate
print("The mean for Murder.Rate column is:", data['Murder.Rate'].mean())

The mean for Population column is: 6162876.3
The median for Population column is: 4436369.5
The trimmed mean for Population column is: 4783697.125
The trimmed median for Population column is: 4783697.125
The mean for Murder.Rate column is: 4.066


In [5]:
#calculate the weighted mean
print(np.average(data['Murder.Rate'], weights=data['Population']))
#calculate the weighted median
print(wquantiles.median(data['Murder.Rate'], weights=data['Population']))

4.445833981123393
4.4
