# Estimates of Location 

### Example: Location Estimates of Population and Murder Rates

In [2]:
from statsmodels import robust

In [3]:
import pandas as pd
state = pd.read_csv("Dataset/state.csv")
state.head(8)

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA
5,Colorado,5029196,2.8,CO
6,Connecticut,3574097,2.4,CT
7,Delaware,897934,5.8,DE


Compute the mean, trimmed mean, and median for Population. For `mean` and `median` we can use the _pandas_ methods of the data frame. The trimmed mean requires the `trim_mean` function in _scipy.stats_.

In [4]:
Mean_Population = state['Population'].mean()
print("Mean Population:", Mean_Population)

Mean Population: 6162876.3


In [5]:
state['Population'].shape

(50,)

In [6]:
from scipy.stats import trim_mean
Trimmed_Mean_Population = trim_mean(state['Population'], proportiontocut=0.1)
print("Trimmed Mean Population:", Trimmed_Mean_Population)

# proportiontocut = 0.1, it means 
# Cut 10% of the data from the bottom and 10% from the top of the sorted list.
# Count the data points (n). The state.csv dataset typically has 50 states (for the US). So, n = 50
# p = n * proportiontocut
# p = 50 * 0.1
# p = 5

Trimmed Mean Population: 4783697.125


In [7]:
Median_Population = state['Population'].median()
print("Median Population:", Median_Population)

Median Population: 4436369.5


In [8]:
print("Mean Population:", Mean_Population)
print("Trimmed Mean Population:", Trimmed_Mean_Population)
print("Median Population:", Median_Population)

Mean Population: 6162876.3
Trimmed Mean Population: 4783697.125
Median Population: 4436369.5


The mean is bigger than the trimmed mean, which is bigger than the median.
 This is because the trimmed mean excludes the largest and smallest five states
 (trim=0.1 drops 10% from each end). 

If we want to compute the average murder rate
 for the country, we need to use a weighted mean or median to account for different
 populations in the states.

In [9]:
import wquantiles
import numpy as np

# Weight Median 
MurderRate_Country_WeightMedian = wquantiles.median(state['Murder.Rate'], weights=state['Population'])

# Weight Mean 
MurderRate_Country_Mean = np.average(state['Murder.Rate'], weights=state['Population'])
# We use a weighted mean because states have different numbers of people, 
# so bigger states should have a bigger impact on the final average.


print(MurderRate_Country_WeightMedian)
print(MurderRate_Country_Mean)
#  In this case, the weighted mean and the weighted median are about the same.

4.4
4.445833981123393


Since they are about the same in this case, the main analysis is that the distribution of the US population across different state murder rates is relatively symmetrical and not heavily skewed.

# Estimates of Variability

### Example: Variability Estimates of State Population

In [10]:
state.head(8)

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA
5,Colorado,5029196,2.8,CO
6,Connecticut,3574097,2.4,CT
7,Delaware,897934,5.8,DE


In [11]:
print(f"Standard Deviation of Population: {state['Population'].std()}")

Standard Deviation of Population: 6848235.347401142


Interquartile range is calculated as the difference of the 75% and 25% quantile.

In [12]:
print(f"IQR : {state['Population'].quantile(0.75) - state['Population'].quantile(0.25)}")

IQR : 4847308.0


Median absolute deviation from the median can be calculated with a method in *statsmodels*

In [14]:
# Method 1 
print(f"Method 1 of Median Absolute Deviation (MAD) : {robust.scale.mad(state['Population'])}")

# Method 2 
print(f"Method 2 of Median Absolute Deviation (MAD) : {abs(state['Population']-state['Population'].median()).median() / 0.6744897501960817}")

Method 1 of Median Absolute Deviation (MAD) : 3849876.1459979336
Method 2 of Median Absolute Deviation (MAD) : 3849876.1459979336


# Exploring the Data Distribution 

## Percentiles and Boxplots 