# Example: Location Estimates of Population and Murder Rates

Data set containing population and murder rates (in units of murders per 100,000 people per year) for each US state (2010
Census).

In [12]:
import pandas as pd
import numpy as np
from scipy.stats import trim_mean
# !pip install wquantiles
import wquantiles

Collecting wquantiles
  Downloading wquantiles-0.5.tar.gz (3.6 kB)
Building wheels for collected packages: wquantiles
  Building wheel for wquantiles (setup.py): started
  Building wheel for wquantiles (setup.py): finished with status 'done'
  Created wheel for wquantiles: filename=wquantiles-0.5-py3-none-any.whl size=2513 sha256=a5638bb87b9af05f427cd77e686a3e7cb260dc1fce1fb1895e22cd52ade2da8d
  Stored in directory: c:\users\vibha.sharma\appdata\local\pip\cache\wheels\c7\36\83\d57269027febb66f432e4f578ed118910b15d8bd49ae67ad1f
Successfully built wquantiles
Installing collected packages: wquantiles
Successfully installed wquantiles-0.5


In [2]:
# Load the data
state = pd.read_csv('data/state.csv')

In [5]:
state.head()

Unnamed: 0,State,Population,Murder.Rate,Abbreviation
0,Alabama,4779736,5.7,AL
1,Alaska,710231,5.6,AK
2,Arizona,6392017,4.7,AZ
3,Arkansas,2915918,5.6,AR
4,California,37253956,4.4,CA


In [8]:
state.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   State         50 non-null     object 
 1   Population    50 non-null     int64  
 2   Murder.Rate   50 non-null     float64
 3   Abbreviation  50 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 1.7+ KB


## <u> Population

#### (a) To compute the mean and median in Python we can use the pandas methods of the dataframe

In [3]:
# compute the mean
state['Population'].mean()

6162876.3

In [4]:
# compute the median
state['Population'].median()

4436369.5

#### (b) Trimmed mean requires the trim_mean function in scipy.stats

In [7]:
# compute trimmed mean
trim_mean(state['Population'],0.1)

4783697.125

**<u>Note<u>** :<br> 
    mean > trimmed mean > median<br>
    This is because the trimmed mean excludes the largest and smallest five states (trim=0.1 drop 10% from each end).

## <u>Murder rate

To compute the average murder rate for the country, we need to use a weighted mean or median to account for different populations in the sates.

#### (a) Weighted mean is available with numpy

In [10]:
# compute weighted mean
np.average(state['Murder.Rate'],weights=state['Population'])

4.445833981123393

#### (b) For weighted median we use the specialized package wquantiles

In [13]:
# compute the weighted median
wquantiles.median(state['Murder.Rate'],weights=state['Population'])

4.4

**<u>Note<u>** : <br>
    In this case weighted mean and weighted median are about the same.

---
---
<h1><center>Key Ideas</center></h1>

*  The basic metric for the location is the mean, but it can be sensitive to extreme values(outlier).
*  Other metrics(median, trimmed mean) are less sensitive to outliers and unusual distributions and hence are more robust. 
---
---