## Application: 2000/2010 Political Campaign Contributions by Race

Using ethnicolr, we look to answer three basic questions:
<ol>
<li>What proportion of contributions were made by blacks, whites, Hispanics, and Asians? 
<li>What proportion of unique contributors were blacks, whites, Hispanics, and Asians?
<li>What proportion of total donations were given by blacks, whites, Hispanics, and Asians?
</ol>

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('/opt/names/fec_contrib/contribDB_2000.csv', nrows=100)
df.columns

Index([u'cycle', u'transaction_id', u'transaction_type', u'amount', u'date',
       u'bonica_cid', u'contributor_name', u'contributor_lname',
       u'contributor_fname', u'contributor_mname', u'contributor_suffix',
       u'contributor_title', u'contributor_ffname', u'contributor_type',
       u'contributor_gender', u'contributor_address', u'contributor_city',
       u'contributor_state', u'contributor_zipcode', u'contributor_occupation',
       u'contributor_employer', u'contributor_category',
       u'contributor_category_order', u'is_corp', u'organization_name',
       u'parent_organization_name', u'recipient_name', u'bonica_rid',
       u'recipient_party', u'recipient_type', u'recipient_state',
       u'recipient_category', u'recipient_category_order',
       u'recipient_district', u'seat', u'election_type',
       u'contributor_cfscore', u'candidate_cfscore', u'latitude', u'longitude',
       u'gis_confidence', u'contributor_district_90s',
       u'contributor_district_00s', u'co

In [3]:
from ethnicolr import pred_fl_reg_name, pred_fl_reg_ln

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


**Load and Subset on Individual Contributors**

In [4]:
df = pd.read_csv('/opt/names/fec_contrib/contribDB_2000.csv', usecols=['amount', 'contributor_type', 'contributor_lname', 'contributor_fname', 'contributor_name'])
sdf = df[df.contributor_type=='I'].copy()
sdf.fillna('', inplace=True)
rdf2000 = pred_fl_reg_name(sdf, 'contributor_lname', 'contributor_fname')
rdf2000['year'] = 2000

df = pd.read_csv('/opt/names/fec_contrib/contribDB_2010.csv.zip', usecols=['amount', 'contributor_type', 'contributor_lname', 'contributor_fname', 'contributor_name'])
sdf = df[df.contributor_type=='I'].copy()
sdf.fillna('', inplace=True)
rdf2010 = pred_fl_reg_name(sdf, 'contributor_lname', 'contributor_fname')
rdf2010['year'] = 2010

rdf = pd.concat([rdf2000, rdf2010])
rdf.head(20)

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,amount,contributor_name,contributor_lname,contributor_fname,contributor_type,race,asian,hispanic,nh_black,nh_white,year
102912,180.0,"JOHNSON, KENYIE PAUL",johnson,kenyie,I,nh_black,0.000481,0.002378,0.894507,0.102633,2000
103990,743.0,"KIRSCH, STEVEN T",kirsch,steven,I,nh_white,0.001272,0.005285,0.001623,0.99182,2000
105298,180.0,"MCCOY, TIMOTHY D",mccoy,timothy,I,nh_white,0.001921,0.005301,0.147964,0.844815,2000
105344,188.0,"WILLIAMS, VICTOR K",williams,victor,I,nh_white,0.004435,0.026017,0.449958,0.51959,2000
105577,211.0,"ELDER, CHESTER H",elder,chester,I,nh_white,0.003519,0.011062,0.074804,0.910614,2000
105659,13000.0,"MACARTHUR, GREG",macarthur,greg,I,nh_white,0.006452,0.017204,0.05499,0.921354,2000
105665,13972.0,"ABELE, CHRIS",abele,chris,I,nh_white,0.004977,0.019195,0.092955,0.882874,2000
105829,15000.0,"PRICE, SOL",price,sol,I,nh_white,0.034847,0.024663,0.114083,0.826407,2000
106029,13600.0,"KIRSCH, STEVEN T",kirsch,steven,I,nh_white,0.001272,0.005285,0.001623,0.99182,2000
106150,22146.0,"KIRSCH, STEVEN T",kirsch,steven,I,nh_white,0.001272,0.005285,0.001623,0.99182,2000


###  What proportion of contributons were by blacks, whites, Hispanics, and Asians?

In [5]:
adf = rdf.groupby(['year', 'race']).agg({'contributor_lname': 'count'})
adf.unstack().apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,contributor_lname,contributor_lname,contributor_lname,contributor_lname
race,asian,hispanic,nh_black,nh_white
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2000,1.25%,1.89%,3.00%,93.86%
2010,1.61%,2.77%,2.93%,92.68%


### What proportion of the donors were blacks, whites, Hispanics, and Asians?

In [14]:
udf = rdf.drop_duplicates(subset=['contributor_name']).copy()
gdf = udf.groupby(['year', 'race']).agg({'contributor_name': 'count'})
gdf.unstack().apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,contributor_name,contributor_name,contributor_name,contributor_name
race,asian,hispanic,nh_black,nh_white
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2000,1.59%,2.31%,3.46%,92.63%
2010,2.40%,3.44%,3.93%,90.23%


### What proportion of the total donation was given by blacks, whites, Hispanics, and Asians?

In [7]:
bdf = rdf.groupby(['year', 'race']).agg({'amount': 'sum'})
bdf.unstack().apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,amount,amount,amount,amount
race,asian,hispanic,nh_black,nh_white
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2000,1.38%,1.71%,2.54%,94.37%
2010,1.89%,2.00%,2.14%,93.97%


### What if we estimated by using probabilities for race rather than labels?

#### What proportion of contributons were by blacks, whites, Hispanics, and Asians?

In [12]:
rdf['white_count'] = rdf.nh_white
rdf['black_count'] = rdf.nh_black
rdf['asian_count'] = rdf.asian
rdf['hispanic_count'] = rdf.hispanic
gdf = rdf.groupby(['year']).agg({'white_count': 'sum', 'black_count': 'sum', 'asian_count': 'sum', 'hispanic_count': 'sum'})
gdf.apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,black_count,hispanic_count,white_count,asian_count
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000,9.40%,3.38%,85.34%,1.89%
2010,8.99%,4.11%,84.74%,2.16%


#### What proportion of the donors were blacks, whites, Hispanics, and Asians?

In [15]:
udf['white_count'] = udf.nh_white
udf['black_count'] = udf.nh_black
udf['asian_count'] = udf.asian
udf['hispanic_count'] = udf.hispanic
gdf = udf.groupby(['year']).agg({'white_count': 'sum', 'black_count': 'sum', 'asian_count': 'sum', 'hispanic_count': 'sum'})
gdf.apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,black_count,hispanic_count,white_count,asian_count
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000,9.58%,3.90%,84.23%,2.29%
2010,9.45%,5.04%,82.45%,3.05%


#### What proportion of the total donation was given by blacks, whites, Hispanics, and Asians?

In [8]:
rdf['white_amount'] = rdf.amount * rdf.nh_white
rdf['black_amount'] = rdf.amount * rdf.nh_black
rdf['api_amount'] = rdf.amount * rdf.asian
rdf['hispanic_amount'] = rdf.amount * rdf.hispanic
gdf = rdf.groupby(['year']).agg({'white_amount': 'sum', 'black_amount': 'sum', 'api_amount': 'sum', 'hispanic_amount': 'sum'}) / 10e6
gdf.style.format("{:0.2f}")

Unnamed: 0_level_0,api_amount,white_amount,black_amount,hispanic_amount
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000,3.6,154.43,16.07,5.82
2010,10.32,390.74,35.77,14.96


In [9]:
gdf.apply(lambda r: r / r.sum(), axis=1).style.format("{:.2%}")

Unnamed: 0_level_0,api_amount,white_amount,black_amount,hispanic_amount
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2000,2.00%,85.84%,8.93%,3.23%
2010,2.28%,86.49%,7.92%,3.31%
