# 2012 Federal Election Commission Database

The US Federal Election Commission is in charge of publishing data on contributions  to political campaigns. The data is an extremely large file that contains information about a contributor's: name, occupation, employer, address, and contribution amount.

In [117]:
# Start with imports
import pandas as pd 
import numpy as np 
import os
from csv import reader 

In [119]:
# Load data
#fec = pd.read_csv('P00000001-ALL.csv')
fec = pd.read_parquet('fec.parquet')

In [120]:
print(os.path.getsize('P00000001-ALL.csv'))

134


Preview the data: 

In [122]:
fec.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1001731 entries, 0 to 1001730
Data columns (total 16 columns):
 #   Column             Non-Null Count    Dtype  
---  ------             --------------    -----  
 0   cmte_id            1001731 non-null  object 
 1   cand_id            1001731 non-null  object 
 2   cand_nm            1001731 non-null  object 
 3   contbr_nm          1001731 non-null  object 
 4   contbr_city        1001712 non-null  object 
 5   contbr_st          1001727 non-null  object 
 6   contbr_zip         1001620 non-null  object 
 7   contbr_employer    988002 non-null   object 
 8   contbr_occupation  993301 non-null   object 
 9   contb_receipt_amt  1001731 non-null  float64
 10  contb_receipt_dt   1001731 non-null  object 
 11  receipt_desc       14166 non-null    object 
 12  memo_cd            92482 non-null    object 
 13  memo_text          97770 non-null    object 
 14  form_tp            1001731 non-null  object 
 15  file_num           1001731 non-n

Sample the DataFrame: 

In [127]:
fec.iloc[123456]

cmte_id                             C00431445
cand_id                             P80003338
cand_nm                         Obama, Barack
contbr_nm                         ELLMAN, IRA
contbr_city                             TEMPE
contbr_st                                  AZ
contbr_zip                          852816719
contbr_employer      ARIZONA STATE UNIVERSITY
contbr_occupation                   PROFESSOR
contb_receipt_amt                        50.0
contb_receipt_dt                    01-DEC-11
receipt_desc                             None
memo_cd                                  None
memo_text                                None
form_tp                                 SA17A
file_num                               772372
Name: 123456, dtype: object

Get a breakdown of the political parties and a column: 

In [132]:
unique_cands = fec.cand_nm.unique()

In [135]:
unique_cands

array(['Bachmann, Michelle', 'Romney, Mitt', 'Obama, Barack',
       "Roemer, Charles E. 'Buddy' III", 'Pawlenty, Timothy',
       'Johnson, Gary Earl', 'Paul, Ron', 'Santorum, Rick',
       'Cain, Herman', 'Gingrich, Newt', 'McCotter, Thaddeus G',
       'Huntsman, Jon', 'Perry, Rick'], dtype=object)

Make a dict of party affiliations using the candidates names as key value pairs.

In [139]:
parties = {
    'Bachmann, Michelle': 'Republican',
    'Cain, Herman': 'Republican',
    'Gingrich, Newt': 'Republican',
    'Huntsman, Jon': 'Republican',
    'Johnson, Gary Earl': 'Republican',
    'McCotter, Thaddeus G': 'Republican',
    'Obama, Barack': 'Democrat',
    'Paul, Ron': 'Republican',
    'Pawlenty, Timothy': 'Republican',
    'Perry, Rick': 'Republican',
    "Roemer, Charles E. 'Buddy' III": 'Republican',
    'Romney, Mitt': 'Republican',
    'Santorum, Rick': 'Republican'
}

Using the map method on a Series of objects computes an array of political parties from the candidates names: 

In [143]:
fec.cand_nm[123456:123461]

123456    Obama, Barack
123457    Obama, Barack
123458    Obama, Barack
123459    Obama, Barack
123460    Obama, Barack
Name: cand_nm, dtype: object

Add it as a column:  

In [147]:
fec['party'] = fec.cand_nm.map(parties)

In [150]:
fec['party'].value_counts()

party
Democrat      593746
Republican    407985
Name: count, dtype: int64

The data until this point includes refunds remove this information out of the data. Compare the amount of positive contribution amount: 

In [154]:
(fec.contb_receipt_amt > 0).value_counts()

contb_receipt_amt
True     991475
False     10256
Name: count, dtype: int64

Restrict analysis to positive contributions:

In [173]:
fec = fec[fec.contb_receipt_amt > 0]

Restrict analysis to only the top 2 candidates:

In [178]:
fec_mrbo = fec[fec.cand_nm.isin(['Obama, Barack', 'Romney, Mitt'])]

Gather statistics by occupation and employer. Start with first viewing the number of donations by all occupations:

In [182]:
fec.contbr_occupation.value_counts()[:30]

contbr_occupation
RETIRED                                   233990
INFORMATION REQUESTED                      35107
ATTORNEY                                   34286
HOMEMAKER                                  29931
PHYSICIAN                                  23432
INFORMATION REQUESTED PER BEST EFFORTS     21138
ENGINEER                                   14334
TEACHER                                    13990
CONSULTANT                                 13273
PROFESSOR                                  12555
NOT EMPLOYED                                9828
SALES                                       8333
LAWYER                                      8283
MANAGER                                     8024
PRESIDENT                                   7758
STUDENT                                     7071
OWNER                                       6343
EXECUTIVE                                   5506
SELF-EMPLOYED                               5472
WRITER                                      5128
SO

There are several jobs that are the same occupation, so clean up the data by mapping similar occupations together by dict.get(). Do this for both the job and employers:

In [None]:
occ_mapping = {
    'INFORMATION REQUESTED PER BEST EFFORTS' : 'NOT PROVIDED',
    'INFORMATION REQUESTED' : 'NOT PROVIDED',
    'INFORMATION REQUESTED (BEST EFFORTS)' : 'NOT PROVIDED',
    'C.E.O.': 'CEO'
}

In [None]:
# If no mapping provided return x 

In [None]:
f = lambda x: occ_mapping.get(x, x)
fec.contbr_occupation = fec.contbr_occupation.map(f)