<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Exploratory-Data-Analysis" data-toc-modified-id="Exploratory-Data-Analysis-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Exploratory Data Analysis</a></span><ul class="toc-item"><li><span><a href="#Questions-to-Answer:" data-toc-modified-id="Questions-to-Answer:-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Questions to Answer:</a></span></li><li><span><a href="#Summary-of-Results" data-toc-modified-id="Summary-of-Results-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Summary of Results</a></span></li><li><span><a href="#Data-Import-and-Cleaning" data-toc-modified-id="Data-Import-and-Cleaning-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Data Import and Cleaning</a></span></li><li><span><a href="#Campaign-Donations" data-toc-modified-id="Campaign-Donations-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Campaign Donations</a></span></li><li><span><a href="#Campaign-Spending" data-toc-modified-id="Campaign-Spending-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Campaign Spending</a></span></li><li><span><a href="#Conclusions" data-toc-modified-id="Conclusions-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Conclusions</a></span></li></ul></li></ul></div>

# Exploratory Data Analysis
Looking at a few different csv files that have either been scraped from election related websites or are provided as is from government web sites.

## Questions to Answer:
1. Which candidates for ANC office received reported campaign donations?
2. Which candidates for ANC office had reported campaign expenditures?

## Summary of Results

Number of ANC candidates who received reported campaign donations: <b>11 out of 1068 possible</b> (and many of the 11 are probably related to other elections)

Number of ANC candidates who reported campaign expenditures: <b>13 out of 1068 possible</b> (and many of these expenditures are probably related to other elections)


## Data Import and Cleaning

In [37]:
import pandas as pd
import numpy as np
import qgrid

In [38]:
# Vote counts from 2012 to 2018, with each election as a single line
df_history = pd.read_csv('../cleaned_data/election_history_r.csv')

# Vote counts from 2012 to 2018 showing each candidates results
df_history2 = pd.read_csv('../cleaned_data/anc_electoral_history_2012_2018.csv')

# List of all individual campaign donations, from 2002 - 2016
df_donations = pd.read_csv('../raw_data/campaignfinancialcontributions.csv')

# List of all reported campaign expenditures
df_spending = pd.read_csv('../raw_data/campaignfinancialexpenditures.csv')

# Current anc members
df_ancs = pd.read_csv('../raw_data/dc_ancs.csv')

# election_history_R merged with current_anc_membership
df_commisioners = pd.read_csv('../cleaned_data/2018_elections_commissioners.csv')

In [39]:
def strip_and_upper_strings(df):
    for col in df.columns:
        if df[col].dtype == 'O':
            df[col] = df[col].str.upper().str.strip()
    return(df)

In [40]:
df_history = strip_and_upper_strings(df_history)
df_history2 = strip_and_upper_strings(df_history2)
df_donations = strip_and_upper_strings(df_donations)
df_ancs = strip_and_upper_strings(df_ancs)
df_spending = strip_and_upper_strings(df_spending)

df_donations.columns = [col.lower() for col in df_donations.columns]
df_spending.columns = [col.lower() for col in df_spending.columns]

In [41]:
df_history2.head()

Unnamed: 0.1,Unnamed: 0,ELECTION_DATE,SMD,CANDIDATE,VOTES,WARD
0,0,2012-11-06,1A01,LISA KRALOVIC,374,1
1,1,2012-11-06,1A01,WRITE-IN,24,1
2,2,2012-11-06,1A02,ALEXANDER GALLO,295,1
3,3,2012-11-06,1A02,VICKEY A. WRIGHT-SMITH,432,1
4,4,2012-11-06,1A02,WRITE-IN,11,1


In [42]:
df_donations.head()

Unnamed: 0,objectid,committeename,candidatename,electionyear,contributorname,address,contributortype,contributiontype,employer,employeraddress,amount,dateofreceipt,fulladdress,gis_last_mod_dttm
0,1001,COMMITTEE TO ELECT DARRYL L.C. MOCH,DARRYL L.C. MOCH,2010,THELMA CURRY,"624 W 124TH ST, LOS ANGELES, CA 90044",INDIVIDUAL,CHECK,NONE,"WASHINGTON, DC",200.0,2010/08/11 00:00:00+00,,2016/12/05 06:15:31+00
1,1002,COMMITTEE TO ELECT DARRYL L.C. MOCH,DARRYL L.C. MOCH,2010,WANDA JACKSON,"13610 VALLEY DR., ROCKVILLE, MD 20850",INDIVIDUAL,CHECK,REQUESTED,"WASHINGTON, DC",100.0,2010/08/11 00:00:00+00,,2016/12/05 06:15:31+00
2,1003,COMMITTEE TO ELECT DARRYL L.C. MOCH,DARRYL L.C. MOCH,2010,LOUIS WOLF,"4107 ELLICOTT ST. NW, WASHINGTON, DC 20016",INDIVIDUAL,CHECK,NONE,"WASHINGTON, DC",100.0,2010/08/19 00:00:00+00,4107 ELLICOTT STREET NW,2016/12/05 06:15:31+00
3,1004,COMMITTEE TO ELECT DARRYL L.C. MOCH,DARRYL L.C. MOCH,2010,QUENTINE WHITE,"PO BOX 76688, WASHINGTON, DC 20013",INDIVIDUAL,CHECK,,,20.0,2010/08/21 00:00:00+00,,2016/12/05 06:15:31+00
4,1005,COMMITTEE TO ELECT DARRYL L.C. MOCH,DARRYL L.C. MOCH,2010,LESLIE RUFFIN,"618 QUACKENBOS ST. NW, WASHINGTON, DC 20011",INDIVIDUAL,CHECK,SELF,"WASHINGTON, DC",400.0,2010/08/23 00:00:00+00,618 QUACKENBOS STREET NW,2016/12/05 06:15:31+00


## Campaign Donations

In [43]:
df_donation_totals = df_donations[['candidatename','electionyear','amount']]
df_donation_totals = df_donation_totals[['candidatename','electionyear','amount']].groupby(['candidatename',
                                                                                            'electionyear']).sum()
df_donation_totals.columns = ['total_donation_amount']

In [44]:
df_donor_counts = df_donations[['candidatename','electionyear','contributorname']]
df_donor_counts = df_donor_counts.groupby(['candidatename','electionyear']).nunique()
df_donor_counts = df_donor_counts[['contributorname']]
df_donor_counts.columns = ['num_donors']

In [45]:
df_combined = df_donation_totals.join(df_donor_counts,on=['candidatename','electionyear'],how='left')
df_combined['avg_donation'] = df_combined['total_donation_amount'] / df_combined['num_donors']
df_combined.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_donation_amount,num_donors,avg_donation
candidatename,electionyear,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A. SCOTT BOLDEN,2006,33750.0,88,383.522727
A.J COOPER,2012,9377.09,50,187.5418
ACQUNETTA ANDERSON,2014,1575.65,3,525.216667
ADAM CLAMPITT,2008,39796.45,164,242.66128
ADAM EIDINGER,2002,275.0,9,30.555556
ADRIAN FENTY,2002,80383.16,138,582.486667
ADRIAN FENTY,2004,33625.0,212,158.608491
ADRIAN FENTY,2006,845459.02,214,3950.743084
ADRIAN FENTY,2010,3719113.54,4185,888.67707
ALAN PAGE,2011,1898.18,17,111.657647


In [46]:
df_history2['electionyear'] = df_history2['ELECTION_DATE'].str[:4].astype(int)
df_history2.columns = [col.lower() for col in df_history2.columns]
df_history2 = df_history2.drop('election_date',axis=1)
df_history2 = df_history2.rename(columns={'candidate':'candidatename'})
df_history2 = df_history2[~df_history2['candidatename'].isin(['WRITE-IN','OVER VOTES','UNDER VOTES'])]
df_history2 = df_history2.set_index(['candidatename','electionyear'])

In [47]:
df_combined2 = df_combined.join(df_history2)
df_combined2.dropna(subset=['smd'])

Unnamed: 0_level_0,Unnamed: 1_level_0,total_donation_amount,num_donors,avg_donation,unnamed: 0,smd,votes,ward
candidatename,electionyear,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ACQUNETTA ANDERSON,2014,1575.65,3,525.216667,1185.0,4A01,838.0,4.0
E. GAIL ANDERSON HOLNESS,2012,965.0,15,64.333333,51.0,1B11,404.0,1.0
FRANK WILDS,2012,24445.0,85,287.588235,334.0,5A01,627.0,5.0
JACQUELINE MANNING,2014,80.0,1,80.0,1399.0,5C04,688.0,5.0
JUDI JONES,2012,1425.0,9,158.333333,296.0,4B07,651.0,4.0
KATHY HENDERSON,2012,551.8,4,137.95,399.0,5D05,393.0,5.0
KATHY HENDERSON,2014,4937.93,44,112.225682,1437.0,5D05,330.0,5.0
NATALIE WILLIAMS,2012,4758.0,14,339.857143,628.0,8A07,806.0,8.0


As can be seen in the above table, out of 1600 candidate + election year combinations, there are only 8 matches. However, upon looking back at the data, it seems that the election year listed in the donations dataset and the election year listed in the historical vote count dataset often don't line up. So I'll check any name matches between the two datasets to see what that looks like.

In [48]:
df_history2 = df_history2.reset_index(1)
df_combined = df_combined.reset_index(1)

In [49]:
df_combined3 = df_combined.join(df_history2,lsuffix='_l',rsuffix='_r')
df_combined3 = df_combined3.dropna(subset=['smd'])

In [50]:
df_combined3

Unnamed: 0_level_0,electionyear_l,total_donation_amount,num_donors,avg_donation,electionyear_r,unnamed: 0,smd,votes,ward
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ACQUNETTA ANDERSON,2014,1575.65,3,525.216667,2014.0,1185.0,4A01,838.0,4.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2012.0,672.0,8E02,686.0,8.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2014.0,1955.0,8E02,344.0,8.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2018.0,3242.0,8E02,155.0,8.0
BILL QUIRK,2011,270.00,2,135.000000,2012.0,332.0,4D06,698.0,4.0
BILL QUIRK,2011,270.00,2,135.000000,2014.0,1327.0,4D06,968.0,4.0
BRIAN HART,2014,31563.73,156,202.331603,2012.0,58.0,1C01,420.0,1.0
DARLENE GLYMPH,2010,364.99,5,72.998000,2012.0,398.0,5D05,233.0,5.0
DAVID GARBER,2016,4023.67,20,201.183500,2012.0,511.0,6D07,808.0,6.0
DOROTHY DOUGLAS,2007,1212.46,10,121.246000,2014.0,1745.0,7D03,170.0,7.0


In [51]:
print('Number of unique entries with any match to the ANC dataset: ',df_combined3.index.nunique())

Number of unique entries with any match to the ANC dataset:  27


As can be seen in the above chart and the following cell, there are only 27 unique candidate names in the joined dataframes. 

In [52]:
df_history2['electionyear'].value_counts()

2014    413
2018    398
2012    385
Name: electionyear, dtype: int64

In [53]:
df_donations['electionyear'].value_counts().sort_index()

2002     6247
2004     3123
2006     6928
2007     2604
2008     4774
2010    10556
2011      601
2012     5463
2013     1517
2014    13510
2015     6593
2016     3583
Name: electionyear, dtype: int64

In [54]:
# Function to drop rows where the election year differs by more than 1
df_combined3['year_diff'] = df_combined3['electionyear_l'] - df_combined3['electionyear_r']

In [55]:
df_combined3.head()

Unnamed: 0_level_0,electionyear_l,total_donation_amount,num_donors,avg_donation,electionyear_r,unnamed: 0,smd,votes,ward,year_diff
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ACQUNETTA ANDERSON,2014,1575.65,3,525.216667,2014.0,1185.0,4A01,838.0,4.0,0.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2012.0,672.0,8E02,686.0,8.0,3.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2014.0,1955.0,8E02,344.0,8.0,1.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2018.0,3242.0,8E02,155.0,8.0,-3.0
BILL QUIRK,2011,270.0,2,135.0,2012.0,332.0,4D06,698.0,4.0,-1.0


In [56]:
df_combined3 = df_combined3[df_combined3['year_diff'].between(-1,1,inclusive=True)]

In [57]:
df_combined3

Unnamed: 0_level_0,electionyear_l,total_donation_amount,num_donors,avg_donation,electionyear_r,unnamed: 0,smd,votes,ward,year_diff
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ACQUNETTA ANDERSON,2014,1575.65,3,525.216667,2014.0,1185.0,4A01,838.0,4.0,0.0
ANTHONY MUHAMMAD,2015,1301.86,12,108.488333,2014.0,1955.0,8E02,344.0,8.0,1.0
BILL QUIRK,2011,270.0,2,135.0,2012.0,332.0,4D06,698.0,4.0,-1.0
DOUGLASS SLOAN,2015,1490.0,11,135.454545,2014.0,1253.0,4B09,738.0,4.0,1.0
E. GAIL ANDERSON HOLNESS,2012,965.0,15,64.333333,2012.0,51.0,1B11,404.0,1.0,0.0
FRANK WILDS,2012,24445.0,85,287.588235,2012.0,334.0,5A01,627.0,5.0,0.0
JACQUELINE MANNING,2014,80.0,1,80.0,2014.0,1399.0,5C04,688.0,5.0,0.0
JUDI JONES,2012,1425.0,9,158.333333,2012.0,296.0,4B07,651.0,4.0,0.0
JUDI JONES,2015,1050.0,8,131.25,2014.0,1245.0,4B07,1480.0,4.0,1.0
KATHY HENDERSON,2012,551.8,4,137.95,2012.0,399.0,5D05,393.0,5.0,0.0


In [58]:
print('Number of unique entries with any match to the ANC dataset: ',df_combined3.index.nunique())

Number of unique entries with any match to the ANC dataset:  11


As can be seen from the above dataframe and the following cell, there are only 11 unique candidate names to receive reported campaign donations, when you filter for election years that are at most 1 year apart. There are a few notable outliers in the above data who received a large number and total amount of campaign donations, however they should be checked individually to see if that candidate was in fact running for a different office that year. 

## Campaign Spending

In [59]:
df_spending['transactiondate'] = pd.to_datetime(df_spending['transactiondate'])
df_spending['transaction_year'] = df_spending['transactiondate'].dt.year

In [60]:
df_spending_totals = (df_spending[['candidatename','transaction_year','amount']]
                      .groupby(['candidatename','transaction_year'])
                      .sum())
df_spending_totals.columns = ['total_spending_amount']
df_spending_totals = df_spending_totals.reset_index(1)

In [61]:
df_spending_totals.sort_values('total_spending_amount',ascending=False).head()

Unnamed: 0_level_0,transaction_year,total_spending_amount
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1
ADRIAN FENTY,2010,4507969.29
MURIEL BOWSER,2014,3425439.97
ADRIAN FENTY,2006,3038600.06
VINCENT GRAY,2010,2789645.49
LINDA CROPP,2006,2419780.17


In [62]:
df_combined_spending = df_spending_totals.join(df_history2)
df_combined_spending = df_combined_spending.dropna(subset=['smd'])

In [63]:
df_combined_spending.info()

<class 'pandas.core.frame.DataFrame'>
Index: 107 entries, ACQUNETTA ANDERSON to WILLIAM BOSTON
Data columns (total 7 columns):
transaction_year         107 non-null int64
total_spending_amount    107 non-null float64
electionyear             107 non-null float64
unnamed: 0               107 non-null float64
smd                      107 non-null object
votes                    107 non-null float64
ward                     107 non-null float64
dtypes: float64(5), int64(1), object(1)
memory usage: 6.7+ KB


In [64]:
df_combined_spending['year_diff'] = df_combined_spending['transaction_year'] - df_combined_spending['electionyear']
df_combined_spending = df_combined_spending[df_combined_spending['year_diff'].between(-1,0,inclusive=True)]

In [65]:
df_combined_spending.head(10)

Unnamed: 0_level_0,transaction_year,total_spending_amount,electionyear,unnamed: 0,smd,votes,ward,year_diff
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ACQUNETTA ANDERSON,2013,173.57,2014.0,1185.0,4A01,838.0,4.0,-1.0
ACQUNETTA ANDERSON,2014,5220.87,2014.0,1185.0,4A01,838.0,4.0,0.0
ANTHONY MUHAMMAD,2011,5383.05,2012.0,672.0,8E02,686.0,8.0,-1.0
ANTHONY MUHAMMAD,2014,0.0,2014.0,1955.0,8E02,344.0,8.0,0.0
BILL QUIRK,2011,4481.22,2012.0,332.0,4D06,698.0,4.0,-1.0
BILL QUIRK,2013,187.93,2014.0,1327.0,4D06,968.0,4.0,-1.0
DOTTI LOVE WADE,2011,1060.0,2012.0,25.0,1A11,429.0,1.0,-1.0
DOUGLASS SLOAN,2014,543.88,2014.0,1253.0,4B09,738.0,4.0,0.0
E. GAIL ANDERSON HOLNESS,2011,108.05,2012.0,51.0,1B11,404.0,1.0,-1.0
E. GAIL ANDERSON HOLNESS,2012,6496.16,2012.0,51.0,1B11,404.0,1.0,0.0


In [66]:
df_combined_spending.sort_values('total_spending_amount',ascending=False).head()

Unnamed: 0_level_0,transaction_year,total_spending_amount,electionyear,unnamed: 0,smd,votes,ward,year_diff
candidatename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FRANK WILDS,2012,49443.13,2012.0,334.0,5A01,627.0,5.0,0.0
SHELLY GARDNER,2012,12153.58,2012.0,388.0,5C07,165.0,5.0,0.0
NATALIE WILLIAMS,2012,8557.07,2012.0,628.0,8A07,806.0,8.0,0.0
E. GAIL ANDERSON HOLNESS,2012,6496.16,2012.0,51.0,1B11,404.0,1.0,0.0
ANTHONY MUHAMMAD,2011,5383.05,2012.0,672.0,8E02,686.0,8.0,-1.0


In [67]:
print('Number of unique entries with matches within one year in the spending dataset:',df_combined_spending.index.nunique())

Number of unique entries with matches within one year in the spending dataset: 13


As can be seen in the above 2 dataframe and following cell, there are only 13 unique candidatename entries with a corresponding entry in the ANC elections database. However there are some notable outliers for candidates who had particularly large expenditures in the year of or the year before an ANC election.

## Conclusions

In ANC campaigns, there are very few reported donations or reported campaign expenditures. There were several notable outliers however, which could be looked into.

Number of ANC candidates who received reported campaign donations: <b>11 out of 1068 possible</b> (and many of the 11 are probably related to non-ANC elections)

Number of ANC candidates who reported campaign expenditures: <b>13 out of 1068 possible</b> (and many of these expenditures are probably related to non-ANC elections)