<h2>Analytics Practicum I - 1st Assignment 2021</h2>
<h3>Exploring the FEC 2016 Elections Data</h3>

---
> Georgia Vlassi p2822001<br />
> Business Anlytics <br />
> Athens University of Economics and Business <br/>

---

The main scope of this assignment is to investigate data provided from Federal Election Commission for the American  elections of 2016.Through different questions, we will investigate and analyze the amounts raised and spent regarding the two main presidential candidates H.Clinton and D.Trump. 






---
## Questions

<h3>1. Identify the top 5 Political Action Committees (PACs), or rather, super-PACs, that supported each of the two presidential candidates, giving the amount of money raised and spent by each one of them.</h3>


---

Before analyzing our data, you should import the following packages.


In [175]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
import mpl_toolkits
from mpl_toolkits.mplot3d import Axes3D

from collections import defaultdict
from matplotlib.colors import rgb2hex, Normalize
from matplotlib.patches import Polygon
from matplotlib.colorbar import ColorbarBase

pd.options.display.float_format = '{:,.2f}'.format

%matplotlib inline
matplotlib.style.use('ggplot')
pd.set_option("display.max_columns", None)



 * The first dataset, which will be used is the contributions from commitees to cadidates and independent expenditure for 2016. You can download both header file and data from https://www.fec.gov/files/bulk-downloads/data_dictionaries/pas2_header_file.csv and https://www.fec.gov/files/bulk-downloads/2016/pas216.zip respectively.
 * The file contains each contribution or independent expenditure made by a PAC, party committee, candidate committee, or other federal committee to a candidate during the two-year election cycle.



In [2]:
pas_headers = pd.read_csv('https://www.fec.gov/files/bulk-downloads/data_dictionaries/pas2_header_file.csv')
pas_headers = pas_headers.columns.tolist()
pas_headers

['CMTE_ID',
 'AMNDT_IND',
 'RPT_TP',
 'TRANSACTION_PGI',
 'IMAGE_NUM',
 'TRANSACTION_TP',
 'ENTITY_TP',
 'NAME',
 'CITY',
 'STATE',
 'ZIP_CODE',
 'EMPLOYER',
 'OCCUPATION',
 'TRANSACTION_DT',
 'TRANSACTION_AMT',
 'OTHER_ID',
 'CAND_ID',
 'TRAN_ID',
 'FILE_NUM',
 'MEMO_CD',
 'MEMO_TEXT',
 'SUB_ID']

* Before we import the data, we should convert the field TRANSACTION_AMT to float and all the other fields to strings.

In [3]:
data_types = {header: np.str for header in pas_headers}
data_types['TRANSACTION_AMT'] = np.float

* Full data of file contributions from commitees can be read

In [4]:
pas_contrib = pd.read_csv('https://www.fec.gov/files/bulk-downloads/2016/pas216.zip', 
                  sep="|", 
                  index_col=False,
                  low_memory=False,
                  names=pas_headers,
                  dtype=data_types)
pas_contrib['TRANSACTION_DT']= pd.to_datetime(pas_contrib['TRANSACTION_DT'], format='%m%d%Y')
print(pas_contrib.shape)
pas_contrib.head(5)

(516394, 22)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID
0,C00548198,N,M3,G2016,15950887602,24K,CCM,BLAINE FOR CONGRESS,JEFFERSON CITY,MO,65102,,,2015-02-10,2500.0,C00458679,H8MO09153,6783511,998835,,,4032020151240895091
1,C00235739,N,M3,P2014,15950887798,24K,CCM,DAN NEWHOUSE FOR CONGRESS,YAKIMA,WA,98909,,,2015-02-05,2500.0,C00559393,H4WA04104,B544444,998836,,,4032020151240895118
2,C00235739,N,M3,P2016,15950887799,24K,CCM,CATHY MCMORRIS RODGERS FOR CONGRESS,SPOKANE,WA,99210,,,2015-02-18,1000.0,C00390476,H4WA05077,B544838,998836,,,4032020151240895120
3,C00235739,N,M3,G2018,15950887799,24K,CCM,MANCHIN FOR WEST VIRGINIA,WASHINGTON,DC,20002,,,2015-02-05,1000.0,C00486563,S0WV00090,B544441,998836,,,4032020151240895121
4,C00235739,N,M3,P2016,15950887790,24K,CCM,PETE AGUILAR FOR CONGRESS,WASHINGTON,DC,20003,,,2015-02-18,1500.0,C00510461,H2CA31125,B544833,998836,,,4032020151240895094


Regarding the documentation https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/, we should take into account:
   * The transaction type 24A, which represents an independent expenditure opposing election of candidate.
   * The transaction type 24E, which represents an independent expenditure advocating the election of candidate.
    
We should check if these types exist in our data and then filter:   


In [5]:
pas_contrib['TRANSACTION_TP'].unique()

array(['24K', '24Z', '24E', '24A', '24R', '24C', '24F', '24N'],
      dtype=object)

In [6]:
pas_contrib_up = pas_contrib[pas_contrib['TRANSACTION_TP'].isin(['24E', '24A'])]
print(pas_contrib_up.shape)
pas_contrib_up.head(5)

(239515, 22)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID
802,C00365296,N,Q1,P2015,15970350002,24E,ORG,TERRA STRATEGIES LLC,DES MOINES,IA,50309,,,2015-03-24,12404.0,H6MO05171,H6MO05171,SE.22416,1003479,,,4041620151241906737
2398,C00520684,N,MY,G2014,201507309000454990,24A,ORG,ODNEY ADVERTISING AGENCY,BISMARCK,ND,58502,,,2015-01-08,2195.0,H4ND00046,H4ND00046,SE.4381,1018960,,,4073020151248026313
3620,C00570739,N,MY,P2016,201507319000476060,24E,PAC,MAKE DC LISTEN,ATHENS,GA,30605,,,2015-04-20,61.0,P60006111,P60006111,EC0B8BFA8422A4D8BABD,1019398,,,4073120151248081657
3621,C00570739,N,MY,P2016,201507319000476060,24E,ORG,FACEBOOK,MENLO PARK,CA,940251452,,,2015-04-21,325.0,P60006111,P60006111,E04A7B32ADF0840C5889,1019398,,,4073120151248081659
3622,C00570739,N,MY,P2016,201507319000476061,24E,ORG,ENVISION MARKETING,LYNCHBURG,VA,245024202,,,2015-04-17,25532.0,P60006111,P60006111,E2840F8EF99584536A12,1019398,,,4073120151248081661


In [7]:
pas_contrib['TRANSACTION_TP'].unique()

array(['24K', '24Z', '24E', '24A', '24R', '24C', '24F', '24N'],
      dtype=object)

* The second dataset, which will be used is the candidate master for 2016. Υou can download both header file and data https://www.fec.gov/files/bulk-downloads/data_dictionaries/cn_header_file.csv and https://www.fec.gov/files/bulk-downloads/2016/cn16.zip

* The candidate master file contains one record for each candidate who has either registered with the Federal Election Commission. We want to get the candidate name of each candidate id.


In [8]:
cn_headers =  pd.read_csv('https://www.fec.gov/files/bulk-downloads/data_dictionaries/cn_header_file.csv')
cn_headers = cn_headers.columns.tolist()
cn_headers

['CAND_ID',
 'CAND_NAME',
 'CAND_PTY_AFFILIATION',
 'CAND_ELECTION_YR',
 'CAND_OFFICE_ST',
 'CAND_OFFICE',
 'CAND_OFFICE_DISTRICT',
 'CAND_ICI',
 'CAND_STATUS',
 'CAND_PCC',
 'CAND_ST1',
 'CAND_ST2',
 'CAND_CITY',
 'CAND_ST',
 'CAND_ZIP']

*  Full data of file candidates can be read

In [9]:
candidates = pd.read_csv('https://www.fec.gov/files/bulk-downloads/2016/cn16.zip', 
                  sep="|", 
                  index_col=False,
                  low_memory=False,
                  names=cn_headers)
print(candidates.shape)
candidates.head(5)

(7396, 15)


Unnamed: 0,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,CAND_ELECTION_YR,CAND_OFFICE_ST,CAND_OFFICE,CAND_OFFICE_DISTRICT,CAND_ICI,CAND_STATUS,CAND_PCC,CAND_ST1,CAND_ST2,CAND_CITY,CAND_ST,CAND_ZIP
0,H0AK00097,"COX, JOHN R.",REP,2014,AK,H,0.0,C,N,C00525261,P.O. BOX 1092,,ANCHOR POINT,AK,99556.0
1,H0AL02087,"ROBY, MARTHA",REP,2016,AL,H,2.0,I,C,C00462143,PO BOX 195,,MONTGOMERY,AL,36101.0
2,H0AL02095,"JOHN, ROBERT E JR",IND,2016,AL,H,2.0,C,N,,1465 W OVERBROOK RD,,MILLBROOK,AL,36054.0
3,H0AL05049,"CRAMER, ROBERT E ""BUD"" JR",DEM,2008,AL,H,5.0,,P,C00239038,PO BOX 2621,,HUNTSVILLE,AL,35804.0
4,H0AL05163,"BROOKS, MO",REP,2016,AL,H,5.0,I,C,C00464149,7610 FOXFIRE DRIVE,,HUNTSVILLE,AL,35802.0


* From file candidates we will only keep the following:
    * The candidates running for president in the 2016 elections.
    * The value 'P' for the column CAND_OFFICE means that the candidate runs for president.
    * The value 'C' for the column CAND_STATUS means 'statutory candidate'
    * Fields 'CAND_ID', 'CAND_NAME', 'CAND_PTY_AFFILIATION', 'CAND_ELECTION_YR', 'CAND_OFFICE', 'CAND_STATUS'


In [10]:
candidates = candidates.loc[(candidates['CAND_ELECTION_YR'] == 2016) & (candidates['CAND_OFFICE'] == 'P') & (candidates['CAND_STATUS'] == 'C')]
candidates = candidates[['CAND_ID', 'CAND_NAME', 'CAND_PTY_AFFILIATION', 'CAND_ELECTION_YR', 'CAND_OFFICE', 'CAND_STATUS']]
print(candidates.shape)
candidates.sample(5)

(72, 6)


Unnamed: 0,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,CAND_ELECTION_YR,CAND_OFFICE,CAND_STATUS
5298,P60012234,"JOHNSON, JOHN FITZGERALD MR.",IND,2016,P,C
4985,P60008398,"JINDAL, BOBBY",REP,2016,P,C
5640,P60016383,"KITTINGTON, VALMA",REP,2016,P,C
4693,P60005279,"BOWERS, KERRY DALE",REP,2016,P,C
4763,P60006046,"WALKER, SCOTT",REP,2016,P,C


In [11]:
candidates['CAND_NAME'].unique()

array(['CLINTON, HILLARY RODHAM / TIMOTHY MICHAEL KAINE',
       'SCHRINER, JOSEPH CHARLES', 'BROWN, HARLEY D',
       'BICKELMEYER, MICHAEL', 'JOHNSON, GARY / WILLIAM "BILL" WELD',
       'SANTORUM, RICHARD J.', 'HILL, CHRISTOPHER V',
       'PERRY, JAMES R (RICK)', 'STEIN, JILL', 'WELLS, ROBERT CARR JR',
       'WHITE, JEROME S', 'KREML, WILLIAM P', 'PAUL, RAND',
       'KASICH, JOHN R', 'MOOREHEAD, MONICA GAIL',
       'ADESHINA, YINKA ABOSEDE', 'JEROBOAN, KINS',
       'STEINBERG, MICHAEL ALAN', 'LOWER, BARTHOLOMEW JAMES MR.',
       'BOWERS, KERRY DALE', 'SHERMAN, JEFFERSON WOODSON',
       'CHRISTENSEN, DALE H', 'DUCKWALD, WANDA GAYLE',
       'CARSON, BENJAMIN S SR MD', 'EVERSON, MARK', 'WALKER, SCOTT',
       'CRUZ, RAFAEL EDWARD "TED"', 'LYNCH, DENNIS MICHAEL',
       'SANDERS, BERNARD', 'FIORINA, CARLY', 'KELSO, LLOYD THOMAS',
       'SCROGGIE, JEREMY', 'WILSON, WILLIE', 'PATAKI, GEORGE E',
       "O'MALLEY, MARTIN JOSEPH", 'GRAHAM, LINDSEY O',
       'WINSLOW, BRAD MR.', 'MA

Clear values of H.Clinton and D.Trump

In [12]:
candidates.loc[candidates['CAND_NAME'] == 'TRUMP, DONALD J. / MICHAEL R. PENCE ', 'CAND_NAME'] = 'TRUMP, DONALD J.'
candidates.loc[candidates['CAND_NAME'] == 'CLINTON, HILLARY RODHAM / TIMOTHY MICHAEL KAINE', 'CAND_NAME'] = 'CLINTON, HILLARY RODHAM'

In [13]:
candidates = candidates.loc[candidates['CAND_NAME'].isin(['TRUMP, DONALD J.', 'CLINTON, HILLARY RODHAM'])]
print(candidates.shape)
candidates.head()

(2, 6)


Unnamed: 0,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,CAND_ELECTION_YR,CAND_OFFICE,CAND_STATUS
4367,P00003392,"CLINTON, HILLARY RODHAM",DEM,2016,P,C
6298,P80001571,"TRUMP, DONALD J.",REP,2016,P,C


* We will merge the 'pas_contrib' file with 'candidates' file

In [14]:
contrib_candidates = pd.merge(pas_contrib_up, candidates[['CAND_ID', 'CAND_NAME','CAND_PTY_AFFILIATION',]], on='CAND_ID', how='inner')
print(contrib_candidates.shape)
contrib_candidates.head(5)

(109630, 24)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CAND_NAME,CAND_PTY_AFFILIATION
0,C90015454,N,Q3,P2016,201508119000813972,24E,IND,"MARLOWE, MARK ANTHONY",IOWA CITY,IA,52244,,,2015-07-04,2250.0,P80001571,P80001571,F57.000001,1021308,,,4081220151248549144,"TRUMP, DONALD J.",REP
1,C00004036,N,YE,P2016,201601309004951347,24A,ORG,FACEBOOK ADVERTISING,CHICAGO,IL,60693,,,2015-12-15,3698.0,P80001571,P80001571,D362280,1046003,X,,4021020161262352096,"TRUMP, DONALD J.",REP
2,C00004036,N,YE,P2016,201601309004951349,24A,ORG,RISING TIDE INTERACTIVE LLC,WASHINGTON,DC,20015,,,2015-12-15,1240.0,P80001571,P80001571,D362284,1046003,X,,4021020161262352100,"TRUMP, DONALD J.",REP
3,C00004036,N,YE,P2016,201601309004951350,24A,ORG,GOOGLE,MOUNTAIN VIEW,CA,94043,,,2015-12-15,3698.0,P80001571,P80001571,D362290,1046003,X,,4021020161262352102,"TRUMP, DONALD J.",REP
4,C00004036,N,YE,P2016,201601309004951352,24A,ORG,GOOGLE,MOUNTAIN VIEW,CA,94043,,,2015-12-22,48.0,P80001571,P80001571,D362890,1046003,,,4021020161262352106,"TRUMP, DONALD J.",REP


* The third dataset, which will be used is the PAC summary for 2016. Υou can download the data from https://www.fec.gov/files/bulk-downloads/2016/webk16.zip

* This file gives overall receipts and disbursements for each PAC and party committee registered with the commission, along with a breakdown of overall receipts by source and totals for contributions to other committees, independent expenditures made and other information.

In [15]:
pac_headers = ['CMTE_ID', 'CMTE_NM', 'CMTE_TP', 'CMTE_DSGN', 'CMTE_FILING_FREQ', 'TTL_RECEIPTS', 'TRANS_FROM_AFF', 'INDV_CONTRIB',
           'OTHER_POL_CMTE_CONTRIB', 'CAND_CONTRIB', 'CAND_LOANS', 'TTL_LOANS_RECEIVED', 'TTL_DISB', 'TRANF_TO_AFF', 
           'INDV_REFUNDS', 'OTHER_POL_CMTE_REFUNDS', 'CAND_LOAN_REPAY', 'LOAN_REPAY', 'COH_BOP', 'COH_COP', 'DEBTS_OWED_BY',
           'NONFED_TRANS_RECEIVED', 'CONTRIB_TO_OTHER_CMTE', 'IND_EXP', 'PTY_COORD_EXP', 'NONFED_SHARE_EXP', 'CVG_END_DT']
pac_headers

['CMTE_ID',
 'CMTE_NM',
 'CMTE_TP',
 'CMTE_DSGN',
 'CMTE_FILING_FREQ',
 'TTL_RECEIPTS',
 'TRANS_FROM_AFF',
 'INDV_CONTRIB',
 'OTHER_POL_CMTE_CONTRIB',
 'CAND_CONTRIB',
 'CAND_LOANS',
 'TTL_LOANS_RECEIVED',
 'TTL_DISB',
 'TRANF_TO_AFF',
 'INDV_REFUNDS',
 'OTHER_POL_CMTE_REFUNDS',
 'CAND_LOAN_REPAY',
 'LOAN_REPAY',
 'COH_BOP',
 'COH_COP',
 'DEBTS_OWED_BY',
 'NONFED_TRANS_RECEIVED',
 'CONTRIB_TO_OTHER_CMTE',
 'IND_EXP',
 'PTY_COORD_EXP',
 'NONFED_SHARE_EXP',
 'CVG_END_DT']

In [16]:
pac_sum = pd.read_csv('https://www.fec.gov/files/bulk-downloads/2016/webk16.zip', sep="|", index_col=False, names= pac_headers)
print(pac_sum.shape)
pac_sum.head(5)

(12048, 27)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,CMTE_DSGN,CMTE_FILING_FREQ,TTL_RECEIPTS,TRANS_FROM_AFF,INDV_CONTRIB,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,CAND_LOANS,TTL_LOANS_RECEIVED,TTL_DISB,TRANF_TO_AFF,INDV_REFUNDS,OTHER_POL_CMTE_REFUNDS,CAND_LOAN_REPAY,LOAN_REPAY,COH_BOP,COH_COP,DEBTS_OWED_BY,NONFED_TRANS_RECEIVED,CONTRIB_TO_OTHER_CMTE,IND_EXP,PTY_COORD_EXP,NONFED_SHARE_EXP,CVG_END_DT
0,C00000059,HALLMARK CARDS PAC,Q,U,M,123198.92,0.0,123198.92,0.0,0.0,0.0,0.0,88500.0,0.0,0.0,0.0,0.0,0.0,104794.36,139493.28,0.0,0.0,88500.0,0.0,0.0,0.0,12/31/2016
1,C00000422,AMERICAN MEDICAL ASSOCIATION POLITICAL ACTION ...,Q,B,M,2114478.16,0.0,2099958.16,0.0,0.0,0.0,0.0,2047839.79,1790.0,12621.71,0.0,0.0,0.0,552464.38,619102.75,0.0,0.0,1853000.0,141616.35,0.0,0.0,12/31/2016
2,C00000489,D R I V E POLITICAL FUND CHAPTER 886,N,U,Q,41455.17,41453.0,0.0,0.0,0.0,0.0,0.0,39672.85,0.0,0.0,0.0,0.0,0.0,192.0,1974.0,0.0,0.0,0.0,0.0,0.0,0.0,12/31/2016
3,C00000547,KANSAS MEDICAL SOCIETY POLITICAL ACTION COMMITTEE,Q,U,Q,19065.0,0.0,19065.0,0.0,0.0,0.0,0.0,17592.5,2592.5,0.0,0.0,0.0,0.0,4681.26,6153.76,0.0,0.0,15000.0,0.0,0.0,0.0,12/31/2016
4,C00000638,INDIANA STATE MEDICAL ASSOCIATION POLITICAL AC...,Q,U,Q,143170.0,0.0,142570.0,0.0,0.0,0.0,0.0,85918.48,13222.5,0.0,0.0,0.0,0.0,35530.91,92782.43,0.0,0.0,3000.0,0.0,0.0,0.0,12/31/2016


* We will merge the 'contrib_candidates' file with 'pac_sum'

In [17]:
final = pd.merge(pac_sum, contrib_candidates, on='CMTE_ID', how='inner')
print(final.shape)
final.head(5)

(109616, 50)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,CMTE_DSGN,CMTE_FILING_FREQ,TTL_RECEIPTS,TRANS_FROM_AFF,INDV_CONTRIB,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,CAND_LOANS,TTL_LOANS_RECEIVED,TTL_DISB,TRANF_TO_AFF,INDV_REFUNDS,OTHER_POL_CMTE_REFUNDS,CAND_LOAN_REPAY,LOAN_REPAY,COH_BOP,COH_COP,DEBTS_OWED_BY,NONFED_TRANS_RECEIVED,CONTRIB_TO_OTHER_CMTE,IND_EXP,PTY_COORD_EXP,NONFED_SHARE_EXP,CVG_END_DT,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CAND_NAME,CAND_PTY_AFFILIATION
0,C00000935,DCCC,Y,U,M,220891388.49,6619252.47,147346246.71,38631410.54,0.0,0.0,0.0,216358583.53,26206817.84,2732586.69,14300.0,0.0,13000000.0,2149438.76,6682243.72,14000000.0,0.0,403810.19,80378630.35,3612999.17,0.0,12/31/2016,A,M10,G2016,201706169056600064,24A,ORG,"MOORE CAMPAIGNS, LLC",WASHINGTON,DC,20010,,,2016-09-28,2168.0,P80001571,P80001571,SE-950758,1166132,,DATE OF DISSEMINATION: 09/29/16,4061620171409944958,"TRUMP, DONALD J.",REP
1,C00000935,DCCC,Y,U,M,220891388.49,6619252.47,147346246.71,38631410.54,0.0,0.0,0.0,216358583.53,26206817.84,2732586.69,14300.0,0.0,13000000.0,2149438.76,6682243.72,14000000.0,0.0,403810.19,80378630.35,3612999.17,0.0,12/31/2016,A,M10,G2016,201706169056600065,24A,ORG,"MOORE CAMPAIGNS, LLC",WASHINGTON,DC,20010,,,2016-09-28,836.0,P80001571,P80001571,SE-950759,1166132,,DATE OF DISSEMINATION: 09/29/16,4061620171409944960,"TRUMP, DONALD J.",REP
2,C00000935,DCCC,Y,U,M,220891388.49,6619252.47,147346246.71,38631410.54,0.0,0.0,0.0,216358583.53,26206817.84,2732586.69,14300.0,0.0,13000000.0,2149438.76,6682243.72,14000000.0,0.0,403810.19,80378630.35,3612999.17,0.0,12/31/2016,A,M10,G2016,201706169056600075,24A,ORG,GREAT AMERICAN MEDIA,WASHINGTON,DC,20007,,,2016-09-29,77062.0,P80001571,P80001571,SE-951093,1166132,,DATE OF DISSEMINATION: 10/04/16,4061620171409945000,"TRUMP, DONALD J.",REP
3,C00000935,DCCC,Y,U,M,220891388.49,6619252.47,147346246.71,38631410.54,0.0,0.0,0.0,216358583.53,26206817.84,2732586.69,14300.0,0.0,13000000.0,2149438.76,6682243.72,14000000.0,0.0,403810.19,80378630.35,3612999.17,0.0,12/31/2016,A,M10,G2016,201706169056600084,24A,ORG,GREAT AMERICAN MEDIA,WASHINGTON,DC,20007,,,2016-09-29,108354.0,P80001571,P80001571,SE-950803,1166132,,DATE OF DISSEMINATION: 10/04/16,4061620171409945035,"TRUMP, DONALD J.",REP
4,C00000935,DCCC,Y,U,M,220891388.49,6619252.47,147346246.71,38631410.54,0.0,0.0,0.0,216358583.53,26206817.84,2732586.69,14300.0,0.0,13000000.0,2149438.76,6682243.72,14000000.0,0.0,403810.19,80378630.35,3612999.17,0.0,12/31/2016,A,M10,G2016,201706169056600086,24A,ORG,"RALSTON LAPP MEDIA, LLC",WASHINGTON,DC,20007,,,2016-09-30,6220.0,P80001571,P80001571,SE-950901,1166132,,DATE OF DISSEMINATION: 10/01/16,4061620171409945045,"TRUMP, DONALD J.",REP


*** Check the below ***

In [18]:
final['CAND_NAME'].unique()

array(['TRUMP, DONALD J.', 'CLINTON, HILLARY RODHAM'], dtype=object)

In [19]:
final['TRANSACTION_TP'].unique()

array(['24A', '24E'], dtype=object)

For better perfomance some columns must be excluded from our final dataset. Regarding the 1st question the following columns will make up the final dataset: 

In [20]:
final = final[['CMTE_ID','CMTE_NM','CMTE_TP','TTL_RECEIPTS','OTHER_POL_CMTE_CONTRIB','CAND_CONTRIB','IND_EXP','CAND_ID','CAND_NAME','CAND_PTY_AFFILIATION','ENTITY_TP','NAME','SUB_ID','TRANSACTION_AMT','TRANSACTION_DT','TRANSACTION_PGI','TRANSACTION_TP']]
print(final.shape)
final.sample(5)

(109616, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
52908,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,IND,"RUIZ, LUZMEILYN",4101320161340955502,47.0,2016-09-27,G2016,24E
66333,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,IND,"GOIDICH, ALYSSA",4101320161340950735,17.0,2016-09-23,G2016,24E
66803,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,SUPERIOR BP,4101320161340927727,5.0,2016-09-01,G2016,24E
61259,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,BUDGET-PITTSBURGH,4101320161340929836,10.0,2016-09-06,G2016,24E
32656,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P80001571,"TRUMP, DONALD J.",REP,ORG,CANDLEWOOD,4020920171370075482,9.0,2016-08-29,G2016,24A


* Our dataset includes both contributions and refunds.

In [21]:
(final['TRANSACTION_AMT'] > 0).value_counts()

True     109258
False       358
Name: TRANSACTION_AMT, dtype: int64

* We will keep only positive contributions.

In [23]:
final = final[final['TRANSACTION_AMT'] > 0]

* As our main scope is to find the top 5 super PACs, we will filter our data with CMTE_TP = 'O'.  The 'O' indicates the Independent expenditure-only (Super PACs) as shown in documentation https://www.fec.gov/campaign-finance-data/committee-type-code-descriptions/

In [24]:
final_o = final.loc[final['CMTE_TP'].isin(['O'])]
print(final_o.shape)
final_o.sample(5)

(6245, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
8093,C00587022,COURAGEOUS CONSERVATIVES PAC,O,424500.21,2500.0,0.0,324961.94,P80001571,"TRUMP, DONALD J.",REP,ORG,MOUNTAINTOP MEDIA,4052020161292980432,250.0,2016-04-29,P2016,24A
108377,C00489799,PLANNED PARENTHOOD VOTES,O,22266898.24,3712020.0,0.0,12628454.91,P80001571,"TRUMP, DONALD J.",REP,ORG,MOXIE MEDIA INC.,4031720171380310471,5000.0,2016-08-24,G2016,24A
107631,C00495861,PRIORITIES USA ACTION,O,192065767.58,30083034.37,0.0,133408056.41,P80001571,"TRUMP, DONALD J.",REP,ORG,"RALSTON LAPP MEDIA, LLC",4111520161347283193,32121.0,2016-06-19,G2016,24A
108861,C00487470,CLUB FOR GROWTH ACTION,O,19936801.97,4661.11,0.0,19182422.19,P80001571,"TRUMP, DONALD J.",REP,ORG,CLUB FOR GROWTH,4112020151257319032,39.0,2015-10-22,P2016,24A
2815,C00497420,CITIZENS UNITED SUPER PAC LLC,O,1142857.09,0.0,0.0,955345.69,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,HSP DIRECT LLC,4080420171442569331,5984.0,2016-06-13,G2016,24A


<h3>Hillary Clinton </h3>

In [25]:
adv_clinton = final.loc[final['CAND_NAME'].isin(['CLINTON, HILLARY RODHAM'])].copy()
print(adv_clinton.shape)
adv_clinton.head(5)

(64144, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
183,C00002089,COMMUNICATIONS WORKERS OF AMERICA-COPE POLITIC...,Q,7930523.68,0.0,0.0,150.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,OFFBEAT PRESS,4020620171369713210,150.0,2016-09-26,G2016,24E
184,C00002766,UNITED FOOD AND COMMERCIAL WORKERS INTERNATION...,Q,12747221.51,0.0,0.0,180480.13,P00003392,"CLINTON, HILLARY RODHAM",DEM,,JOE TRIPPI AND ASSOCIATES INC.,4082320161313244264,746.0,2016-07-26,P2016,24E
185,C00002766,UNITED FOOD AND COMMERCIAL WORKERS INTERNATION...,Q,12747221.51,0.0,0.0,180480.13,P00003392,"CLINTON, HILLARY RODHAM",DEM,,JOE TRIPPI AND ASSOCIATES INC.,4082320161313244267,1702.0,2016-07-28,P2016,24E
186,C00002766,UNITED FOOD AND COMMERCIAL WORKERS INTERNATION...,Q,12747221.51,0.0,0.0,180480.13,P00003392,"CLINTON, HILLARY RODHAM",DEM,,JOE TRIPPI AND ASSOCIATES INC.,4082320161313244269,6041.0,2016-07-28,P2016,24E
187,C00002766,UNITED FOOD AND COMMERCIAL WORKERS INTERNATION...,Q,12747221.51,0.0,0.0,180480.13,P00003392,"CLINTON, HILLARY RODHAM",DEM,,JOE TRIPPI AND ASSOCIATES INC.,4082320161313244271,124686.0,2016-07-28,P2016,24E


* Find commitees advocating H.Clinton

In [26]:
adv_clinton = adv_clinton.loc[adv_clinton['TRANSACTION_TP'].isin(['24Ε'])]
adv_clinton = adv_clinton[['CAND_NAME', 'CMTE_NM', 'CMTE_TP','TRANSACTION_TP', 'TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'NAME']]
print(adv_clinton.shape)
adv_clinton.head(5)

(0, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME


In [27]:
opp_clinton = final.loc[final['CAND_NAME'].isin(['TRUMP, DONALD J.'])].copy()
print(opp_clinton.shape)
opp_clinton.head(5)

(45114, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
0,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"MOORE CAMPAIGNS, LLC",4061620171409944958,2168.0,2016-09-28,G2016,24A
1,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"MOORE CAMPAIGNS, LLC",4061620171409944960,836.0,2016-09-28,G2016,24A
2,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,GREAT AMERICAN MEDIA,4061620171409945000,77062.0,2016-09-29,G2016,24A
3,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,GREAT AMERICAN MEDIA,4061620171409945035,108354.0,2016-09-29,G2016,24A
4,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"RALSTON LAPP MEDIA, LLC",4061620171409945045,6220.0,2016-09-30,G2016,24A


* Find committes oppposing H.Clinton

In [28]:
opp_clinton = opp_clinton.loc[opp_clinton['TRANSACTION_TP'].isin(['24A'])]
opp_clinton = opp_clinton[['CAND_NAME', 'CMTE_NM', 'CMTE_TP','TRANSACTION_TP', 'TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'NAME']]
print(opp_clinton.shape)
opp_clinton.head(5)

(43823, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
0,"TRUMP, DONALD J.",DCCC,Y,24A,2168.0,220891388.49,80378630.35,"MOORE CAMPAIGNS, LLC"
1,"TRUMP, DONALD J.",DCCC,Y,24A,836.0,220891388.49,80378630.35,"MOORE CAMPAIGNS, LLC"
2,"TRUMP, DONALD J.",DCCC,Y,24A,77062.0,220891388.49,80378630.35,GREAT AMERICAN MEDIA
3,"TRUMP, DONALD J.",DCCC,Y,24A,108354.0,220891388.49,80378630.35,GREAT AMERICAN MEDIA
4,"TRUMP, DONALD J.",DCCC,Y,24A,6220.0,220891388.49,80378630.35,"RALSTON LAPP MEDIA, LLC"


* Merge these two datasets for H.Clinton

In [29]:
super_clinton = pd.concat([adv_clinton, opp_clinton])
print(super_clinton.shape)
super_clinton.head()

(43823, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
0,"TRUMP, DONALD J.",DCCC,Y,24A,2168.0,220891388.49,80378630.35,"MOORE CAMPAIGNS, LLC"
1,"TRUMP, DONALD J.",DCCC,Y,24A,836.0,220891388.49,80378630.35,"MOORE CAMPAIGNS, LLC"
2,"TRUMP, DONALD J.",DCCC,Y,24A,77062.0,220891388.49,80378630.35,GREAT AMERICAN MEDIA
3,"TRUMP, DONALD J.",DCCC,Y,24A,108354.0,220891388.49,80378630.35,GREAT AMERICAN MEDIA
4,"TRUMP, DONALD J.",DCCC,Y,24A,6220.0,220891388.49,80378630.35,"RALSTON LAPP MEDIA, LLC"


* Keep seperately the super PACs for H.Clinton

In [30]:
super_clinton_o = super_clinton.loc[super_clinton['CMTE_TP'].isin(['O'])].copy()
print(super_clinton_o.shape)
super_clinton_o.head()

(2371, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
2044,"TRUMP, DONALD J.",WOMEN VOTE!,O,24A,12333.0,36685866.33,33167398.37,"PRECISION NETWORK, LLC"
2045,"TRUMP, DONALD J.",WOMEN VOTE!,O,24A,93219.0,36685866.33,33167398.37,"PRECISION NETWORK, LLC"
2046,"TRUMP, DONALD J.",WOMEN VOTE!,O,24A,673224.0,36685866.33,33167398.37,"PRECISION NETWORK, LLC"
2047,"TRUMP, DONALD J.",WOMEN VOTE!,O,24A,450000.0,36685866.33,33167398.37,"PRECISION NETWORK, LLC"
2048,"TRUMP, DONALD J.",WOMEN VOTE!,O,24A,558000.0,36685866.33,33167398.37,WATERFRONT STRATEGIES


<h3>Donald Trump </h3>

In [31]:
adv_trump = final.loc[final['CAND_NAME'].isin(['TRUMP, DONALD J.'])].copy()
print(adv_trump.shape)
adv_trump.head(5)

(45114, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
0,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"MOORE CAMPAIGNS, LLC",4061620171409944958,2168.0,2016-09-28,G2016,24A
1,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"MOORE CAMPAIGNS, LLC",4061620171409944960,836.0,2016-09-28,G2016,24A
2,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,GREAT AMERICAN MEDIA,4061620171409945000,77062.0,2016-09-29,G2016,24A
3,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,GREAT AMERICAN MEDIA,4061620171409945035,108354.0,2016-09-29,G2016,24A
4,C00000935,DCCC,Y,220891388.49,38631410.54,0.0,80378630.35,P80001571,"TRUMP, DONALD J.",REP,ORG,"RALSTON LAPP MEDIA, LLC",4061620171409945045,6220.0,2016-09-30,G2016,24A


* Find committes advocating for D.Trump

In [32]:
adv_trump = adv_trump.loc[adv_trump['TRANSACTION_TP'].isin(['24E'])]
adv_trump = adv_trump[['CAND_NAME', 'CMTE_NM', 'CMTE_TP','TRANSACTION_TP', 'TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'NAME']]
print(adv_trump.shape)
adv_trump.sample()

(1291, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
1141,"TRUMP, DONALD J.",NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICA...,Q,24E,145.0,21591111.4,19241228.01,PROLIST INC.


In [33]:
opp_trump = final.loc[final['CAND_NAME'].isin(['CLINTON, HILLARY RODHAM'])].copy()
print(opp_trump.shape)
opp_trump.sample(5)

(64144, 17)


Unnamed: 0,CMTE_ID,CMTE_NM,CMTE_TP,TTL_RECEIPTS,OTHER_POL_CMTE_CONTRIB,CAND_CONTRIB,IND_EXP,CAND_ID,CAND_NAME,CAND_PTY_AFFILIATION,ENTITY_TP,NAME,SUB_ID,TRANSACTION_AMT,TRANSACTION_DT,TRANSACTION_PGI,TRANSACTION_TP
66396,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,IND,"YOUNG, JOHN",4101320161340904932,34.0,2016-07-26,G2016,24E
69354,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,BUDGET-CLEVELAND,4020920171370153095,12.0,2016-10-27,G2016,24E
74901,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,BUDGET-CLEVELAND,4020920171370072608,13.0,2016-08-25,G2016,24E
4349,C00544767,STOP HILLARY PAC,V,6843714.67,0.0,0.0,3462965.48,P00003392,"CLINTON, HILLARY RODHAM",DEM,ORG,CAMPAIGN SOLUTIONS,4022220191643428287,14375.0,2016-10-13,G2016,24A
66644,C90011156,WORKING AMERICA,I,11373645.0,0.0,0.0,0.0,P00003392,"CLINTON, HILLARY RODHAM",DEM,IND,"WHITTIER, SHAYOLONDA",4101320161340951581,3.0,2016-09-23,G2016,24E


* Find commitees opposing D.Trump

In [34]:
opp_trump = opp_trump.loc[opp_trump['TRANSACTION_TP'].isin(['24A'])]
opp_trump = opp_trump[['CAND_NAME', 'CMTE_NM', 'CMTE_TP','TRANSACTION_TP', 'TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'NAME']]
print(opp_trump.shape)
opp_trump.sample()

(5393, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
5919,"CLINTON, HILLARY RODHAM",VIGOP (VIRGIN ISLANDS REPUBLICAN PARTY),Q,24A,24.0,3493514.62,737140.1,DIRECT SUPPORT SERVICES INC


* Merge these two datasets for D.Trump


In [35]:
super_trump = pd.concat([adv_trump, opp_trump])
print(super_trump.shape)
super_trump.head()

(6684, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
344,"TRUMP, DONALD J.",SEIU COPE (SERVICE EMPLOYEES INTERNATIONAL UNI...,Q,24E,4580.0,50264684.02,8550288.58,THE PIVOT GROUP
1068,"TRUMP, DONALD J.",NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICA...,Q,24E,113.0,21591111.4,19241228.01,MAHONING COUNTY AGRICULTURAL SOCIETY
1069,"TRUMP, DONALD J.",NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICA...,Q,24E,7.0,21591111.4,19241228.01,"WISCONSIN FIREARM OWNERS, RANGES, CLUBS AND ED..."
1070,"TRUMP, DONALD J.",NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICA...,Q,24E,205.0,21591111.4,19241228.01,KITTITAS VALLEY EVENT CENTER
1071,"TRUMP, DONALD J.",NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICA...,Q,24E,21.0,21591111.4,19241228.01,"DOWNTOWN EAU CLAIRE, INC."


* Keep seperately the super PACs for D.Trump

In [36]:
super_trump_o = super_trump.loc[super_trump['CMTE_TP'].isin(['O'])].copy()
print(super_trump_o.shape)
super_trump_o.head()

(2000, 8)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME
2259,"TRUMP, DONALD J.",TEXAS PATRIOTS PAC,O,24E,86.0,396682.92,87288.87,CONROE COURIER
2349,"TRUMP, DONALD J.",CITIZENS UNITED SUPER PAC LLC,O,24E,20000.0,1142857.09,955345.69,INFOCISION
2350,"TRUMP, DONALD J.",CITIZENS UNITED SUPER PAC LLC,O,24E,10000.0,1142857.09,955345.69,MDI IMAGING & MAILING
2351,"TRUMP, DONALD J.",CITIZENS UNITED SUPER PAC LLC,O,24E,17550.0,1142857.09,955345.69,HSP DIRECT LLC
2352,"TRUMP, DONALD J.",CITIZENS UNITED SUPER PAC LLC,O,24E,773.0,1142857.09,955345.69,STMP


* We will calculate the total amounts for PACs and super-PACs for H.Clinton

In [37]:
super_clinton_party = super_clinton.groupby(['CMTE_NM']).sum().sort_values(by='IND_EXP', ascending=False)
super_clinton_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
PRIORITIES USA ACTION,126062193.0,32267048953.44,22412553476.88
DCCC,6693341.0,40423124093.67,14709289354.05
CLUB FOR GROWTH ACTION,7054203.0,4884516482.65,4699693436.55
OUR PRINCIPLES PAC,16353117.0,3900105125.25,3757059347.85
HOUSE MAJORITY PAC,2761357.0,4190405342.25,3560282061.0


<h4>The TOP 5 PACs and super-PACs for Hillary Clinton</h4>

In [38]:
super_clinton_party_apply = super_clinton.groupby(['CMTE_NM']).apply(lambda x:x.sort_values(by='TRANSACTION_AMT', 
                             ascending=False).iloc[0])[['TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'CMTE_TP',]]
super_clinton_party = super_clinton_party_apply.sort_values(by='IND_EXP', ascending=False)
super_clinton_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,CMTE_TP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PRIORITIES USA ACTION,10774356.0,192065767.58,133408056.41,O
RIGHT TO RISE USA,19411.0,121695224.05,86817478.31,O
DCCC,467549.0,220891388.49,80378630.35,Y
SENATE MAJORITY PAC,313650.0,92821080.67,75413534.87,O
DSCC,240829.0,179800228.74,60421908.0,Y


* We will calculate the total amounts for PACs and super-PACs for D.Trump

In [39]:
super_trump_party = super_trump.groupby(['CMTE_NM']).sum().sort_values(by='IND_EXP', ascending=False)
super_trump_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GREAT AMERICA PAC,23885544.0,11158298772.52,9183640334.99
FUTURE45,24219101.0,5224272688.36,5071184750.83
NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICAL VICTORY FUND,9315576.0,5095502290.4,4540929810.36
TEA PARTY MAJORITY FUND,3602443.0,3687443562.0,2008823245.0
REBUILDING AMERICA NOW,21199098.0,1440607533.34,1208873407.85


<h4>The TOP 5 PACs and super-PACs for Donald Trump</h4>

In [40]:
super_trump_party_apply = super_trump.groupby(['CMTE_NM']).apply(lambda x:x.sort_values(by='TRANSACTION_AMT', 
                             ascending=False).iloc[0])[['TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'CMTE_TP',]]
super_trump_party = super_trump_party_apply.sort_values(by='IND_EXP', ascending=False)
super_trump_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,CMTE_TP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"FREEDOM PARTNERS ACTION FUND, INC.",13400.0,28201407.05,29728850.26,O
FUTURE45,7000000.0,24996520.04,24264041.87,O
GREAT AMERICA PAC,3000000.0,28684572.68,23608329.91,W
REBUILDING AMERICA NOW,1746350.0,23616516.94,19817596.85,O
NATIONAL RIFLE ASSOCIATION OF AMERICA POLITICAL VICTORY FUND,2235223.0,21591111.4,19241228.01,Q


In order to be more precisely to the 1st question, we should show the TOP 5 super PACS for each candidate

<h4>The TOP 5 super PACs for Hillary Clinton</h4>

In [41]:
super_clinton_party_o = super_clinton_o.groupby(['CMTE_NM']).apply(lambda x:x.sort_values(by='TRANSACTION_AMT', 
                             ascending=False).iloc[0])[['TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'CMTE_TP',]]
superPAC_clinton_party = super_clinton_party_o.sort_values(by='IND_EXP', ascending=False)
superPAC_clinton_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,CMTE_TP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PRIORITIES USA ACTION,10774356.0,192065767.58,133408056.41,O
RIGHT TO RISE USA,19411.0,121695224.05,86817478.31,O
SENATE MAJORITY PAC,313650.0,92821080.67,75413534.87,O
CONSERVATIVE SOLUTIONS PAC,1510406.0,60564219.16,55443629.89,O
HOUSE MAJORITY PAC,539240.0,55872071.23,47470427.48,O


<h4>The TOP 5 super PACs for Donald Trump</h4>

In [42]:
super_trump_party_o = super_trump_o.groupby(['CMTE_NM']).apply(lambda x:x.sort_values(by='TRANSACTION_AMT', 
                             ascending=False).iloc[0])[['TRANSACTION_AMT', 'TTL_RECEIPTS', 'IND_EXP', 'CMTE_TP',]]
superPAC_trump_party = super_trump_party_o.sort_values(by='IND_EXP', ascending=False)
superPAC_trump_party.head()

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,CMTE_TP
CMTE_NM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"FREEDOM PARTNERS ACTION FUND, INC.",13400.0,28201407.05,29728850.26,O
FUTURE45,7000000.0,24996520.04,24264041.87,O
REBUILDING AMERICA NOW,1746350.0,23616516.94,19817596.85,O
CLUB FOR GROWTH ACTION,52.0,19936801.97,19182422.19,O
UNITED WE CAN,4392.0,24206986.36,13734221.79,O


---
<h3>2. Identify the top 10 individual donors (i.e., persons) for each of the two presidential candidates and the amount they spent. In order to do that, you should know that donations are not always to a fundraising committee that can be directly linked to a candidate, but they can be due to other entities such as PACs.</h3>


---


* The first dataset, which will be used is the contributions from individuals for 2016. You can download both header file and data from https://www.fec.gov/files/bulk-downloads/data_dictionaries/indiv_header_file.csv and https://www.fec.gov/files/bulk-downloads/2016/indiv16.zip respectively.

* The file contains contributor names, occupation and employer, address, and contribution amount.


In [43]:
indiv_headers = pd.read_csv('https://www.fec.gov/files/bulk-downloads/data_dictionaries/indiv_header_file.csv')
indiv_headers = indiv_headers.columns.tolist()
indiv_headers

['CMTE_ID',
 'AMNDT_IND',
 'RPT_TP',
 'TRANSACTION_PGI',
 'IMAGE_NUM',
 'TRANSACTION_TP',
 'ENTITY_TP',
 'NAME',
 'CITY',
 'STATE',
 'ZIP_CODE',
 'EMPLOYER',
 'OCCUPATION',
 'TRANSACTION_DT',
 'TRANSACTION_AMT',
 'OTHER_ID',
 'TRAN_ID',
 'FILE_NUM',
 'MEMO_CD',
 'MEMO_TEXT',
 'SUB_ID']

* Before we import the data, we should convert the field TRANSACTION_AMT to float and all the other fields to strings.

In [44]:
indiv_data_types = {header: np.str for header in indiv_headers}
indiv_data_types['TRANSACTION_AMT'] = np.float

* Full data of the file individuals can be read

In [45]:
individuals = pd.read_csv('C:/Users/tzina/Downloads/indiv16/itcont.txt', 
                  sep="|", 
                  index_col=False,
                  names=indiv_headers,
                  dtype=indiv_data_types,
                  parse_dates=['TRANSACTION_DT'])
print(individuals.shape)
individuals.head(5)

(20498303, 21)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID
0,C00088591,N,M3,P,15970306895,15,IND,"BURCH, MARY K.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP PROGRAM MANAGEMENT,2132015,500.0,,2A8EE0688413416FA735,998834,,,4032020151240885624
1,C00088591,N,M3,P,15970306960,15,IND,"KOUNTZ, DONALD E.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,DIR PROGRAMS,2132015,200.0,,20150211113220-479,998834,,,4032020151240885819
2,C00088591,N,M3,P,15970306960,15,IND,"KOUNTZ, DONALD E.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,DIR PROGRAMS,2272015,200.0,,20150225112333-476,998834,,,4032020151240885820
3,C00088591,N,M3,P,15970306915,15,IND,"DOSHI, NIMISH M.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP AND CFO,2132015,200.0,,20150309_2943,998834,,,4032020151240885683
4,C00088591,N,M3,P,15970306915,15,IND,"DOSHI, NIMISH M.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP AND CFO,2272015,200.0,,20150224153748-2525,998834,,,4032020151240885684


* Convert TRANSACTION_DT to datetime
* Remove TRANSACTION_DT = '1015-10-05' as we cannot parse the field

In [46]:
individuals = individuals[individuals['TRANSACTION_DT'] != '1015-10-05']

In [47]:
individuals['TRANSACTION_DT']= pd.to_datetime(individuals['TRANSACTION_DT'], format='%m%d%Y', errors = 'coerce')

* We will keep only the positive contributions.

In [48]:
individuals = individuals[individuals['TRANSACTION_AMT'] > 0]

* For committe master you can download both header file and data from https://www.fec.gov/data/browse-data/files/bulk-downloads/data_dictionaries/cm_header_file.csv and https://www.fec.gov/files/bulk-downloads/2016/cm16.zip respectively.


In [49]:
cm_headers =  pd.read_csv('https://www.fec.gov/data/browse-data/files/bulk-downloads/data_dictionaries/cm_header_file.csv')
cm_headers = cm_headers.columns.tolist()
cm_headers

['CMTE_ID',
 'CMTE_NM',
 'TRES_NM',
 'CMTE_ST1',
 'CMTE_ST2',
 'CMTE_CITY',
 'CMTE_ST',
 'CMTE_ZIP',
 'CMTE_DSGN',
 'CMTE_TP',
 'CMTE_PTY_AFFILIATION',
 'CMTE_FILING_FREQ',
 'ORG_TP',
 'CONNECTED_ORG_NM',
 'CAND_ID']

* We are ready to read the data.

In [50]:
committes = pd.read_csv('https://www.fec.gov/files/bulk-downloads/2016/cm16.zip', 
                  sep="|", 
                  index_col=False,
                  names=cm_headers)
print(committes.shape)
committes.head(5)

(17651, 15)


Unnamed: 0,CMTE_ID,CMTE_NM,TRES_NM,CMTE_ST1,CMTE_ST2,CMTE_CITY,CMTE_ST,CMTE_ZIP,CMTE_DSGN,CMTE_TP,CMTE_PTY_AFFILIATION,CMTE_FILING_FREQ,ORG_TP,CONNECTED_ORG_NM,CAND_ID
0,C00000059,HALLMARK CARDS PAC,ERIN BROWER,2501 MCGEE,MD#288,KANSAS CITY,MO,64108,U,Q,UNK,M,C,,
1,C00000422,AMERICAN MEDICAL ASSOCIATION POLITICAL ACTION ...,"WALKER, KEVIN","25 MASSACHUSETTS AVE, NW",SUITE 600,WASHINGTON,DC,20001,B,Q,,M,M,AMERICAN MEDICAL ASSOCIATION,
2,C00000489,D R I V E POLITICAL FUND CHAPTER 886,TOM RITTER,3528 W RENO,,OKLAHOMA CITY,OK,73107,U,N,,Q,L,TEAMSTERS LOCAL UNION 886,
3,C00000547,KANSAS MEDICAL SOCIETY POLITICAL ACTION COMMITTEE,"C. RICHARD BONEBRAKE, M.D.",623 SW 10TH AVE,,TOPEKA,KS,66612,U,Q,UNK,Q,T,,
4,C00000638,INDIANA STATE MEDICAL ASSOCIATION POLITICAL AC...,"VIDYA KORA, M.D.","322 CANAL WALK, CANAL LEVEL",,INDIANAPOLIS,IN,46202,U,Q,,Q,M,,


* Merge individuals with committes

In [52]:
fec = pd.merge(individuals, 
               committes[['CMTE_ID', 'CMTE_NM', 'CAND_ID']], 
               on='CMTE_ID', 
               how='inner')
print(fec.shape)
fec.head(5)

(20114561, 23)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CMTE_NM,CAND_ID
0,C00088591,N,M3,P,15970306895,15,IND,"BURCH, MARY K.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP PROGRAM MANAGEMENT,2015-02-13,500.0,,2A8EE0688413416FA735,998834,,,4032020151240885624,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
1,C00088591,N,M3,P,15970306960,15,IND,"KOUNTZ, DONALD E.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,DIR PROGRAMS,2015-02-13,200.0,,20150211113220-479,998834,,,4032020151240885819,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
2,C00088591,N,M3,P,15970306960,15,IND,"KOUNTZ, DONALD E.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,DIR PROGRAMS,2015-02-27,200.0,,20150225112333-476,998834,,,4032020151240885820,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
3,C00088591,N,M3,P,15970306915,15,IND,"DOSHI, NIMISH M.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP AND CFO,2015-02-13,200.0,,20150309_2943,998834,,,4032020151240885683,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
4,C00088591,N,M3,P,15970306915,15,IND,"DOSHI, NIMISH M.",FALLS CHURCH,VA,220424511,NORTHROP GRUMMAN,VP AND CFO,2015-02-27,200.0,,20150224153748-2525,998834,,,4032020151240885684,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,


* We should merge the fec dataset with candidates from previous question

In [53]:
fec = pd.merge(fec, 
               candidates[['CAND_ID', 'CAND_NAME']], 
               on='CAND_ID',
               how='inner')
print(fec.shape)
fec.head(5)

(2662041, 24)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CMTE_NM,CAND_ID,CAND_NAME
0,C00575795,A,Q2,P2016,201509039001600101,15,IND,"CLARY, CAROLE",DESTIN,FL,325405205,"CPR OF DESTIN, INC.",VP/DIRECTOR OF OPERATIONS,2015-06-15,50.0,,C240090,1024052,,,4090920151249600958,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
1,C00575795,A,Q2,P2016,201509039001600101,15,IND,"FEW, MELANIE",DALLAS,TX,752011123,ACE CASH EXPRESS,MARKETING,2015-04-12,5.0,,C17450,1024052,,,4090920151249600959,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
2,C00575795,A,Q2,P2016,201509039001600101,15,IND,"SPURLOCK, MICHAEL",NASHVILLE,TN,372036615,,NOT EMPLOYED,2015-06-13,50.0,,C234270,1024052,,,4090920151249600960,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
3,C00575795,A,Q2,P2016,201509039001600102,15,IND,"ELIAS, JOHN",WASHINGTON,DC,200053743,U.S. DEPARTMENT OF JUSTICE,ATTORNEY,2015-06-11,100.0,,C223180,1024052,,,4090920151249600961,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
4,C00575795,A,Q2,P2016,201509039001600102,15,IND,"SHIFTON, CONSTANCE",CHERRY HILL,NJ,80031314,SELF-EMPLOYED,EDUCATIONAL ADVISOR,2015-06-12,50.0,,C229360,1024052,,,4090920151249600962,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"


* Check the unique candidates

In [54]:
fec['CAND_NAME'].unique()

array(['CLINTON, HILLARY RODHAM', 'TRUMP, DONALD J.'], dtype=object)

In [55]:
fec['ENTITY_TP'].unique()

array(['IND', 'CAN', 'ORG', 'PAC', 'COM'], dtype=object)

* From ENTITY_TP keep only the indivinduals

In [56]:
fec = fec.loc[fec['ENTITY_TP'].isin(['IND'])]

* Create the dataframe for H.Clinton

In [57]:
f_clinton = fec.loc[fec['CAND_NAME'] == 'CLINTON, HILLARY RODHAM']
f_clinton = f_clinton[['CMTE_ID', 'TRANSACTION_TP', 'NAME', 'ENTITY_TP','TRANSACTION_AMT', 'CAND_NAME']]
print(f_clinton.shape)
f_clinton.head(5)

(2514936, 6)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,NAME,ENTITY_TP,TRANSACTION_AMT,CAND_NAME
0,C00575795,15,"CLARY, CAROLE",IND,50.0,"CLINTON, HILLARY RODHAM"
1,C00575795,15,"FEW, MELANIE",IND,5.0,"CLINTON, HILLARY RODHAM"
2,C00575795,15,"SPURLOCK, MICHAEL",IND,50.0,"CLINTON, HILLARY RODHAM"
3,C00575795,15,"ELIAS, JOHN",IND,100.0,"CLINTON, HILLARY RODHAM"
4,C00575795,15,"SHIFTON, CONSTANCE",IND,50.0,"CLINTON, HILLARY RODHAM"


Rename field CAND_NAME

In [58]:
occ_mapping = {
    'TRUMP, DONALD J.': 'CLINTON, HILLARY RODHAM'
}

f = lambda x: occ_mapping.get(x, x) 
super_clinton.loc[:, 'CAND_NAME'] = super_clinton.loc[:, 'CAND_NAME'].map(f)

* Create the dataframe for D.Trump

In [59]:
f_trump = fec.loc[fec['CAND_NAME'] == 'TRUMP, DONALD J.']
f_trump = f_trump[['CMTE_ID', 'TRANSACTION_TP', 'NAME', 'ENTITY_TP','TRANSACTION_AMT', 'CAND_NAME']]
print(f_trump.shape)
f_trump.head(5)

(146846, 6)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,NAME,ENTITY_TP,TRANSACTION_AMT,CAND_NAME
2515042,C00580100,15,"BONNETT, ELLIOTT",IND,60.0,"TRUMP, DONALD J."
2515043,C00580100,15,"FISHER, SANDI",IND,100.0,"TRUMP, DONALD J."
2515044,C00580100,15,"FLAHART, MICHAEL",IND,50.0,"TRUMP, DONALD J."
2515045,C00580100,15,"FLANAGAN, DAVID",IND,162.0,"TRUMP, DONALD J."
2515046,C00580100,15,"LANDES, PATRICIA",IND,10.0,"TRUMP, DONALD J."


Rename field CAND_NAME

In [60]:
occ_mapping = {
    'CLINTON, HILLARY RODHAM': 'TRUMP, DONALD J.'
}

f = lambda x: occ_mapping.get(x, x) 
super_trump.loc[:, 'CAND_NAME'] = super_trump.loc[:, 'CAND_NAME'].map(f)

* Merge the new dataset of H.Clinton with the dataset created in question 1 (including the transaction amounts of PACs and superPACs)

In [61]:
total_clinton = pd.concat([super_clinton, f_clinton])
print(total_clinton.shape)
total_clinton.sample(5)

(2558759, 10)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME,CMTE_ID,ENTITY_TP
969812,"CLINTON, HILLARY RODHAM",,,15,50.0,,,"KAPLAN, HOWARD",C00575795,IND
125773,"CLINTON, HILLARY RODHAM",,,15,25.0,,,"SOLOMON, ELIOT",C00575795,IND
1628359,"CLINTON, HILLARY RODHAM",,,15,38.0,,,"SPRIGGS, LUCY",C00575795,IND
2243373,"CLINTON, HILLARY RODHAM",,,15,38.0,,,"MCCORMICK, PATRICIA",C00575795,IND
1626003,"CLINTON, HILLARY RODHAM",,,15,100.0,,,"MCCULLOUGH, CATHERINE",C00575795,IND


* Merge the new dataset of D.Trump with the dataset created in question 1 (including the transaction amounts of PACs and superPACs)

In [62]:
total_trump = pd.concat([super_trump, f_trump])
print(total_trump.shape)
total_trump.sample(5)

(153530, 10)


Unnamed: 0,CAND_NAME,CMTE_NM,CMTE_TP,TRANSACTION_TP,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP,NAME,CMTE_ID,ENTITY_TP
2539937,"TRUMP, DONALD J.",,,15,250.0,,,"BOYD, ANTHONY",C00580100,IND
2550618,"TRUMP, DONALD J.",,,15,58.0,,,"POWERS, ALLISON",C00580100,IND
2575606,"TRUMP, DONALD J.",,,15,146.0,,,"FLERCHINGER, JOHN",C00580100,IND
2640335,"TRUMP, DONALD J.",,,15,35.0,,,"BARDON, DAVID",C00580100,IND
2572874,"TRUMP, DONALD J.",,,15,250.0,,,"PAUL, CASTANEDO",C00580100,IND


<h4>The TOP 10 donors of Hillary Clinton </h4>

In [63]:
donors_clinton = total_clinton.groupby(['NAME']).sum().sort_values(by='TRANSACTION_AMT', ascending=False)
donors_clinton.head(10)

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP
NAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
TARGETED PLATFORM MEDIA LLC,99211787.0,6598946725.53,4592783612.14
PRECISION NETWORK LLC,18551753.0,6340310089.57,4404039072.65
BLUEWEST MEDIA,11075368.0,211722039.04,209019623.2
WATERFRONT STRATEGIES,7819657.0,3805367191.84,2879275710.91
ADELSTEIN & ASSOCIATES,6807764.0,73057250.0,69454192.5
FUSE,6189832.0,4705913835.32,3262541491.8
GCW MEDIA SERVICES,5591365.0,285373545.75,274906781.55
"TARGET ENTERPRISES, LLC",5260498.0,1938055013.12,1774196156.48
"RED SEA, LLC",4557817.0,159494415.76,153459377.52
DDC ADVOCACY,4417075.0,2016639723.3,1942674589.62


<h4>The TOP 10 donors Donald Trump</h4>

In [64]:
trump_donors = total_trump.groupby(['NAME']).sum().sort_values(by='TRANSACTION_AMT', ascending=False)
trump_donors.head(10)

Unnamed: 0_level_0,TRANSACTION_AMT,TTL_RECEIPTS,IND_EXP
NAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MULTI MEDIA SERVICES CORPORATION,18980619.0,614029440.44,515257518.1
DEL CIELO MEDIA,17825000.0,174975640.28,169848293.09
STARBOARD STRATEGIC,13842906.0,0.0,0.0
"STARBOARD STRATEGIC, INC.",8936541.0,561368896.4,500271928.26
RAPID RESPONSE TELEVISION LLC,5868879.0,1948262654.92,1508339175.86
INFOCISION MANAGEMENT CORP,5202469.0,3772561325.79,2071131597.46
TANGIBLE MEDIA INC,5000000.0,57369145.36,47216659.82
TARGET ENTERPRISES LLC,4423479.0,250598928.14,59691232.37
FRONTLINE STRATEGIES & MEDIA,4316381.0,286845726.8,236083299.1
NONBOX,4058683.0,117371754.11,94512753.27


---
<h3>3. Investigate the chronological evolution of the contributions made to and the expenditures made by the campaigns</h3>

---

* For operating expenditures and disbursements you can download both header file and data from https://www.fec.gov/files/bulk-downloads/data_dictionaries/oppexp_header_file.csv and https://www.fec.gov/files/bulk-downloads/2016/oppexp16.zip respectively.

In [109]:
opper_headers = pd.read_csv('https://www.fec.gov/files/bulk-downloads/data_dictionaries/oppexp_header_file.csv')
opper_headers = opper_headers.columns.tolist()
opper_headers

['CMTE_ID',
 'AMNDT_IND',
 'RPT_YR',
 'RPT_TP',
 'IMAGE_NUM',
 'LINE_NUM',
 'FORM_TP_CD',
 'SCHED_TP_CD',
 'NAME',
 'CITY',
 'STATE',
 'ZIP_CODE',
 'TRANSACTION_DT',
 'TRANSACTION_AMT',
 'TRANSACTION_PGI',
 'PURPOSE',
 'CATEGORY',
 'CATEGORY_DESC',
 'MEMO_CD',
 'MEMO_TEXT',
 'ENTITY_TP',
 'SUB_ID',
 'FILE_NUM',
 'TRAN_ID',
 'BACK_REF_TRAN_ID']

* Convert type of TRANSACTION_AMT to float

In [110]:
opper_data_types = {header:np.str for header in opper_headers}
opper_data_types['TRANSACTION_AMT'] = np.float

* The full data of opperating_expenditures can be read

In [116]:
opper_exp = pd.read_csv('https://www.fec.gov/files/bulk-downloads/2016/oppexp16.zip', 
                        sep="|", 
                        index_col=False,
                        names=opper_headers,
                        dtype=opper_data_types)
print(opper_exp.shape)
opper_exp.head(5)

(1749832, 25)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_YR,RPT_TP,IMAGE_NUM,LINE_NUM,FORM_TP_CD,SCHED_TP_CD,NAME,CITY,STATE,ZIP_CODE,TRANSACTION_DT,TRANSACTION_AMT,TRANSACTION_PGI,PURPOSE,CATEGORY,CATEGORY_DESC,MEMO_CD,MEMO_TEXT,ENTITY_TP,SUB_ID,FILE_NUM,TRAN_ID,BACK_REF_TRAN_ID
0,C00415182,N,2015,Q1,15951142498,17,F3,SB,CHASE CARDMEMBER SERVICE,WILMINGTON,DE,198865153,02/21/2015,3301.24,P2016,CREDIT CARD PAYMENT,1,Administrative/Salary/Overhead Expenses,,,ORG,4041520151241882404,1002978,VN81E9TS8X8,
1,C00415182,N,2015,Q1,15951142495,17,F3,SB,GOOGLE INC.,SAN FRANCISCO,CA,941390001,01/21/2015,56.42,P2016,INTERNET SERVICE,1,Administrative/Salary/Overhead Expenses,X,*,ORG,4041520151241882396,1002978,VN81E9TQB00,VN81E9TQAP1
2,C00415182,N,2015,Q1,15951142495,17,F3,SB,LINKEDIN CORPORATION,MOUNTAIN VIEW,CA,94043,01/21/2015,49.95,P2016,INTERNET SERVICE,1,Administrative/Salary/Overhead Expenses,X,*,ORG,4041520151241882397,1002978,VN81E9TQB76,VN81E9TQAP1
3,C00415182,N,2015,Q1,15951142496,17,F3,SB,NATIONBUILDER,LOS ANGELES,CA,900131155,01/21/2015,99.0,P2016,INTERNET SERVICE,1,Administrative/Salary/Overhead Expenses,X,*,ORG,4041520151241882398,1002978,VN81E9TS9X1,VN81E9TQAP1
4,C00415182,N,2015,Q1,15951142496,17,F3,SB,"NGP VAN, INC.",WASHINGTON,DC,20005,01/21/2015,2000.0,P2016,INTERNET SERVICE,1,Administrative/Salary/Overhead Expenses,X,*,ORG,4041520151241882399,1002978,VN81E9TQAX7,VN81E9TQAP1


* We will keep only positive contributions.

In [117]:
opper_exp = opper_exp.loc[opper_exp['TRANSACTION_AMT'] > 0]

* Merge the above dataset with committes, so as to have the name of each committe

In [118]:
opper_exp = pd.merge(opper_exp, 
                     committes[['CMTE_ID', 'CMTE_NM', 'CAND_ID']], 
                     on='CMTE_ID', 
                     how='inner')
print(opper_exp.shape)
opper_exp.sample(5)

(1732174, 27)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_YR,RPT_TP,IMAGE_NUM,LINE_NUM,FORM_TP_CD,SCHED_TP_CD,NAME,CITY,STATE,ZIP_CODE,TRANSACTION_DT,TRANSACTION_AMT,TRANSACTION_PGI,PURPOSE,CATEGORY,CATEGORY_DESC,MEMO_CD,MEMO_TEXT,ENTITY_TP,SUB_ID,FILE_NUM,TRAN_ID,BACK_REF_TRAN_ID,CMTE_NM,CAND_ID
33841,C00498121,N,2015,Q2,201507159000158893,17,F3,SB,AT&T,ATLANTA,GA,303485414,06/05/2015,150.0,,TELEPHONE SERVICE,,,X,,ORG,4071620151247283014,1015061,SB17.10849.1,SB17.10849,ROGER WILLIAMS FOR U S CONGRESS COMMITTEE,H2TX33040
602861,C00549477,A,2015,YE,201603239011948444,17,F3,SB,IMPACT MAILING SERVICE,CHARLOTTE,NC,28273,11/17/2015,3733.62,,JFC DIRECT MARKETING,1.0,Administrative/Salary/Overhead Expenses,X,,ORG,4032320161277043904,1057102,SB17.5056.0,SB17.5056,THE PITTENGER VICTORY FUND,H2NC09134
1388235,C00431601,N,2015,YE,201601299004770712,21B,F3X,SB,"TRILOGY INTERACTIVE, LLC",MOUNTAIN VIEW,CA,94040,08/03/2015,302.0,,INTERNET HOSTING,,,,,ORG,4020220161261402807,1044862,D468404,,OCEANS PAC,
1092098,C00575795,A,2016,M7,201609189030921736,23,F3P,SB,EXPEDIA,BELLEVUE,WA,980045703,05/18/2016,168.37,P2016,TRAVEL,,,X,,ORG,4092020161317302394,1099613,D234089,D215250,HILLARY FOR AMERICA,P00003392
643137,C00540500,N,2016,Q1,201604150200105945,17,F3,SB,BUFFER,SAN FRANCISCO,CA,941071872,01/28/2016,510.0,P,ONLINE ADVERTISEMENTS,,,X,MEMO ITEM,,2042920161284602520,1066842,SB0427165945944,,CORY BOOKER FOR SENATE,S4NJ00185


In [119]:
opper_exp = opper_exp[['CMTE_ID','NAME','TRANSACTION_DT','TRANSACTION_AMT','PURPOSE','CMTE_NM','STATE','CAND_ID']]
print(opper_exp.shape)
opper_exp.head(5)

(1732174, 8)


Unnamed: 0,CMTE_ID,NAME,TRANSACTION_DT,TRANSACTION_AMT,PURPOSE,CMTE_NM,STATE,CAND_ID
0,C00415182,CHASE CARDMEMBER SERVICE,02/21/2015,3301.24,CREDIT CARD PAYMENT,FRIENDS OF JOHN SARBANES,DE,H6MD03292
1,C00415182,GOOGLE INC.,01/21/2015,56.42,INTERNET SERVICE,FRIENDS OF JOHN SARBANES,CA,H6MD03292
2,C00415182,LINKEDIN CORPORATION,01/21/2015,49.95,INTERNET SERVICE,FRIENDS OF JOHN SARBANES,CA,H6MD03292
3,C00415182,NATIONBUILDER,01/21/2015,99.0,INTERNET SERVICE,FRIENDS OF JOHN SARBANES,CA,H6MD03292
4,C00415182,"NGP VAN, INC.",01/21/2015,2000.0,INTERNET SERVICE,FRIENDS OF JOHN SARBANES,DC,H6MD03292


* Merge the above merged dataset with candidates H.Clinton and D.Trump

In [120]:
opper_exp = pd.merge(opper_exp, 
                     candidates[['CAND_ID', 'CAND_NAME']], 
                     on='CAND_ID', 
                     how='inner')
print(opper_exp.shape)
opper_exp.sample(5)

(171866, 9)


Unnamed: 0,CMTE_ID,NAME,TRANSACTION_DT,TRANSACTION_AMT,PURPOSE,CMTE_NM,STATE,CAND_ID,CAND_NAME
127717,C00575795,TZELL TRAVEL GROUP,01/29/2016,10.0,TRAVEL,HILLARY FOR AMERICA,NY,P00003392,"CLINTON, HILLARY RODHAM"
146552,C00575795,AMERICAN EXPRESS,03/03/2016,30.0,TRAVEL,HILLARY FOR AMERICA,NJ,P00003392,"CLINTON, HILLARY RODHAM"
73901,C00575795,FEDEX,11/10/2016,220.57,EVENT SUPPLIES,HILLARY FOR AMERICA,TN,P00003392,"CLINTON, HILLARY RODHAM"
163002,C00575795,"KULES, AMANDA",06/09/2016,500.0,TRAVEL AND SUBSISTENCE,HILLARY FOR AMERICA,DC,P00003392,"CLINTON, HILLARY RODHAM"
125119,C00575795,AMERICAN EXPRESS,06/05/2016,9.0,TRAVEL,HILLARY FOR AMERICA,NJ,P00003392,"CLINTON, HILLARY RODHAM"


In [121]:
opper_exp['CAND_NAME'].unique()

array(['TRUMP, DONALD J.', 'CLINTON, HILLARY RODHAM'], dtype=object)

* Convert TRANSACTION_DT field to datatime, so as to combine with the below

In [122]:
opper_exp['TRANSACTION_DT']= pd.to_datetime(opper_exp['TRANSACTION_DT'])

* Although the combine dataset with contributions from committes and candidates H.Clinton and D.Trump already exist, it has only TRANSACTIONS_TPs 24A and 24E. We will reproduce this dataset from the beginning

* Check for refunds and keep only the positive contributions

* Revert the type of TRANSACTION_DT to datetime

In [123]:
pas_contrib_new = pd.merge(pas_contrib, candidates[['CAND_ID', 'CAND_NAME']], on='CAND_ID', how='inner')
pas_contrib_new['TRANSACTION_DT'] = pd.to_datetime(pas_contrib_new['TRANSACTION_DT'])
print(pas_contrib_new.shape)
pas_contrib_new.sample(5)

(111409, 23)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CAND_NAME
49441,C90011156,N,YE,G2016,201701319042181626,24E,IND,"POND, NATHANIEL",PORTLAND,OR,972363342,,,2016-11-05,49.0,P00003392,P00003392,VN7CZA7EPM0,1144686,,,4020920171370173618,"CLINTON, HILLARY RODHAM"
33868,C90011156,N,YE,G2016,201701319042153164,24E,ORG,BUDGET-GREENSBORO,GREENSBORO,NC,274074619,,,2016-09-10,7.0,P00003392,P00003392,VN7CZA37ZN2,1144686,,,4020920171370088233,"CLINTON, HILLARY RODHAM"
106525,C90011156,N,YE,G2016,201701319042167084,24A,IND,"FREDMAN, JOSHUA",SEATTLE,WA,981051745,,,2016-10-13,4.0,P80001571,P80001571,VN7CZA5AQX4,1144686,,,4020920171370129991,"TRUMP, DONALD J."
48405,C90011156,N,YE,G2016,201701319042177928,24E,IND,"WALKER, AMBER",PITTSBURGH,PA,152182527,,,2016-10-31,21.0,P00003392,P00003392,VN7CZA6XB95,1144686,,,4020920171370162524,"CLINTON, HILLARY RODHAM"
44003,C90011156,N,YE,G2016,201701319042142134,24E,IND,"REPPERT, TARAH",PHILADELPHIA,PA,191191508,,,2016-07-18,89.0,P00003392,P00003392,VN7CZA1RQ34,1144686,,,4020920171370055143,"CLINTON, HILLARY RODHAM"


In [124]:
(pas_contrib_new['TRANSACTION_AMT']>0).value_counts()

True     111024
False       385
Name: TRANSACTION_AMT, dtype: int64

In [125]:
pas_contrib_new = pas_contrib_new.loc[pas_contrib_new['TRANSACTION_AMT'] > 0]

In [126]:
pas_contrib_new['TRANSACTION_TP'].unique()

array(['24K', '24A', '24E', '24Z', '24F', '24C', '24N'], dtype=object)

Let's have a brief summary for the above TRANSACTION_TPs from https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/

* 24K: Contribution made to nonaffiliated committee
* 24A: Independent expenditure opposing election of candidate
* 24E: Independent expenditure advocating election of candidate
* 24Z: In-kind contribution made to registered filer
* 24F: Communication cost for candidate (only for Form 7 filer)
* 24C: Coordinated party expenditure
* 24N: Communication cost against candidate (only for Form 7 filer)

The types 24Z and 24C will be executed as they are irrelevant with money.

In [127]:
pas_contrib_new = pas_contrib_new[pas_contrib_new['TRANSACTION_TP'].isin(['24K', '24A', '24E', '24F', '24N'])]
print(pas_contrib_new.shape)
pas_contrib_new.sample(5)

(110795, 23)


Unnamed: 0,CMTE_ID,AMNDT_IND,RPT_TP,TRANSACTION_PGI,IMAGE_NUM,TRANSACTION_TP,ENTITY_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,OTHER_ID,CAND_ID,TRAN_ID,FILE_NUM,MEMO_CD,MEMO_TEXT,SUB_ID,CAND_NAME
87836,C90011156,N,YE,G2016,201701319042174616,24A,IND,"TOTTEN, ALI",REYNOLDSBURG,OH,430684253,,,2016-10-26,34.0,P80001571,P80001571,VN7CZA6EFR4,1144686,,,4020920171370152587,"TRUMP, DONALD J."
36277,C90011156,N,YE,G2016,201701319042149509,24E,IND,"MCCAIN, HOPE",CLEVELAND,OH,441102525,,,2016-08-30,34.0,P00003392,P00003392,VN7CZA2RXY1,1144686,,,4020920171370077268,"CLINTON, HILLARY RODHAM"
30325,C90011156,N,YE,G2016,201701319042160264,24E,IND,"HAMILTON, MAURICE",DURHAM,NC,277034448,,,2016-09-28,17.0,P00003392,P00003392,VN7CZA45X00,1144686,,,4020920171370109533,"CLINTON, HILLARY RODHAM"
38611,C90011156,N,YE,G2016,201701319042142447,24E,IND,"GARRETT, DEANNA",FORT THOMAS,KY,410752520,,,2016-07-21,27.0,P00003392,P00003392,VN7CZA1TXQ0,1144686,,,4020920171370056082,"CLINTON, HILLARY RODHAM"
93787,C90011156,N,YE,G2016,201701319042146191,24A,IND,"ADAMS, CHRISTIAN",CINCINNATI,OH,452202216,,,2016-08-17,34.0,P80001571,P80001571,VN7CZA2CFD3,1144686,,,4020920171370067314,"TRUMP, DONALD J."


In [128]:
pas_contrib_new['CAND_NAME'].unique()

array(['CLINTON, HILLARY RODHAM', 'TRUMP, DONALD J.'], dtype=object)

We will remove some columns from our dataset

In [129]:
pas_contrib_new = pas_contrib_new[['CAND_ID','CAND_NAME','CMTE_ID','TRANSACTION_TP','ENTITY_TP','TRANSACTION_DT','TRANSACTION_AMT']]
print(pas_contrib_new.shape)
pas_contrib_new.head(5)

(110795, 7)


Unnamed: 0,CAND_ID,CAND_NAME,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT
0,P00003392,"CLINTON, HILLARY RODHAM",C00553966,24K,COM,2015-06-11,2700.0
1,P00003392,"CLINTON, HILLARY RODHAM",C00574228,24K,CCM,2015-04-16,500.0
2,P00003392,"CLINTON, HILLARY RODHAM",C00574228,24K,CCM,2015-05-06,2200.0
3,P00003392,"CLINTON, HILLARY RODHAM",C00540955,24K,CCM,2015-06-22,5000.0
4,P00003392,"CLINTON, HILLARY RODHAM",C00342048,24K,CCM,2015-05-05,5000.0


A new combined data frame will be created, including commites inside the pas_contrib_new

In [130]:
pas_contrib_new = pd.merge(pas_contrib_new, 
                           committes[['CMTE_ID', 'CMTE_NM']], 
                           on='CMTE_ID', 
                           how='inner')
print(pas_contrib_new.shape)
pas_contrib_new.head(5)

(110795, 8)


Unnamed: 0,CAND_ID,CAND_NAME,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM
0,P00003392,"CLINTON, HILLARY RODHAM",C00553966,24K,COM,2015-06-11,2700.0,"YELP, INC. POLITICAL ACTION COMMITTEE"
1,P00003392,"CLINTON, HILLARY RODHAM",C00574228,24K,CCM,2015-04-16,500.0,CINCINNATUS PAC
2,P00003392,"CLINTON, HILLARY RODHAM",C00574228,24K,CCM,2015-05-06,2200.0,CINCINNATUS PAC
3,P00003392,"CLINTON, HILLARY RODHAM",C00574228,24K,CCM,2016-03-10,2300.0,CINCINNATUS PAC
4,P00003392,"CLINTON, HILLARY RODHAM",C00540955,24K,CCM,2015-06-22,5000.0,FEARLESS PAC


In [131]:
individuals['TRANSACTION_TP'].unique()

array(['15', '15E', '22Y', '11', '15C', '24T', '24I', '10', '31', '20Y',
       '32', '30', '31T', '21Y', '32T', '19', '30T', '41T', '40Y'],
      dtype=object)

In [132]:
individuals_new = individuals[['CMTE_ID', 'TRANSACTION_TP', 'ENTITY_TP', 'TRANSACTION_DT', 'TRANSACTION_AMT']]

In [133]:
individuals_new = pd.merge(individuals_new, 
                           committes[['CMTE_ID', 'CMTE_NM', 'CAND_ID']], 
                           on='CMTE_ID', 
                           how='inner')
print(individuals_new.shape)
individuals_new.sample(5)

(20114561, 7)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM,CAND_ID
13692152,C00578401,15,IND,2016-06-28,500.0,FRIENDS OF ANNA THRONE-HOLST,H6NY01126
3420262,C00401224,24T,IND,2016-04-15,35.0,ACTBLUE,
2904390,C00401224,24T,IND,2015-02-28,50.0,ACTBLUE,
516742,C00574624,15,IND,2015-12-30,200.0,CRUZ FOR PRESIDENT,P60006111
13549497,C00108035,15,IND,2016-10-14,30.0,MCKESSON CORPORATION EMPLOYEES POLITICAL FUND,


* Remove some TRANSACTION_TPs irrelevant to the contributions regarding https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/

In [134]:
individuals_tr = individuals_new.loc[individuals_new['TRANSACTION_TP'].isin(['15', '15E', '11', '15C', '24T', '24I', '10', '32', '30', '32T',
                                                       '19', '30T'])].copy()
print(individuals_tr.shape)
individuals_tr.head(5)

(19878533, 7)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM,CAND_ID
0,C00088591,15,IND,2015-02-13,500.0,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
1,C00088591,15,IND,2015-02-13,200.0,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
2,C00088591,15,IND,2015-02-27,200.0,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
3,C00088591,15,IND,2015-02-13,200.0,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,
4,C00088591,15,IND,2015-02-27,200.0,EMPLOYEES OF NORTHROP GRUMMAN CORPORATION PAC,


* Keep this file with candidates, to eliminate our analysis to H.Clinton and D.Trump

In [135]:
individuals_tr = pd.merge(individuals_tr, 
                          candidates[['CAND_ID', 'CAND_NAME']], 
                          on='CAND_ID', 
                          how='inner')
print(individuals_tr.shape)
individuals_tr.sample(5)

(2629716, 8)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM,CAND_ID,CAND_NAME
44818,C00575795,15,IND,2015-09-21,500.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
256635,C00575795,15,IND,2016-10-28,1.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
1646995,C00575795,15,IND,2016-08-26,25.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
27517,C00575795,15,IND,2015-06-25,2700.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
291169,C00575795,15,IND,2016-10-21,25.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"


In [136]:
individuals_tr['CAND_NAME'].unique()

array(['CLINTON, HILLARY RODHAM', 'TRUMP, DONALD J.'], dtype=object)

After calculating all the above, we have to split them in different dataset regarding the candidate

Hillary Clinton's datasets

<h4>Group Clinton's pas contributions per month </h4>

In [137]:
pas_contrib_clinton = pas_contrib_new.loc[pas_contrib_new['CAND_NAME'].isin(['CLINTON, HILLARY RODHAM'])].copy()
print(pas_contrib_clinton.shape)
pas_contrib_clinton.sample(5)

(65326, 8)


Unnamed: 0,CAND_ID,CAND_NAME,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM
11278,P00003392,"CLINTON, HILLARY RODHAM",C90011156,24E,IND,2016-09-17,61.0,WORKING AMERICA
38431,P00003392,"CLINTON, HILLARY RODHAM",C90011156,24E,IND,2016-07-22,34.0,WORKING AMERICA
59445,P00003392,"CLINTON, HILLARY RODHAM",C90011156,24E,IND,2016-10-05,34.0,WORKING AMERICA
47681,P00003392,"CLINTON, HILLARY RODHAM",C90011156,24E,IND,2016-11-01,72.0,WORKING AMERICA
23425,P00003392,"CLINTON, HILLARY RODHAM",C90011156,24E,IND,2016-09-29,17.0,WORKING AMERICA


In [138]:
pas_contrib_clinton['TRANSACTION_DT'] = pd.to_datetime(pas_contrib_clinton['TRANSACTION_DT'], format='%m%d%Y')
pas_contrib_clinton_group = pas_contrib_clinton.groupby([(pas_contrib_clinton['TRANSACTION_DT'].dt.to_period('M'))]).sum()
pas_contrib_clinton_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-01,661.0
2015-02,250.0
2015-03,42800.0
2015-04,172085.0
2015-05,333944.0
2015-06,1174967.0
2015-07,492504.0
2015-08,518764.0
2015-09,262914.0
2015-10,1124508.0


* Remove the dates that are not related to period 2015-2016

In [139]:
pas_contrib_clinton_group[:-3]
pas_contrib_clinton_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-01,661.0
2015-02,250.0
2015-03,42800.0
2015-04,172085.0
2015-05,333944.0
2015-06,1174967.0
2015-07,492504.0
2015-08,518764.0
2015-09,262914.0
2015-10,1124508.0


<h4>Group Clinton's indivinduals contributions per month </h4>

In [140]:
individuals_tr_clinton = individuals_tr.loc[individuals_tr['CAND_NAME'].isin(['CLINTON, HILLARY RODHAM'])].copy()
print(individuals_tr_clinton.shape)
individuals_tr_clinton.sample(5)

(2483565, 8)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM,CAND_ID,CAND_NAME
463320,C00575795,15,IND,2016-10-22,10.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
1296340,C00575795,15,IND,2016-10-08,37.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
2199339,C00575795,15,IND,2016-04-19,25.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
9689,C00575795,15,IND,2015-06-23,121.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"
1831696,C00575795,15,IND,2016-06-28,25.0,HILLARY FOR AMERICA,P00003392,"CLINTON, HILLARY RODHAM"


In [141]:
individuals_tr_clinton['TRANSACTION_DT'] = pd.to_datetime(individuals_tr_clinton['TRANSACTION_DT'], format='%m%d%Y')
individuals_tr_clinton_group = individuals_tr_clinton.groupby([(individuals_tr_clinton['TRANSACTION_DT'].dt.to_period('M'))]).sum()
individuals_tr_clinton_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-04,13358043.0
2015-05,11946341.0
2015-06,13831957.0
2015-07,6992381.0
2015-08,5812265.0
2015-09,9696825.0
2015-10,7766163.0
2015-11,8747813.0
2015-12,8306308.0
2016-01,9094081.0


<h4>Group Clinton's opperating expenditures per month </h4>

In [142]:
opper_exp_clinton = opper_exp.loc[opper_exp['CAND_NAME'].isin(['CLINTON, HILLARY RODHAM'])].copy()
print(opper_exp_clinton.shape)
opper_exp_clinton.sample(5)

(141664, 9)


Unnamed: 0,CMTE_ID,NAME,TRANSACTION_DT,TRANSACTION_AMT,PURPOSE,CMTE_NM,STATE,CAND_ID,CAND_NAME
33474,C00575795,"BLANDIN, NEISHA",2015-05-15,1726.58,PAYROLL,HILLARY FOR AMERICA,NY,P00003392,"CLINTON, HILLARY RODHAM"
60606,C00575795,"PAUL, MATTHEW",2016-08-15,3815.91,PAYROLL,HILLARY FOR AMERICA,NY,P00003392,"CLINTON, HILLARY RODHAM"
126292,C00575795,UNITED AIRLINES,2016-06-09,470.25,TRAVEL,HILLARY FOR AMERICA,IL,P00003392,"CLINTON, HILLARY RODHAM"
39842,C00575795,COACH USA,2015-06-30,34.25,TRAVEL,HILLARY FOR AMERICA,NY,P00003392,"CLINTON, HILLARY RODHAM"
42171,C00575795,"SOWAH, HELENA-JASMINE",2015-09-15,1441.03,PAYROLL,HILLARY FOR AMERICA,NY,P00003392,"CLINTON, HILLARY RODHAM"


In [143]:
opper_exp_clinton['TRANSACTION_DT'] = pd.to_datetime(opper_exp_clinton['TRANSACTION_DT'], format='%m%d%Y')
opper_exp_clinton_group = opper_exp_clinton.groupby([(opper_exp_clinton['TRANSACTION_DT'].dt.to_period('M'))]).sum()
opper_exp_clinton_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-01,41059.4
2015-02,38707.07
2015-03,93299.7
2015-04,3228321.12
2015-05,7261633.18
2015-06,9158569.12
2015-07,9466236.63
2015-08,9270133.15
2015-09,7789361.79
2015-10,13231268.04


  * Donald Trump's datasets

<h4>Group Trump's pas contributions per month </h4>

In [144]:
pas_contrib_trump = pas_contrib_new.loc[pas_contrib_new['CAND_NAME'].isin(['TRUMP, DONALD J.'])].copy()
print(pas_contrib_trump.shape)
pas_contrib_trump.sample(5)

(45469, 8)


Unnamed: 0,CAND_ID,CAND_NAME,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM
91411,P80001571,"TRUMP, DONALD J.",C90011156,24A,IND,2016-10-25,22.0,WORKING AMERICA
93581,P80001571,"TRUMP, DONALD J.",C90011156,24A,IND,2016-11-01,17.0,WORKING AMERICA
5074,P80001571,"TRUMP, DONALD J.",C00508440,24A,ORG,2016-10-30,8932.0,HUMAN RIGHTS CAMPAIGN EQUALITY VOTES
103920,P80001571,"TRUMP, DONALD J.",C90011156,24A,ORG,2016-10-05,11.0,WORKING AMERICA
110033,P80001571,"TRUMP, DONALD J.",C00521013,24A,IND,2016-04-29,225.0,FLORIDA FREEDOM PAC


In [145]:
pas_contrib_trump['TRANSACTION_DT'] = pd.to_datetime(pas_contrib_trump['TRANSACTION_DT'], format='%m%d%Y')
pas_contrib_trump_group = pas_contrib_trump.groupby([(pas_contrib_trump['TRANSACTION_DT'].dt.to_period('M'))]).sum()
pas_contrib_trump_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-06,272.0
2015-07,27300.0
2015-08,38830.0
2015-09,1153912.0
2015-10,192606.0
2015-11,148296.0
2015-12,161409.0
2016-01,3671641.0
2016-02,8494964.0
2016-03,19025410.0


<h4>Group Trump's indivinduals contributions per month </h4>

In [146]:
individuals_tr_trump = individuals_tr.loc[individuals_tr['CAND_NAME'].isin(['TRUMP, DONALD J.'])].copy()
print(individuals_tr_trump.shape)
individuals_tr_trump.sample(5)

(146151, 8)


Unnamed: 0,CMTE_ID,TRANSACTION_TP,ENTITY_TP,TRANSACTION_DT,TRANSACTION_AMT,CMTE_NM,CAND_ID,CAND_NAME
2509252,C00580100,15,IND,2016-06-22,250.0,"DONALD J. TRUMP FOR PRESIDENT, INC.",P80001571,"TRUMP, DONALD J."
2581100,C00580100,15,IND,2016-10-27,500.0,"DONALD J. TRUMP FOR PRESIDENT, INC.",P80001571,"TRUMP, DONALD J."
2492468,C00580100,15,IND,2016-06-22,250.0,"DONALD J. TRUMP FOR PRESIDENT, INC.",P80001571,"TRUMP, DONALD J."
2562393,C00580100,15,IND,2016-08-14,250.0,"DONALD J. TRUMP FOR PRESIDENT, INC.",P80001571,"TRUMP, DONALD J."
2604727,C00580100,15,IND,2016-07-28,200.0,"DONALD J. TRUMP FOR PRESIDENT, INC.",P80001571,"TRUMP, DONALD J."


In [147]:
individuals_tr_trump['TRANSACTION_DT'] = pd.to_datetime(individuals_tr_trump['TRANSACTION_DT'], format='%m%d%Y')
individuals_tr_trump_group = individuals_tr_trump.groupby([(individuals_tr_trump['TRANSACTION_DT'].dt.to_period('M'))]).sum()
individuals_tr_trump_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-06,65224.0
2015-07,264742.0
2015-08,630093.0
2015-09,259451.0
2015-10,234268.0
2015-11,160560.0
2015-12,284979.0
2016-01,268397.0
2016-02,492956.0
2016-03,735254.0


<h4>Group Trump's opperating expenditures per month </h4>

In [148]:
opper_exp_trump = opper_exp.loc[opper_exp['CAND_NAME'].isin(['TRUMP, DONALD J.'])].copy()
print(opper_exp_trump.shape)
opper_exp_trump.sample(5)

(30202, 9)


Unnamed: 0,CMTE_ID,NAME,TRANSACTION_DT,TRANSACTION_AMT,PURPOSE,CMTE_NM,STATE,CAND_ID,CAND_NAME
18640,C00580100,HOTELS.COM,2016-08-28,1274.4,TRAVEL: LODGING [AMEX: SB23.4584],"DONALD J. TRUMP FOR PRESIDENT, INC.",TX,P80001571,"TRUMP, DONALD J."
3008,C00580100,UBER,2015-07-31,51.98,TRAVEL: GROUND TRANSPORTATION [AMEX: SB23.6877],"DONALD J. TRUMP FOR PRESIDENT, INC.",CA,P80001571,"TRUMP, DONALD J."
1969,C00580100,UBER,2015-10-02,71.15,TRAVEL: GROUND TRANSPORTATION [AMEX: SB23.248391],"DONALD J. TRUMP FOR PRESIDENT, INC.",CA,P80001571,"TRUMP, DONALD J."
3575,C00580100,USPS,2015-09-03,21.66,DELIVERY SERVICES [MCENTEE: SB23.6733],"DONALD J. TRUMP FOR PRESIDENT, INC.",DC,P80001571,"TRUMP, DONALD J."
27668,C00580100,"PLAYFORTH, TAYLOR",2016-11-21,2666.38,TRAVEL EXPENSE REIMBURSEMENT: ITEMIZATION BELO...,"DONALD J. TRUMP FOR PRESIDENT, INC.",NY,P80001571,"TRUMP, DONALD J."


In [149]:
opper_exp_trump['TRANSACTION_DT'] = pd.to_datetime(opper_exp_trump['TRANSACTION_DT'], format='%m%d%Y')
opper_exp_trump_group = opper_exp_trump.groupby([(opper_exp_trump['TRANSACTION_DT'].dt.to_period('M'))]).sum()
opper_exp_trump_group

Unnamed: 0_level_0,TRANSACTION_AMT
TRANSACTION_DT,Unnamed: 1_level_1
2015-04,197738.32
2015-05,358715.09
2015-06,870798.78
2015-07,763324.82
2015-08,1850246.51
2015-09,1845830.99
2015-10,1691236.99
2015-11,2620480.95
2015-12,3076835.53
2016-01,12105750.13


---
<h3>4. Identify the biggest recipients of campaign expenditures</h3>

---

Make sure before implementations, that file opperating expenditures is filtered for candidates H.Clinton and D.Trump

In [150]:
opper_exp['CAND_NAME'].unique()

array(['TRUMP, DONALD J.', 'CLINTON, HILLARY RODHAM'], dtype=object)

Regarding the documentation https://www.fec.gov/campaign-finance-data/operating-expenditures-file-description/, we will group the data by candidate's name and name of Contributor/Lender/Transfer Name

</h4> The biggest recipients of campaign expenditures for Hillary Clinton</h4>

In [155]:
recipients_clinton = opper_exp[opper_exp['CAND_NAME'] == 'CLINTON, HILLARY RODHAM'].groupby(['CAND_NAME', 'NAME']).sum().sort_values(by='TRANSACTION_AMT', ascending=False)
recipients_clinton.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,TRANSACTION_AMT
CAND_NAME,NAME,Unnamed: 2_level_1
"CLINTON, HILLARY RODHAM",GMMB,308463389.36
"CLINTON, HILLARY RODHAM",BULLY PULPIT INTERACTIVE LLC,33306462.95
"CLINTON, HILLARY RODHAM",ADP,27902805.45
"CLINTON, HILLARY RODHAM",AMERICAN EXPRESS,23977194.86
"CLINTON, HILLARY RODHAM",MARKHAM PRODUCTIONS,16025851.54
"CLINTON, HILLARY RODHAM",EXECUTIVE FLITEWAYS,15866289.31
"CLINTON, HILLARY RODHAM","AIR PARTNERS, INC.",11156326.38
"CLINTON, HILLARY RODHAM",STRIPE,10199840.66
"CLINTON, HILLARY RODHAM",AETNA,5883908.5
"CLINTON, HILLARY RODHAM","MISSION CONTROL, INC.",5837599.86


</h4> The biggest recipients of campaign expenditures for Donald Trump</h4>

In [156]:
recipients_trump = opper_exp[opper_exp['CAND_NAME'] == 'TRUMP, DONALD J.'].groupby(['CAND_NAME', 'NAME']).sum().sort_values(by='TRANSACTION_AMT', ascending=False)
recipients_trump.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,TRANSACTION_AMT
CAND_NAME,NAME,Unnamed: 2_level_1
"TRUMP, DONALD J.",GILES-PARSCALE,87838378.1
"TRUMP, DONALD J.",AMERICAN MEDIA & ADVOCACY GROUP,74176379.3
"TRUMP, DONALD J.","RICK REED MEDIA, INC.",24359873.08
"TRUMP, DONALD J.",AMERICAN EXPRESS,19393367.83
"TRUMP, DONALD J.","ACE SPECIALTIES, LLC",15885323.63
"TRUMP, DONALD J.","PRIVATE JET SERVICES, LLC",9953216.06
"TRUMP, DONALD J.",JAMESTOWN ASSOCIATES,8836175.01
"TRUMP, DONALD J.","TAG AIR, INC.",8741464.25
"TRUMP, DONALD J.",AIR CHARTER TEAM,8321403.26
"TRUMP, DONALD J.","CAMBRIDGE ANALYTICA, LLC",5912500.0


---
<h3>5. Examine the geographical distribution, at the state level, of campaign expenditures. For each state, calculate the expenditures per voter. This will require that you find a source with the number of registered voters per state. Examine the situation for swing states.</h3>

---

* We will attempt to make a map of the US showing the donations by candidate per state.
* From file opperating expenditures group the data by candidate's name and state


In [157]:
states_group = opper_exp.groupby(['CAND_NAME', 'STATE'])
states_totals = states_group.sum().unstack(0).fillna(0)
states_totals.head(5)

Unnamed: 0_level_0,TRANSACTION_AMT,TRANSACTION_AMT
CAND_NAME,"CLINTON, HILLARY RODHAM","TRUMP, DONALD J."
STATE,Unnamed: 1_level_2,Unnamed: 2_level_2
AK,27981.93,20551.37
AL,2591329.75,480833.44
AR,16375895.98,170702.77
AZ,231822.41,3390953.43
BC,4432.96,0.0


* The data will be normalized, so that we get the percentages.

In [158]:
states_norm = states_totals.div(states_totals.sum(axis=1), axis=0)
states_norm.head(5)

Unnamed: 0_level_0,TRANSACTION_AMT,TRANSACTION_AMT
CAND_NAME,"CLINTON, HILLARY RODHAM","TRUMP, DONALD J."
STATE,Unnamed: 1_level_2,Unnamed: 2_level_2
AK,0.58,0.42
AL,0.84,0.16
AR,0.99,0.01
AZ,0.06,0.94
BC,1.0,0.0


* We will use a dictionary with US states from https://gist.github.com/mGalarnyk/27c99c5f55133a2ceb42b8c9a450d794

In [174]:
states = {
    'Alaska': 'AK',
    'Alabama': 'AL',
    'Arkansas': 'AR',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'District of Columbia': 'DC',
    'Delaware': 'DE',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Iowa': 'IA',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Massachusetts': 'MA',
    'Maryland': 'MD',
    'Maine': 'ME',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Missouri': 'MO',
    'Northern Mariana Islands': 'MP',
    'Mississippi': 'MS',
    'Montana': 'MT',
    'National': 'NA',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Nebraska': 'NE',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'Nevada': 'NV',
    'New York': 'NY',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Virginia': 'VA',
    'Virgin Islands': 'VI',
    'Vermont': 'VT',
    'Washington': 'WA',
    'Wisconsin': 'WI',
    'West Virginia': 'WV',
    'Wyoming': 'WY',
    'Virgin Islands': 'VI'
}

Get the voters from the https://www.fec.gov/documents/1890/federalelections2016.xlsx

In [212]:
election_results = pd.read_excel(
    'https://www.fec.gov/documents/1890/federalelections2016.xlsx',
    sheet_name='Table 2. Electoral &  Pop Vote',
    skiprows=2,
    skipfooter=7,
    header=[0, 1])

In [213]:
print(election_results.shape)
election_results.head()

(51, 7)


Unnamed: 0_level_0,STATE,ELECTORAL VOTE,ELECTORAL VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE
Unnamed: 0_level_1,Unnamed: 0_level_1,Trump (R),Clinton (D),Trump (R),Clinton (D),All Others,Total Vote
0,AL,9.0,,1318255,729547,75570,2123372
1,AK,3.0,,163387,116454,38767,318608
2,AZ,11.0,,1252401,1161167,159597,2573165
3,AR,6.0,,684872,380494,65310,1130676
4,CA,,55.0,4483814,8753792,943998,14181604


In [214]:
second_col_level = sorted(set(election_results.columns.get_level_values(level=1).tolist()))
print(second_col_level)
second_col_level[-1] = 'STATE'
print(second_col_level)
election_results.columns.set_levels(
    second_col_level,
    inplace=True,
    level=1)

['All Others', 'Clinton (D)', 'Total Vote', 'Trump (R)', 'Unnamed: 0_level_1']
['All Others', 'Clinton (D)', 'Total Vote', 'Trump (R)', 'STATE']


In [215]:
election_results.head()

Unnamed: 0_level_0,STATE,ELECTORAL VOTE,ELECTORAL VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE
Unnamed: 0_level_1,STATE,Trump (R),Clinton (D),Trump (R),Clinton (D),All Others,Total Vote
0,AL,9.0,,1318255,729547,75570,2123372
1,AK,3.0,,163387,116454,38767,318608
2,AZ,11.0,,1252401,1161167,159597,2573165
3,AR,6.0,,684872,380494,65310,1130676
4,CA,,55.0,4483814,8753792,943998,14181604


In [216]:
election_results['Winning Party'] = np.where(election_results[('ELECTORAL VOTE', 'Trump (R)')].isna(),'blue','red')
election_results.head()

Unnamed: 0_level_0,STATE,ELECTORAL VOTE,ELECTORAL VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE,POPULAR VOTE,Winning Party
Unnamed: 0_level_1,STATE,Trump (R),Clinton (D),Trump (R),Clinton (D),All Others,Total Vote,Unnamed: 8_level_1
0,AL,9.0,,1318255,729547,75570,2123372,red
1,AK,3.0,,163387,116454,38767,318608,red
2,AZ,11.0,,1252401,1161167,159597,2573165,red
3,AR,6.0,,684872,380494,65310,1130676,red
4,CA,,55.0,4483814,8753792,943998,14181604,blue


We should adjust the column 'Total Vote' per candidate at the dataframe opperating expenditures.