# Analysis of the 2016 Campaign Contribution Behavior

There were two seminal news events in the last stages of the US 2016 Presidential Election: the release of the Access Hollywood tape that showed Donald Trump make inredibly lewd comments about women and the letter that FBI Director James Comey sent to Congress two weeks before the election which explained that the investigation into Hilary Clinton's emails was reopened. The investigation was reopened because of content that was found on Anthony Weiner's laptop during a different investigation. The letter was revoked a week later because the investigation turned up nothing new on Clinton who had been under investigation for the use of a privae e-mail server during her time as Secretary of State. 

These news events were massively covered by virtually every media outlet. In this analysis of small donation behavior, I am going to explore whether the events changed the behavior of Clinton and Trump supporters in regards to their campaign contributions. The analysis focuses on the ten most important battleground states where the election outcome was the closest: 

Arizona
Florida
Maine
Minnesota
Michigan
North Carolina
New Hampshire
Nevada
Pennsylvania
Wisonsin 

(https://www.usnews.com/news/the-run-2016/articles/2016-11-14/the-10-closest-states-in-the-2016-election) 

The data used is the offical data provded by the Federal Election Commission. 

## Import and Cleaning of Data

First, all the data is imported and formatted to make the data malleable. All donations after the 8th of November are disregarded. The data is also reduced to two weeks, the one before and the one after the seminal events (Access Hollywood tape was released on Oct. 7, Comey letter was released on Oct. 28). 


In [None]:
import pandas as pd
from xlsxwriter.utility import xl_rowcol_to_cell
import datetime


def clean_date(df):
    df["contb_receipt_dt"]=pd.to_datetime(df["contb_receipt_dt"])
    df=df.loc[df["contb_receipt_dt"]<"2016-11-09"]
    df=df.loc[df["contb_receipt_amt"]>0]
    return df

def before_and_after(df,date,range):
    df=df.loc[df["contb_receipt_dt"]>=date-range]
    df=df.loc[df["contb_receipt_dt"]<=date+range]
    df=df.sort_values(["contb_receipt_dt"],ascending=False)
    return df


states=["AZ","FL","ME","MI","MN","NC","NH","NV","PA","WI"]

for state in states:
    filename = "C:\Users\malte\OneDrive\Coding\Exercises\Intro to inferential statistics\Final Project\Trump Raw Data\P80001571-"+state+".csv"
    df = pd.read_csv(filename,low_memory=False)
    df=clean_date(df)
    writer = pd.ExcelWriter(state+"-formatted.xlsx", engine='xlsxwriter')
    before_and_after(df,datetime.date(2016, 10, 07),pd.Timedelta(days=7)).to_excel(writer, sheet_name='Sheet1')
    writer.save()

# Analysis

## Top Contributors

For the first part, the top five types of contributors will be anaylzed. Contributors have to identify themselves by occupation when they make a donation. 

Top Trump Supporters

In [1]:
import pandas as pd

states=["AZ","FL","ME","MI","MN","NC","NH","NV","PA","WI"]

def top_contributors(df):
    contributors = pd.DataFrame(df.groupby("contbr_occupation")["contb_receipt_amt"].sum())
    print "Top 5 Contributors in " + str(df["contbr_st"].iloc[0])
    print ""
    print contributors.sort_values(["contb_receipt_amt"],ascending=False).iloc[:5]
    print ""

for state in states:
    filename = "C:/Users/malte/OneDrive/Coding/Exercises/Intro to inferential statistics/Final Project/Trump Raw/P80001571-"+state+".csv"
    df = pd.read_csv(filename,low_memory=False)
    top_contributors(df)



Top 5 Contributors in AZ

                       contb_receipt_amt
contbr_occupation                       
RETIRED                       1241838.66
INFORMATION REQUESTED          381379.36
CEO                             77675.38
SELF-EMPLOYED                   76342.32
PHYSICIAN                       50566.80

Top 5 Contributors in FL

                       contb_receipt_amt
contbr_occupation                       
RETIRED                       4898422.33
INFORMATION REQUESTED         1752951.93
CEO                            299010.35
PHYSICIAN                      243284.49
BUSINESS OWNER                 230225.11

Top 5 Contributors in ME

                       contb_receipt_amt
contbr_occupation                       
RETIRED                         87278.34
INFORMATION REQUESTED           40087.37
SELF-EMPLOYED                   12122.67
OWNER                            6749.45
BUSINESS OWNER                   5462.40

Top 5 Contributors in MI

                       contb_rec

Top Clinton Supporters 

In [2]:
states=["AZ","FL","ME","MI","MN","NC","NH","NV","PA","WI"]

def top_contributors(df):
    contributors = pd.DataFrame(df.groupby("contbr_occupation")["contb_receipt_amt"].sum())
    print "Top 5 Contributors in " + str(df["contbr_st"].iloc[0])
    print ""
    print contributors.sort_values(["contb_receipt_amt"],ascending=False).iloc[:5]
    print ""

for state in states:
    filename = "C:/Users/malte/OneDrive/Coding/Exercises/Intro to inferential statistics/Final Project/Clinton Raw/P00003392-"+state+".csv"
    df = pd.read_csv(filename,low_memory=False)
    top_contributors(df)



Top 5 Contributors in AZ

                       contb_receipt_amt
contbr_occupation                       
RETIRED                       1268217.76
ATTORNEY                       236954.40
INFORMATION REQUESTED          214587.69
PHYSICIAN                      205846.47
CONSULTANT                     105827.57

Top 5 Contributors in FL

                       contb_receipt_amt
contbr_occupation                       
RETIRED                       5131147.54
ATTORNEY                      2319643.63
INFORMATION REQUESTED         1039447.73
HOMEMAKER                      648414.07
PHYSICIAN                      567061.03

Top 5 Contributors in ME

                       contb_receipt_amt
contbr_occupation                       
RETIRED                        448219.58
ATTORNEY                       104623.27
INFORMATION REQUESTED           90871.08
CONSULTANT                      57386.62
LAWYER                          49164.20

Top 5 Contributors in MI

                       contb_rec

Both Clinton and Trump received their largest share of campaign donations from retirees. Trump seems to have drawn more donations from the business sector whereas Clinton attracted more donations from physicians and lawyers. 

# Method of Comparison

In order to compare the effect of the two significant events on the donor behavior, I will use the statistical method of chi-square. This will determine whether the difference in donations is statistically significant. Our level of statistical significance is alpha=0.05 for a two-tailored test. That means that the change we observe will only be deemed signifcant if its likeliness to occur is less than 5%, by random sampling. 

Our chi-square critical value is **+/- 19.023**. Our statistical test needs to produce a value higher than the positive critical value or lower than the negative one in order to be regarded as significant.

# Impact of the Access Hollywood Tape


In order to analyze the impact of the Access Hollywood tape on Trump donors, we can compare the mean amount of donations between the week prior and the week after the tape was leaked.

In [8]:
import pandas as pd
import datetime

def before_and_after(df):
    df_before=df.loc[df["contb_receipt_dt"]>=datetime.date(2016, 10, 07)]
    df_after=df.loc[df["contb_receipt_dt"]<datetime.date(2016, 10, 07)]
    return df_before["contb_receipt_amt"].mean(),df_after["contb_receipt_amt"].mean()

states=["AZ","FL","ME","MI","MN","NC","NH","NV","PA","WI"]

c=0
df_chi=pd.DataFrame([],states,columns=["before","after"])

for state in states:
    filename = "C:/Users/malte/OneDrive/Coding/Exercises/Intro to inferential statistics/Final Project/Trump Formatted/"+state+"-formatted.xlsx"
    df = pd.read_excel(filename)
    before,after=before_and_after(df)
    df_chi.set_value(state,"before",before)
    df_chi.set_value(state,"after",after)
    c+=1

print df_chi

     before    after
AZ  321.325  280.069
FL   357.49  393.551
ME  319.502  285.543
MI  338.477  543.811
MN  322.686  282.707
NC  283.412  297.383
NH  279.078  335.794
NV  320.641  387.157
PA   268.71   417.39
WI  335.184   441.75


Now we will compute our chi-square value. 

In [10]:
se=df_chi["before"]
s2=df_chi["after"]
n=20
df=len(se)-1

new_s = ((s2-se)**2)/se

chi_squared = new_s.sum()

print chi_squared


284.220686669


Our chi-square level of **284.22** is much higher than our critical value at **19.023**. We can conclude that the Access Hollywood tape  made a real difference in donation behavior for Donald Trump. It significantly increased contributions in these ten crticial states. 


# Impact of the Comey Letter

We will now do the same for the Clinton campaign. First we will compute the mean amounts before and after the letter was released.

In [13]:
def before_and_after(df):
    df_before=df.loc[df["contb_receipt_dt"]>=datetime.date(2016, 10, 28)]
    df_after=df.loc[df["contb_receipt_dt"]<datetime.date(2016, 10, 28)]
    return df_before["contb_receipt_amt"].mean(),df_after["contb_receipt_amt"].mean()

states=["AZ","FL","ME","MI","MN","NC","NH","NV","PA","WI"]

c=0
df_chi=pd.DataFrame([],states,columns=["before","after"])

for state in states:
    filename = "C:/Users/malte/OneDrive/Coding/Exercises/Intro to inferential statistics/Final Project/Clinton Formatted/"+state+"-formatted.xlsx"
    df = pd.read_excel(filename)
    before,after=before_and_after(df)
    df_chi.set_value(state,"before",before)
    df_chi.set_value(state,"after",after)
    c+=1

print df_chi

     before    after
AZ  58.1687  44.1232
FL  63.1427  63.6861
ME  61.5934   49.975
MI   63.206  63.3837
MN  62.0407  60.8012
NC  66.6862  61.6529
NH  65.1342  61.8674
NV  58.6889  45.4493
PA  66.7969  59.5103
WI  63.8634  53.4364


Now we will compute our chi-square value.

In [14]:
se=df_chi["before"]
s2=df_chi["after"]
n=20
df=len(se)-1

new_s = ((s2-se)**2)/se

chi_squared = new_s.sum()

print chi_squared

11.6407169835


This time, our chi-square value of **11.64** is not significant. Donor behavior did not change to a statistically relevant degree after the letter was revealed. 

## Conclusion

Though this analysis is very limited in scope, it does reveal one of the reasons for the outcome of the 2016 US Presidential election: Hilary Clinton lacked base support. 

Not only did the Trump campaign manage to attract more small donations from individuals (observed in the difference between donation means between Clinton and Trump), more importantly its donors responded directly to their candidate when he needed it. To be clear: instead of backing away from their candidate after they had proof that he sexually harassed women, they instead supported him even more. Though morally reprehensible, it does show that the Trump campaign managed to accumulate a more passionate base support than the Clinton campaign did, which is one of the main factors that eventually cost her the electoral college win. 