# Out-of-State-Contributions: Candidates Analysis

How much out-of-state money are candidates raising in the 2018 election cycle, in absolute and proportional terms, thus far and how does that compare with this point in the 2014 and 2010 cycles?

In [1]:
from functools import reduce
import numpy as np
import pandas as pd

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 500)
pd.options.display.float_format = "{:,.2f}".format # Format floats

Import contributions data.

In [2]:
contributions = pd.read_csv("data/contributions.csv")
contributions.info()

  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6947770 entries, 0 to 6947769
Data columns (total 23 columns):
candidate                 object
candidate_id              int64
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      object
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           float64
in_out_state              object
no_veto                   object
office                    object
last_day                  object
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: float64(2), int64(2), object(19)
memory usage: 1.2+ GB


Convert the contribution date and latest month columns to datetime data type.

In [3]:
contributions["date"] = pd.to_datetime(contributions["date"], errors="coerce")
contributions["last_day"] = pd.to_datetime(contributions["last_day"], errors="coerce")
contributions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6947770 entries, 0 to 6947769
Data columns (total 23 columns):
candidate                 object
candidate_id              int64
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           float64
in_out_state              object
no_veto                   object
office                    object
last_day                  datetime64[ns]
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: datetime64[ns](2), float64(2), int64(2), object(17)
memory usage: 1.2+ GB


## Calculate out-of-state contributions by candidate

Filter the data by election cycle.

In [4]:
contributions_18 = contributions[contributions["year"] == 2018]
contributions_14 = contributions[contributions["year"] == 2014]
contributions_10 = contributions[contributions["year"] == 2010]

Group by candidate, state, year, office status, redistricting role (for 2018 cycle data) and in-vs.-out-of-state contribution status and sum contributions.

In [5]:
contributions_by_candidate_18 = contributions_18.groupby(["candidate", "state", "year", "standardized_office", "standardized_status", "redistricting_role", "in_out_state"])["amount"].sum().reset_index()
contributions_by_candidate_18.rename(columns={"amount": "amount_18"}, inplace=True)
contributions_by_candidate_14 = contributions_14.groupby(["candidate", "state", "year", "standardized_office", "standardized_status", "in_out_state"])["amount"].sum().reset_index()
contributions_by_candidate_14.rename(columns={"amount": "amount_14"}, inplace=True)
contributions_by_candidate_10 = contributions_10.groupby(["candidate", "state", "year", "standardized_office", "standardized_status", "in_out_state"])["amount"].sum().reset_index()
contributions_by_candidate_10.rename(columns={"amount": "amount_10"}, inplace=True)

Pivot the dataframes to aggregate each candidate's data in a single row.

In [6]:
contributions_by_candidate_18 = pd.pivot_table(contributions_by_candidate_18, index=["candidate", "state", "year", "standardized_office", "standardized_status", "redistricting_role"], columns=["in_out_state"]).reset_index()
contributions_by_candidate_14 = pd.pivot_table(contributions_by_candidate_14, index=["candidate", "state", "year", "standardized_office", "standardized_status"], columns=["in_out_state"]).reset_index()
contributions_by_candidate_10 = pd.pivot_table(contributions_by_candidate_10, index=["candidate", "state", "year", "standardized_office", "standardized_status"], columns=["in_out_state"]).reset_index()

Some records have no contributions for certain categories. Let's set those values equal to zero to be sure any calculations we run on them are correct.

In [7]:
contributions_by_candidate_18.fillna(0, inplace=True)
contributions_by_candidate_14.fillna(0, inplace=True)
contributions_by_candidate_10.fillna(0, inplace=True)

Flatten the resulting dataframes' multi-index columns.

In [8]:
contributions_by_candidate_18.columns = ["_".join(column).replace("-","_").strip("_") for column in contributions_by_candidate_18.columns.values]
contributions_by_candidate_18.rename(columns={"standardized_office": "standardized_office_18"}, inplace=True)
contributions_by_candidate_14.columns = ["_".join(column).replace("-","_").strip("_") for column in contributions_by_candidate_14.columns.values]
contributions_by_candidate_14.rename(columns={"standardized_office": "standardized_office_14"}, inplace=True)
contributions_by_candidate_10.columns = ["_".join(column).replace("-","_").strip("_") for column in contributions_by_candidate_10.columns.values]
contributions_by_candidate_10.rename(columns={"standardized_office": "standardized_office_10"}, inplace=True)

Calculate the proportion of in-state, out-of-state and unknown contributions.

In [9]:
contributions_by_candidate_18["pct_18_in_state"] = contributions_by_candidate_18["amount_18_in_state"] / (contributions_by_candidate_18["amount_18_in_state"] + contributions_by_candidate_18["amount_18_out_of_state"] + contributions_by_candidate_18["amount_18_unknown"])
contributions_by_candidate_18["pct_18_out_of_state"] = contributions_by_candidate_18["amount_18_out_of_state"] / (contributions_by_candidate_18["amount_18_in_state"] + contributions_by_candidate_18["amount_18_out_of_state"] + contributions_by_candidate_18["amount_18_unknown"])
contributions_by_candidate_18["pct_18_unknown"] = contributions_by_candidate_18["amount_18_unknown"] / (contributions_by_candidate_18["amount_18_in_state"] + contributions_by_candidate_18["amount_18_out_of_state"] + contributions_by_candidate_18["amount_18_unknown"])
contributions_by_candidate_14["pct_14_in_state"] = contributions_by_candidate_14["amount_14_in_state"] / (contributions_by_candidate_14["amount_14_in_state"] + contributions_by_candidate_14["amount_14_out_of_state"] + contributions_by_candidate_14["amount_14_unknown"])
contributions_by_candidate_14["pct_14_out_of_state"] = contributions_by_candidate_14["amount_14_out_of_state"] / (contributions_by_candidate_14["amount_14_in_state"] + contributions_by_candidate_14["amount_14_out_of_state"] + contributions_by_candidate_14["amount_14_unknown"])
contributions_by_candidate_14["pct_14_unknown"] = contributions_by_candidate_14["amount_14_unknown"] / (contributions_by_candidate_14["amount_14_in_state"] + contributions_by_candidate_14["amount_14_out_of_state"] + contributions_by_candidate_14["amount_14_unknown"])
contributions_by_candidate_10["pct_10_in_state"] = contributions_by_candidate_10["amount_10_in_state"] / (contributions_by_candidate_10["amount_10_in_state"] + contributions_by_candidate_10["amount_10_out_of_state"] + contributions_by_candidate_10["amount_10_unknown"])
contributions_by_candidate_10["pct_10_out_of_state"] = contributions_by_candidate_10["amount_10_out_of_state"] / (contributions_by_candidate_10["amount_10_in_state"] + contributions_by_candidate_10["amount_10_out_of_state"] + contributions_by_candidate_10["amount_10_unknown"])
contributions_by_candidate_10["pct_10_unknown"] = contributions_by_candidate_10["amount_10_unknown"] / (contributions_by_candidate_10["amount_10_in_state"] + contributions_by_candidate_10["amount_10_out_of_state"] + contributions_by_candidate_10["amount_10_unknown"])

Join the 2018, 2014 and 2010 contributions by candidate data

In [10]:
list_of_contributions_by_candidate = [contributions_by_candidate_18, contributions_by_candidate_14, contributions_by_candidate_10]
contributions_by_candidate = reduce(lambda left, right: pd.merge(left, right, on=["candidate", "state"], how="outer"), list_of_contributions_by_candidate)
contributions_by_candidate.drop(["year_x", "year_y", "year"], axis=1, inplace=True)
contributions_by_candidate.rename(columns={"standardized_status_x": "standardized_status_18", "standardized_status_y": "standardized_status_14"}, inplace=True)
contributions_by_candidate["amount_18_total"] = contributions_by_candidate["amount_18_in_state"] + contributions_by_candidate["amount_18_out_of_state"] + contributions_by_candidate["amount_18_unknown"]
contributions_by_candidate["amount_14_total"] = contributions_by_candidate["amount_14_in_state"] + contributions_by_candidate["amount_14_out_of_state"] + contributions_by_candidate["amount_14_unknown"]
contributions_by_candidate["amount_10_total"] = contributions_by_candidate["amount_10_in_state"] = contributions_by_candidate["amount_10_out_of_state"] + contributions_by_candidate["amount_10_unknown"]
contributions_by_candidate.head()

Unnamed: 0,candidate,state,standardized_office_18,standardized_status_18,redistricting_role,amount_18_in_state,amount_18_out_of_state,amount_18_unknown,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,standardized_office_14,standardized_status_14,amount_14_in_state,amount_14_out_of_state,amount_14_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,standardized_office_10,standardized_status,amount_10_in_state,amount_10_out_of_state,amount_10_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown,amount_18_total,amount_14_total,amount_10_total
0,"ABBATE, PETER",NY,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,N,183575.0,18350.0,31925.0,0.79,0.08,0.14,,,,,,,,,,,,,,,,,233850.0,,
1,"ABBOTT, DAVID H",IN,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,Y,26065.0,0.0,0.0,1.0,0.0,0.0,,,,,,,,,,,,,,,,,26065.0,,
2,"ABBOTT, GREG",TX,GOVERNOR/LIEUTENANT GOVERNOR,ADVANCED TO GENERAL,Y,61189628.95,4590344.58,1020.0,0.93,0.07,0.0,GOVERNOR/LIEUTENANT GOVERNOR,ADVANCED TO GENERAL,23579018.32,1526043.75,4791017.12,0.79,0.05,0.16,,,,,,,,,65780993.53,29896079.19,
3,"ABDUL-RAHIM, ANEES",MD,STATE HOUSE/ASSEMBLY/SENATE,DID NOT ADVANCE,Y,3165.0,0.0,0.0,1.0,0.0,0.0,,,,,,,,,,,,,,,,,3165.0,,
4,"ABERCROMBIE, CATHERINE F",CT,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,N,2485.0,0.0,0.0,1.0,0.0,0.0,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,5130.06,7.0,20.0,0.99,0.0,0.0,,,,,,,,,2485.0,5157.06,


## Calculate the average contributions to 2018 state legislative candidates and the average proportion of out-of-state contributions to them

Filter the data to just candidates running for state house or senate in 2018.

In [11]:
legislative_candidates = contributions_by_candidate[contributions_by_candidate["standardized_office_18"] == "STATE HOUSE/ASSEMBLY/SENATE"]
legislative_candidates.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8614 entries, 0 to 8947
Data columns (total 30 columns):
candidate                 8614 non-null object
state                     8614 non-null object
standardized_office_18    8614 non-null object
standardized_status_18    8614 non-null object
redistricting_role        8614 non-null object
amount_18_in_state        8614 non-null float64
amount_18_out_of_state    8614 non-null float64
amount_18_unknown         8614 non-null float64
pct_18_in_state           8614 non-null float64
pct_18_out_of_state       8614 non-null float64
pct_18_unknown            8614 non-null float64
standardized_office_14    2834 non-null object
standardized_status_14    2834 non-null object
amount_14_in_state        2834 non-null float64
amount_14_out_of_state    2834 non-null float64
amount_14_unknown         2834 non-null float64
pct_14_in_state           2834 non-null float64
pct_14_out_of_state       2834 non-null float64
pct_14_unknown            2834 non-n

Calculate the mean total contributions.

In [12]:
legislative_candidates["amount_18_total"].mean()

72349.1902333409

Calculate the mean out-of-state contributions.

In [13]:
legislative_candidates["amount_18_out_of_state"].mean()

8233.98652426283

Calculate the average proportion of out-of-state contributions.

In [14]:
legislative_candidates["amount_18_out_of_state"].mean() / legislative_candidates["amount_18_total"].mean()

0.1138089659014364

## Calculate 2018 out-of-state contributions for New Mexico's Jimmie Hall and Melanie Stansbury.

Filter the data to return contributions to Jimmie Hall and Melanie Stansbury in the 2018 cycle.

In [15]:
hall_contributions = contributions[(contributions["candidate_id"] == 240408) & (contributions["year"] == 2018)]
stansbury_contributions = contributions[(contributions["candidate_id"] == 240407) & (contributions["year"] == 2018)]

In [16]:
hall_contributions.groupby("in_out_state")["amount"].sum()

in_out_state
in-state       18,965.00
out-of-state   15,250.00
Name: amount, dtype: float64

In [17]:
stansbury_contributions.groupby("in_out_state")["amount"].sum()

in_out_state
in-state       52,858.85
out-of-state   32,776.76
unknown           185.00
Name: amount, dtype: float64

## Calculate the top contributors to the 2018 campaign of Wisconsin's Scott Walker.

And how many people maxed out at the $20,000 level?

In [18]:
walker_contributions = contributions[contributions["candidate_id"] == 224207]
walker_contributors = walker_contributions.groupby(["contributor", "in_out_state"])["amount"].sum().reset_index().sort_values("amount", ascending=False)
# All people are identified as "Last Name, First Name"
walker_contributors[(walker_contributors["contributor"].str.contains(", ")) & (walker_contributors["amount"] >= 20000) & (walker_contributors["in_out_state"] == "out-of-state")].reset_index(drop=True)

Unnamed: 0,contributor,in_out_state,amount
0,"ROBERTS, TIMOTHY J",out-of-state,25000.0
1,"GROSS, DIETRICH M",out-of-state,22500.0
2,"MOROUN, LINDSAY S",out-of-state,20000.0
3,"HERZOG, STANLEY M (STAN)",out-of-state,20000.0
4,"MOROUN, MATTHEW T",out-of-state,20000.0
5,"MURRAY, ROBERT EUGENE (BOB)",out-of-state,20000.0
6,"DARROW, SUSAN J (SUE)",out-of-state,20000.0
7,"CRAFT, JOE W",out-of-state,20000.0
8,"HERTOG, ROGER",out-of-state,20000.0
9,"SMITH, THOMAS WILLIAM",out-of-state,20000.0


## Calculate out-of-state contributions for Iowa's Cathy Glasson and Fred Hubbell through the primary election

Filter the data to return contributions to Fred Hubbell and Cathy Glasson through the June 5, 2018 primary.

In [19]:
hubbell_primary_contributions = contributions[(contributions["candidate_id"] == 264335) & (contributions["date"] <= "2018-06-05")]
glasson_primary_contributions = contributions[(contributions["candidate_id"] == 223996) & (contributions["date"] <= "2018-06-05")]

In [20]:
hubbell_primary_contributions.groupby("in_out_state")["amount"].sum()

in_out_state
in-state       5,072,593.32
out-of-state   1,019,312.25
unknown            8,415.00
Name: amount, dtype: float64

In [21]:
hubbell_primary_contributions["amount"].sum()

6100320.5700000003

In [22]:
glasson_primary_contributions.groupby("in_out_state")["amount"].sum()

in_out_state
in-state         109,689.90
out-of-state   2,064,815.92
Name: amount, dtype: float64

In [23]:
glasson_primary_contributions["amount"].sum()

2174505.8200000003

## Export the data to Excel.

In [24]:
contributions_by_candidate.to_excel("data/contributions_by_candidate.xlsx", index=False)