# Out-of-State-Contributions: National Analysis

In [15]:
from functools import reduce
import numpy as np
import pandas as pd

%load_ext jupyternotify

pd.set_option("display.max_columns", 50)
pd.set_option("display.max_rows", 50)
pd.options.display.float_format = "{:.2f}".format # Suppress scientific notation

The jupyternotify extension is already loaded. To reload it, use:
  %reload_ext jupyternotify


Import [National Institute on Money in Politics](https://www.followthemoney.org/) API key.

In [16]:
nimp_key = open("nimp_api_key.txt", "r").readline()

## Question 1: How much out-of-state money has been raised in the 2018 election cycle, in absolute and proportional terms, thus far and how does that compare with the 2014 and 2010 cycles?

### Import contribution-level data on donations to gubernatorial, attorney general, secretary of state, state supreme court, state senate and state house candidates in 2018, 2014 and 2010.

Our first task is to determine a data cut-off point for prior election cycles so we can make accurate comparisons across cycles.

Download and save each cycle's contributions data.

In [3]:
%%notify
%%time
contributions_18 = pd.read_csv("https://www.followthemoney.org/aaengine/aafetch.php?dt=1&y=2018&c-exi=1&c-r-oc=Z10,Z70&c-r-ot=G,S,H,J&gro=c-t-id,d-id&APIKey="+nimp_key+"&mode=csv")
contributions_18.to_csv("data/contributions_18.csv", index=False)
contributions_14 = pd.read_csv("https://www.followthemoney.org/aaengine/aafetch.php?dt=1&y=2014&c-exi=1&c-r-oc=Z10,Z70&c-r-ot=G,S,H,J&gro=c-t-id,d-id&APIKey="+nimp_key+"&mode=csv")
contributions_14.read_csv("data/contributions_14.csv", index=False)
contributions_10 = pd.read_csv("https://www.followthemoney.org/aaengine/aafetch.php?dt=1&y=2010&c-exi=1&c-r-oc=Z10,Z70&c-r-ot=G,S,H,J&gro=c-t-id,d-id&APIKey="+nimp_key+"&mode=csv")
contributions_10.read_csv("data/contributions_10.csv", index=False)

CPU times: user 6 µs, sys: 1e+03 ns, total: 7 µs
Wall time: 13.6 µs


<IPython.core.display.Javascript object>

Concatenate the data.

In [17]:
%%bash
head -1 "data/contributions_18.csv" >> "data/contributions.csv"
sed '1d' "data/contributions_18.csv" >> "data/contributions.csv"
sed '1d' "data/contributions_14.csv" >> "data/contributions.csv"
sed '1d' "data/contributions_10.csv" >> "data/contributions.csv"

Import the data.

In [None]:
contributions = pd.read_csv("data/contributions.csv", error_bad_lines=False)

b'Skipping line 1143428: expected 79 fields, saw 80\n'
  interactivity=interactivity, compiler=compiler, result=result)


Convert the contribution amount column to numeric (float) data type and convert the contribution date column to datetime data type.

In [6]:
contributions["Amount"] = pd.to_numeric(contributions["Amount"], errors="coerce")
contributions["Date"] = pd.to_datetime(contributions["Date"], errors="coerce")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2180762 entries, 0 to 2180761
Data columns (total 79 columns):
request                        object
Candidate:token                object
Candidate:id                   int64
Candidate                      object
Candidate_Entity:token         object
Candidate_Entity:id            object
Candidate_Entity               object
Election_Status:token          object
Election_Status:id             object
Election_Status                object
Status_of_Candidate:token      object
Status_of_Candidate:id         int64
Status_of_Candidate            object
Specific_Party:token           object
Specific_Party:id              int64
Specific_Party                 object
General_Party:token            object
General_Party:id               int64
General_Party                  object
Election_Jurisdiction:token    object
Election_Jurisdiction:id       object
Election_Jurisdiction          object
Election_Year:token            object
Election_Year:id 

Filter out unitemized donations as it is impossible to determine where those contributions originated. Then filter the data to just the state, contribution amount, contribution date and in-vs.-out-of-state columns.

In [None]:
contributions = contribs_18[contribs_18["Contributor"] != "UNITEMIZED DONATIONS"]
contributions = contribs_18[["Election_Jurisdiction", "Amount", "Date", "In-State"]]

Rename the categories in the in-vs.-out-of-state column.

In [None]:
# 0 = out-of-state, 1 = in-state, 2 = unknown
contributions["in_out_state"] = contributions["in_out_state"].replace({0: "out-of-state", 1: "in-state", 2: "unknown"})

Filter the data to just 2018 cycle contributions. Then extract the month and year from the contribution date column.

In [None]:
contributions_18 = contributions[contributions["Election_Year"] == "2018"]
contributions_18["month"] = contributions_18["date"].dt.to_period("M")

Group the contributions by state and month.

In [None]:
grouped_by_month = contributions_18.groupby(["state", "month"])["amount"].sum().reset_index()
grouped_by_month.info()

Because we eventually want to use each state's month column as the cut-off date for contributions, we need to add a day to the month and the year and then convert the column into datetime data type.

In [10]:
grouped_by_month["month"] = grouped_by_month["month"].astype(str) + "-28" # No month has fewer than 28 days
grouped_by_month["month"] = pd.to_datetime(grouped_by_month["month"], errors="coerce")
grouped_by_month.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 3 columns):
state     1460 non-null object
month     1460 non-null datetime64[ns]
amount    1460 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 34.3+ KB


We know some of the contribution dates are wrong. We know this because some of the dates occur in the future and, unless we've got some time travelling campaign donors, these are data entry errors. To eliminate this noise, we will filter out months after August 2018.

In [12]:
grouped_by_month = grouped_by_month[grouped_by_month["month"] <= "2018-08-28"]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1173 entries, 3 to 1459
Data columns (total 3 columns):
state     1173 non-null object
month     1173 non-null datetime64[ns]
amount    1173 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 36.7+ KB


Return the most recent month of contributions for each state.

In [13]:
latest_month = grouped_by_month.groupby("state")["month"].max().reset_index()
latest_month.rename(columns={"month": "latest_month"}, inplace=True)
latest_month

Unnamed: 0,state,latest_month
0,AK,2018-07-28
1,AL,2018-07-28
2,AR,2018-03-28
3,AZ,2017-12-28
4,CA,2018-07-28
5,CO,2018-06-28
6,CT,2018-03-28
7,FL,2018-08-28
8,GA,2018-07-28
9,HI,2017-12-28
