# Out-of-State-Contributions: National Analysis

How much out-of-state money have candidates nationally raised in the 2018 election cycle, in absolute and proportional terms, thus far and how does that compare with this point in the 2014 and 2010 cycles?

In [49]:
from functools import reduce
import numpy as np
import pandas as pd

%load_ext jupyternotify

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 500)
pd.options.display.float_format = "{:,.2f}".format # Format floats

The jupyternotify extension is already loaded. To reload it, use:
  %reload_ext jupyternotify


Import contributions data.

In [50]:
%%notify
contributions = pd.read_csv("data/contributions.csv")
contributions.info()

  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6051953 entries, 0 to 6051952
Data columns (total 18 columns):
candidate                 object
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      object
in_out_state              object
no_veto                   object
office                    object
latest_month              object
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: float64(1), int64(1), object(16)
memory usage: 831.1+ MB


<IPython.core.display.Javascript object>

Convert the contribution date and latest month columns to datetime data type and the redistricting role column to string (object) datatype.

In [51]:
contributions["date"] = pd.to_datetime(contributions["date"], errors="coerce")
contributions["latest_month"] = pd.to_datetime(contributions["latest_month"], errors="coerce")
contributions["redistricting_role"] = contributions["redistricting_role"].astype(object)
contributions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6051953 entries, 0 to 6051952
Data columns (total 18 columns):
candidate                 object
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      datetime64[ns]
in_out_state              object
no_veto                   object
office                    object
latest_month              datetime64[ns]
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: datetime64[ns](2), float64(1), int64(1), object(14)
memory usage: 831.1+ MB


## Calculate out-of-state contributions by party and year

Group by year and party and in-vs.-out-of-state contribution status and sum contributions.

In [52]:
contributions_by_party = contributions.groupby(["year", "party", "in_out_state"])["amount"].sum().reset_index()
contributions_by_party

Unnamed: 0,year,party,in_out_state,amount
0,2010,Democratic,in-state,454991330.51
1,2010,Democratic,out-of-state,56810108.85
2,2010,Democratic,unknown,1312244.72
3,2010,Nonpartisan,in-state,942676.0
4,2010,Nonpartisan,out-of-state,136753.7
5,2010,Nonpartisan,unknown,-19688.72
6,2010,Republican,in-state,631986359.03
7,2010,Republican,out-of-state,48159238.63
8,2010,Republican,unknown,627736.24
9,2010,Third-Party,in-state,5257658.51


Pivot dataframe to aggregate each party's data in a single row.

In [53]:
contributions_by_party = pd.pivot_table(contributions_by_party, index=["party"], columns=["year", "in_out_state"]).reset_index()
contributions_by_party

Unnamed: 0_level_0,party,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,Democratic,454991330.51,56810108.85,1312244.72,380863502.49,65669233.08,5969356.57,593342876.83,87513130.1,8089220.32
1,Nonpartisan,942676.0,136753.7,-19688.72,1672746.69,156743.05,4068.0,1376332.38,310506.3,-3156.23
2,Republican,631986359.03,48159238.63,627736.24,445191983.93,72850170.2,10986718.52,700498617.2,66420967.5,8863713.0
3,Third-Party,5257658.51,607163.33,1531169.31,9019459.94,533565.54,37435.4,3810315.64,340839.0,114731.23
4,Unknown,,,,,,,28042.01,7944.09,


Some records have no contributions for certain categories. Let's set those values equal to zero to be sure any calculations we run on them are correct.

In [54]:
contributions_by_party.fillna(0, inplace=True)

Flatten the resulting dataframe's multi-index columns.

In [55]:
contributions_by_party.columns = ["party", "18_in_state", "18_out_of_state", "18_unknown",
                                  "14_in_state", "14_out_of_state", "14_unknown",
                                  "10_in_state", "10_out_of_state", "10_unknown"]
contributions_by_party

Unnamed: 0,party,18_in_state,18_out_of_state,18_unknown,14_in_state,14_out_of_state,14_unknown,10_in_state,10_out_of_state,10_unknown
0,Democratic,454991330.51,56810108.85,1312244.72,380863502.49,65669233.08,5969356.57,593342876.83,87513130.1,8089220.32
1,Nonpartisan,942676.0,136753.7,-19688.72,1672746.69,156743.05,4068.0,1376332.38,310506.3,-3156.23
2,Republican,631986359.03,48159238.63,627736.24,445191983.93,72850170.2,10986718.52,700498617.2,66420967.5,8863713.0
3,Third-Party,5257658.51,607163.33,1531169.31,9019459.94,533565.54,37435.4,3810315.64,340839.0,114731.23
4,Unknown,0.0,0.0,0.0,0.0,0.0,0.0,28042.01,7944.09,0.0


Calculate the total contributions by cycle.

In [56]:
contributions_by_party["18_total"] = contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"]
contributions_by_party["14_total"] = contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"]
contributions_by_party["10_total"] = contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"]
contributions_by_party = contributions_by_party[["party", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]

Calculate the proportion of in-state, out-of-state and unknown contributions by cycle.

In [57]:
contributions_by_party["pct_18_in_state"] = contributions_by_party["18_in_state"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_18_out_of_state"] = contributions_by_party["18_out_of_state"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_18_unknown"] = contributions_by_party["18_unknown"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_14_in_state"] = contributions_by_party["14_in_state"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_14_out_of_state"] = contributions_by_party["14_out_of_state"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_14_unknown"] = contributions_by_party["10_unknown"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_10_in_state"] = contributions_by_party["10_in_state"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party["pct_10_out_of_state"] = contributions_by_party["14_out_of_state"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party["pct_10_unknown"] = contributions_by_party["10_unknown"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,party,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,Democratic,454991330.51,56810108.85,1312244.72,513113684.08,380863502.49,65669233.08,5969356.57,452502092.14,593342876.83,87513130.1,8089220.32,688945227.25,0.89,0.11,0.0,0.84,0.15,0.02,0.86,0.1,0.01
1,Nonpartisan,942676.0,136753.7,-19688.72,1059740.98,1672746.69,156743.05,4068.0,1833557.74,1376332.38,310506.3,-3156.23,1683682.45,0.89,0.13,-0.02,0.91,0.09,-0.0,0.82,0.09,-0.0
2,Republican,631986359.03,48159238.63,627736.24,680773333.9,445191983.93,72850170.2,10986718.52,529028872.65,700498617.2,66420967.5,8863713.0,775783297.7,0.93,0.07,0.0,0.84,0.14,0.02,0.9,0.09,0.01
3,Third-Party,5257658.51,607163.33,1531169.31,7395991.15,9019459.94,533565.54,37435.4,9590460.88,3810315.64,340839.0,114731.23,4265885.87,0.71,0.08,0.21,0.94,0.06,0.01,0.89,0.13,0.03
4,Unknown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28042.01,7944.09,0.0,35986.1,,,,,,,0.78,0.0,0.0


## Calculate 2018 out-of-state contributions by redistricting role

Filter the contributions data to the 2018 cycle.

In [58]:
contributions_18 = contributions[contributions["year"] == 2018]
contributions_18.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2275012 entries, 0 to 2275011
Data columns (total 18 columns):
candidate                 object
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      datetime64[ns]
in_out_state              object
no_veto                   object
office                    object
latest_month              datetime64[ns]
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: datetime64[ns](2), float64(1), int64(1), object(14)
memory usage: 329.8+ MB


Group by redistricting role and in-vs.-out-of-state contribution status and sum contributions.

In [59]:
contributions_by_redistricting = contributions.groupby(["redistricting_role", "in_out_state"])["amount"].sum().reset_index()
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_out_state,amount
0,N,in-state,222189963.25
1,N,out-of-state,27213022.74
2,N,unknown,4360094.74
3,Y,in-state,1076866220.81
4,Y,out-of-state,127380364.25
5,Y,unknown,12704413.58


Pivot dataframe to aggregate each role's data in a single row.

In [62]:
contributions_by_redistricting = pd.pivot_table(contributions_by_redistricting, index=["redistricting_role"], columns=["in_out_state"]).reset_index()
contributions_by_redistricting

Unnamed: 0_level_0,redistricting_role,amount,amount,amount
in_out_state,Unnamed: 1_level_1,in-state,out-of-state,unknown
0,N,222189963.25,27213022.74,4360094.74
1,Y,1076866220.81,127380364.25,12704413.58


Flatten the resulting dataframe's multi-index columns.

In [63]:
contributions_by_redistricting.columns = ["redistricting_role", "in_state", "out_of_state", "unknown"]
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_state,out_of_state,unknown
0,N,222189963.25,27213022.74,4360094.74
1,Y,1076866220.81,127380364.25,12704413.58


Calculate the total contributions by redistricting role.

In [64]:
contributions_by_redistricting["total"] = contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"]

Calculate the proportion of in-state, out-of-state and unknown contributions by redistricting role.

In [65]:
contributions_by_redistricting["pct_in_state"] = contributions_by_redistricting["in_state"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting["pct_out_of_state"] = contributions_by_redistricting["out_of_state"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting["pct_unknown"] = contributions_by_redistricting["unknown"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_state,out_of_state,unknown,total,pct_in_state,pct_out_of_state,pct_unknown
0,N,222189963.25,27213022.74,4360094.74,253763080.73,0.88,0.11,0.02
1,Y,1076866220.81,127380364.25,12704413.58,1216950998.64,0.88,0.1,0.01


## Calculate contributions by candidate status and year

Group by year and candidate status and in-vs.-out-of-state contribution status and sum contributions.

In [41]:
contributions_by_status = contributions.groupby(["year", "standardized_status", "in_out_state"])["amount"].sum().reset_index()
contributions_by_status

Unnamed: 0,year,standardized_status,in_out_state,amount
0,2010,ADVANCED TO GENERAL,in-state,854672580.19
1,2010,ADVANCED TO GENERAL,out-of-state,89236961.16
2,2010,ADVANCED TO GENERAL,unknown,2732449.53
3,2010,DID NOT ADVANCE,in-state,238505443.86
4,2010,DID NOT ADVANCE,out-of-state,16476303.35
5,2010,DID NOT ADVANCE,unknown,719012.02
6,2014,ADVANCED TO GENERAL,in-state,710162108.85
7,2014,ADVANCED TO GENERAL,out-of-state,120696904.53
8,2014,ADVANCED TO GENERAL,unknown,11732958.7
9,2014,DID NOT ADVANCE,in-state,126585584.2


Pivot dataframe to aggregate each candidate status' data in a single row.

In [42]:
contributions_by_status = pd.pivot_table(contributions_by_status, index=["standardized_status"], columns=["year", "in_out_state"]).reset_index()
contributions_by_status

Unnamed: 0_level_0,standardized_status,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,ADVANCED TO GENERAL,854672580.19,89236961.16,2732449.53,710162108.85,120696904.53,11732958.7,929935372.71,114551238.31,11040180.78
1,DID NOT ADVANCE,238505443.86,16476303.35,719012.02,126585584.2,18512807.34,5264619.79,369120811.35,40042148.68,6024327.54


Flatten the resulting dataframe's multi-index columns.

In [44]:
contributions_by_status.columns = ["standardized_status", "18_in_state", "18_out_of_state", "18_unknown",
                                  "14_in_state", "14_out_of_state", "14_unknown",
                                  "10_in_state", "10_out_of_state", "10_unknown"]
contributions_by_status

Unnamed: 0,standardized_status,18_in_state,18_out_of_state,18_unknown,14_in_state,14_out_of_state,14_unknown,10_in_state,10_out_of_state,10_unknown
0,ADVANCED TO GENERAL,854672580.19,89236961.16,2732449.53,710162108.85,120696904.53,11732958.7,929935372.71,114551238.31,11040180.78
1,DID NOT ADVANCE,238505443.86,16476303.35,719012.02,126585584.2,18512807.34,5264619.79,369120811.35,40042148.68,6024327.54


Calculate the total contributions by cycle.

In [46]:
contributions_by_status["18_total"] = contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"]
contributions_by_status["14_total"] = contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"]
contributions_by_status["10_total"] = contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"]
contributions_by_status = contributions_by_status[["standardized_status", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]

Calculate the proportion of in-state, out-of-state and unknown contributions by cycle.

In [47]:
contributions_by_status["pct_18_in_state"] = contributions_by_status["18_in_state"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_18_out_of_state"] = contributions_by_status["18_out_of_state"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_18_unknown"] = contributions_by_status["18_unknown"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_14_in_state"] = contributions_by_status["14_in_state"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_14_out_of_state"] = contributions_by_status["14_out_of_state"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_14_unknown"] = contributions_by_status["10_unknown"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_10_in_state"] = contributions_by_status["10_in_state"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status["pct_10_out_of_state"] = contributions_by_status["14_out_of_state"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status["pct_10_unknown"] = contributions_by_status["10_unknown"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,standardized_status,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,ADVANCED TO GENERAL,854672580.19,89236961.16,2732449.53,946641990.88,710162108.85,120696904.53,11732958.7,842591972.08,929935372.71,114551238.31,11040180.78,1055526791.8,0.9,0.09,0.0,0.84,0.14,0.01,0.88,0.11,0.01
1,DID NOT ADVANCE,238505443.86,16476303.35,719012.02,255700759.23,126585584.2,18512807.34,5264619.79,150363011.33,369120811.35,40042148.68,6024327.54,415187287.57,0.93,0.06,0.0,0.84,0.12,0.04,0.89,0.04,0.01


## Export the data

In [66]:
writer = pd.ExcelWriter("data/national_analysis.xlsx")
contributions_by_party.to_excel(writer, "contributions_by_party", index=False)
contributions_by_redistricting.to_excel(writer, "contributions_by_redistricting", index=False)
contributions_by_status.to_excel(writer, "contributions_by_status", index=False)
writer.save()