# Out-of-State-Contributions: National Analysis

How much out-of-state money have candidates nationally raised in the 2018 election cycle, in absolute and proportional terms, thus far and how does that compare with this point in the 2014 and 2010 cycles?

In [1]:
import numpy as np
import pandas as pd

%load_ext jupyternotify

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 500)
pd.options.display.float_format = "{:,.2f}".format # Format floats

<IPython.core.display.Javascript object>

Import contributions data.

In [2]:
%%notify
contributions = pd.read_csv("data/contributions.csv")
contributions.info()

  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6867306 entries, 0 to 6867305
Data columns (total 23 columns):
candidate                 object
candidate_id              int64
year                      int64
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      object
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           float64
in_out_state              object
no_veto                   object
office                    object
latest_month              object
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: float64(2), int64(2), object(19)
memory usage: 1.2+ GB


<IPython.core.display.Javascript object>

Convert the candidate ID and year columns to integer data type, the contribution date and latest month columns to datetime data type and the contributor zip column to object data type.

In [3]:
contributions["candidate_id"] = pd.to_numeric(contributions["candidate_id"], errors="coerce", downcast="integer")
contributions["year"] = pd.to_numeric(contributions["year"], errors="coerce", downcast="integer")
contributions["date"] = pd.to_datetime(contributions["date"], errors="coerce")
contributions["latest_month"] = pd.to_datetime(contributions["latest_month"], errors="coerce")
contributions["contributor_zip"] = contributions["contributor_zip"].astype(object)
contributions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6867306 entries, 0 to 6867305
Data columns (total 23 columns):
candidate                 object
candidate_id              int32
year                      int16
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           object
in_out_state              object
no_veto                   object
office                    object
latest_month              datetime64[ns]
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: datetime64[ns](2), float64(1), int16(1), int32(1), object(18)
memory usage: 1.1+ GB


Import full cycle contributions data.

In [4]:
%%notify
contributions_full_cycles = pd.read_csv("data/contributions_full_cycles.csv")
contributions_full_cycles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9106128 entries, 0 to 9106127
Data columns (total 17 columns):
candidate_id           int64
candidate              object
election_status        object
party                  object
state                  object
year                   int64
office                 object
contributor            object
amount                 float64
date                   object
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        float64
in_out_state           object
standardized_office    object
standardized_status    object
dtypes: float64(2), int64(2), object(13)
memory usage: 1.2+ GB


<IPython.core.display.Javascript object>

Convert the candidate ID and year columns to integer data type, the contribution date column to datetime data type and the contributor zip column to object data type.

In [5]:
contributions_full_cycles["candidate_id"] = pd.to_numeric(contributions_full_cycles["candidate_id"], errors="coerce", downcast="integer")
contributions_full_cycles["year"] = pd.to_numeric(contributions_full_cycles["year"], errors="coerce", downcast="integer")
contributions_full_cycles["date"] = pd.to_datetime(contributions_full_cycles["date"], errors="coerce")
contributions_full_cycles["contributor_zip"] = contributions_full_cycles["contributor_zip"].astype(object)
contributions_full_cycles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9106128 entries, 0 to 9106127
Data columns (total 17 columns):
candidate_id           int32
candidate              object
election_status        object
party                  object
state                  object
year                   int16
office                 object
contributor            object
amount                 float64
date                   datetime64[ns]
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        object
in_out_state           object
standardized_office    object
standardized_status    object
dtypes: datetime64[ns](1), float64(1), int16(1), int32(1), object(13)
memory usage: 1.1+ GB


## Calculate out-of-state contributions by party and year

Group by year and party and in-vs.-out-of-state contribution status and sum contributions.

In [6]:
contributions_by_party = contributions.groupby(["year", "party", "in_out_state"])["amount"].sum().reset_index()
contributions_by_party.head()

Unnamed: 0,year,party,in_out_state,amount
0,2010,Democratic,in-state,511599401.37
1,2010,Democratic,out-of-state,63773103.1
2,2010,Democratic,unknown,1670811.78
3,2010,Nonpartisan,in-state,994909.24
4,2010,Nonpartisan,out-of-state,142553.7


Pivot dataframe to aggregate each party's data in a single row.

In [7]:
contributions_by_party = pd.pivot_table(contributions_by_party, index=["party"], columns=["year", "in_out_state"]).reset_index()
contributions_by_party

Unnamed: 0_level_0,party,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,Democratic,511599401.37,63773103.1,1670811.78,438568693.79,74661507.08,6296267.27,694938481.84,100883207.06,10117328.46
1,Nonpartisan,994909.24,142553.7,-19438.72,1781565.01,178003.05,4068.0,1642246.38,336165.87,-2307.58
2,Republican,701750841.97,57352634.7,875940.46,498905779.94,78945614.02,11198512.65,773921813.64,71364530.01,9472896.23
3,Third-Party,6863698.57,852177.89,1538978.63,12669293.15,627085.94,40286.04,3309432.13,339879.37,486249.28


Some records have no contributions for certain categories. Let's set those values equal to zero to be sure any calculations we run on them are correct.

In [8]:
contributions_by_party.fillna(0, inplace=True)

Flatten the resulting dataframe's multi-index columns.

In [9]:
contributions_by_party.columns = ["party", "10_in_state", "10_out_of_state", "10_unknown",
                                  "14_in_state", "14_out_of_state", "14_unknown",
                                  "18_in_state", "18_out_of_state", "18_unknown"
                                  ]
contributions_by_party

Unnamed: 0,party,10_in_state,10_out_of_state,10_unknown,14_in_state,14_out_of_state,14_unknown,18_in_state,18_out_of_state,18_unknown
0,Democratic,511599401.37,63773103.1,1670811.78,438568693.79,74661507.08,6296267.27,694938481.84,100883207.06,10117328.46
1,Nonpartisan,994909.24,142553.7,-19438.72,1781565.01,178003.05,4068.0,1642246.38,336165.87,-2307.58
2,Republican,701750841.97,57352634.7,875940.46,498905779.94,78945614.02,11198512.65,773921813.64,71364530.01,9472896.23
3,Third-Party,6863698.57,852177.89,1538978.63,12669293.15,627085.94,40286.04,3309432.13,339879.37,486249.28


Calculate the total contributions by cycle.

In [10]:
contributions_by_party["18_total"] = contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"]
contributions_by_party["14_total"] = contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"]
contributions_by_party["10_total"] = contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"]
contributions_by_party = contributions_by_party[["party", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]

Calculate the proportion of in-state, out-of-state and unknown contributions by cycle.

In [11]:
contributions_by_party["pct_18_in_state"] = contributions_by_party["18_in_state"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_18_out_of_state"] = contributions_by_party["18_out_of_state"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_18_unknown"] = contributions_by_party["18_unknown"] / (contributions_by_party["18_in_state"] + contributions_by_party["18_out_of_state"] + contributions_by_party["18_unknown"])
contributions_by_party["pct_14_in_state"] = contributions_by_party["14_in_state"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_14_out_of_state"] = contributions_by_party["14_out_of_state"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_14_unknown"] = contributions_by_party["14_unknown"] / (contributions_by_party["14_in_state"] + contributions_by_party["14_out_of_state"] + contributions_by_party["14_unknown"])
contributions_by_party["pct_10_in_state"] = contributions_by_party["10_in_state"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party["pct_10_out_of_state"] = contributions_by_party["10_out_of_state"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party["pct_10_unknown"] = contributions_by_party["10_unknown"] / (contributions_by_party["10_in_state"] + contributions_by_party["10_out_of_state"] + contributions_by_party["10_unknown"])
contributions_by_party

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,party,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,Democratic,694938481.84,100883207.06,10117328.46,805939017.36,438568693.79,74661507.08,6296267.27,519526468.14,511599401.37,63773103.1,1670811.78,577043316.25,0.86,0.13,0.01,0.84,0.14,0.01,0.89,0.11,0.0
1,Nonpartisan,1642246.38,336165.87,-2307.58,1976104.67,1781565.01,178003.05,4068.0,1963636.06,994909.24,142553.7,-19438.72,1118024.22,0.83,0.17,-0.0,0.91,0.09,0.0,0.89,0.13,-0.02
2,Republican,773921813.64,71364530.01,9472896.23,854759239.88,498905779.94,78945614.02,11198512.65,589049906.61,701750841.97,57352634.7,875940.46,759979417.13,0.91,0.08,0.01,0.85,0.13,0.02,0.92,0.08,0.0
3,Third-Party,3309432.13,339879.37,486249.28,4135560.78,12669293.15,627085.94,40286.04,13336665.13,6863698.57,852177.89,1538978.63,9254855.09,0.8,0.08,0.12,0.95,0.05,0.0,0.74,0.09,0.17


## Calculate out-of-state contributions by party and year for the complete cycle

Group by year and party and in-vs.-out-of-state contribution status and sum contributions.

In [12]:
contributions_by_party_full_cycles = contributions_full_cycles.groupby(["year", "party", "in_out_state"])["amount"].sum().reset_index()
contributions_by_party_full_cycles.head()

Unnamed: 0,year,party,in_out_state,amount
0,2010,Democratic,in-state,782601351.14
1,2010,Democratic,out-of-state,98978507.93
2,2010,Democratic,unknown,5873945.04
3,2010,Nonpartisan,in-state,1393839.63
4,2010,Nonpartisan,out-of-state,197886.63


Pivot dataframe to aggregate each party's data in a single row.

In [13]:
contributions_by_party_full_cycles = pd.pivot_table(contributions_by_party_full_cycles, index=["party"], columns=["year", "in_out_state"]).reset_index()
contributions_by_party_full_cycles

Unnamed: 0_level_0,party,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,Democratic,782601351.14,98978507.93,5873945.04,688634127.81,116003523.41,9128383.19,707990241.87,104940627.95,10618189.37
1,Nonpartisan,1393839.63,197886.63,-16406.72,3559696.98,464716.75,10268.0,1642246.38,336165.87,-2307.58
2,Republican,1050695215.16,98930400.21,3278640.99,865413734.07,124561145.9,14348298.86,795176405.03,75738193.34,11253754.55
3,Third-Party,16751364.84,1554587.31,5214417.52,19055899.25,1099466.87,83266.73,3344964.24,342789.37,486816.42


Some records have no contributions for certain categories. Let's set those values equal to zero to be sure any calculations we run on them are correct.

In [14]:
contributions_by_party_full_cycles.fillna(0, inplace=True)

Flatten the resulting dataframe's multi-index columns.

In [15]:
contributions_by_party_full_cycles.columns = ["party", "10_in_state", "10_out_of_state", "10_unknown",
                                  "14_in_state", "14_out_of_state", "14_unknown",
                                  "18_in_state", "18_out_of_state", "18_unknown"
                                  ]
contributions_by_party_full_cycles

Unnamed: 0,party,10_in_state,10_out_of_state,10_unknown,14_in_state,14_out_of_state,14_unknown,18_in_state,18_out_of_state,18_unknown
0,Democratic,782601351.14,98978507.93,5873945.04,688634127.81,116003523.41,9128383.19,707990241.87,104940627.95,10618189.37
1,Nonpartisan,1393839.63,197886.63,-16406.72,3559696.98,464716.75,10268.0,1642246.38,336165.87,-2307.58
2,Republican,1050695215.16,98930400.21,3278640.99,865413734.07,124561145.9,14348298.86,795176405.03,75738193.34,11253754.55
3,Third-Party,16751364.84,1554587.31,5214417.52,19055899.25,1099466.87,83266.73,3344964.24,342789.37,486816.42


Calculate the total contributions by cycle.

In [16]:
contributions_by_party_full_cycles["18_total"] = contributions_by_party_full_cycles["18_in_state"] + contributions_by_party_full_cycles["18_out_of_state"] + contributions_by_party_full_cycles["18_unknown"]
contributions_by_party_full_cycles["14_total"] = contributions_by_party_full_cycles["14_in_state"] + contributions_by_party_full_cycles["14_out_of_state"] + contributions_by_party_full_cycles["14_unknown"]
contributions_by_party_full_cycles["10_total"] = contributions_by_party_full_cycles["10_in_state"] + contributions_by_party_full_cycles["10_out_of_state"] + contributions_by_party_full_cycles["10_unknown"]
contributions_by_party_full_cycles = contributions_by_party_full_cycles[["party", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]

Calculate the proportion of in-state, out-of-state and unknown contributions by cycle.

In [17]:
contributions_by_party_full_cycles["pct_18_in_state"] = contributions_by_party_full_cycles["18_in_state"] / (contributions_by_party_full_cycles["18_in_state"] + contributions_by_party_full_cycles["18_out_of_state"] + contributions_by_party_full_cycles["18_unknown"])
contributions_by_party_full_cycles["pct_18_out_of_state"] = contributions_by_party_full_cycles["18_out_of_state"] / (contributions_by_party_full_cycles["18_in_state"] + contributions_by_party_full_cycles["18_out_of_state"] + contributions_by_party_full_cycles["18_unknown"])
contributions_by_party_full_cycles["pct_18_unknown"] = contributions_by_party_full_cycles["18_unknown"] / (contributions_by_party_full_cycles["18_in_state"] + contributions_by_party_full_cycles["18_out_of_state"] + contributions_by_party_full_cycles["18_unknown"])
contributions_by_party_full_cycles["pct_14_in_state"] = contributions_by_party_full_cycles["14_in_state"] / (contributions_by_party_full_cycles["14_in_state"] + contributions_by_party_full_cycles["14_out_of_state"] + contributions_by_party_full_cycles["14_unknown"])
contributions_by_party_full_cycles["pct_14_out_of_state"] = contributions_by_party_full_cycles["14_out_of_state"] / (contributions_by_party_full_cycles["14_in_state"] + contributions_by_party_full_cycles["14_out_of_state"] + contributions_by_party_full_cycles["14_unknown"])
contributions_by_party_full_cycles["pct_14_unknown"] = contributions_by_party_full_cycles["14_unknown"] / (contributions_by_party_full_cycles["14_in_state"] + contributions_by_party_full_cycles["14_out_of_state"] + contributions_by_party_full_cycles["14_unknown"])
contributions_by_party_full_cycles["pct_10_in_state"] = contributions_by_party_full_cycles["10_in_state"] / (contributions_by_party_full_cycles["10_in_state"] + contributions_by_party_full_cycles["10_out_of_state"] + contributions_by_party_full_cycles["10_unknown"])
contributions_by_party_full_cycles["pct_10_out_of_state"] = contributions_by_party_full_cycles["10_out_of_state"] / (contributions_by_party_full_cycles["10_in_state"] + contributions_by_party_full_cycles["10_out_of_state"] + contributions_by_party_full_cycles["10_unknown"])
contributions_by_party_full_cycles["pct_10_unknown"] = contributions_by_party_full_cycles["10_unknown"] / (contributions_by_party_full_cycles["10_in_state"] + contributions_by_party_full_cycles["10_out_of_state"] + contributions_by_party_full_cycles["10_unknown"])
contributions_by_party_full_cycles

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,party,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,Democratic,707990241.87,104940627.95,10618189.37,823549059.19,688634127.81,116003523.41,9128383.19,813766034.41,782601351.14,98978507.93,5873945.04,887453804.11,0.86,0.13,0.01,0.85,0.14,0.01,0.88,0.11,0.01
1,Nonpartisan,1642246.38,336165.87,-2307.58,1976104.67,3559696.98,464716.75,10268.0,4034681.73,1393839.63,197886.63,-16406.72,1575319.54,0.83,0.17,-0.0,0.88,0.12,0.0,0.88,0.13,-0.01
2,Republican,795176405.03,75738193.34,11253754.55,882168352.92,865413734.07,124561145.9,14348298.86,1004323178.83,1050695215.16,98930400.21,3278640.99,1152904256.36,0.9,0.09,0.01,0.86,0.12,0.01,0.91,0.09,0.0
3,Third-Party,3344964.24,342789.37,486816.42,4174570.03,19055899.25,1099466.87,83266.73,20238632.85,16751364.84,1554587.31,5214417.52,23520369.67,0.8,0.08,0.12,0.94,0.05,0.0,0.71,0.07,0.22


## Calculate out-of-state contributions by office and year

Group by year, standardized office and in-vs.-out-of-state contribution status and sum contributions.

In [18]:
contributions_by_office = contributions.groupby(["year", "standardized_office", "in_out_state"])["amount"].sum().reset_index()
contributions_by_office.head()

Unnamed: 0,year,standardized_office,in_out_state,amount
0,2010,GOVERNOR/LIEUTENANT GOVERNOR,in-state,723096242.03
1,2010,GOVERNOR/LIEUTENANT GOVERNOR,out-of-state,74493761.73
2,2010,GOVERNOR/LIEUTENANT GOVERNOR,unknown,2145307.41
3,2010,STATE HOUSE/ASSEMBLY/SENATE,in-state,498112609.12
4,2010,STATE HOUSE/ASSEMBLY/SENATE,out-of-state,47626707.66


Pivot the dataframe to aggregate each office's data in a single row.

In [19]:
contributions_by_office = pd.pivot_table(contributions_by_office, index=["standardized_office"], columns=["year", "in_out_state"]).reset_index()
contributions_by_office

Unnamed: 0_level_0,standardized_office,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,GOVERNOR/LIEUTENANT GOVERNOR,723096242.03,74493761.73,2145307.41,430226582.69,96121512.79,10271392.27,937211096.27,102323339.14,7613593.59
1,STATE HOUSE/ASSEMBLY/SENATE,498112609.12,47626707.66,1920984.74,521698749.2,58290697.3,7267741.69,536600877.72,70600443.17,12460572.8


Flatten the resulting dataframe's multi-index columns.

In [20]:
contributions_by_office.columns = ["standardized_office", "10_in_state", "10_out_of_state", "10_unknown",
                                  "14_in_state", "14_out_of_state", "14_unknown",
                                  "18_in_state", "18_out_of_state", "18_unknown"
                                  ]
contributions_by_office

Unnamed: 0,standardized_office,10_in_state,10_out_of_state,10_unknown,14_in_state,14_out_of_state,14_unknown,18_in_state,18_out_of_state,18_unknown
0,GOVERNOR/LIEUTENANT GOVERNOR,723096242.03,74493761.73,2145307.41,430226582.69,96121512.79,10271392.27,937211096.27,102323339.14,7613593.59
1,STATE HOUSE/ASSEMBLY/SENATE,498112609.12,47626707.66,1920984.74,521698749.2,58290697.3,7267741.69,536600877.72,70600443.17,12460572.8


Calculate the total contributions by cycle.

In [21]:
contributions_by_office["18_total"] = contributions_by_office["18_in_state"] + contributions_by_office["18_out_of_state"] + contributions_by_office["18_unknown"]
contributions_by_office["14_total"] = contributions_by_office["14_in_state"] + contributions_by_office["14_out_of_state"] + contributions_by_office["14_unknown"]
contributions_by_office["10_total"] = contributions_by_office["10_in_state"] + contributions_by_office["10_out_of_state"] + contributions_by_office["10_unknown"]
contributions_by_office = contributions_by_office[["standardized_office", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]
contributions_by_office

Unnamed: 0,standardized_office,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total
0,GOVERNOR/LIEUTENANT GOVERNOR,937211096.27,102323339.14,7613593.59,1047148029.0,430226582.69,96121512.79,10271392.27,536619487.75,723096242.03,74493761.73,2145307.41,799735311.17
1,STATE HOUSE/ASSEMBLY/SENATE,536600877.72,70600443.17,12460572.8,619661893.69,521698749.2,58290697.3,7267741.69,587257188.19,498112609.12,47626707.66,1920984.74,547660301.52


Calculate the proportion of in-state, out-of-state and unknown contributions.

In [22]:
contributions_by_office["pct_18_in_state"] = contributions_by_office["18_in_state"] / (contributions_by_office["18_in_state"] + contributions_by_office["18_out_of_state"] + contributions_by_office["18_unknown"])
contributions_by_office["pct_18_out_of_state"] = contributions_by_office["18_out_of_state"] / (contributions_by_office["18_in_state"] + contributions_by_office["18_out_of_state"] + contributions_by_office["18_unknown"])
contributions_by_office["pct_18_unknown"] = contributions_by_office["18_unknown"] / (contributions_by_office["18_in_state"] + contributions_by_office["18_out_of_state"] + contributions_by_office["18_unknown"])
contributions_by_office["pct_14_in_state"] = contributions_by_office["14_in_state"] / (contributions_by_office["14_in_state"] + contributions_by_office["14_out_of_state"] + contributions_by_office["14_unknown"])
contributions_by_office["pct_14_out_of_state"] = contributions_by_office["14_out_of_state"] / (contributions_by_office["14_in_state"] + contributions_by_office["14_out_of_state"] + contributions_by_office["14_unknown"])
contributions_by_office["pct_14_unknown"] = contributions_by_office["14_unknown"] / (contributions_by_office["14_in_state"] + contributions_by_office["14_out_of_state"] + contributions_by_office["14_unknown"])
contributions_by_office["pct_10_in_state"] = contributions_by_office["10_in_state"] / (contributions_by_office["10_in_state"] + contributions_by_office["10_out_of_state"] + contributions_by_office["10_unknown"])
contributions_by_office["pct_10_out_of_state"] = contributions_by_office["10_out_of_state"] / (contributions_by_office["10_in_state"] + contributions_by_office["10_out_of_state"] + contributions_by_office["10_unknown"])
contributions_by_office["pct_10_unknown"] = contributions_by_office["10_unknown"] / (contributions_by_office["10_in_state"] + contributions_by_office["10_out_of_state"] + contributions_by_office["10_unknown"])
contributions_by_office

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,standardized_office,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,GOVERNOR/LIEUTENANT GOVERNOR,937211096.27,102323339.14,7613593.59,1047148029.0,430226582.69,96121512.79,10271392.27,536619487.75,723096242.03,74493761.73,2145307.41,799735311.17,0.9,0.1,0.01,0.8,0.18,0.02,0.9,0.09,0.0
1,STATE HOUSE/ASSEMBLY/SENATE,536600877.72,70600443.17,12460572.8,619661893.69,521698749.2,58290697.3,7267741.69,587257188.19,498112609.12,47626707.66,1920984.74,547660301.52,0.87,0.11,0.02,0.89,0.1,0.01,0.91,0.09,0.0


## Calculate 2018 out-of-state contributions by redistricting role

Filter the contributions data to the 2018 cycle.

In [23]:
contributions_18 = contributions[contributions["year"] == 2018]
contributions_18.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2619767 entries, 0 to 2619766
Data columns (total 23 columns):
candidate                 object
candidate_id              int32
year                      int16
state                     object
party                     object
election_status           object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           object
in_out_state              object
no_veto                   object
office                    object
latest_month              datetime64[ns]
redistricting_role        object
independent_commission    object
single_house_district     object
standardized_office       object
standardized_status       object
two_year_term             object
dtypes: datetime64[ns](2), float64(1), int16(1), int32(1), object(18)
memory usage: 454.7+ MB


Group by redistricting role and in-vs.-out-of-state contribution status and sum contributions.

In [24]:
contributions_by_redistricting = contributions.groupby(["redistricting_role", "in_out_state"])["amount"].sum().reset_index()
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_out_state,amount
0,N,in-state,234469649.22
1,N,out-of-state,28338244.6
2,N,unknown,4360639.26
3,Y,in-state,1239342324.77
4,Y,out-of-state,144585537.71
5,Y,unknown,15713527.13


Pivot dataframe to aggregate each role's data in a single row.

In [25]:
contributions_by_redistricting = pd.pivot_table(contributions_by_redistricting, index=["redistricting_role"], columns=["in_out_state"]).reset_index()
contributions_by_redistricting

Unnamed: 0_level_0,redistricting_role,amount,amount,amount
in_out_state,Unnamed: 1_level_1,in-state,out-of-state,unknown
0,N,234469649.22,28338244.6,4360639.26
1,Y,1239342324.77,144585537.71,15713527.13


Flatten the resulting dataframe's multi-index columns.

In [26]:
contributions_by_redistricting.columns = ["redistricting_role", "in_state", "out_of_state", "unknown"]
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_state,out_of_state,unknown
0,N,234469649.22,28338244.6,4360639.26
1,Y,1239342324.77,144585537.71,15713527.13


Calculate the total contributions by redistricting role.

In [27]:
contributions_by_redistricting["total"] = contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"]

Calculate the proportion of in-state, out-of-state and unknown contributions by redistricting role.

In [28]:
contributions_by_redistricting["pct_in_state"] = contributions_by_redistricting["in_state"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting["pct_out_of_state"] = contributions_by_redistricting["out_of_state"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting["pct_unknown"] = contributions_by_redistricting["unknown"] / (contributions_by_redistricting["in_state"] + contributions_by_redistricting["out_of_state"] + contributions_by_redistricting["unknown"])
contributions_by_redistricting

Unnamed: 0,redistricting_role,in_state,out_of_state,unknown,total,pct_in_state,pct_out_of_state,pct_unknown
0,N,234469649.22,28338244.6,4360639.26,267168533.08,0.88,0.11,0.02
1,Y,1239342324.77,144585537.71,15713527.13,1399641389.61,0.89,0.1,0.01


## Calculate contributions by candidate status and year

Group by year and candidate status and in-vs.-out-of-state contribution status and sum contributions.

In [29]:
contributions_by_status = contributions.groupby(["year", "standardized_status", "in_out_state"])["amount"].sum().reset_index()
contributions_by_status

Unnamed: 0,year,standardized_status,in_out_state,amount
0,2010,ADVANCED TO GENERAL,in-state,973295056.5
1,2010,ADVANCED TO GENERAL,out-of-state,104553554.45
2,2010,ADVANCED TO GENERAL,unknown,3334419.94
3,2010,DID NOT ADVANCE,in-state,247913794.65
4,2010,DID NOT ADVANCE,out-of-state,17566914.94
5,2010,DID NOT ADVANCE,unknown,731872.21
6,2014,ADVANCED TO GENERAL,in-state,820509652.4
7,2014,ADVANCED TO GENERAL,out-of-state,135234615.93
8,2014,ADVANCED TO GENERAL,unknown,12254832.61
9,2014,DID NOT ADVANCE,in-state,131415679.49


Pivot dataframe to aggregate each candidate status' data in a single row.

In [30]:
contributions_by_status = pd.pivot_table(contributions_by_status, index=["standardized_status"], columns=["year", "in_out_state"]).reset_index()
contributions_by_status

Unnamed: 0_level_0,standardized_status,amount,amount,amount,amount,amount,amount,amount,amount,amount
year,Unnamed: 1_level_1,2010,2010,2010,2014,2014,2014,2018,2018,2018
in_out_state,Unnamed: 1_level_2,in-state,out-of-state,unknown,in-state,out-of-state,unknown,in-state,out-of-state,unknown
0,ADVANCED TO GENERAL,973295056.5,104553554.45,3334419.94,820509652.4,135234615.93,12254832.61,1049219119.87,135163430.33,14934582.7
1,DID NOT ADVANCE,247913794.65,17566914.94,731872.21,131415679.49,19177594.16,5284301.35,424592854.12,37760351.98,5139583.69


Flatten the resulting dataframe's multi-index columns.

In [31]:
contributions_by_status.columns = ["standardized_status", "10_in_state", "10_out_of_state", "10_unknown",
                                   "14_in_state", "14_out_of_state", "14_unknown",
                                   "18_in_state", "18_out_of_state", "18_unknown"
                                  ]
contributions_by_status

Unnamed: 0,standardized_status,10_in_state,10_out_of_state,10_unknown,14_in_state,14_out_of_state,14_unknown,18_in_state,18_out_of_state,18_unknown
0,ADVANCED TO GENERAL,973295056.5,104553554.45,3334419.94,820509652.4,135234615.93,12254832.61,1049219119.87,135163430.33,14934582.7
1,DID NOT ADVANCE,247913794.65,17566914.94,731872.21,131415679.49,19177594.16,5284301.35,424592854.12,37760351.98,5139583.69


Calculate the total contributions by cycle.

In [32]:
contributions_by_status["18_total"] = contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"]
contributions_by_status["14_total"] = contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"]
contributions_by_status["10_total"] = contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"]
contributions_by_status = contributions_by_status[["standardized_status", "18_in_state", "18_out_of_state", "18_unknown", "18_total", "14_in_state", "14_out_of_state", "14_unknown", "14_total", "10_in_state", "10_out_of_state", "10_unknown", "10_total"]]

Calculate the proportion of in-state, out-of-state and unknown contributions by cycle.

In [33]:
contributions_by_status["pct_18_in_state"] = contributions_by_status["18_in_state"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_18_out_of_state"] = contributions_by_status["18_out_of_state"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_18_unknown"] = contributions_by_status["18_unknown"] / (contributions_by_status["18_in_state"] + contributions_by_status["18_out_of_state"] + contributions_by_status["18_unknown"])
contributions_by_status["pct_14_in_state"] = contributions_by_status["14_in_state"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_14_out_of_state"] = contributions_by_status["14_out_of_state"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_14_unknown"] = contributions_by_status["14_unknown"] / (contributions_by_status["14_in_state"] + contributions_by_status["14_out_of_state"] + contributions_by_status["14_unknown"])
contributions_by_status["pct_10_in_state"] = contributions_by_status["10_in_state"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status["pct_10_out_of_state"] = contributions_by_status["10_out_of_state"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status["pct_10_unknown"] = contributions_by_status["10_unknown"] / (contributions_by_status["10_in_state"] + contributions_by_status["10_out_of_state"] + contributions_by_status["10_unknown"])
contributions_by_status

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

Unnamed: 0,standardized_status,18_in_state,18_out_of_state,18_unknown,18_total,14_in_state,14_out_of_state,14_unknown,14_total,10_in_state,10_out_of_state,10_unknown,10_total,pct_18_in_state,pct_18_out_of_state,pct_18_unknown,pct_14_in_state,pct_14_out_of_state,pct_14_unknown,pct_10_in_state,pct_10_out_of_state,pct_10_unknown
0,ADVANCED TO GENERAL,1049219119.87,135163430.33,14934582.7,1199317132.9,820509652.4,135234615.93,12254832.61,967999100.94,973295056.5,104553554.45,3334419.94,1081183030.89,0.87,0.11,0.01,0.85,0.14,0.01,0.9,0.1,0.0
1,DID NOT ADVANCE,424592854.12,37760351.98,5139583.69,467492789.79,131415679.49,19177594.16,5284301.35,155877575.0,247913794.65,17566914.94,731872.21,266212581.8,0.91,0.08,0.01,0.84,0.12,0.03,0.93,0.07,0.0


## Export the data

In [34]:
%%notify
writer = pd.ExcelWriter("data/national_analysis.xlsx")
contributions_by_party_full_cycles.to_excel(writer, "contributions_party_full_cycles", index=False)
contributions_by_party.to_excel(writer, "contributions_by_party", index=False)
contributions_by_office.to_excel(writer, "contributions_by_office", index=False)
contributions_by_redistricting.to_excel(writer, "contributions_by_redistricting", index=False)
contributions_by_status.to_excel(writer, "contributions_by_status", index=False)
writer.save()

<IPython.core.display.Javascript object>