# Out-of-State-Contributions: Data Importation and Preparation

In [1]:
import numpy as np
import pandas as pd
import us
from pandas.tseries.offsets import MonthEnd

%load_ext jupyternotify

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 500)
pd.options.display.float_format = "{:,.2f}".format # Format floats

<IPython.core.display.Javascript object>

## Import and format the data

Import and format contribution-level data from the [National Institute on Money in Politics](https://www.followthemoney.org/) for gubernatorial, state senate and state house candidates in 2018, 2014 and 2010.

Download and save each cycle's contributions data and concatenate the data into a single file.

In [2]:
#!sh process_contributions.sh

Import the contributions data.

In [3]:
%%notify
contributions = pd.read_csv("data/raw/contributions.csv", usecols=["Candidate:id", "Candidate", "Election_Status", "General_Party", "Election_Jurisdiction", "Election_Year", "Office_Sought", "Contributor", "Amount", "Date", "Street", "City", "State", "Zip", "In-State"], error_bad_lines=False)
contributions.columns = ["candidate_id", "candidate", "election_status", "party", "state", "year", "office", "contributor", "amount", "date", "contributor_street", "contributor_city", "contributor_state", "contributor_zip", "in_out_state"]
contributions.info()

  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9264367 entries, 0 to 9264366
Data columns (total 15 columns):
candidate_id          int64
candidate             object
election_status       object
party                 object
state                 object
year                  int64
office                object
contributor           object
amount                object
date                  object
contributor_street    object
contributor_city      object
contributor_state     object
contributor_zip       float64
in_out_state          float64
dtypes: float64(2), int64(2), object(11)
memory usage: 1.0+ GB


<IPython.core.display.Javascript object>

Convert the amount column to float data type, the contribution date column to datetime data type and the contributor zip column to object data type.

In [4]:
contributions["amount"] = pd.to_numeric(contributions["amount"], errors="coerce")
contributions["date"] = pd.to_datetime(contributions["date"], errors="coerce")
contributions["contributor_zip"] = contributions["contributor_zip"].astype(object)
contributions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9264367 entries, 0 to 9264366
Data columns (total 15 columns):
candidate_id          int64
candidate             object
election_status       object
party                 object
state                 object
year                  int64
office                object
contributor           object
amount                float64
date                  datetime64[ns]
contributor_street    object
contributor_city      object
contributor_state     object
contributor_zip       object
in_out_state          float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(10)
memory usage: 1.0+ GB


Filter out unitemized contributions as we cannot ascertain from where those contributions came.

In [5]:
contributions = contributions[contributions["contributor"] != "UNITEMIZED DONATIONS"]
contributions.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9117716 entries, 0 to 9264366
Data columns (total 15 columns):
candidate_id          int64
candidate             object
election_status       object
party                 object
state                 object
year                  int64
office                object
contributor           object
amount                float64
date                  datetime64[ns]
contributor_street    object
contributor_city      object
contributor_state     object
contributor_zip       object
in_out_state          float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(10)
memory usage: 1.1+ GB


Filter out contributions to candidates who raised less than $1,000.

In [6]:
contributions_by_candidate = contributions.groupby("candidate_id")["amount"].sum().reset_index()
contributions_by_candidate = contributions_by_candidate[contributions_by_candidate["amount"] >= 1000]
contributions = contributions.merge(contributions_by_candidate, on="candidate_id", how="inner")
contributions.drop("amount_y", axis=1, inplace=True)
contributions.rename(columns={"amount_x": "amount"}, inplace=True)
contributions.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9106128 entries, 0 to 9106127
Data columns (total 15 columns):
candidate_id          int64
candidate             object
election_status       object
party                 object
state                 object
year                  int64
office                object
contributor           object
amount                float64
date                  datetime64[ns]
contributor_street    object
contributor_city      object
contributor_state     object
contributor_zip       object
in_out_state          float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(10)
memory usage: 1.1+ GB


Rename the categories in the in-vs.-out-of-state column.

In [7]:
# 0 = out-of-state, 1 = in-state, 2 = unknown
contributions["in_out_state"] = contributions["in_out_state"].replace({0: "out-of-state", 1: "in-state", 2: "unknown"})
contributions.groupby("in_out_state").size()

in_out_state
in-state        7650352
out-of-state    1326454
unknown          129320
dtype: int64

Create a standardized office column.

In [8]:
%%notify
contributions["standardized_office"] = np.where(contributions["office"].str.contains("governor", case=False), "GOVERNOR/LIEUTENANT GOVERNOR",
                                       np.where(contributions["office"].str.contains("senate", case=False), "STATE HOUSE/ASSEMBLY/SENATE",
                                       np.where(contributions["office"].str.contains("house", case=False), "STATE HOUSE/ASSEMBLY/SENATE",
                                       np.where(contributions["office"].str.contains("assembly", case=False), "STATE HOUSE/ASSEMBLY/SENATE", ""))))
contributions.groupby("standardized_office").size()

standardized_office
GOVERNOR/LIEUTENANT GOVERNOR    3891030
STATE HOUSE/ASSEMBLY/SENATE     5215098
dtype: int64

<IPython.core.display.Javascript object>

Create a standardized election status column.

In [9]:
%%notify
advanced_to_general = ["Deceased-General", "Disqualified-General", "Default Winner-General",
                       "Default Winner-Primary","Lost-General", "Lost-General Runoff", "Lost-Retention",
                       "Pending-General", "Tied-General", "Withdrew-General", "Won-General",
                       "Won-General Runoff", "Won-Primary", "Won-Primary Runoff", "Won-Top Two Primary"]
did_not_advance = ["Disqualified-Primary", "Lost-Convention", "Lost-Primary", "Lost-Primary Runoff",
                              "Lost-Top Two Primary", "Pending-Primary", "Pending-Primary Runoff",
                   "Tied-Primary", "Withdrew-Primary", "Withdrew-Primary Runoff"]
contributions["standardized_status"] = np.where(contributions["election_status"].isin(advanced_to_general),
                                                "ADVANCED TO GENERAL",
                                       np.where(contributions["election_status"].isin(did_not_advance),
                                                "DID NOT ADVANCE", ""))
contributions.groupby("standardized_status").size()

standardized_status
ADVANCED TO GENERAL    7784439
DID NOT ADVANCE        1321689
dtype: int64

<IPython.core.display.Javascript object>

Create a table of full cycle contributions prior to applying the contribution date filter.

In [10]:
contributions_full_cycles = contributions
contributions_full_cycles.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9106128 entries, 0 to 9106127
Data columns (total 17 columns):
candidate_id           int64
candidate              object
election_status        object
party                  object
state                  object
year                   int64
office                 object
contributor            object
amount                 float64
date                   datetime64[ns]
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        object
in_out_state           object
standardized_office    object
standardized_status    object
dtypes: datetime64[ns](1), float64(1), int64(2), object(13)
memory usage: 1.2+ GB


## Calculate a cut-off point for prior election cycles

Our next task is to determine a data cut-off date for prior election cycles so we can make accurate comparisons across cycles.

Extract the month and year from the contribution date column for 2018 election cycle data.

In [27]:
contributions_18 = contributions[contributions["year"] == 2018]
contributions_18["month"] = contributions_18["date"].dt.to_period("M")
contributions_18.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2697099 entries, 0 to 2697098
Data columns (total 18 columns):
candidate_id           int64
candidate              object
election_status        object
party                  object
state                  object
year                   int64
office                 object
contributor            object
amount                 float64
date                   datetime64[ns]
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        object
in_out_state           object
standardized_office    object
standardized_status    object
month                  object
dtypes: datetime64[ns](1), float64(1), int64(2), object(14)
memory usage: 391.0+ MB


In [28]:
%%notify
grouped_by_month = contributions_18.groupby(["state", "month"])["amount"].sum().reset_index()
contributions_18.drop("month", axis=1, inplace=True)
grouped_by_month.head(1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,state,month,amount
0,AK,2013-08,50.0


<IPython.core.display.Javascript object>

Because we want to use each state's month column as the cut-off date for contributions, we need to add a day to the month and the year and then convert the column to datetime data type.

In [30]:
#grouped_by_month["month"] = grouped_by_month["month"].astype(str) + "-28" # No month has fewer than 28 days
#grouped_by_month["month"] = pd.to_datetime(grouped_by_month["month"], errors="coerce")
grouped_by_month["last_day"] = pd.to_datetime(grouped_by_month["month"], format="%Y%m") + MonthEnd(1)
grouped_by_month.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1644 entries, 0 to 1643
Data columns (total 4 columns):
state       1644 non-null object
month       1644 non-null datetime64[ns]
amount      1644 non-null float64
last_day    1644 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(1), object(1)
memory usage: 51.5+ KB


In [32]:
grouped_by_month.head()

Unnamed: 0,state,month,amount,last_day
0,AK,2013-08-28,50.0,2013-08-31
1,AK,2017-04-28,223.93,2017-04-30
2,AK,2017-05-28,1177.04,2017-05-31
3,AK,2017-06-28,19250.0,2017-06-30
4,AK,2017-07-28,14255.21,2017-07-31


We know some of the contribution dates are wrong. We know this because some of the dates occur in the future and, unless we've got some time travelling campaign donors, these are data entry errors. To eliminate this noise, we will filter out months after October 2018.

In [33]:
grouped_by_month = grouped_by_month[grouped_by_month["month"] <= "2018-10-28"]

Return the most recent month of contributions for each state.

In [34]:
latest_month = grouped_by_month.groupby("state")["last_day"].max().reset_index()
#latest_month.rename(columns={"month": "latest_month"}, inplace=True)
latest_month

Unnamed: 0,state,last_day
0,AK,2018-08-31
1,AL,2018-07-31
2,AR,2018-03-31
3,AZ,2017-12-31
4,CA,2018-09-30
5,CO,2018-09-30
6,CT,2018-08-31
7,DE,2018-09-30
8,FL,2018-09-30
9,GA,2018-07-31


Filter out the states whose most recent month of contributions falls in 2017.

In [35]:
latest_month = latest_month[latest_month["last_day"] >= "2018-01-01"].reset_index(drop=True)
latest_month

Unnamed: 0,state,last_day
0,AK,2018-08-31
1,AL,2018-07-31
2,AR,2018-03-31
3,CA,2018-09-30
4,CO,2018-09-30
5,CT,2018-08-31
6,DE,2018-09-30
7,FL,2018-09-30
8,GA,2018-07-31
9,HI,2018-08-31


## Apply the cut-off date to the contributions data.

Join the table of the 2018 cycle's latest contribution months with the contribution-level data.

In [36]:
contributions = contributions.merge(latest_month, on="state")

Convert the year in the latest month column to its equivalent in the relevant election cycle.

In [42]:
contributions["last_day"] = contributions["last_day"].mask(contributions["year"] == 2014,
                                           contributions["last_day"] - pd.to_timedelta(4, unit="y"))
contributions["last_day"] = contributions["last_day"].mask(contributions["year"] == 2010,
                                           contributions["last_day"] - pd.to_timedelta(8, unit="y"))
# Remove time values from latest month column
contributions["last_day"] = pd.DatetimeIndex(contributions["last_day"]).normalize()

Filter the data to eliminate contributions after the 2018 cycle's latest contribution month in each state.

In [43]:
contributions.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8567799 entries, 0 to 8567798
Data columns (total 18 columns):
candidate_id           int64
candidate              object
election_status        object
party                  object
state                  object
year                   int64
office                 object
contributor            object
amount                 float64
date                   datetime64[ns]
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        object
in_out_state           object
standardized_office    object
standardized_status    object
last_day               datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int64(2), object(13)
memory usage: 1.2+ GB


In [44]:
contributions = contributions[contributions["date"] <= contributions["last_day"]]
contributions.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2633309 entries, 0 to 8530605
Data columns (total 18 columns):
candidate_id           int64
candidate              object
election_status        object
party                  object
state                  object
year                   int64
office                 object
contributor            object
amount                 float64
date                   datetime64[ns]
contributor_street     object
contributor_city       object
contributor_state      object
contributor_zip        object
in_out_state           object
standardized_office    object
standardized_status    object
last_day               datetime64[ns]
dtypes: datetime64[ns](2), float64(1), int64(2), object(13)
memory usage: 381.7+ MB


In [48]:
contributions[contributions["year"] == 2014]

Unnamed: 0,candidate_id,candidate,election_status,party,state,year,office,contributor,amount,date,contributor_street,contributor_city,contributor_state,contributor_zip,in_out_state,standardized_office,standardized_status,last_day
333383,181060,"WILSON JR, BRUCE H",Lost-General,Republican,CT,2014,SENATE DISTRICT 012,"MILLMAN, KEVIN I",50.0,2010-05-29,CT,MADISON,CT,6443.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-08-31
1866484,171381,"BROWN, J PAUL",Won-General,Republican,CO,2014,HOUSE DISTRICT 059,"BROWN, BONNIE",25.0,2010-08-16,CO,ARVADA,CO,80002.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-09-30
1866520,171381,"BROWN, J PAUL",Won-General,Republican,CO,2014,HOUSE DISTRICT 059,"BROWN, BONNIE",-25.0,2010-08-16,CO,ARVADA,CO,80002.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-09-30
1891378,171243,"NICHOLSON, NORMA JEANNE",Lost-General,Democratic,CO,2014,SENATE DISTRICT 016,"JOHNSON, BARB",117.0,2010-03-07,CO,EVERGREEN,CO,80439.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-09-30
1892764,171243,"NICHOLSON, NORMA JEANNE",Lost-General,Democratic,CO,2014,SENATE DISTRICT 016,"JOHNSON, BARB",-117.0,2010-02-17,CO,EVERGREEN,CO,80439.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-09-30
2193109,172955,"HENNESSEY, TIMOTHY F",Won-General,Republican,PA,2014,HOUSE DISTRICT 026,"DRAKE, MARGARET & DANIEL",100.0,2003-03-19,PA,WEST CHESTER,PA,19380.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-10-31
2222582,172795,"OBRIEN, MICHAEL H",Won-General,Democratic,PA,2014,HOUSE DISTRICT 175,"TUCKER, DIANNE DENBO",500.0,2003-09-17,PA,PHILA,PA,19106.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-10-31
2395496,169766,"HOLT, ANDREW H (ANDY)",Won-General,Republican,TN,2014,HOUSE DISTRICT 076,JACKSON CLINIC,2000.0,2004-09-24,TN,JACKSON,TN,38305.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-07-31
2402004,169675,"ROACH, DENNIS (COACH)",Lost-Primary,Republican,TN,2014,HOUSE DISTRICT 035,TENNESSEE STATE EMPLOYEES ASSOCIATION,1000.0,2001-07-08,TN,NASHVILLE,TN,37206.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,DID NOT ADVANCE,2010-07-31
2406947,169733,"SARGENT JR, CHARLES M",Won-General,Republican,TN,2014,HOUSE DISTRICT 061,TENNESSEE ASSOCIATION OF REALTORS,500.0,2001-10-15,TN,NASHVILLE,TN,37212.0,in-state,STATE HOUSE/ASSEMBLY/SENATE,ADVANCED TO GENERAL,2010-07-31


In [None]:
contributions_old = pd.read_csv("data/contributions.csv")

## Add redistricting rules to the 2018 election cycle's data

Our next task is to incorporate each state's redistricting rules in our analysis. This will allow us to determine whether a particular office's role in that state's redistricting process has an effect on the proportion of out-of-state contributions flowing to its race.

In [None]:
redistricting = pd.read_csv("data/raw/redistricting_rules.csv")
redistricting.info()

We need to join the contribution-level data with the table of state redistring rules. In order to do so, we will add a state abbreviation column to the redistricting rules.

In [None]:
states = pd.DataFrame(list(us.states.mapping("name", "abbr").items()), columns=["state", "abbreviation"])
states.info()

Join the table of state redistricting rules and state abbreviations.

In [None]:
redistricting = redistricting.merge(states, on="state")
redistricting

Join the table of 2018 contribution-level data with the redistricting rules.

In [None]:
contributions_18 = contributions[contributions["year"] == 2018]
contributions_18 = contributions_18.merge(redistricting, left_on="state", right_on="abbreviation")
contributions_18.drop(["state_y", "abbreviation"], axis=1, inplace=True)
contributions_18.rename(columns={"state_x": "state"}, inplace=True)
contributions_18.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2619767 entries, 0 to 2619766
Data columns (total 22 columns):
candidate_id              int64
candidate                 object
election_status           object
party                     object
state                     object
year                      int64
office                    object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           object
in_out_state              object
standardized_office       object
standardized_status       object
latest_month              datetime64[ns]
independent_commission    object
single_house_district     object
no_veto                   object
two_year_term             object
dtypes: datetime64[ns](2), float64(1), int64(2), object(17)
memory usage: 459.7+ MB


Filter contributions to those in races where the office plays a role in redistricting.

In [None]:
redistricting_contributions_18 = contributions_18[((contributions_18["standardized_office"] == "GOVERNOR/LIEUTENANT GOVERNOR") &
                                                   (contributions_18["single_house_district"] == "N") &
                                                   (contributions_18["independent_commission"] == "N") &
                                                   (contributions_18["no_veto"] == "N")) |
                                               ((contributions_18["standardized_office"] == "STATE HOUSE/ASSEMBLY/SENATE") &
                                                   (contributions_18["single_house_district"] == "N") &
                                                   (contributions_18["independent_commission"] == "N") &
                                                   (contributions_18["two_year_term"] == "N"))
                                              ]
redistricting_contributions_18["redistricting_role"] = "Y"
redistricting_contributions_18.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2232102 entries, 0 to 2600513
Data columns (total 23 columns):
candidate_id              int64
candidate                 object
election_status           object
party                     object
state                     object
year                      int64
office                    object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           object
in_out_state              object
standardized_office       object
standardized_status       object
latest_month              datetime64[ns]
independent_commission    object
single_house_district     object
no_veto                   object
two_year_term             object
redistricting_role        object
dtypes: datetime64[ns](2), float64(1), int64(2), object(18)
memory usage: 408.7+ MB


Filter contributions to those in races where the office does not play a role in redistricting.

In [None]:
non_redistricting_contributions_18 = contributions_18[((contributions_18["standardized_office"] == "GOVERNOR/LIEUTENANT GOVERNOR") &
                                                   ((contributions_18["single_house_district"] == "Y") |
                                                   (contributions_18["independent_commission"] == "Y") |
                                                   (contributions_18["no_veto"] == "Y"))) |
                                                   ((contributions_18["standardized_office"] == "STATE HOUSE/ASSEMBLY/SENATE")) &
                                                   ((contributions_18["single_house_district"] == "Y") |
                                                   (contributions_18["independent_commission"] == "Y") |
                                                   (contributions_18["two_year_term"] == "Y"))
                                                  ]
non_redistricting_contributions_18["redistricting_role"] = "N"
non_redistricting_contributions_18.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.


<class 'pandas.core.frame.DataFrame'>
Int64Index: 387665 entries, 52277 to 2619766
Data columns (total 23 columns):
candidate_id              387665 non-null int64
candidate                 387665 non-null object
election_status           387665 non-null object
party                     387665 non-null object
state                     387665 non-null object
year                      387665 non-null int64
office                    387665 non-null object
contributor               387665 non-null object
amount                    387665 non-null float64
date                      387665 non-null datetime64[ns]
contributor_street        387665 non-null object
contributor_city          386675 non-null object
contributor_state         386888 non-null object
contributor_zip           386142 non-null object
in_out_state              387665 non-null object
standardized_office       387665 non-null object
standardized_status       387665 non-null object
latest_month              387665 non-null da

Confirm the filtering worked.

In [None]:
redistricting_contributions_18[(redistricting_contributions_18["standardized_office"] == "GOVERNOR/LIEUTENANT GOVERNOR") & ((redistricting_contributions_18["single_house_district"] == "Y") | (redistricting_contributions_18["independent_commission"] == "Y") | (redistricting_contributions_18["no_veto"] == "Y"))]

Unnamed: 0,candidate_id,candidate,election_status,party,state,year,office,contributor,amount,date,contributor_street,contributor_city,contributor_state,contributor_zip,in_out_state,standardized_office,standardized_status,latest_month,independent_commission,single_house_district,no_veto,two_year_term,redistricting_role


In [None]:
non_redistricting_contributions_18[(non_redistricting_contributions_18["standardized_office"] == "GOVERNOR/LIEUTENANT GOVERNOR") & (non_redistricting_contributions_18["single_house_district"] == "N") & (non_redistricting_contributions_18["independent_commission"] == "N") & (non_redistricting_contributions_18["no_veto"] == "N")]

Unnamed: 0,candidate_id,candidate,election_status,party,state,year,office,contributor,amount,date,contributor_street,contributor_city,contributor_state,contributor_zip,in_out_state,standardized_office,standardized_status,latest_month,independent_commission,single_house_district,no_veto,two_year_term,redistricting_role


In [None]:
redistricting_contributions_18[((redistricting_contributions_18["standardized_office"] == "STATE HOUSE/ASSEMBLY") | (redistricting_contributions_18["standardized_office"] == "STATE SENATE")) & ((redistricting_contributions_18["single_house_district"] == "Y") | (redistricting_contributions_18["independent_commission"] == "Y") | (redistricting_contributions_18["two_year_term"] == "Y"))]

Unnamed: 0,candidate_id,candidate,election_status,party,state,year,office,contributor,amount,date,contributor_street,contributor_city,contributor_state,contributor_zip,in_out_state,standardized_office,standardized_status,latest_month,independent_commission,single_house_district,no_veto,two_year_term,redistricting_role


In [None]:
non_redistricting_contributions_18[((non_redistricting_contributions_18["standardized_office"] == "STATE HOUSE/ASSEMBLY") | (non_redistricting_contributions_18["standardized_office"] == "STATE SENATE")) & (non_redistricting_contributions_18["single_house_district"] == "N") & (non_redistricting_contributions_18["independent_commission"] == "N") & (non_redistricting_contributions_18["two_year_term"] == "N")]

Unnamed: 0,candidate_id,candidate,election_status,party,state,year,office,contributor,amount,date,contributor_street,contributor_city,contributor_state,contributor_zip,in_out_state,standardized_office,standardized_status,latest_month,independent_commission,single_house_district,no_veto,two_year_term,redistricting_role


Concatenate the redistricting and non-redistricting contributions data.

In [None]:
contributions_18 = pd.concat([redistricting_contributions_18, non_redistricting_contributions_18])
contributions_18.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2619767 entries, 0 to 2619766
Data columns (total 23 columns):
candidate_id              int64
candidate                 object
election_status           object
party                     object
state                     object
year                      int64
office                    object
contributor               object
amount                    float64
date                      datetime64[ns]
contributor_street        object
contributor_city          object
contributor_state         object
contributor_zip           object
in_out_state              object
standardized_office       object
standardized_status       object
latest_month              datetime64[ns]
independent_commission    object
single_house_district     object
no_veto                   object
two_year_term             object
redistricting_role        object
dtypes: datetime64[ns](2), float64(1), int64(2), object(18)
memory usage: 479.7+ MB


## Export the data

Add a redistricting role column to the 2014 and 2010 contributions data.

In [None]:
contributions_14 = contributions[contributions["year"] == 2014]
contributions_10 = contributions[contributions["year"] == 2010]
contributions_14["redistricting_role"] = ""
contributions_10["redistricting_role"] = ""

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Concatenate the 2010, 2014 and 2018 contributions data.

In [None]:
contributions = pd.concat([contributions_18, contributions_14, contributions_10]).reset_index(drop=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


Re-order the contributions table's columns.

In [None]:
contributions = contributions[["candidate", "candidate_id", "year", "state", "party", "election_status", "contributor",
                               "amount", "date", "contributor_street", "contributor_city", "contributor_state", "contributor_zip",
                               "in_out_state", "no_veto", "office", "latest_month", "redistricting_role", "independent_commission",
                               "single_house_district", "standardized_office", "standardized_status", "two_year_term"]]
contributions.info()

Export the contribution-level data for the 2010, 2014 and 2018 election cycles with filters applied and redistricting rules added.

In [None]:
%%notify
contributions.to_csv("data/contributions.csv", index=False)

Export the contribution-level data for the 2010, 2014 and 2018 election cycles without filters applied and redistricting rules added.

In [None]:
%%notify
contributions_full_cycles.to_csv("data/contributions_full_cycles.csv", index=False)