# Collecting Election information
Elections have their own information that can and will affect voter sentiment, motivation and turnout. This Election based information is gathered here and saved out to its own csv file.

In [3]:
import pandas as pd

The following attributes of each of the 8 elections held during our timeframe of interest was investigated and collected, most of the data came from wikipedia:

| output column(s) | Description of data | Type |
|:---|:---|:---:|
| 'election' | Unique ID for each election held between 2012 and 2018, including this years target for predicting voting. | str |
| 'dates' | Actual date of the election. | dt |
| 'cycle ' | Is this a Congessional only election year or a Presidential election year. | cat |
| 'etype' | Is this a 'Primary' or 'General' election. | cat |
| 'president' | What is the party of the president in power at the time of the election. | cat |
| 'us_senate_maj' | How big is the controlling margin in the US Senate. Positive numbers indicate a REP maj, negative a DEM one. | Num |
| 'us_repre_maj' | How big is the controlling margin in the US House of Representatives. Positive for REP maj, negative for DEM. | Num |
| 'ca_governor' | What party did the Governor of CA belong too at the election. | cat |
| 'ca_lt_govnor' | Who was the Lieutenant Governor of CA belong too at the election. | cat |
| 'ca_senate_maj' | How big is the controlling margin in the CA Senate. Positive for REP maj, negative for DEM. | Num |
| 'ca_assembly_maj' | How big is the controlling margin in the CA Assembly. Positive for REP maj, negative for DEM. | Num |



In [4]:
data = {
    'elections':['E8_110618','E7_060518','E6_110816','E5_060716','E4_110414','E3_060314','E2_110612','E1_060512'],
    'dates':['110618','060518','110816','060716','110414','060314','110612','060512'],
    'cycle':['Cong','Cong','Pres', 'Pres','Cong','Cong','Pres','Pres'],
    'etype':['General','Primary','General','Primary','General','Primary','General','Primary'],
    'president':['REP','REP','DEM','DEM','DEM','DEM','DEM','DEM'],

    # gathered from, majorities calculated on the eve of the election
    # https://en.wikipedia.org/wiki/113th_United_States_Congress
    # https://en.wikipedia.org/wiki/113th_United_States_Congress
    # https://en.wikipedia.org/wiki/114th_United_States_Congress
    # https://en.wikipedia.org/wiki/115th_United_States_Congress
    'us_senate_maj':[4,4,10,10,-8,-8,-3,-3],
    'us_repre_maj':[42,42,60,58,34,34,50,52],

    # CA government:
    # https://en.wikipedia.org/wiki/List_of_Governors_of_California
    'ca_governor':['DEM','DEM','DEM','DEM','DEM','DEM','DEM','DEM'],
    'ca_lt_govnor':['DEM','DEM','DEM','DEM','DEM','DEM','DEM','DEM'],

    # CA information:
    # https://en.wikipedia.org/wiki/California_State_Legislature,_2011%E2%80%9312_session
    # https://en.wikipedia.org/wiki/California_State_Legislature,_2013%E2%80%9314_session
    # https://en.wikipedia.org/wiki/California_State_Legislature,_2015%E2%80%9316_session
    # https://en.wikipedia.org/wiki/California_State_Legislature,_2017%E2%80%9318_session
    'ca_senate_maj':[-13,-13,-13,-13,-13,-13,-11,-11],
    'ca_assembly_maj':[-28,-28,-24,-24,-31,-31,-25,-25]
}


In [5]:
ed = pd.DataFrame(data)
ed.set_index('elections', inplace=True)
ed.dates = pd.to_datetime(ed.dates)
ed[ed.select_dtypes(['object']).columns] = ed.select_dtypes(['object'
                                                            ]).apply(lambda 
                                                                     x: x.astype('category'))

In [6]:
ed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, E8_110618 to E1_060512
Data columns (total 10 columns):
dates              8 non-null datetime64[ns]
cycle              8 non-null category
etype              8 non-null category
president          8 non-null category
us_senate_maj      8 non-null int64
us_repre_maj       8 non-null int64
ca_governor        8 non-null category
ca_lt_govnor       8 non-null category
ca_senate_maj      8 non-null int64
ca_assembly_maj    8 non-null int64
dtypes: category(5), datetime64[ns](1), int64(4)
memory usage: 888.0+ bytes


In [7]:
date = pd.Timestamp("today").strftime("%Y%m%d")
ed.to_csv('data_clean/{}_election_data.csv'.format(date))

### Change Management
20180621
- moved to a single scale for majority columns, Positive numbers indicate a Republican majority, Negative numbers indicate a Democrate majority.