# Analysis 

Now that I've validated and cleaned up my dataset, I can analyze it. I'll quickly run everything I did in the [preparation and validation notebook](https://github.com/anushadatar/split-ticket/blob/master/Prepare-Dataset.ipynb) and then go on to actually explore split-ticket voting.

# Setup

In [5]:
from pandas.api.types import is_numeric_dtype
import numpy as np
import pandas as pd

def sanitize_series(series):
    """
    Turns missing data codes in Pandas series into NaN, and then returns 
    the pandas series.
    """
    return series.map(lambda n: n if (n > 0) else np.nan)

def sanitize_df(df):
    """
    Turns missing data codes in Pandas series into NaN, and then returns 
    the pandas series.
    """
    return df.applymap(lambda n: np.nan if (is_numeric_dtype(type(n)) and n < 0) else n)

# Load the dataset.
data_path = "C:/Users/anush/OneDrive/Documents/anes_data/anes_timeseries_cdf_dta/anes_timeseries_cdf.dta"
df = pd.read_stata(data_path)

# Clean up missing data codes.
df = sanitize_df(df)

# Resample.
n = len(df)
weights = df['VCF0009z']
sample = df.sample(n, 
                     replace=True, 
                     weights=weights)


In [6]:
df

Unnamed: 0,Version,VCF0004,VCF0006,VCF0006a,VCF0009x,VCF0010x,VCF0011x,VCF0009y,VCF0010y,VCF0011y,...,VCF9272,VCF9273,VCF9274,VCF9275,VCF9277,VCF9278,VCF9279,VCF9280,VCF9281,VCF9282
0,ANES_CDF_VERSION:2019-Sep-10,1948.0,1001.0,19481001.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
1,ANES_CDF_VERSION:2019-Sep-10,1948.0,1002.0,19481002.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
2,ANES_CDF_VERSION:2019-Sep-10,1948.0,1003.0,19481003.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
3,ANES_CDF_VERSION:2019-Sep-10,1948.0,1004.0,19481004.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
4,ANES_CDF_VERSION:2019-Sep-10,1948.0,1005.0,19481005.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
5,ANES_CDF_VERSION:2019-Sep-10,1948.0,1006.0,19481006.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
6,ANES_CDF_VERSION:2019-Sep-10,1948.0,1007.0,19481007.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
7,ANES_CDF_VERSION:2019-Sep-10,1948.0,1008.0,19481008.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
8,ANES_CDF_VERSION:2019-Sep-10,1948.0,1009.0,19481009.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,
9,ANES_CDF_VERSION:2019-Sep-10,1948.0,1010.0,19481010.0,1.0,1.0,1.0,1.000,1.000,1.000,...,,,,,,,,,,


Now that I have a sanitized, resampled dataset, I need to address the first question my dataset can answer: how has the prevalence of split ticket voting changed over time?

# Split Ticket Metrics
The dataset provides a metric for split-ticketedness, but it only looks at the difference between presidential and house votes. This is limiting because it does not consider the Senate and by virtue of considering presidential choices each time it cuts the amount of years in half.

In a general election year, the following outcomes are possible:

Note D stands for Democrat, R for Republican, and I for Independent. There is no data for indpendent house or senate candidates. 

| President | Senate | House |
|-----------|--------|-------|
| R         | R      | R     |
| R         | R      | D     |
| R         | D      | R     |
| R         | D      | D     |
| D         | R      | R     |
| D         | R      | D     |
| D         | D      | R     |
| D         | D      | D     |
| I         | R      | D     |
| I         | R      | D     |
| I         | D      | R     |
| I         | D      | D     |

In a midterm election year, the following outcomes are possible:
Note D stands for Democrat, R for Republican, and I for Independent. There is no data for indpendent house or senate candidates. 

| Senate | House |
|--------|-------|
| R      | R     |
| R      | D     |
| D      | R     |
| D      | D     |


## Metric Specification

The metric I propose to use is a weighted sum of the split-ticketedness of each vote. 

The way I will measure that will be to assign each set of votes a score between 0 and 1. In doing so, I will ignore the independent candidates, which I can justify because while independent candidates generally align with the voters of a single party, there is no straightforward way to determine which independent candidate a survey participant voted for and how that aligns with the rest of their ballot.

The updated metric will work as such:

| President | Senate | House | Score |
|-----------|--------|-------|-------|
| R         | R      | R     | 1     |
| R         | R      | D     | .66   |
| R         | D      | R     | .33   |
| R         | D      | D     | 0     | 
| D         | R      | R     | .66   |
| D         | R      | D     | .33   |
| D         | D      | R     | .33   |
| D         | D      | D     | 0     |
| I         | R      | D     | .5    |
| I         | R      | D     | .5    |
| I         | D      | R     | .5    |
| I         | D      | D     | 1     |


| Senate | House | Score |
|--------|-------|-------|
| R      | R     | 1     |
| R      | D     | .5    |
| D      | R     | .5    |
| D      | D     | 1     |