# Election Data Formatting Notebook

This notebook contains code to format the raw election results data. As of February 11, 2021 not all congessional districts reported their 2020 election results.

Source: **[Daily Kos Elections' presidential results by congressional district for 2020, 2016, and 2012](https://www.dailykos.com/stories/2020/11/19/1163009/-Daily-Kos-Elections-presidential-results-by-congressional-district-for-2020-2016-and-2012)** *by David Nir*

In [1]:
import pandas as pd
import numpy as np

## Read in the data

In [2]:
raw_election_data = pd.read_csv('raw_data_2020/Daily Kos Elections 2012, 2016 & 2020 presidential election results for congressional districts used in 2020 elections.csv', header = 1)
raw_election_data.head()

Unnamed: 0,CD,Incumbent,Party,Biden,Trump,Clinton,Trump.1,Obama,Romney
0,AK-AL,Don Young,(R),43.0,53.1,37.6,52.8,41.2,55.3
1,AL-01,Jerry Carl,(R),,,34.1,63.5,37.4,61.8
2,AL-02,Barry Moore,(R),,,33.0,64.9,36.4,62.9
3,AL-03,Mike Rogers,(R),,,32.3,65.3,36.8,62.3
4,AL-04,Robert Aderholt,(R),,,17.4,80.4,24.0,74.8


There are two header columns, the first one contains the year.

## Rename the columns to match the other results
The candidates were Obama and Romney, Trump and Clinton, Biden and Trump in 2012, 2016, and 2020, respectively.

In [3]:
raw_election_data.rename(columns={'Trump': 'Trump_2020',
                                  'Biden': 'Biden_2020',
                                  'Trump.1': 'Trump_2016',
                                  'Clinton': 'Clinton_2016',
                                  'Obama': 'Obama_2012',
                                  'Romney': 'Romney_2012'
                                 }, inplace=True)

In [4]:
raw_election_data.head()

Unnamed: 0,CD,Incumbent,Party,Biden_2020,Trump_2020,Clinton_2016,Trump_2016,Obama_2012,Romney_2012
0,AK-AL,Don Young,(R),43.0,53.1,37.6,52.8,41.2,55.3
1,AL-01,Jerry Carl,(R),,,34.1,63.5,37.4,61.8
2,AL-02,Barry Moore,(R),,,33.0,64.9,36.4,62.9
3,AL-03,Mike Rogers,(R),,,32.3,65.3,36.8,62.3
4,AL-04,Robert Aderholt,(R),,,17.4,80.4,24.0,74.8


### Reorganize the data so that there is one year's result per line

A function to look up the data in the original dataframe

In [5]:
def format_annual_election_results_data(raw_data):
    raw_data_2020 = raw_data[['CD', 'Biden_2020', 'Trump_2020', 'Trump_2016', 'Clinton_2016']].copy()
    raw_data_2020.rename(columns={'Biden_2020': 'Target_Presidential_D',
                                  'Trump_2020': 'Target_Presidential_R',
                                  'Trump_2016': 'Previous_Presidential_R',
                                  'Clinton_2016': 'Previous_Presidential_D'
                                 }, inplace=True)
    raw_data_2020['Year'] = 2020
    
    raw_data_2016 = raw_data[['CD', 'Trump_2016', 'Clinton_2016', 'Obama_2012', 'Romney_2012']].copy()
    raw_data_2016.rename(columns={'Trump_2016': 'Target_Presidential_R',
                                  'Clinton_2016': 'Target_Presidential_D',
                                  'Obama_2012': 'Previous_Presidential_D',
                                  'Romney_2012': 'Previous_Presidential_R'
                                 }, inplace=True)
    raw_data_2016['Year'] = 2016
    
    
    ## Create the target and previous results data 
    results = pd.concat([raw_data_2016, raw_data_2020])[['CD',
                                                         'Year',
                                                         'Target_Presidential_R',
                                                         'Target_Presidential_D',
                                                         'Previous_Presidential_D',
                                                         'Previous_Presidential_R'
                                                        ]]
    return results
    

In [6]:
election_data = format_annual_election_results_data(raw_election_data)

In [7]:
election_data.head()

Unnamed: 0,CD,Year,Target_Presidential_R,Target_Presidential_D,Previous_Presidential_D,Previous_Presidential_R
0,AK-AL,2016,52.8,37.6,41.2,55.3
1,AL-01,2016,63.5,34.1,37.4,61.8
2,AL-02,2016,64.9,33.0,36.4,62.9
3,AL-03,2016,65.3,32.3,36.8,62.3
4,AL-04,2016,80.4,17.4,24.0,74.8


In [8]:
election_data.to_csv('raw_data_2020/election_results.csv', index = False)