# Recreating the FiveThirtyEight 2016 Election Forecast

![Title](https://raw.githubusercontent.com/ahoaglandnu/election/master/538_prob.png "Title")

# Summary of Findings

It appears as though FiveThirtyEight relied on a simulation to produce the election forecast probability and NOT its own published state-by-state probabilities.   

By running simulations that add uncertainty to the probabilities for swing states and the toss-up states,  we can replicate FiveThirtyEight Presidential Election Forecasts.

Through this process, we did not identify anything that would have predicted the outcomes for the specific polling state errors for Wisconsin, Michigan, and Pennsylvania.

### Examining the State Probabilities

In [1]:
import numpy as np
import pandas as pd
import random

### We start with a dataframe of each state and district probabilities.

State probabilities can be found at: [FiveThirtyEight](https://projects.fivethirtyeight.com/2016-election-forecast/#plus)

In [2]:
df = pd.read_csv('fivethirtyeight2016.csv')

In [3]:
df.head()

Unnamed: 0,state,ec,clinton_538,trump_538
0,D.C.,3,99.99,0.01
1,California,55,99.99,0.01
2,Maryland,10,99.99,0.01
3,Hawaii,4,99.0,1.0
4,Vermont,3,98.0,2.0


In [4]:
df.tail()

Unnamed: 0,state,ec,clinton_538,trump_538
51,Idaho,4,1.0,99.0
52,Kentucky,8,0.01,99.99
53,Oklahoma,7,0.01,99.99
54,Wyoming,3,1.0,99.0
55,Nebraska (CD 3)*,1,1.0,99.0


### The electoral college totals for all states 50% and higher for each candidate

In [5]:
print('Trump Most Likely Electoral College Results:', np.sum(df['ec'][df['trump_538'] >= 50]))

Trump Most Likely Electoral College Results: 215


In [6]:
print('Clinton Most Likely Electoral College Results:', np.sum(df['ec'][df['clinton_538'] >= 50]))

Clinton Most Likely Electoral College Results: 323


### But the forecasted outcome was Trump: 236, Clinton: 302

![EC Map](https://raw.githubusercontent.com/ahoaglandnu/election/master/538_ec.png "EC Map")

### The posted state probabilities are NOT the model probabilities

# A Simulation for the 538 Election Forecast Probabilities and Electoral Votes

In [7]:
def electoral_college(ec, cand, state, sims=10, polling_error=False):
    cand_wins = 0
    cand_ec_total = []
    cand_states = []
        
    for i in range(sims):
        cand_ec = 0
        cand_state = []
        correl_error = np.random.uniform()
        poll_error = int(np.random.standard_t(11))
        for x, y, z in zip(cand, state, ec):
            if polling_error == True:
                x = x + poll_error
            sim_election = np.random.uniform()*100
            if x > sim_election:
                cand_ec += z
                cand_state.append(y)
        cand_ec_total.append(cand_ec)
        cand_states.append(cand_state)
        if cand_ec > 269:
            cand_wins += 1
    return cand_wins, cand_ec_total, cand_states

### 20,000 simulated elections using the state and district probabilities

In [8]:
print("Recreate 538 Probabilities and Electoral College Results")
print()
sims = 20000
ec = list(df.ec.values)
states = list(df.state.values)
cand_1 = list(df['trump_538'].values)
cand_1_wins, cand_1_ec_totals, cand_1_states = electoral_college(ec, cand_1, states, sims=sims)
print('Trump Average EC:', np.average(cand_1_ec_totals))
print('Trump Win Prob:', (cand_1_wins/sims)*100)
print()
cand_2 = list(df['clinton_538'].values)
cand_2_wins, cand_2_ec_totals, cand_2_states = electoral_college(ec, cand_2, states,sims=sims)
print('Clinton Average EC:', np.average(cand_2_ec_totals))
print('Clinton Win Prob:', (cand_2_wins/sims)*100)

Recreate 538 Probabilities and Electoral College Results

Trump Average EC: 235.09815
Trump Win Prob: 10.225

Clinton Average EC: 302.29805
Clinton Win Prob: 88.47500000000001


# The EC outcome matches but why is the probability so high?

If you look at the methodology documentation, it mentions that the simulation injects a random national polling error. 

[FiveThirtyEight Methodology Explanation](https://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/)

### Adding the national polling error

In [9]:
print("538 Probabilities and Electoral College Results with National Polling Error")
print()
sims = 20000
ec = list(df.ec.values)
states = list(df.state.values)
cand_1 = list(df['trump_538'].values)
cand_1_wins, cand_1_ec_totals, cand_1_states = electoral_college(ec, cand_1, states, sims=sims,polling_error=True)
print('Trump Average EC:', np.average(cand_1_ec_totals))
print('Trump Win Prob:', (cand_1_wins/sims)*100)
print()
cand_2 = list(df['clinton_538'].values)
cand_2_wins, cand_2_ec_totals, cand_2_states = electoral_college(ec, cand_2, states,sims=sims,polling_error=True)
print('Clinton Average EC:', np.average(cand_2_ec_totals))
print('Clinton Win Prob:', (cand_2_wins/sims)*100)

538 Probabilities and Electoral College Results with National Polling Error

Trump Average EC: 235.35695
Trump Win Prob: 10.63

Clinton Average EC: 302.05975
Clinton Win Prob: 87.985


### There was no significant change with a national polling error, so we will add a new rule for the states FiveThirtyEight identified as swing states.

### Create a swing state polling error

In [10]:
def electoral_college(ec, cand, state, sims=10, polling_error=False):
    cand_wins = 0
    cand_ec_total = []
    cand_states = []
    swing_states = ['Colorado', 'Florida', 'Iowa', 'Michigan', 'Nevada',
                'New Hampshire', 'North Carolina', 'Ohio', 'Pennsylvania',
                'Virginia', 'Wisconsin','Minnesota','Arizona','New Mexico']
    
    for i in range(sims):
        cand_ec = 0
        cand_state = []
        poll_error = int(np.random.standard_t(10))
        swing_error = int(np.random.standard_t(6))
        sim_election = np.random.uniform()*100
        for x, y, z in zip(cand, state, ec):
            if y in swing_states: 
                x = x + swing_error
            if polling_error == True:
                x = x + poll_error
            if x > sim_election:
                cand_ec += z
                cand_state.append(y)
        cand_ec_total.append(cand_ec)
        cand_states.append(cand_state)
        if cand_ec > 269:
            cand_wins += 1
    return cand_wins, cand_ec_total, cand_states

In [11]:
print("538 Probabilities and Electoral College Results with Swing States and National Polling Errors")
print()
sims = 20000
ec = list(df.ec.values)
states = list(df.state.values)
cand_1 = list(df['trump_538'].values)
cand_1_wins, cand_1_ec_totals, cand_1_states = electoral_college(ec, cand_1, states, sims=sims,polling_error=True)
print('Trump Average EC:', np.average(cand_1_ec_totals))
print('Trump Win Prob:', (cand_1_wins/sims)*100)
print()
cand_2 = list(df['clinton_538'].values)
cand_2_wins, cand_2_ec_totals, cand_2_states = electoral_college(ec, cand_2, states,sims=sims,polling_error=True)
print('Clinton Average EC:', np.average(cand_2_ec_totals))
print('Clinton Win Prob:', (cand_2_wins/sims)*100)

538 Probabilities and Electoral College Results with Swing States and National Polling Errors

Trump Average EC: 235.30035
Trump Win Prob: 30.154999999999998

Clinton Average EC: 302.02295
Clinton Win Prob: 69.685


### We can replicate the 538 Electoral College results but the probability is still off. 

### If we examine the states with a near 50% probability, we can create a rule especially for them keeping with the "correlated" states methodology of 538.

In [12]:
def electoral_college(ec, cand, state, sims=10, polling_error=False):
    cand_wins = 0
    cand_ec_total = []
    cand_states = []

    swing_states = ['Colorado', 'Florida', 'Iowa', 'Michigan', 'Nevada',
                'New Hampshire', 'North Carolina', 'Ohio', 'Pennsylvania',
                'Virginia', 'Wisconsin','Minnesota','Arizona','New Mexico']
    close_states = ['Florida', 'Nevada', 'North Carolina']
    
    for i in range(sims):
        cand_ec = 0
        cand_state = []
        correl_error = np.random.uniform()
        poll_error = int(np.random.standard_t(10)) 
        swing_error = int(np.random.standard_t(5)) 
        sim_election = np.random.uniform()*100
        for x, y, z in zip(cand, state, ec):
            if correl_error <= .2 and y in close_states: 
                x = 85 
            if correl_error >= .85 and y in close_states: 
                x = 0 
            if y in swing_states: 
                x = x + swing_error
            if polling_error == True:
                x = x + poll_error
            if x > sim_election:
                cand_ec += z
                cand_state.append(y)
        cand_ec_total.append(cand_ec)
        cand_states.append(cand_state)
        if cand_ec > 269:
            cand_wins += 1
    return cand_wins, cand_ec_total, cand_states

### For this final simulation, we will increase the number of simulations from 20,000 to 100,000

In [13]:
print("Recreate 538 Probabilities and Electoral College Results")
print()
sims = 100000
ec = list(df.ec.values)
states = list(df.state.values)
cand_1 = list(df['trump_538'].values)
cand_1_wins, cand_1_ec_totals, cand_1_states = electoral_college(ec, cand_1, states, sims=sims,polling_error=True)
print('Trump Average EC:', np.average(cand_1_ec_totals))
print('Trump Win Prob:', (cand_1_wins/sims)*100)
print()
cand_2 = list(df['clinton_538'].values)
cand_2_wins, cand_2_ec_totals, cand_2_states = electoral_college(ec, cand_2, states,sims=sims,polling_error=True)
print('Clinton Average EC:', np.average(cand_2_ec_totals))
print('Clinton Win Prob:', (cand_2_wins/sims)*100)

Recreate 538 Probabilities and Electoral College Results

Trump Average EC: 236.11207
Trump Win Prob: 28.343

Clinton Average EC: 300.85173
Clinton Win Prob: 72.45899999999999


### We have nearly replicated the probabilities and electoral college results

![Title](https://raw.githubusercontent.com/ahoaglandnu/election/master/538_prob.png "Title")
![EC Map](https://raw.githubusercontent.com/ahoaglandnu/election/master/538_ec.png "EC Map")

# Conclusions

It appears as though FiveThirtyEight relied on a simulation to produce the election forecast probability and NOT its own published state-by-state probabilities.   

By running simulations that add uncertainty to the probabilities for swing states and the toss-up states,  we can replicate FiveThirtyEight Presidential Election Forecasts.

Through this process, we did not identify anything that would have predicted the outcomes for the specific polling state errors for Wisconsin, Michigan, and Pennsylvania.