# US PRESIDENTIAL ELECTIONS

## 1. Introduction
In this project we import data from the US Census Bureau using API codes in order to retrieve data about the US population. We merge this dataset with a dataset on the presidential elections from MIT Election Data. We also try to do a Fixed Effect Estimation but have not suceeded in visualizing this just yet. Therefore, we visualize the election results in different ways in order to show how different visualizations can portray the results in another light. 

#### 1.1 Importing and setting magics

In [1]:
# Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import requests
import warnings
from numpy import linalg as la
import plotly.graph_objects as go
import ipywidgets as widgets
from IPython.display import display, clear_output
import LinearModelsWeek2_post as lm # This py-file is from the course Advanced Micro Economentrics and is not our own code!

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# 2. Read, clean and merge data

#### 2.1 US Census Data 
We import population data using API keys from US Census data, specifically the American Community Survey. 
1. We sign up in order to get acess to the data, then we make a variable with the acess code.
2. Then we make a list of the years we want to get population data from and define the selected variables we want to import. 
3. We iterate through the list of years in order to get the data for each year and then we combine all of the years into one single dataframe. 
4. Then we make a loop that creates a dataframe containing data on population in the years we have selected.
5. Now we have one dataset with all the population categories and years we need. 
6. We then change the state names to abbreviations to make it a bit more efficient and easier when visualizing.  

In [2]:
# Census code
census_api_key = '9028fd3fd9edc961350156ded10028cc74c92576' # This our personal code, it will work on your computer

# list of years for Census API
year_list = ['2009', '2012','2016','2020']

# The variables of interest
variables = {
    'total_population': 'B01001_001E',
    'white_population': 'B01001H_001E',
    'black_population': 'B01001B_001E',
    'hispanic_population': 'B01001I_001E'
}

# Making a loop in order to get datasets for the variables
dfs = {}
for var_name, var_code in variables.items():
    df_list = []
    for year in year_list:
        url = f'https://api.census.gov/data/{year}/acs/acs5?get=NAME,{var_code}&for=state:*&key={census_api_key}'
        response = requests.get(url)
        _df = pd.DataFrame(response.json()[1:], columns=response.json()[0])
        _df['year'] = year
        df_list.append(_df)
    df = pd.concat(df_list)
    df[var_name] = df[var_code].astype(int)
    df.drop(columns=[var_code], inplace=True)
    dfs[var_name] = df

# merging the population datasets into one big dataset
merge_df = dfs['total_population']
for var_name in ['white_population', 'black_population', 'hispanic_population']:
    merge_df = pd.merge(merge_df, dfs[var_name], how='left', on=['state', 'year', 'NAME'])
merge_df['NAME'] = merge_df['NAME'].str.upper()
merge_df['year'] = merge_df['year'].replace(['2009'], '2008')
merge_df['year'] = merge_df['year'].astype(int)
print('Num of rows:', len(merge_df))

# Removing Puerto Rico as they don't have voting rights in the presidential election
merge_df = merge_df[( merge_df["NAME"] != 'PUERTO RICO')] 

# creating a dictionary to change the full state names to state codes
state_codes = {'ALABAMA': 'AL', 'ALASKA': 'AK', 'ARIZONA': 'AZ', 'ARKANSAS': 'AR', 'CALIFORNIA': 'CA',
               'COLORADO': 'CO', 'CONNECTICUT': 'CT', 'DELAWARE': 'DE', 'DISTRICT OF COLUMBIA': 'DC', 'FLORIDA': 'FL', 'GEORGIA': 'GA',
               'HAWAII': 'HI', 'IDAHO': 'ID', 'ILLINOIS': 'IL', 'INDIANA': 'IN', 'IOWA': 'IA', 'KANSAS': 'KS',
               'KENTUCKY': 'KY', 'LOUISIANA': 'LA', 'MAINE': 'ME', 'MARYLAND': 'MD', 'MASSACHUSETTS': 'MA',
               'MICHIGAN': 'MI', 'MINNESOTA': 'MN', 'MISSISSIPPI': 'MS', 'MISSOURI': 'MO', 'MONTANA': 'MT',
               'NEBRASKA': 'NE', 'NEVADA': 'NV', 'NEW HAMPSHIRE': 'NH', 'NEW JERSEY': 'NJ', 'NEW MEXICO': 'NM',
               'NEW YORK': 'NY', 'NORTH CAROLINA': 'NC', 'NORTH DAKOTA': 'ND', 'OHIO': 'OH', 'OKLAHOMA': 'OK',
               'OREGON': 'OR', 'PENNSYLVANIA': 'PA', 'RHODE ISLAND': 'RI', 'SOUTH CAROLINA': 'SC',
               'SOUTH DAKOTA': 'SD', 'TENNESSEE': 'TN', 'TEXAS': 'TX', 'UTAH': 'UT', 'VERMONT': 'VT',
               'VIRGINIA': 'VA', 'WASHINGTON': 'WA', 'WEST VIRGINIA': 'WV', 'WISCONSIN': 'WI', 'WYOMING': 'WY'}

# replace full state names with state codes
merge_df['NAME'] = merge_df['NAME'].replace(state_codes)
merge_df.head()


Num of rows: 208


Unnamed: 0,NAME,state,year,total_population,white_population,black_population,hispanic_population
0,AL,1,2008,4633360,3174011,1209938,130220
1,AK,2,2008,683142,448329,25161,39661
2,AZ,4,2008,6324865,3700053,227282,1881878
3,AR,5,2008,2838143,2149766,439355,153630
4,CA,6,2008,36308527,15446196,2249404,13102161


#### 2.2 MIT Election Data
The second dataset is from MIT Election Data and contains information on number of votes for each party in every state for presidential elections from 1976 to 2020. 
1. We downloaded the csv file from the webpage and import the dataset.  
2. We remove all the variables in the dataset we do not need, and also rename some of them in order to make it more intuitive and also for it to match the population dataset. We remove the other parties, 'OTHER' and 'LIBERTARIAN' as The Winner Takes All system makes them insignificant. 
3. Also, the population dataset only have data from 2005 and onwards. Therefore, we remove the years we do not need. 
4. We remove Puerto Rico as they do not have voting rights in the presidential elections.  
5. Then we create a dicitionary with all the states codes in order to be able to visualize the results later.
6. Lastly, we create a pivot table, going from long to wide, in order to only have one state observation and not two as we had previosly. In this process we also remove the column for party and rather add the party to the candidate_votes instead to know which party got the number of votes. 

In [3]:
# Loading the data 
pres = pd.read_csv('1976-2020-president.csv')

# renaming the columns 
pres.rename(columns={'state':'NAME'}, inplace=True)


# deleting the years we do not need, as the Community Service data only has data from 2005
pres = pres[( pres["year"] >= 2005 )] 

#deleting variables we do not need
del pres['notes']
del pres['state_cen']
del pres['office']
del pres['writein']
del pres['version']
del pres['state_fips']
del pres['party_detailed']
del pres['candidate']
del pres['state_ic']
del pres['state_po']

# deleting votes that are not republican or democrat, justify due to winner takes all
pres = pres[( pres["party_simplified"] != 'OTHER')] 
pres = pres[( pres["party_simplified"] != 'LIBERTARIAN')] 

pres.rename(columns={'party_simplified':'party'}, inplace=True)
pres.reset_index(drop = True)
pres.reset_index(drop = True, inplace = True)

# Removing Puerto Rico
pres = pres[( pres["NAME"] != 'PUERTO RICO')] 
pres = pres.drop(index = 248)

# create dictionary to map full state names to state codes
state_codes = {'ALABAMA': 'AL', 'ALASKA': 'AK', 'ARIZONA': 'AZ', 'ARKANSAS': 'AR', 'CALIFORNIA': 'CA',
               'COLORADO': 'CO', 'CONNECTICUT': 'CT', 'DELAWARE': 'DE', 'DISTRICT OF COLUMBIA': 'DC', 'FLORIDA': 'FL', 'GEORGIA': 'GA',
               'HAWAII': 'HI', 'IDAHO': 'ID', 'ILLINOIS': 'IL', 'INDIANA': 'IN', 'IOWA': 'IA', 'KANSAS': 'KS',
               'KENTUCKY': 'KY', 'LOUISIANA': 'LA', 'MAINE': 'ME', 'MARYLAND': 'MD', 'MASSACHUSETTS': 'MA',
               'MICHIGAN': 'MI', 'MINNESOTA': 'MN', 'MISSISSIPPI': 'MS', 'MISSOURI': 'MO', 'MONTANA': 'MT',
               'NEBRASKA': 'NE', 'NEVADA': 'NV', 'NEW HAMPSHIRE': 'NH', 'NEW JERSEY': 'NJ', 'NEW MEXICO': 'NM',
               'NEW YORK': 'NY', 'NORTH CAROLINA': 'NC', 'NORTH DAKOTA': 'ND', 'OHIO': 'OH', 'OKLAHOMA': 'OK',
               'OREGON': 'OR', 'PENNSYLVANIA': 'PA', 'RHODE ISLAND': 'RI', 'SOUTH CAROLINA': 'SC',
               'SOUTH DAKOTA': 'SD', 'TENNESSEE': 'TN', 'TEXAS': 'TX', 'UTAH': 'UT', 'VERMONT': 'VT',
               'VIRGINIA': 'VA', 'WASHINGTON': 'WA', 'WEST VIRGINIA': 'WV', 'WISCONSIN': 'WI', 'WYOMING': 'WY'}

# replace full state names with state codes
pres['NAME'] = pres['NAME'].replace(state_codes)

# create a new column for party
pres['party'] = pres['party'].apply(lambda x: x.lower())

# create a pivot table with state, year and party as indices and candidatevotes as values
pivot = pres.pivot_table(index=['NAME', 'year'], columns='party', values='candidatevotes')

# rename columns
pivot = pivot.rename(columns={'democrat': 'candidatevotes_dem', 'republican': 'candidatevotes_rep'})

# reset index
pres = pivot.reset_index()



#### 2.3 Merge
We merge the two cleaned datasets into one dataset. 

##### 2.3.1 Presidential and population data

In [4]:
# Merging president and population data sets
data = pd.merge(merge_df, pres, how='left', on=[ 'year', 'NAME' ])
data_sorted = data.sort_values(by='NAME')
np.array(data_sorted.head())

array([['AK', '02', 2008, 683142, 448329, 25161, 39661, 123594.0,
        193841.0],
       ['AK', '02', 2012, 711139, 454689, 24219, 40371, 122640.0,
        164676.0],
       ['AK', '02', 2020, 736990, 439979, 23894, 53059, 153778.0,
        189951.0],
       ['AK', '02', 2016, 736855, 456575, 24443, 49031, 116454.0,
        163387.0],
       ['AL', '01', 2008, 4633360, 3174011, 1209938, 130220, 813479.0,
        1266546.0]], dtype=object)

##### 2.3.2 Electoral college 
We also want to add the number of electoral college votes to the dataset. 
1. Firstly, we create a dictionary with  the number of electoral votes in every state.
2. Then we make it into a dataframe
3. Lastly, we merge thie electoral college dataframe with the dataset containing population and election results in order to get our final dataset. 

In [5]:
# create dictionary of electoral votes by state and year
electoral_votes = {
    'State': ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
    'NAME': ['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'],
    '2008': [9, 3, 10, 6, 55, 9, 7, 3, 3, 27, 15, 4, 4, 21, 11, 7, 6, 8, 9, 4, 10, 12, 17, 10, 6, 11, 3, 5, 6, 4, 14, 5, 29, 15, 3, 18, 7, 7, 20, 4, 9, 3, 11, 38, 5, 3, 13, 12, 5, 10, 3],
    '2012': [9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, 6, 8, 8, 4, 10, 11, 16, 10, 6, 10, 3, 5, 6, 4, 14, 5, 29, 15, 3, 18, 7, 7, 20, 4, 11, 3, 11, 38, 6, 3, 13, 12, 5, 10, 3],
    '2016': [9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, 6, 8, 8, 4, 10, 11, 16, 10, 6, 10, 3, 3, 6, 6, 14, 5, 29, 15, 3, 18, 7, 7, 20, 4, 11, 3, 11, 38, 6, 3, 13, 12, 5, 10, 3],
    '2020': [9, 3, 11, 6, 55, 9, 7, 3, 3, 29, 16, 4, 4, 20, 11, 6, 6, 8, 8, 4, 10, 11, 16, 10, 6, 10, 3, 3, 5, 6, 14, 5, 29, 15, 3, 18, 7, 7, 20, 4, 11, 3, 11, 38, 6, 3, 13, 12, 5, 10, 3]
}

# create DataFrame from dictionary
ECV = pd.DataFrame(electoral_votes)
ECV = pd.melt(ECV, id_vars='NAME', value_vars=['2008', '2012', '2016', '2020'])
ECV = ECV.rename(columns={'variable': 'year'})
ECV = ECV.rename(columns={'value': 'electoralvotes'})
ECV["year"] = ECV["year"].astype(np.int64)

# Merging data with electoral votess
data = pd.merge(data, ECV, how='left', on=[ 'year', 'NAME' ])
data_sorted = data.sort_values(by='NAME')
np.array(data_sorted.head())

array([['AK', '02', 2008, 683142, 448329, 25161, 39661, 123594.0,
        193841.0, 3],
       ['AK', '02', 2012, 711139, 454689, 24219, 40371, 122640.0,
        164676.0, 3],
       ['AK', '02', 2020, 736990, 439979, 23894, 53059, 153778.0,
        189951.0, 3],
       ['AK', '02', 2016, 736855, 456575, 24443, 49031, 116454.0,
        163387.0, 3],
       ['AL', '01', 2008, 4633360, 3174011, 1209938, 130220, 813479.0,
        1266546.0, 9]], dtype=object)

We also add a new dummy variable to our dataset. It takes the value 0 if the Democrats got more votes in one state, and 1 if the Republicans recieved the most votes. 

In [6]:
# Make winner dummy variable 
data['winner'] = pd.Series(data.apply(lambda row: 0 if float(row['candidatevotes_dem']) > float(row['candidatevotes_rep']) else 1, axis=1))
data

Unnamed: 0,NAME,state,year,total_population,white_population,black_population,hispanic_population,candidatevotes_dem,candidatevotes_rep,electoralvotes,winner
0,AL,01,2008,4633360,3174011,1209938,130220,813479.0,1266546.0,9,1
1,AK,02,2008,683142,448329,25161,39661,123594.0,193841.0,3,1
2,AZ,04,2008,6324865,3700053,227282,1881878,1034707.0,1230111.0,10,1
3,AR,05,2008,2838143,2149766,439355,153630,422310.0,638017.0,6,1
4,CA,06,2008,36308527,15446196,2249404,13102161,8274473.0,5011781.0,55,0
...,...,...,...,...,...,...,...,...,...,...,...
199,NV,32,2020,3030281,1460159,282722,875798,703486.0,669890.0,3,0
200,DE,10,2020,967679,595236,212795,91350,296268.0,200603.0,3,0
201,KY,21,2020,4461952,3751738,361230,167949,772474.0,1326646.0,6,1
202,SD,46,2020,879336,715328,18836,36088,150471.0,261043.0,11,1


## 3. Fixed Effect Estimation

Originally, we planned on running a regression where we estimate the elasticity of racial groups in the US to vote for the Democratic or Republican party. To check the robustness of the estimate we then could compare the result of the electoral votes in the presidential election, using the population data and the elasticity to create a hypotetical election result. If the estimates were to be of reliable, it would be possible to add a slider to the map you will see later changing the population groups. To elaborate, if the slider allows the reader to slide the black population and increase its size with for instance one percent, or decreace it, what would happened with the electoral college. This is of economic interest as it will allow for insight to to election results might change with changing demographics. Although the results are significant, we did not have time to incorporate them into our maps. Maybe we will be able to do so for the exam hand-in.

#### 3.1 The Model
$$
\begin{align}
\log(D_{s,e}) = \beta^r \log(P^g_{s,e}) + \alpha_e + d_s + \varepsilon^g_{s,e} \\
\log(R_{s,e}) = \beta^r \log(P^g_{s,e}) + \alpha_e + d_s + \varepsilon^g_{s,e} 
\end{align}
$$

We regress for the Democratic and Republican presidential candidate where $D_{s,e}$ and $R_{s,e}$ are the fraction of the total votes in state $s$ in election year $e$. $\varepsilon^g_{s,e}$ is a randomly distributed error with zero mean. $\alpha^r_e$ captures the election year fixed effect (FE), $d^g_s$ captures the state specific FE. We are interested in the hetergenous responses by racial groups, indexed by $r$. Specifically, $\beta^r$ captures the elasticity of each racial group to vote for the specific party.

#### 3.2 Analysis of results
Below you can read the results presented in a table. Althought the results are addressing an interesting topic, they are very likely biased. They could benefit from including more variables are the likeliness of an omitted variables bias is large. To address this issue, we could for instance include a instrumental variable. However, that is outside the scope of this assignment. Therefore, we do not use the estimates to anything inthe countinuing of this assigment. However, as describes prior to this, it could be interesting to see how the estimates of the races' inclination to vote for each party can predict an election results. And furthermore, how an increase of decrease in a population ratio of a race will change the electoral college. This is interesting as, in the case of the US, with growing racial minorities the electoral college will likely change in the years to come. 

In [7]:
# Changing the dataframe into an array for the regression
reg_data = np.array(data)
id_array = np.array(reg_data[:, 0])
# Count how many observations we have. This returns a tuple with the unique IDs,
# and the number of times each person is observed.
unique_id = np.unique(id_array, return_counts=True)
N = unique_id[0].size
T = int(unique_id[1].mean())
white = np.log(np.array(reg_data[:, 4], dtype=np.float))
black =  np.log(np.array(reg_data[:, 5], dtype=np.float))
hispanic = np.log(np.array(reg_data[:, 6], dtype=np.float))
state = np.array(reg_data[:,1], dtype=np.float)
year = np.array(reg_data[:,2], dtype=np.float)

# Defining variables
y_dem = np.log(np.array((reg_data[:,7]/(reg_data[:,7] + reg_data[:,8])), dtype = np.float)).reshape(-1, 1)
y_rep = np.log(np.array((reg_data[:,8]/(reg_data[:,8] + reg_data[:,7])), dtype = np.float)).reshape(-1, 1)
x = np.array([np.ones((y_dem.shape[0])),state,  year,white, black, hispanic]).T

# Defining labels
label_y_dem = 'Log D'
label_y_rep = 'Log R'
label_x = ['Constant', 'State', 'Year','White', 'Black', 'Hispanic']

# Running av OLS for Dems
ols_result_dem = lm.estimate(y_dem, x,N=N,T=T)
lm.print_table(
    (label_y_dem, label_x), ols_result_dem, title="Pooled OLS for the Democrats", floatfmt='.4f'
)

# Running av OLS for reps
ols_result_rep = lm.estimate(y_rep, x,N=N,T=T)
lm.print_table(
    (label_y_rep, label_x), ols_result_rep, title="Pooled OLS for the Republicans", floatfmt='.4f'
)

Pooled OLS for the Democrats
Dependent variable: Log D

             Beta      Se    t-values
--------  -------  ------  ----------
Constant  16.3761  7.2343      2.2637
State     -0.0003  0.0011     -0.3072
Year      -0.0082  0.0036     -2.2751
White     -0.1376  0.0303     -4.5345
Black      0.0480  0.0144      3.3236
Hispanic   0.0606  0.0169      3.5829
R² = 0.167
σ² = 0.051
Pooled OLS for the Republicans
Dependent variable: Log R

              Beta       Se    t-values
--------  --------  -------  ----------
Constant  -20.1226  10.1154     -1.9893
State      -0.0002   0.0015     -0.1023
Year        0.0083   0.0050      1.6518
White       0.3529   0.0424      8.3176
Black      -0.1188   0.0202     -5.8902
Hispanic   -0.0803   0.0237     -3.3959
R² = 0.290
σ² = 0.100


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  white = np.log(np.array(reg_data[:, 4], dtype=np.float))
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  black =  np.log(np.array(reg_data[:, 5], dtype=np.float))
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  hispanic = np.log(np.array(reg_data[:, 6], dtype=np.float))
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  state = np.array(reg_data[:,1], dtype=np.float)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  year = np.array(reg_data[:,2], dtype=np.float)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#d

## 4. Visualization
Different ways to visualize can have a "HUUUGE" impact on how the reader interprets the election results. In this section we display different visualizations of the election results from 2008 to 2020. 

For the states that have multiple electoral districts we have chosen to only include the biggest one, and thus forced them to operate with the winner takes all system on a state level. This was done due to simplicity. Therefore, there could be some minor errors in the number of electoral college votes given to a candidate in Maine and Nebraska (the only states that operate with this system).

#### 4.1 State map for the elections 2008-2020
1. First we define the colors for the states to take, red for republican and blue for democrat. 
2. Then we define the output area of the map. 
3. We create a function that will update the map for each year as we want to add a dropdown menu. 
4. Then we code the figure, a choropleth map, and add the text to be displayed when navigating the map and more. 
5. Then we create a drop down menu in order for the reader to choose which year to be displayed. 

In [8]:
# Define the colors for each value
colors = ["#0000ff", "#ff0000"]

# Define the output area for the map
out = widgets.Output()
display(out)

# Define the function to update the map
def update_map(year):
    # Get the data for the selected year
    group = data[data['year'] == year]
    
    state_codes = group['NAME']
    values = [1 if rep_votes > dem_votes else 0 for rep_votes, dem_votes in zip(group['candidatevotes_rep'], group['candidatevotes_dem'])]
    populations = data['total_population'].tolist()

    fig = go.Figure(go.Choropleth(
    locationmode="USA-states",
    locations=state_codes,
    z=values,
    text=[f"{state}<br>" +
          f"Democrat votes: {dem_votes:,}<br>" +
          f"Republican votes: {rep_votes:,}<br>" +
          f"Electoral College Votes: {elec_votes:,}<br>" +
          f"Population: {population:,}<br>" +
          f"Winner: {'Republican' if rep_votes > dem_votes else 'Democrat'}"
          for state, dem_votes, rep_votes, population, elec_votes in zip(group['NAME'], group['candidatevotes_dem'], group['candidatevotes_rep'], group['total_population'], group['electoralvotes'])],
    hovertemplate="%{text}<extra></extra>",
    colorscale=[[i / (len(colors) - 1), c] for i, c in enumerate(colors)],
    showscale=False
    ))
    
    fig.add_trace(go.Scattergeo(
        locationmode="USA-states",
        lon=[-96.8],
        lat=[32.8],
        mode="lines",
        line=dict(width=1, color="black")
    ))

    
    fig.update_geos(
        visible=False, resolution=110, scope="usa",
        showcountries=True, countrycolor="Black",
        showsubunits=True, subunitcolor="Black"
    )
    
    #Adding the title 
    fig.update_layout(title=f"Electoral College Results {year}")
    
    with out:
        clear_output(wait=True)
        display(fig)  # Display the figure

# Create the dropdown menu
year_options = [2008, 2012, 2016, 2020]
dropdown = widgets.Dropdown(options=year_options, description='Select a year:')

# Define the callback function for the dropdown menu
def dropdown_callback(change):
    year = change.new
    update_map(year)

dropdown.observe(dropdown_callback, names='value')

# Display the initial map
update_map(year_options[0])

# Display the dropdown menu
display(dropdown)

Output()

Dropdown(description='Select a year:', options=(2008, 2012, 2016, 2020), value=2008)

In this map the election results for 2008, 2012, 2016 and 2020 is visualized. It is possible to choose the year you want to display from the drop down menu. Also, if you hold the mouse pointer over a state, it will display this information: State abbreviation, number of votes for each party, number of electoral college votes, population in the state in the given year, and lastly which party won the state. 

This visualization of the presidential election is quite standard, and it offers an intuitive way of interpreting the results as the state is colored in the party color of the winning party. The downside to this visialization is that it could look like the republican should have won in some year as there is a lot of red colored states. However, these states have few elecotral college votes as the population is small

#### 4.2 Continous color state map
Here we visualize the share of Republican votes in order to show that some states are swing states. This shows that it is not as red and blue as we saw previously. 

In [9]:
# Make a new variable for the share of republican votes
data['share_rep'] = data['candidatevotes_rep']/(data['candidatevotes_rep'] + data['candidatevotes_dem'])


We repeat the same steps as last time, however now we visualize the new variable ** share_rep** instead and make a continous colorscale instead of a discrete one. 

In [1]:
# Define the output area for the map
out = widgets.Output()
display(out)

# Define the function to update the map
def update_map(year):
    # Get the data for the selected year
    group = data[data['year'] == year]

    state_codes = group['NAME']
    values = group['share_rep'].tolist()
    populations = data['total_population'].tolist()

    fig = go.Figure(go.Choropleth(
        locationmode="USA-states",
        locations=state_codes,
        z=values,
        text=[f"{state}<br>" +
              f"Democrat votes: {dem_votes:,}<br>" +
              f"Republican votes: {rep_votes:,}<br>" +
              f"Share republican: {share_rep:,}<br>" +
              f"Electoral College Votes: {elec_votes:,}<br>" +
              f"Population: {population:,}<br>" +
              f"Winner: {'Republican' if rep_votes > dem_votes else 'Democrat'}"
              for state, dem_votes, rep_votes, share_rep, population, elec_votes in zip(group['NAME'], group['candidatevotes_dem'], group['candidatevotes_rep'], group['share_rep'], group['total_population'], group['electoralvotes'])],
        hovertemplate="%{text}<extra></extra>",
        colorscale=[
            [0, '#0000ff'],
            [0.5, '#0000ff'],
            [0.55, '#A020F0'],
            [1, '#ff0000'],
        ],
        showscale=True
    ))

    fig.add_trace(go.Scattergeo(
        locationmode="USA-states",
        lon=[-96.8],
        lat=[32.8],
        mode="lines",
        line=dict(width=1, color="black")
    ))

    fig.update_geos(
        visible=False, resolution=110, scope="usa",
        showcountries=True, countrycolor="Black",
        showsubunits=True, subunitcolor="Black"
    )

    # Adding the title
    fig.update_layout(title=f"Electoral College Results {year}")

    with out:
        clear_output(wait=True)
        display(fig)  # Display the figure

# Create the dropdown menu
year_options = [2008, 2012, 2016, 2020]
dropdown = widgets.Dropdown(options=year_options, description='Select a year:')

# Define the callback function for the dropdown menu
def dropdown_callback(change):
    year = change.new
    update_map(year)

dropdown.observe(dropdown_callback, names='value')

# Display the initial map
update_map(year_options[0])

# Display the dropdown menu
display(dropdown)


NameError: name 'widgets' is not defined

This visualization show the share of republican votes in a state. If the share is low (lower than 0.4) the color is more blue, if the share is high the color is red, and if the share is close to 0.5 the color will be more purple. 

The purple states indicate swing states. These states do not vote the same every election and therefore are often refered to as battleground states as it is here the actual battle for the position as predident takes place. So in practice it is the swing states that decide the election, and also their vote becomes more "valuable" since if you are a Republican in California your vote does not count towards the final results that much, but in the swing states every vote can change the result of the whole election. 

By visualizing like this the election results show that some states are more in doubt and also give a better representation of the popular vote as you do not only see the color of the winning party but now also see the quantity of votes that did not win in the state. 

## 5. Conclusion
As we have shown in this data project the way the data is visualized could have an impact on the readers perception of the election results. 

Other visualiztions we can implement for the exam version:
1. A map that is skewed in size relative to the total population in the state.
2. A hexagon map where as california has 55 electoral votes, they will have 55 hexagons, while alabama has their number of electoral votes.
3. A map including the regrssion results as described earlier. 
$$