## Python and R

This setup allows you to use *Python* and *R* in the same notebook.

To set up a similar notebook, see quickstart instructions here:

https://github.com/dmil/jupyter-quickstart

Some thoughts on why I like this setup and how I use it at the [end](notebook.ipynb#Thoughts) of  this notebook.

In [1]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

This is a Python notebook, but below is an R cell. The `%%R` at the top of the cell indicates that the code in this cell will be R code.

In [2]:
%%R

# My commonly used R imports

require('tidyverse')

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Loading required package: tidyverse


In [3]:
%%R

install.packages(c(
  'tidyverse',  # includes ggplot2, dplyr, tidyr, etc.
  'sf',         # for spatial data handling
  'tigris'      # for county shapefiles
))

--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors 

 1: 0-Cloud [https]
 2: Australia (Canberra) [https]
 3: Australia (Melbourne 1) [https]
 4: Australia (Melbourne 2) [https]
 5: Austria (Wien 1) [https]
 6: Belgium (Brussels) [https]
 7: Brazil (PR) [https]
 8: Brazil (SP 1) [https]
 9: Brazil (SP 2) [https]
10: Bulgaria [https]
11: Canada (MB) [https]
12: Canada (ON 1) [https]
13: Canada (ON 2) [https]
14: Chile (Santiago) [https]
15: China (Beijing 2) [https]
16: China (Beijing 3) [https]
17: China (Hefei) [https]
18: China (Hong Kong) [https]
19: China (Jinan) [https]
20: China (Lanzhou) [https]
21: China (Nanjing) [https]
22: China (Shanghai 2) [https]
23: China (Shenzhen) [https]
24: China (Wuhan) [https]
25: Colombia (Cali) [https]
26: Costa Rica [https]
27: Cyprus [https]
28: Czech Republic [https]
29: Denmark [https]
30: East Asia [https]
31: Ecuador (Cuenca) [https]
32: France (Lyon 1) [https]
33: France (Lyon 2) [https]
34: France (Marseille) 

Selection:  66



The downloaded binary packages are in
	/var/folders/mb/f5zh4qyd6sbf171lrlnmff040000gn/T//Rtmpuik4Ik/downloaded_packages


trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/big-sur-arm64/contrib/4.4/tidyverse_2.0.0.tgz'
Content type 'application/x-gzip' length 428901 bytes (418 KB)
downloaded 418 KB

trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/big-sur-arm64/contrib/4.4/sf_1.0-19.tgz'
Content type 'application/x-gzip' length 88837949 bytes (84.7 MB)
downloaded 84.7 MB

trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/big-sur-arm64/contrib/4.4/tigris_2.1.tgz'
Content type 'application/x-gzip' length 367350 bytes (358 KB)
downloaded 358 KB

In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: <34C5A480-1AC4-30DF-83C9-30A913FC042E> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so
  Reason: tried: '/opt/X11/lib/

## Read Data

In [15]:
import us 

def read_election_results_csv(filepath):
    # Read the year and type of election from the flipelath data/presidential/2016_0_0_2.csv
    file_year = filepath.split('/')[2].split('_')[0]
    election_type = filepath.split('/')[1].split('_')[0]

    # Load Presidential Election Data 
    # skip the second row because the column names are in the CSV twice
    results = pd.read_csv(filepath, skiprows=[1])


    # combine Geographic Name and Geographic Subtype into one column called geography
    results['geography'] = results['Geographic Name'] + ' ' + results['Geographic Subtype']
    results = results.drop(columns=['Geographic Name', 'Geographic Subtype'])

    # Add year column
    results['year'] = file_year

    # Add election type column
    results['type'] = election_type

    # Zero pad the fips column to 5 digits 
    results['FIPS'] = results['FIPS'].apply(lambda x: str(x).zfill(5))

    # get the first two digits of the fips code and extract the state name
    results['state'] = results['FIPS'].apply(lambda x: str(us.states.lookup(str(x)[:2])))

    # Reorder Columns to FIPS, geography, year and then the remaining columns
    cols = ['FIPS', 'state', 'geography', 'year', 'type'] + [c for c in results.columns if c not in ['FIPS', 'state','geography', 'year', 'type']]
    results = results[cols]

    return results

def squash_other_parties(results):
    election_type = results['type'].iloc[0]

    # Combine election results for minor candidates into "other"
    if election_type == 'senate':
        first_n = 9 # Take the first n columns and sum everything else into other
    elif election_type == 'presidential':
        first_n = 8
    else:
        raise ValueError('Unknown election type')
    
    results = results.iloc[:, :first_n].join(results.iloc[:, first_n:].sum(axis=1).rename('other'))
    return results
    

#### Read Presdontial Election Data

In [16]:
# Load 2024 Presidential Election Data
print("Presidential Election Results for 2024 by county")
presidential_results_2024 = read_election_results_csv('data/presidential/2024_0_0_2.csv')
presidential_results_2024 = squash_other_parties(presidential_results_2024)
display(presidential_results_2024)

print("Presidential Election Results for 2020 by county")
presidential_results_2020 = read_election_results_csv('data/presidential/2020_0_0_2.csv')
presidential_results_2020 = squash_other_parties(presidential_results_2020)
display(presidential_results_2020)

print("Presidential Election Results for 2016 by county")
presidential_results_2016 = read_election_results_csv('data/presidential/2016_0_0_2.csv')
presidential_results_2016 = squash_other_parties(presidential_results_2016)
display(presidential_results_2016)

Presidential Election Results for 2024 by county


Unnamed: 0,FIPS,state,geography,year,type,Total Vote,Kamala D. Harris,Donald J. Trump,other
0,01001,Alabama,Autauga County,2024,presidential,28281,7439,20484,358
1,01003,Alabama,Baldwin County,2024,presidential,122249,24934,95798,1517
2,01005,Alabama,Barbour County,2024,presidential,9855,4158,5606,91
3,01007,Alabama,Bibb County,2024,presidential,9257,1619,7572,66
4,01009,Alabama,Blount County,2024,presidential,28163,2576,25354,233
...,...,...,...,...,...,...,...,...,...
3159,56037,Wyoming,Sweetwater County,2024,presidential,16698,3731,12541,426
3160,56039,Wyoming,Teton County,2024,presidential,13286,8748,4134,404
3161,56041,Wyoming,Uinta County,2024,presidential,9089,1561,7282,246
3162,56043,Wyoming,Washakie County,2024,presidential,3877,656,3125,96


Presidential Election Results for 2020 by county


Unnamed: 0,FIPS,state,geography,year,type,Total Vote,Joseph R. Biden Jr.,Donald J. Trump,other
0,01001,Alabama,Autauga County,2020,presidential,27770,7503,19838,429
1,01003,Alabama,Baldwin County,2020,presidential,109679,24578,83544,1557
2,01005,Alabama,Barbour County,2020,presidential,10518,4816,5622,80
3,01007,Alabama,Bibb County,2020,presidential,9595,1986,7525,84
4,01009,Alabama,Blount County,2020,presidential,27588,2640,24711,237
...,...,...,...,...,...,...,...,...,...
3150,56037,Wyoming,Sweetwater County,2020,presidential,16603,3823,12229,551
3151,56039,Wyoming,Teton County,2020,presidential,14677,9848,4341,488
3152,56041,Wyoming,Uinta County,2020,presidential,9402,1591,7496,315
3153,56043,Wyoming,Washakie County,2020,presidential,4012,651,3245,116


Presidential Election Results for 2016 by county


Unnamed: 0,FIPS,state,geography,year,type,Total Vote,Hillary Clinton,Donald J. Trump,other
0,01001,Alabama,Autauga County,2016,presidential,24973,5936,18172,865
1,01003,Alabama,Baldwin County,2016,presidential,95215,18458,72883,3874
2,01005,Alabama,Barbour County,2016,presidential,10469,4871,5454,144
3,01007,Alabama,Bibb County,2016,presidential,8819,1874,6738,207
4,01009,Alabama,Blount County,2016,presidential,25588,2156,22859,573
...,...,...,...,...,...,...,...,...,...
3147,56037,Wyoming,Sweetwater County,2016,presidential,17130,3231,12154,1745
3148,56039,Wyoming,Teton County,2016,presidential,12627,7314,3921,1392
3149,56041,Wyoming,Uinta County,2016,presidential,8470,1202,6154,1114
3150,56043,Wyoming,Washakie County,2016,presidential,3814,532,2911,371


### Read 2024 senate election data

In [17]:
senate_results_2024 = read_election_results_csv('data/senate/2024_3_0_2.csv')
senate_results_2024

Unnamed: 0,FIPS,state,geography,year,type,Class,Total Vote,Democratic,Republican,Independent,...,Peace and Justice,Natural Law,Write-ins,State1,State2,State3,State4,State5,State6,State7
0,04001,Arizona,Apache County,2024,senate,1,31936,19901,11283,0,...,0,0,73,1,3,0,0,0,0,0
1,04003,Arizona,Cochise County,2024,senate,1,58225,23347,33184,0,...,0,0,109,3,1,5,0,1,1,0
2,04005,Arizona,Coconino County,2024,senate,1,69563,42924,24825,0,...,0,0,78,1,4,3,0,1,0,0
3,04007,Arizona,Gila County,2024,senate,1,27372,9330,17433,0,...,0,0,44,0,0,1,0,0,0,0
4,04009,Arizona,Graham County,2024,senate,1,15007,4235,10385,0,...,0,0,14,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1976,56037,Wyoming,Sweetwater County,2024,senate,1,16351,3550,12724,0,...,0,0,77,0,0,0,0,0,0,0
1977,56039,Wyoming,Teton County,2024,senate,1,13059,7885,5083,0,...,0,0,91,0,0,0,0,0,0,0
1978,56041,Wyoming,Uinta County,2024,senate,1,8841,1478,7310,0,...,0,0,53,0,0,0,0,0,0,0
1979,56043,Wyoming,Washakie County,2024,senate,1,3821,588,3207,0,...,0,0,26,0,0,0,0,0,0,0


### ⚠️ METHODOLOGICAL CHOICE ALERT

For Maine and Vermont, swap the Democrat and the Independent columns.

Bernie Sanders and Angus King are Independents, but they caucus with the Democrats


In [18]:
mask = senate_results_2024['state'].isin(['Maine', 'Vermont'])
senate_results_2024.loc[mask, ['Democratic', 'Independent']] = senate_results_2024.loc[mask, ['Independent', 'Democratic']].values

In [19]:
senate_results_2024 = squash_other_parties(senate_results_2024)
senate_results_2024

Unnamed: 0,FIPS,state,geography,year,type,Class,Total Vote,Democratic,Republican,other
0,04001,Arizona,Apache County,2024,senate,1,31936,19901,11283,752
1,04003,Arizona,Cochise County,2024,senate,1,58225,23347,33184,1694
2,04005,Arizona,Coconino County,2024,senate,1,69563,42924,24825,1814
3,04007,Arizona,Gila County,2024,senate,1,27372,9330,17433,609
4,04009,Arizona,Graham County,2024,senate,1,15007,4235,10385,387
...,...,...,...,...,...,...,...,...,...,...
1976,56037,Wyoming,Sweetwater County,2024,senate,1,16351,3550,12724,77
1977,56039,Wyoming,Teton County,2024,senate,1,13059,7885,5083,91
1978,56041,Wyoming,Uinta County,2024,senate,1,8841,1478,7310,53
1979,56043,Wyoming,Washakie County,2024,senate,1,3821,588,3207,26


## Clean & Combine Data 🧹 

In [20]:
# Replace the names of Democratic and Republican candidates with Democratic and Republican
presidential_results_2024 = presidential_results_2024.rename(columns={'Kamala D. Harris': 'Democratic', 'Donald J. Trump': 'Republican'})
presidential_results_2020 = presidential_results_2020.rename(columns={'Joseph R. Biden Jr.': 'Democratic', 'Donald J. Trump': 'Republican'})
presidential_results_2016 = presidential_results_2016.rename(columns={'Hillary Clinton': 'Democratic', 'Donald J. Trump': 'Republican'})

# Combine the three years of presidential election data into one dataframe
presidential_results = pd.concat([presidential_results_2024, presidential_results_2020, presidential_results_2016], ignore_index=True)
presidential_results


Unnamed: 0,FIPS,state,geography,year,type,Total Vote,Democratic,Republican,other
0,01001,Alabama,Autauga County,2024,presidential,28281,7439,20484,358
1,01003,Alabama,Baldwin County,2024,presidential,122249,24934,95798,1517
2,01005,Alabama,Barbour County,2024,presidential,9855,4158,5606,91
3,01007,Alabama,Bibb County,2024,presidential,9257,1619,7572,66
4,01009,Alabama,Blount County,2024,presidential,28163,2576,25354,233
...,...,...,...,...,...,...,...,...,...
9466,56037,Wyoming,Sweetwater County,2016,presidential,17130,3231,12154,1745
9467,56039,Wyoming,Teton County,2016,presidential,12627,7314,3921,1392
9468,56041,Wyoming,Uinta County,2016,presidential,8470,1202,6154,1114
9469,56043,Wyoming,Washakie County,2016,presidential,3814,532,2911,371


In [21]:
# combine into one big datframe with the senate results
election_results = pd.concat([presidential_results, senate_results_2024], ignore_index=True)
election_results

Unnamed: 0,FIPS,state,geography,year,type,Total Vote,Democratic,Republican,other,Class
0,01001,Alabama,Autauga County,2024,presidential,28281,7439,20484,358,
1,01003,Alabama,Baldwin County,2024,presidential,122249,24934,95798,1517,
2,01005,Alabama,Barbour County,2024,presidential,9855,4158,5606,91,
3,01007,Alabama,Bibb County,2024,presidential,9257,1619,7572,66,
4,01009,Alabama,Blount County,2024,presidential,28163,2576,25354,233,
...,...,...,...,...,...,...,...,...,...,...
11447,56037,Wyoming,Sweetwater County,2024,senate,16351,3550,12724,77,1.0
11448,56039,Wyoming,Teton County,2024,senate,13059,7885,5083,91,1.0
11449,56041,Wyoming,Uinta County,2024,senate,8841,1478,7310,53,1.0
11450,56043,Wyoming,Washakie County,2024,senate,3821,588,3207,26,1.0


In [22]:
# set column headers to one-word lowercase
election_results.columns = [c.split(' ')[0] for c in election_results.columns] # split on space and take the first part
election_results.columns = [c.lower() for c in election_results.columns] # lowercase

# reorder the columns
reordered_columns = ['fips', 'state', 'geography', 'year', 'type', 'class', 'democratic', 'republican', 'other', 'total']
election_results = election_results[reordered_columns]

# rename geography to county
election_results = election_results.rename(columns={'geography': 'county'})

# display the df
election_results

Unnamed: 0,fips,state,county,year,type,class,democratic,republican,other,total
0,01001,Alabama,Autauga County,2024,presidential,,7439,20484,358,28281
1,01003,Alabama,Baldwin County,2024,presidential,,24934,95798,1517,122249
2,01005,Alabama,Barbour County,2024,presidential,,4158,5606,91,9855
3,01007,Alabama,Bibb County,2024,presidential,,1619,7572,66,9257
4,01009,Alabama,Blount County,2024,presidential,,2576,25354,233,28163
...,...,...,...,...,...,...,...,...,...,...
11447,56037,Wyoming,Sweetwater County,2024,senate,1.0,3550,12724,77,16351
11448,56039,Wyoming,Teton County,2024,senate,1.0,7885,5083,91,13059
11449,56041,Wyoming,Uinta County,2024,senate,1.0,1478,7310,53,8841
11450,56043,Wyoming,Washakie County,2024,senate,1.0,588,3207,26,3821


In [24]:
# check that the total is not null
assert election_results['total'].notnull().all()

# check that democratic + republican + other = total for all rows
assert (
        election_results['total'] == (election_results['democratic'].fillna(0) + \
                                      election_results['republican'].fillna(0) + \
                                      election_results['other'].fillna(0))
        )\
        .all()

# note: assert means throw an error if the expression provided is false
# so if this cell runs with no errors, then the conditions are met

# Perform calculations

A few calculations to make this data a bit easier to use

In [25]:
election_results['democratic_pct'] = election_results['democratic'] / election_results['total'] * 100
election_results['republican_pct'] = election_results['republican'] / election_results['total'] * 100
election_results['other_pct'] = election_results['other'] / election_results['total'] * 100

election_results

Unnamed: 0,fips,state,county,year,type,class,democratic,republican,other,total,democratic_pct,republican_pct,other_pct
0,01001,Alabama,Autauga County,2024,presidential,,7439,20484,358,28281,26.303879,72.430254,1.265868
1,01003,Alabama,Baldwin County,2024,presidential,,24934,95798,1517,122249,20.396077,78.363013,1.240910
2,01005,Alabama,Barbour County,2024,presidential,,4158,5606,91,9855,42.191781,56.884830,0.923389
3,01007,Alabama,Bibb County,2024,presidential,,1619,7572,66,9257,17.489467,81.797559,0.712974
4,01009,Alabama,Blount County,2024,presidential,,2576,25354,233,28163,9.146753,90.025921,0.827327
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11447,56037,Wyoming,Sweetwater County,2024,senate,1.0,3550,12724,77,16351,21.711210,77.817870,0.470919
11448,56039,Wyoming,Teton County,2024,senate,1.0,7885,5083,91,13059,60.379815,38.923348,0.696837
11449,56041,Wyoming,Uinta County,2024,senate,1.0,1478,7310,53,8841,16.717566,82.682954,0.599480
11450,56043,Wyoming,Washakie County,2024,senate,1.0,588,3207,26,3821,15.388642,83.930908,0.680450


## Save to CSV File

In [26]:
election_results.to_csv('election_results.csv', index=False)