# COLL 80   Data Science Module on Voting and Registration

 
---

![](DSHeaderMillsLarge.png)


### Professors Susan Wang and Almudena Konrad

In this module we explore data on voting and registration, focusing on presidential elections.  We begin with historical data on voter turnout and the events that affected voter turnout.  We then turn to registration and voter turnout data for the past twenty years, or the last five presidential elections, and compare them at the national, state, and regional levels.  We conclude by exploring data associated with voter characteristics, such as gender, age, race, and education in relation to voter turnout, and reasons for not voting and for not registering.

### Topics Covered
- Jupyter Notebooks
- Voting 
- Registration Rate
- Voter Turnout
- Voter Characteristics


### Table of Contents

[1. Introduction to Jupyter Notebooks](#introJupyterNotebooks)<br>

[2. Voter Turnout in the US Presidential Elections — Historical Perspective](#voterTurnout)<br>

[3. Voter Registration and Turnout in the US Presidential Elections 2000-2016](#voterRegTurnout)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [3.1 National Voting Data](#nationalData)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [3.2 Local Voting Data](#localData)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [3.2.1 Graph of Registration and Turnout Rates of a Selected Region (Widget A)](#widgetA)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [3.2.2 Comparing Registration or Turnout Rates of Three Selected Regions (Widget B)](#widgetB)

[4. Voter Characteristics in Relation to Turnout in the US Presidential Elections](#voterCharacteristics)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.1 Gender](#gender)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.2 Age](#age)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.3 Education](#education)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.4 Race](#race)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.5 Reasons for Not Voting](#notVoting)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4.6 Reasons for Not Registering](#notRegistering)

[5. Data Sources and References](#sources)<br>

---

## 1.  Introduction to Jupyter Notebooks <a id = "introJupyterNotebooks"></a>

In this module we are using a document called a **Jupyter Notebook** which is an *interactive* document containing text, code, visualizations, and more.  In this section, we introduce some of the building blocks of a notebook and how to work with it.

Now click anywhere near this paragraph.  What you see is a rectangle box around this paragraph.  This rectangle is called a **cell**. A notebook is composed of two kinds of cells: markdown and code. A **markdown cell**, such as this one, contains text. A **code cell**, such as the next one, contains commands in Python, a programming language that we will be using for the remainder of this module. A code cell has `In [ ]:` to the left of the cell (that is how you distinguish a code cell).  You can ***run*** (or execute the commands) in a code cell by first selecting it (clicking it once), and
   - press <code>Ctrl+Enter</code> (run the cell and stay at the same cell) or
   - press <code>Shift+Enter</code> (run the cell and go to the next cell) or
   - click the **Run** button in the toolbar at the top of the screen (run the cell and go to the next cell)
   
If a code cell is running, you will see an asterisk `*` appear in the square brackets to the left of the cell. Once the cell has finished running, a number (which will increment) will replace the asterisk and any output from the code will appear under the cell. The output of a code cell may include computations, tables, graphs, or other visualizations.

In [None]:
# this is a code cell; run this cell
print('Welcome to Data Science Voting Module!')

You'll notice that many code cells contain lines of greenish-blue italic text that start with a `#`. These are comments. Comments often contain helpful information about what the code does or what you are supposed to do in the cell. The leading # tells the computer to ignore them.

#### Editing

To edit a Code cell, just click at the location where edit should take place and start typing/editing.  In this module, we will not be writing code, only running code cells.

You can edit a Markdown cell by clicking it twice. Text in Markdown cells is written in Markdown, a formatting syntax for plain text, so you may see some funky symbols when you edit a text cell. Once you've made your changes, you can exit text-editing mode by running the cell. Now try edit the following Markdown cell:


My favorite animal is ...

#### Saving and Loading

Your notebook can record all of your text and code edits, as well as any tables and graphs you generate or calculations you make. You can save the notebook in its current state by clicking Control-S, clicking the floppy disc icon in the toolbar at the top, left of the page, or by going to the File menu and selecting "Save and Checkpoint".

The next time you open the notebook, it will look the same as when you last saved it. **Note**: after loading a notebook you will see all the outputs (tables, graphs, computation, widgets, etc) from your last session, but you won't be able to use any libraries you imported, variables you assigned, functions you defined, or widgets you created. To get back to your previous state, you will need to *re-run* the cells where you imported the libraries, defined the variables and functions. The easiest way is to select the cell where you left off work, then go to the Cell menu at the top of the screen and select "Run All Above".

---
## 2. Voter Turnout in the US Presidential Elections — Historical Perspective <a id='voterTurnout'></a>


Voting plays a fundamental role in a democracy. We vote to elect our president, governor, senators, representative, mayor, and many other public officials, plus we vote on measures and policies that affect our lives.

Who is **eligible to vote**?  in other words, who has the **right to vote**?

In 2020 people who satisfy the following criteria are eligible to vote:
 - citizen of the United States
 - 18 or over in age
 - meet state requirements (varies) — each state sets its own voting criteria and regulations

Have the above eligibility criteria been the same throughout US history?  

Answer: No!  

The fight for the right to vote has been a long battle.

Let's look at the percentage of the US population who voted in presidential elections throughout history (see Figure 1). Midterm elections have lower turnout than presidential elections [1]. <br>

![](U.S._Vote_for_President_as_Population_Share.png)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 1.  Proportion of the US Population Who Voted for President [2]</span></b></center>
 

<div class="alert alert-info">
<b> Activity/Discussion:</b> Looking at the above graph on the historical turnout rate for presidential elections, what events do you think contributed to the rise and fall of voter participation — who has the right to vote when?

[Use polleverywhere to write down your thoughts ]
    </div>    

<details><summary>Click Here to Expand After Discussion</summary>
<p>

Key events that changed the eligibility criteria for voting and contibuted to the rise and fall of voter participation in presidential elections from 1784-2016:
- in the early years of our country's founding, white male property owners aged twenty-one or older can vote
- in mid 1820's, white male twenty-one or older can vote for most states (note the surge in 1828) 
- in 1870 (15th Amendment), all black male twenty-one or older can vote (note surge in 1872, 1876) 
- in 1890-1910, disenfranchisement of blacks and some poor whites—voting taxes, literacy tests, violence, etc (note the dip in 1896)
- in 1920 (19th Amendment), all female twenty-one or older can vote (note the surge from 1924 on)
- in 1947, Native Americans can vote
- in 1952, Asian Americans can vote
- in 1964 (24th Amendment), prohibition of any tax requirement
- in 1971 (26th Amendment), all citizens eighteen or older can vote


(more elaboration of the key events/history may be added later)
</p>
</details
---

## 3.  Voter Registration and Turnout in the US Presidential Elections 2000-2016 <a id='voterRegTurnout'></a>

Each state is responsible for setting its own election regulations and administering its elections, as specified by [Article 1 Section 4 of the US Constitution](https://www.archives.gov/founding-docs/constitution-transcript#toc-section-4-) (the Constitution does not specify any role for the federal government in terms of administering elections). For most states, voter registration is required prior to voting (the only exception is North Dakota).  States have different voter registration requirements and deadlines.  For example, [California's registration requirements](https://www.sos.ca.gov/elections/voting-resources/voting-california/who-can-vote-california/voting-rights-californians/) include: US citizen and resident of California; 18 or over in age on election day; not currently in state or federal prison or on parole for the conviction of a felony, and not currently found mentally incompetent to vote by a court. The deadline for registration is October 19, 2020 for the November 3, 2020 election.  Since registration is a requirement prior to voting for forty-nine states, we pay attention to the *number of registered voters* (**registered**), in addition to the number of eligible voters (**eligible**) and the number of people who voted (**voted**).  Based on these three numbers, we compute the following important measures that we use in this module:
- **registration rate** which is the proportion of registered voters out of those who are eligible—**registered/eligible**
- **turnout rate based on registration** which is the proportion of people who voted out of those who registered—**voted/registered**
- **turnout rate based on eligibility** which is the proportion of people who voted out of those who are eligible—**voted/eligible**

### 3.1  National Voting Data 1980-2016 <a id='nationalData'></a>

#### Data Sources

The data that we use for the presidential elections are obtained from the [US Census Bureau](https://www.census.gov/topics/public-sector/voting.html).  The US Census Bureau conducts the Voting and Registration Supplement to the Current Population Survey (CPS) the week immediately following a national election.  The Voting and Registration Supplement asks questions about voting to about 56,000 households across the US. The questions include gender, age, race, education, reasons for not voting, reasons for not registering and more.  The survey responses are recorded and extrapolated to arrive at *estimates* for the voting data for the entire US population. 

#### Loading and Examining Data

First we run the following cell.

In [None]:
!pip install matplotlib==3.2  

We start by importing some necessary libraries: run the following code cell and ignore any warning messages.

In [None]:
#Imports all necessary libraries. Run this cell and ignore pink warnings
#You don't need to know this
# later:  delete all not needed import statements

from datascience import *
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.filterwarnings('ignore')
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import Image
from IPython.display import display
import graphfile

The data set obtained from the US Census Bureau for the past ten presidential elections has been cleaned and saved in a csv file. Next we read in the csv file and save it in a table. 

In [None]:
# cleaned data from "Characteristics of Voters in the Presidential Election of 2016"
# https://www.census.gov/library/publications/2018/demo/p20-582.pdf
# Prior to 1978, the Current Population Survey did not ask about citizenship status which is needed
# to calculate the citizen voting-age population
USHistorical = Table.read_table('US1980-2016.csv')
USHistorical.set_format(np.arange(2,8), NumberFormatter)

Below we graph the number of people (in thousands) who were eligible, registered, and voted based on the above table.

In [None]:
# graph of population in the US who are elegible to vote, who registered to vote, and who voted 
USHistorical.select('Year', 'Eligible (in K)', 'Registered (in K)', 'Voted (in K)').plot('Year', width = 10)
plt.ylabel("Population in Thousands")
plt.title("US Population Eligible to Vote, Registered to Vote, and Voted in Presidential Elections");

Next we graph the registration rate, turnout rate based on registration, and turnout rate based on eligibility (rates are expressed in percentages).

In [None]:
graphfile.line_graph(USHistorical.column(0), USHistorical.column(5), '% Registered/Eligible', USHistorical.column(6),
          '% Voted/Registered', USHistorical.column(7), '% Voted/Eligible', 
          'Registration Rate and Turnout Rates in Presidential Elections')

Below is a bar graph (instead of a line graph) of the same data.

In [None]:
graphfile.bar_graph(USHistorical.column(0), USHistorical.column(5), '% Registered/Eligible', USHistorical.column(6),
          '% Voted/Registered', USHistorical.column(7), '% Voted/Eligible', 
          'Registration Rate and Turnout Rates in Presidential Elections')

### 3.2  Local Voting Data 2000-2016 <a id='localData'></a>

For local (Alameda county, other Bay Area counties, California) data on voter registration and turnout, we turn to [California State Election](https://www.sos.ca.gov/elections/) which contains voting data for the past twenty years (note: unlike US Census Bureau voting data, state voting data are exact numbers, not estimates).  Next we read in the data and save it in a table.  

In [None]:
# data obtained and compiled from https://www.sos.ca.gov/elections/ for each presidential election
voting = Table.read_table('Voting.csv')
voting.set_format(np.arange(1,73), NumberFormatter)

**Note:** The table, with its many columns, contains voting data from all nine bay area counties, in addition to voting data from California and the US (in thousands).

### 3.2.1 Graph of Registration and Turnout Rates of a Selected Region (Widget A) <a id='widgetA'></a>

The following ***widget*** displays a graph — line, bar, horizontal bar — showing registration and turnout rates based on a selected region from the dropdown menu.

In [None]:
# make a selected graph showing registration and turnout rate based on a selected region
def vote_region(graph, region):
    new_table = voting.select("Year", region  + "% Registered/Eligible", region + "% Voted/Registered", 
                    region + "% Voted/Eligible").where("Year", are.above(1996))
    if graph == 'line graph':
        return (graphfile.line_graph(new_table.column(0), new_table.column(1), "% Registered/Eligible", new_table.column(2), 
        "% Voted/Registered", new_table.column(3), "% Voted/Eligible", region + " -- Registration Rate and Voter Turnout Rates"))
    elif graph == "bar graph":
        return (graphfile.bar_graph(new_table.column(0), new_table.column(1), "% Registered/Eligible", new_table.column(2), 
        "% Voted/Registered", new_table.column(3), "% Voted/Eligible", region + " -- Registration Rate and Voter Turnout Rates"))
    elif graph == "horizontal bar graph":
        return (new_table.barh("Year", width = 10), plt.xlabel("Percent"), 
                plt.title(region + " -- Registration Rate and Voter Turnout Rates"))
              
# create bottons for type of graph
buttons = widgets.ToggleButtons(options=["line graph", "bar graph", "horizontal bar graph"], value="line graph")

# create dropdowns to select regions
region_dropdown = widgets.Dropdown(options=["US ", "CA ", "Bay Area ", "Alameda ", "Contra Costa ", "Marin ", 
                    "Napa ", "San Francisco ", "San Mateo ", "Santa Clara ", "Solano ", "Sonoma "], value="Alameda ")

# create the widget to view plot for registration and turnout based on type of plot
display(widgets.interactive(vote_region, graph=buttons, region=region_dropdown))

<div class="alert alert-info">
<b> Question:</b> After playing with the widget, what are some of your observations? <br>
    Observations could include patterns and anamolies that you see, usefulness of different graphs, ...
    </div>    

*Write down your observations here in this cell.*

### 3.2.2 Comparing Registration or Turnout Rates of Three Selected Regions (Widget B) <a id='widgetB'></a>

The following ***widget*** graphs either registration or voter turnout rates of three selected regions to enable comparison.

In [None]:
# make a plot showing/comparing selected regions based on feature and type of graph
def vote_compare(graph, feature, regionA, regionB, regionC):
    new_table = voting.select("Year", regionA  + feature, regionB + feature, regionC + feature).where("Year", are.above(1996))
    if graph == 'line graph':
        return (graphfile.line_graph(new_table.column(0), new_table.column(1), regionA + feature, new_table.column(2), 
                    regionB + feature, new_table.column(3), regionC + feature, "Comparison of Three Regions"))  
    elif graph == "bar graph":
        return (graphfile.bar_graph(new_table.column(0), new_table.column(1), regionA + feature, new_table.column(2), 
                    regionB + feature, new_table.column(3), regionC + feature, "Comparison of Three Regions"))       

# create bottons for type of graph
graph_buttons = widgets.ToggleButtons(options=["line graph", "bar graph"], value="line graph")
# create bottons for voting parameters
parameter_buttons = widgets.ToggleButtons(options=["% Registered/Eligible", "% Voted/Registered", "% Voted/Eligible"], 
                                          value="% Registered/Eligible")

# create 3 dropdowns to select regions
regA_dropdown = widgets.Dropdown(options=["US ", "CA ", "Bay Area ", "Alameda ", "Contra Costa ", "Marin ", 
             "Napa ", "San Francisco ", "San Mateo ", "Santa Clara ", "Solano ", "Sonoma "], value="US ")
regB_dropdown = widgets.Dropdown(options=["US ", "CA ", "Bay Area ", "Alameda ", "Contra Costa ", "Marin ", 
            "Napa ", "San Francisco ", "San Mateo ", "Santa Clara ", "Solano ", "Sonoma "], value="CA ")
regC_dropdown = widgets.Dropdown(options=["US ", "CA ", "Bay Area ", "Alameda ", "Contra Costa ", "Marin ", 
            "Napa ", "San Francisco ", "San Mateo ", "Santa Clara ", "Solano ", "Sonoma "], value="Bay Area ")

# create the widget to view plots for different regions based on voting parameter
display(widgets.interactive(vote_compare, graph=graph_buttons, feature=parameter_buttons, 
                            regionA=regA_dropdown, regionB=regB_dropdown, regionC=regC_dropdown))

<div class="alert alert-info">
<b> Question:</b> What are local registration and turnout trends for presidential elections?  Are there differences between national, state-level, and local trends?  Discuss and note down your observations.  (One county that stands out is Marin which has high rate of registration and voter turnout.)<br>
    What questions would you like to ask and explore based on your observations?
    </div>    

*Write down your observations.  Write down the questions you want to ask and explore here in this cell.*

---
## 4.  Voter Characteristics in Relation to Turnout in the US Presidential Elections <a id='voterCharacteristics'></a>

We now explore some voter characteristics — gender, age, education, and race — in relation to voter turnout in the presidential elections.  We also look at reasons people provided for not voting and for not registering to vote.

### 4.1 Gender <a id='gender'></a>

In 2016, women made up 52% [3] of the eligible voting-age population and voted at a higher rate (63.3%) than men (59.3%), resulting in almost 10 million more votes casted by women than men.  

Figure 2 below is a graph of the voter turnout (voted/eligible) based on gender for the past ten presidential elections [4]. Note that since 1980, women consistently voted at a higher rate than men (while before 1980, women voted at a lower rate than men.)

![](GenderTurnout1.jpg)<br>


<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 2.  Voter Turnout Rate (Voted/Eligible) Based on Gender [4]</span></b></center><br>

In terms of the *number of votes casted*, Figure 3 shows the voter turnout based on gender in the past fourteen presidential elections.  Due to a higher percentage of women in the eligible voting-age population combined with a higher turnout rate for women since 1980, the number of women who reported voting has been higher than that of men since 1964, with almost 10 million more votes casted by women than men in 2008, 2012 and 2016.

![](GenderTurnout2.jpg)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 3.  Voter Turnout (Voted) Based on Gender [4]</span></b></center><br

### 4.2  Age <a id='age'></a>

Age is another voter characteristics that differentiates voting rate, with older populations consistently turning out at higher rates than younger ones (see Figure 4).

![](TurnoutByAgeLarge.png)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 4.  Voter Turnout Rate (Voted/Eligible) Based on Age [5]</span></b></center><br
                                                                                                                           

### 4.3  Education  <a id='education'></a>

Now we turn to education with data indicating populations with higher education attainment vote at higher rates than those with lower (see Figure 5).

In [None]:
# data from https://www2.census.gov/programs-surveys/cps/tables/p20/580/table05_1.xlsx)
education2016 = Table.read_table('US 2016 Education Gender.csv')
graphfile.bar_graph2(education2016.column(0), education2016.column(1), '% Men Voter Turnout', education2016.column(2), 
                    '% Women Voter Turnout', "Voter Turnout Rate Based on Education and Gender")

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 5.  Voter Turnout Rate (Voted/Eligible) Based on Education and Gender, 2016 Presidential Election</span></b></center><br>


### 4.4 Race  <a id='race'></a>

The final voter characteristic we look at is race. We see that Asians and Hispanics consistently turnout at a lower rates than whites and blacks (see Figure 6).

![](TurnoutByRaceLarge.png)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 6.  Voter Turnout Rate (Voted/Eligible) Based on Race [5]</span></b></center><br>


### 4.5 Reasons for Not Voting  <a id='notVoting'></a>

In 2016, the most common reason registered voters gave for not voting was dislike of candidates or campaign issues (24.9%), followed by not interested in the election (15.3%), too busy or conflict schedule (14.3%), and illness or disability (11.6%) [3].  Figure 7 shows the top five reasons registered voters provided for note voting for the past four elections.

![](ReasonsNotVoting.PNG)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 7.  Top Five Reasons Registered Voters Provided for Not Voting [3]</span></b></center><br>

### 4.6  Reasons for Not Registering  <a id='notRegistering'></a>

Among the non-registered voters who are US citizens and 18 years or older (estimated to be 32.6 million in 2016), the most common reason for not registering was disinterested in the election for politics (41.7%), followed by not meeting the registration deadline (12.0%), not eligible to vote (7.4%), and permanent illness or disability (4.9%) [3].  Figure 8 shows the top five reasons citizens of the voting-age population provided for not registering.

![](ReasonsNotRegistering.PNG)<br>

<center><b><span style="font-family:Ariel; font-size:1.5em;">Figure 8.  Top Five Reasons People Provided for Not Registering [3]</span></b></center><br>

<div class="alert alert-info">
<b> Question:</b> Observations regarding voter characteristics in Section 4?  what to ask students?
    </div>    

**Congratulations on finishing this notebook!**

To transform this notebook into a PDF, simply navigate to **"File"** on the upper-left corner > **"Download as"** > **"PDF via Chrome"**. 

**Note:** PDF via Chrome option is not available; PDF via LaTex produces "500 Internal Server Error:  nbconvert failed: xelatex not found on PATH, if you have not installed xelatex you may need to do so. Find further instructions at https://nbconvert.readthedocs.io/en/latest/install.html#installing-tex."  

Download as html?

![](navigate.png)

---
## 5. Data Source and References <a id='sources'></a>

[1] File, Thom, "Who Votes? Congressional Elections and the American Electorate: 1978-2014," Population Characteristics, p20-577, U.S. Census Bureau, Washingthon, DC, 2015  https://www.census.gov/content/dam/Census/newsroom/c-span/2015/cspan_voting.pdf  [Accessed: September 1, 2020].

[2]  CircleAdrian, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=53252605 [Accessed: September 1, 2020].

[3] File, Thom, "Characteristics of Voters in the Presidential
Election of 2016," Population Characteristics, p20-582, U.S. Census Bureau, Washingthon, DC, 2018 https://www.census.gov/content/dam/Census/library/publications/2018/demo/P20-582.pdf [Accessed: September 1, 2020].

[4] "Gender Differences in Voter Turnout," Center for American Women and Politics, Eagleton Institute of Politics, Rutgers University; https://cawp.rutgers.edu/sites/default/files/resources/genderdiff.pdf [Accessed: September 1, 2020].

[5] File, Thom, "Voting in America: A Look at the 2016 Presidential Election," US Census Bureau, Washington, DC, 2018  https://www.census.gov/newsroom/blogs/random-samplings/2017/05/voting_in_america.html [Accessed: September 1, 2020].


<br>

Notebook developed by: Susan Wang



---

<div class="alert alert-warning">   The following suggestion may be helpful ... not sure it is needed.
    
In order to make sure that all of the interactive widgets appear, please click "Kernel" in the top bar, then select "Restart & Run All", then confirm "Restart & Run All Cells". 
</div>