# Week 1 Lab: Using Python to Simulate Hypothetical Elections

Welcome to our class! We are excited to provide an introduction to how technology and data can be used for political analysis. 

Each week, you will complete a lab like this one in order to reinforce technical concepts and see practical applications towards political topics. Hands-on practice is the best way to grasp technical concepts, so completing labs are important for your conceptual knowledge.

This class will introduce you to Python. This coding langugage is one of the most popular langugages to learn programming with and will help lay the foundations for future programming in classes such as Data 8 or CS61A.

Labs are important and make up 5% of the class grade. They also count towards the "effort" category. If you have any questions about anything covered in the labs or lecture, please feel free to reach out to us! Additionally, as data science and technical classes are collaborative in nature, we encourage collaboration among you and your classmates. However, there are limits to this, which are sharing code and directly providing answers to others. 

### Today's Lab—Using Python to Simulate Hypothetical Elections

In today's lab, you will learn how to:

1. Practice writing and evaluating expressions in Python. 
2. Understand what functions are and how they can be used to make code more efficient.
3. Learn what different parts of code do.

This will be in context will this week's political topic: elections.

Note: In some portions of the lab, there will be ellipses (...) that you will be filling in with code. Please don't hestitate to ask one of the facilitators if you have any questions.

#### A Note on Documentation

In [None]:
# This line is a single line comment-in order to create a comment, you use the "#" symbol. 

""" 
This is a multi line comment. Comments can be used to indicate what the code that you are writing does, what errors 
(or bugs) you are running into, etc. In order to create a single lined comment, you use the "#" symbol. Multi-line 
comments use ". These are used primarily for documentation. The code below imports data science packages we will use 
that will help with the analysis conducted in this lab. We'd like you to try to add a comment to any code that you 
write so that we can understand what the function or algorithm is supposed to do.
  
"""

An important principle in Computer Science is to not re-invent the wheel. To the best of your ability, don't repeat work that has already been done, either by you or by other developers. That is not to say to copy other people's code directly. One way to make use of other people's code is to install and use packages known as *libraries*. To use these libraries, we must first **import**, which is what you see below. 

In [None]:
from datascience import * # this is the library developed and used by Data 8 
import numpy as np
import random

Note the asterisk (\*) in the first line means to import everything in that libary. 

## 1. The Electoral College

"The Electoral College consists of 538 electors. A majority of 270 electoral votes is required to elect the President. Electoral votes are allocated among the states based on the Census. Every state is allocated a number of votes equal to the number of senators and representatives in its U.S. Congressional delegation—two votes for its senators in the U.S. Senate plus a number of votes equal to the number of its members in the U. S. House of Representatives." 

-National Archives and Records Administration

One state constantly in Electoral College news is Florida. Following the 2000 elections, Florida made headlines not only for its election processes, but also for the Electoral College determining the presidency, diverging from the popular vote result. 

This next question follows a simulation using the data from the more recent 2016 presidential election and 2010 census, which can be found [here](https://www.archives.gov/federal-register/electoral-college/allocation.html).

In [None]:
# This code is creating two arrays and then joining them together as a table. Don't worry about tables for right
# now, we'll learn more about them in the next couple of weeks. Basically, two arrays have been joined together 
# to create a table to better visualize the electoral college. 

allocated_reps_states = np.array(["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", 
                            "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", 
                            "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", 
                            "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota",
                            "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", 
                            "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota",
                            "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina",
                            "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", 
                            "West Virginia", "Wisconsin", "Wyoming"])

allocated_state_numbers = np.array([9, 3, 11, 6, 55, 9, 
                                    7, 3, 3, 29, 16,
                                    4, 4, 20, 11, 6, 6, 8,
                                    8, 4, 10, 11, 16, 10, 
                                    6, 10, 3, 5, 6, 4,
                                    14, 5, 29, 15, 3,
                                    18, 7, 7, 20, 4, 9, 
                                    3, 11, 38, 6, 3, 13, 12, 
                                    5, 10, 3])

# Note: each number in allocated_state_numbers corresponds to a value in allocated_reps_states at the same
# position

allocated_states_2016 = Table().with_columns("States", allocated_reps_states,
                                             "Numbers of Electoral College Representatives", 
                                             allocated_state_numbers)

In [None]:
allocated_states_2016.show()

Using two functions called $max()$ and $min()$, we can generate the state with the highest and lowest electoral college representatives.

#### 1.1 What is the highest number of electorates among all the states? Which state(s) is/are this?

In [None]:
max_amount = ...

In [None]:
state_with_most = allocated_states_2016.where("Numbers of Electoral College Representatives", 
                                              are.equal_to(max_amount))

YOUR ANSWER HERE: 

In [None]:
assert max_amount == max(allocated_state_numbers)
assert state_with_most[0] == 'California'

#### 1.2 What is the lowest number of electorates among all the states? Which state(s) is/are this?

In [None]:
min_amount = ...

In [None]:
state_with_least = allocated_states_2016.where("Numbers of Electoral College Representatives", 
                                               are.equal_to(min_amount))

YOUR ANSWER HERE: 

In [None]:
assert min_amount == min(allocated_state_numbers)
np.testing.assert_array_equal(state_with_least['States'], make_array('Alaska', 'Delaware', 
                                                                     'District of Columbia', 
                                                                     'Montana', 'North Dakota',
                                                                     'South Dakota', 'Vermont',
                                                                     'Wyoming'))

#### 1.3 Given that you need 270 (of 538) Electoral College votes to win a presidential election, determine one possibility of states that a candidate must win in order to get to at least 270 votes. 

In [None]:
def to_win_presidential(states, num_reps, goal=270):
    total = ...
    states_to_win = np.zeros(shape=0)
    for ...: 
        ...
    return states_to_win, total
            
states, total = to_win_presidential(allocated_reps_states, allocated_state_numbers)

YOUR ANSWER HERE: 

In [None]:
np.testing.assert_array_equal(states, make_array("Alabama", "Alaska", "Arizona", "Arkansas", 
                                                 "California", "Colorado", "Connecticut", "Delaware", 
                                                 "District of Columbia", "Florida", "Georgia", 
                                                 "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", 
                                                 "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", 
                                                 "Massachusetts", "Michigan", "Minnesota", "Mississippi"))
assert total == 275

## 2. The Presidential Hunger Games

Assume that we are in an apocalyptic society where we can only "reap" from a pool of past presidents. We've created a fairly trivial election voting system for you below. 

In [None]:
list_of_presidents = make_array("George Washington", "Thomas Jefferson", "Abraham Lincoln", 
                                "Theodore Rooesevelt", "Dwight D. Eisenhower", "John F. Kennedy", 
                                "Ronald Reagan", "Bill Clinton", "George W. Bush", "Barack Obama")

def presidential_hunger_games(presidents):
    x, y = random.randint(0, len(presidents)), random.randint(0, len(presidents))
    z = (presidents[x], presidents[y])
    print("The candidates are " + presidents[x], "and " + presidents[y] + ". " + 
                 "The winner is " + random.choice(z) + ".")

presidential_hunger_games(list_of_presidents)

#### 2.1 Describe what is going on in the election voting system that we have provided above. In a few sentences, tell us how it works.

YOUR ANSWER HERE: 

#### 2.2 What are some problems with this type of system? Provide examples to describe any problems that exist.

YOUR ANSWER HERE: 

## 3. Casting a Deciding Vote

Let's get started. We'll run through some examples of the basic syntax that you'll be using frequently throughout this course. All of this will be geared toward answering a question about how many times a voter (you) will be the deciding vote. 

More specifically, this is the scenario we are trying to figure out: In your small town, you are one of 101 voters choosing amongst 3 candidates for mayor. Everyone besides you will vote randomly for 1 of the 3 candidates. If it helps, this makes you the 101st voter. The highest vote getter will be the winner (ties mean no winner). What are the odds that you cast the deciding vote that propels your preferred candidate to victory? Run through the election many times (e.g. 10,000). Each runthrough should compute whether the voter would be a deciding vote computing whether. Once the entire simulation has completed, determine the percentage of times you are the deciding vote. 

One of the toughest parts to solving problems is knowing where to start. Since this is the first problem you'll ever work on, we'll get you started. First, let's define three variables to represent each candidates' vote total. 

In [None]:
# DELETE THE ELLIPSIS AND FILL IN WITH APPROPRIATE VALUE 
candidate1 = ... 
candidate2 = ...
candidate3 = ...

Now, we have three variables ($candidate1$, $candidate2$, and $candidate3$) that represents the total number of votes that candidate has received. Next, we should figure out figure out the voting process. Each of the voters in the town, except you, vote randomly. One of the libraries that we imported in the beginning was **random**, which will help us generate random numbers as we need them. 

In [None]:
# In the next two lines of code, we've generated 5 different random numbers and printed it
for _ in range(5): 
    print(random.randint(1, 3))

Let's do a  quick review of two important syntax items in Python before moving on. 

**For loop**: Loops, in general, are fantastic if we want to repeat a set of instruction a whole bunch of times. For loops, specifically, are better used when we know how many times to repeat. While loops, the other type of loops, can also be used for a specific number of times but is more commonly used with a condition. 

**If-Elif-Else**: If-elif-else statements allows us to do conditional work. If something is true, we should do something specific. If it's false, we follow some different instruction. 

In [None]:
# building off the above cell, let's create the voting process

for _ in range(100): # this allows us to mimic 100 people doing some similar action
    vote = ... # remember, voters choose candidates they vote for randomly
    if vote == ...: 
        # HERE WE SHOULD CHANGE THE VOTE TOTAL FOR THE CORRECT CANDIDATE
    elif vote == ...: 
        # HERE WE SHOULD CHANGE THE VOTE TOTAL FOR THE CORRECT CANDIDATE
    elif vote == ...: # NOTE: it's totally fine to just write "else:" here
        # HERE WE SHOULD CHANGE THE VOTE TOTAL FOR THE CORRECT CANDIDATE

Great! Now, it's your turn to vote. It actually doesn't quite matter who you vote for. The more important idea here is to realize in which situations would you be the deciding vote. You would be the deciding vote if any two candidates are tied, you could be the tiebreaker (but you aren't necessarily the one), and the tied candidates received more votes than the third candidate. For example, let's say Candidate 1 receives 25 votes, Candidate 2 receives 25 votes, and Candidate 3 receives 50 votes. You are not a deciding vote because no matter which candidate you vote for, Candidate 3 will still win. Just to make sure things aren't overly complicated in terms of code, we'll re-organize some of the code you've written above. 

In [None]:
votecount = [candidate1, candidate2, candidate3]
if votecount.count(max(votecount)) == 1: # if the max number of votes shows up only once
    print("You were not the deciding vote!")
elif votecount.count(max(votecount)) == 2: # if the max number of votes shows up twice
    print("You are the deciding vote!")

For one election, we've determined if you are or aren't the deciding vote. Now, we want to compute how many times we are the deciding vote by creating a simulation of 10,000 elections. We'll build on a lot of the code that you've already written to demonstrate how we can accomplish this. 

In [None]:
def simulation(sim_num=10000): # this is how we create a function, which helps to make sure we don't repeat code
    deciding_vote = 0 # this variable will be a tracker of the number of times you are the deciding vote
    for _ in range(sim_num): 
        votecount = [0, 0, 0] # each item in the list represents a candidate
        for ...: 
            vote = ... # this should be almost exactly the same as it was above
            votecount[vote] = votecount[vote] + 1 # this is slightly different from how you did it originally
        cand1, cand2, cand3 = votecount[0], votecount[1], votecount[2]
        if ...: 
            ...
    ...
    return ...

In [None]:
election_result = simulation()
election_result

#### 3.1 Running through the simulation, what is the proportion of times where you cast the deciding vote?

YOUR ANSWER HERE: 

#### 3.2 What conditions need to be true for you to be the deciding vote?

YOUR ANSWER HERE: 

#### 3.3 How does this number change as you run through it multiple times? What is the factor accounting for the change?

YOUR ANSWER HERE: 

Congratulations, you've reached the end of this lab! While this lab is graded by effort, we still want to make sure that all of you get a grade for this assignment. To submit, go to datahub.berkeley.edu. Find your file. Click the checkbox next to the file. If it is green, press shutdown. If it isn't lit up, press "Download". After you download it, please rename the file to follow this format, "[YOUR NAME] WEEK 1 LAB.ipynb", and submit it to the correct bCourses assignment page. 