# Data 80A/180A Data Science for Everyone

# Homework 10: Bootstrap, Confidence Intervals, and A/B Test

### 68 Points

## Due Friday, November 12 by 11:59PM

**Reading**: 
* [Chap 12 Comparing Two Samples](https://inferentialthinking.com/chapters/12/Comparing_Two_Samples.html)
* [Chap 13 Estimation](https://www.inferentialthinking.com/chapters/13/Estimation)

Please complete this notebook by filling in the cells provided. Before you begin, execute the following cell to load the provided tests. Each time you start your server, you will need to execute this cell again to load the tests.

For all problems that you must write our explanations and sentences for, you **must** provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on.

**Note: This homework has hidden tests on it. That means even though tests may say 100% passed, it doesn't mean your final grade will be 100%. We will be running more tests for correctness once everyone turns in the homework.**

In [None]:
# Don't change this cell; just run it. 

import numpy as np
from datascience import *

# These lines do some fancy plotting magic.",
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

import otter
grader = otter.Notebook()

## 1. Thai Restaurants in Oakland

Steph and Adam are trying to see what the best Thai restaurant in Oakland is. They survey 1,500 Oakland residents selected uniformly at random, and ask each resident what Thai restaurant is the best. (*Note: This data is fabricated for the purposes of this homework.*) The choices of Thai restaurants are Lucky House, Imm Thai, Thai Temple, and Thai Basil. After compiling the results, Steph and Adam release the following percentages from their sample:

|Thai Restaurant  | Percentage|
|:------------:|:------------:|
|Lucky House | 8% |
|Imm Thai | 53% |
|Thai Temple | 25% |
|Thai Basil | 14% |

These percentages represent a uniform random sample of the population of Oakland residents. We will attempt to estimate the corresponding *parameters*, or the percentage of the votes that each restaurant will receive from the population (i.e. all Oakland residents). We will use confidence intervals to compute a range of values that reflects the uncertainty of our estimates.

The table `votes` contains the results of Steph and Adam's survey.

In [None]:
# Just run this cell
votes = Table.read_table('votes.csv')
votes

**Question 1.1. (3 pts)** Complete the function `one_resampled_percentage` below. It should return Imm Thai's **percentage** of votes after taking the original table (`tbl`) and performing one bootstrap sample of it. Reminder that a percentage is between 0 and 100. 

*Note:* `tbl` will always be in the same format as `votes`.

In [None]:
def one_resampled_percentage(tbl):
    ...

one_resampled_percentage(votes)

In [None]:
grader.check("q1_1")

**Question 1.2. (3 pts)** Complete the `percentages_in_resamples` function such that it simulates and returns an array of 2500 bootstrapped estimates of the percentage of voters who will vote for Imm Thai. You should use the `one_resampled_percentage` function you wrote above. 

 **Important Note:** There are no public tests for this question, so the autograder cell below will always return 100% passed. 


In [None]:
def percentages_in_resamples():
    percentage_imm = make_array()
    ...
   

In [None]:
grader.check("q1_2")

In the following cell, we run the function you just defined, `percentages_in_resamples`, and create a histogram of the calculated statistic for the 2500 bootstrap estimates of the percentage of voters who voted for Imm Thai. 

*Note:* This might take a few seconds to run.

In [None]:
resampled_percentages = percentages_in_resamples()
Table().with_column('Estimated Percentage', resampled_percentages).hist("Estimated Percentage")

**Question 1.3. (2 pts)** Using the array `resampled_percentages`, find the values at the two edges of the middle 95% of the bootstrapped percentage estimates. (Compute the lower and upper ends of the interval, named `imm_lower_bound` and `imm_upper_bound`, respectively.)

*Hint:* If you are stuck on this question, try looking over [Chapter 13](https://inferentialthinking.com/chapters/13/Estimation.html) of the textbook.

In [None]:
imm_lower_bound = ...
imm_upper_bound = ...
print(f"Bootstrapped 95% confidence interval for the percentage of Imm Thai voters in the population: [{imm_lower_bound:.2f}, {imm_upper_bound:.2f}]")

In [None]:
grader.check("q1_3")

**Question 1.4. (3 pts)** The survey results seem to indicate that Imm Thai is beating all the other Thai restaurants combined among voters. We would like to use confidence intervals to determine a range of likely values for Imm Thai's true lead over all the other restaurants combined. The calculation for Imm Thai's lead over Lucky House, Thai Temple, and Thai Basil combined is:

$$\text{Imm Thai's % of the vote} - (\text{100 %} - \text{Imm Thai's % of Vote})$$

For example, if Imm Thai's % of vote is 54, then its lead over all others combined is: $$54 - (100-54) = 54 - 46 = 8$$

Define the function `one_resampled_difference` that returns the value of Imm Thai's percentage lead over Lucky House, Thai Temple, and Thai Basil combined from one bootstrap sample of `tbl`. 

*Hint 1:* Imm Thai's lead can be negative. **Be sure to use percentages, not proportions, for this question!**

In [None]:
def one_resampled_difference(tbl):
    bootstrap = ...
    imm_percentage = ...
    ...
    
    

In [None]:
grader.check("q1_4")

In [None]:
# check
one_resampled_difference(votes)

**Question 1.5. (3 pts)** Write a function called `leads_in_resamples` that finds 2500 bootstrapped estimates (the result of calling `one_resampled_difference`) of Imm Thai's lead over Lucky House, Thai Temple, and Thai Basil combined. Plot a histogram of the resulting samples. 

*Hint:* If you see an error involving “NoneType”, consider what components a function needs to have. 

In [None]:
def leads_in_resamples():
    ...


sampled_leads = leads_in_resamples()
Table().with_column('Estimated Lead', sampled_leads).hist("Estimated Lead")

**Question 1.6. (2 pts)** Use the simulated data in `sampled_leads` from Question 1.5 to compute an approximate 95% confidence interval for Imm Thai's true lead over Lucky House, Thai Temple, and Thai Basil combined. 

In [None]:
diff_lower_bound = ...
diff_upper_bound = ...
print("Bootstrapped 95% confidence interval for Imm Thai's true lead over Lucky House, Thai Temple, and Thai Basil combined: [{:f}%, {:f}%]".format(diff_lower_bound, diff_upper_bound))

In [None]:
grader.check("q1_6")

## 2. Interpreting Confidence Intervals 

The staff computed the following 95% confidence interval for the percentage of Imm Thai voters: 

$$[50.53, 55.53]$$

(Your answer may have been a bit different due to randomness; that doesn't mean it was wrong!)

**Question 2.1. (5 pts)** The staff also created 70%, 90%, and 99% confidence intervals from the same sample, but we forgot to label which confidence interval represented which percentages! First, match each confidence level (70%, 90%, 99%) with its corresponding interval in the cell below (e.g. __ % CI: [52.1, 54] $\rightarrow$ replace the blank with one of the three confidence levels). **Then**, explain your thought process and how you came up with your answers. 

The intervals are below:

* [50.03, 55.94]
* [52.1, 54]
* [50.97, 54.99]


_Type your answer here, replacing this text._

**Question 2.2. (2 pts)** Suppose we produced 6,000 new samples (each one a uniform random sample of 1,500 voters/residents) from the population and created a 95% confidence interval from each one. Roughly how many of those 6,000 intervals do you expect will actually contain the true percentage of the population? 

Assign your answer to `true_percentage_intervals`.

In [None]:
true_percentage_intervals = ...

In [None]:
grader.check("q2_2")

Recall the second bootstrap confidence interval you created, which estimated Imm Thai's lead over Lucky House, Thai Temple, and Thai Basil combined. Among voters in the sample, Imm Thai's lead was 6%. In our computation the 95% confidence interval for the true lead (in the population of all voters) was:

$$[1.2, 11.2]$$

Suppose we are interested in testing a simple yes-or-no question:

> "Is the percentage of votes for Imm Thai equal to the percentage of votes for Lucky House, Thai Temple, and Thai Basil combined?"

Our null hypothesis is that the percentages are equal, or equivalently, that Imm Thai's lead is exactly 0. Our alternative hypothesis is that Imm Thai's lead is not equal to 0.  In the questions below, don't compute any confidence interval yourself - use only our computed 95% confidence interval.

**Question 2.3. (2 pts)** Say we use a 5% p-value cutoff. Do we reject the null, fail to reject the null, or are we unable to tell using the staff's confidence interval?

Assign `restaurants_equal` to the number corresponding to the correct answer.

1. Reject the null / Data is consistent with the alternative hypothesis
2. Fail to reject the null / Data is consistent with the null hypothesis
3. Unable to tell using our computed confidence interval

*Hint:* Consider the relationship between the p-value cutoff and confidence. If you're confused, take a look at [Chap 13.4](https://inferentialthinking.com/chapters/13/4/Using_Confidence_Intervals.html) of the textbook.

In [None]:
restaurants_equal = ...

In [None]:
grader.check("q2_3")

**Question 2.4. (2 pta)** What if, instead, we use a P-value cutoff of 1%? Do we reject the null, fail to reject the null, or are we unable to tell using our staff confidence interval? 

Assign `cutoff_one_percent` to the number corresponding to the correct answer.

1. Reject the null / Data is consistent with the alternative hypothesis
2. Fail to reject the null / Data is consistent with the null hypothesis
3. Unable to tell using our computed confidence interval

In [None]:
cutoff_one_percent = ...

In [None]:
grader.check("q2_4")

**Question 2.5. (2 pts)** What if we use a p-value cutoff of 10%? Do we reject, fail to reject, or are we unable to tell using our confidence interval? 

Assign `cutoff_ten_percent` to the number corresponding to the correct answer.

1. Reject the null / Data is consistent with the alternative hypothesis
2. Fail to reject the null / Data is consistent with the null hypothesis
3. Unable to tell using our computed confidence interval


In [None]:
cutoff_ten_percent = ...

In [None]:
grader.check("q2_5")

##  Crime and Penalty

## 3.  Murder Rates

Punishment for crime has many [philosophical justifications](https://plato.stanford.edu/entries/legal-punishment/).  An important one is that fear of punishment may *deter* people from committing crimes.

In the United States, some jurisdictions execute people who are convicted of particularly serious crimes, such as murder.  This punishment is called the *death penalty* or *capital punishment*.  The death penalty is controversial, and deterrence has been one focal point of the debate.  There are other reasons to support or oppose the death penalty, but in this project we'll focus on deterrence.

The key question about deterrence is:

> Through our exploration, does instituting a death penalty for murder actually reduce the number of murders?

You might have a strong intuition in one direction, but the evidence turns out to be surprisingly complex.  Different sides have variously argued that the death penalty has no deterrent effect and that each execution prevents 8 murders, all using statistical arguments!  We'll try to come to our own conclusion.

#### The data

The main data source for this lab comes from a [paper](http://cjlf.org/deathpenalty/DezRubShepDeterFinal.pdf) by three researchers, Dezhbakhsh, Rubin, and Shepherd.  The dataset contains rates of various violent crimes for every year 1960-2003 (44 years) in every US state.  The researchers compiled the data from the FBI's Uniform Crime Reports.

Since crimes are committed by people, not states, we need to account for the number of people in each state when we're looking at state-level data.  Murder rates are calculated as follows:

$$\text{murder rate for state X in year Y} = \frac{\text{number of murders in state X in year Y}}{\text{population in state X in year Y}}*100000$$

(Murder is rare, so we multiply by 100,000 just to avoid dealing with tiny numbers.)

In [None]:
murder_rates = Table.read_table('crime_rates.csv').select('State', 'Year', 'Population', 'Murder Rate')
murder_rates.set_format("Population", NumberFormatter)

Murder rates vary over time, and different states exhibit different trends. The rates in some states change dramatically from year to year, while others are quite stable. Let's first take a look at murder rates for Alaska.

In [None]:
ak = murder_rates.where('State', 'Alaska').drop('State', 'Population').relabeled(1, 'Murder rate in Alaska')
ak

Next, let's look at murder rate for Minnesota.

In [None]:
mn = murder_rates.where('State', 'Minnesota').drop('State', 'Population').relabeled(1, 'Murder rate in Minnesota')
mn

Let's plot these two rates. Before plotting, we need to create a table.

**Question 3.1. (2 pts)**  Create the table `ak_mn` with two columns of murder rates, in addition to a column of years. This table will have the following structure:

| Year | Murder rate in Alaska | Murder rate in Minnesota |
|------|-----------------------|--------------------------|
| 1960 | 10.2                  | 1.2                      |
| 1961 | 11.5                  | 1                        |
| 1962 | 4.5                   | 0.9                      |

<center>... (41 rows omitted)</center>

In [None]:
# Fill in this line to make a table like the one pictured above.
ak_mn = ...
ak_mn

In [None]:
grader.check("q3_1")

**Question 3.2. (2 pts)** Using the table `ak_mn`, Draw a line plot with years on the horizontal axis and murder rates on the 
vertical axis. Include two lines: one for Alaska murder rates and one for Minnesota murder rates. 

*Hint:* Table 1 displays the exptected output.

In [None]:
# Draw your line plot here
 ...


Table 1:
<img src="Q3.2_table.PNG">    

Now what about the murder rates of other states? Say, for example, California and New York? Run the cell below to plot the murder rates of different pairs of states.

In [None]:
# Compare the murder rates of any two states by filling in the blanks below

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

def state(state1, state2):
    state1_table = murder_rates.where('State', state1).drop('State', 'Population').relabeled(1, 'Murder rate in {}'.format(state1))
    state2_table = murder_rates.where('State', state2).drop('State', 'Population').relabeled(1, 'Murder rate in {}'.format(state2))
    s1_s2 = state1_table.join('Year', state2_table)
    s1_s2.plot('Year')
    plt.show()

states_array = murder_rates.group('State').column('State')

_ = interact(state,
             state1=widgets.Dropdown(options=list(states_array),value='California'),
             state2=widgets.Dropdown(options=list(states_array),value='New York')
            )

## 4. The Death Penalty

Some US states have the death penalty, and others don't, and laws have changed over time. In addition to changes in murder rates, we will also consider whether the death penalty was in force in each state and each year.

Using this information, we would like to investigate how the presence of the death penalty affects the murder rate of a state.

**Question 4.1. (3 pts)** We want to know whether the death penalty *causes* a change in the murder rate.  Why is it not sufficient to compare murder rates in places and times when the death penalty was in force with places and times when it wasn't?

*Write your answer here, replacing this text.*

### A Natural Experiment

In order to attempt to investigate the causal relationship between the death penalty and murder rates, we're going to take advantage of a *natural experiment*.  A natural experiment happens when something other than experimental design applies a treatment to one group and not to another (control) group, and we have some hope that the treatment and control groups don't have any other systematic differences.

Our natural experiment is this: in 1972, a Supreme Court decision called *Furman v. Georgia* banned the death penalty throughout the US.  Suddenly, many states went from having the death penalty to not having the death penalty.

As a first step, let's see how murder rates changed before and after the court decision.  We'll define the test as follows:

> **Population:** All the states that had the death penalty before the 1972 abolition.  (There is no control group for the states that already lacked the death penalty in 1972, so we must omit them.)  This includes all US states **except** Alaska, Hawaii, Maine, Michigan, Wisconsin, and Minnesota.

> **Treatment group:** The states in that population, in 1973 (the year after 1972).

> **Control group:** The states in that population, in 1971 (the year before 1972).

> **Null hypothesis:** Murder rates in 1971 and 1973 come from the same distribution.

> **Alternative hypothesis:** Murder rates were higher in 1973 than they were in 1971.

Our alternative hypothesis is related to our suspicion that murder rates increase when the death penalty is eliminated.  

**Question 4.2. (3 pts)** Should we use an A/B test to test these hypotheses? If yes, what is our "A" group and what is our "B" group?

*Write your answer here, replacing this text.*

The `death_penalty` table below describes whether each state allowed the death penalty in 1971.

In [None]:
non_death_penalty_states = make_array('Alaska', 'Hawaii', 'Maine', 'Michigan', 'Wisconsin', 'Minnesota')

def had_death_penalty_in_1971(state):
    """Returns True if the argument is the name of a state that had the death penalty in 1971."""
    # The implementation of this function uses a bit of syntax
    # we haven't seen before.  Just trust that it behaves as its
    # documentation claims.
    return state not in non_death_penalty_states

states = murder_rates.group('State').select('State')
death_penalty = states.with_column('Death Penalty', states.apply(had_death_penalty_in_1971, 0))
death_penalty

**Question 4.3. (3 pts)** Use the `death_penalty` and `murder_rates` tables to find murder rates in **1971** for states with the **death penalty** (that is, all states with Death Penalty set to True) before the abolition. Create a new table `preban_rates` that contains the same information as `murder_rates`, along with a column `Death Penalty` that contains booleans (`True` or `False`) describing if states had the death penalty in 1971.

*Hint:* Table 2 displays the expected output.

In [None]:
# States that had death penalty in 1971
preban_rates =  murder_rates....(...).where(...).where(...)
preban_rates

In [None]:
grader.check("q4_3")

Table 2:
<img src="Q4.3_table.PNG">    

Next, we creat a table `postban_rates` that contains the same information as `preban_rates`, but for 1973 instead of 1971. `postban_rates` contains only those the states found in `preban_rates`, so make sure your `preban_rates` table is correct.

In [None]:
# preban_rates table in 1973
states_with_penalty = preban_rates.column("State")
postban_rates = murder_rates.where("Year", 1973).where("State", are.contained_in(states_with_penalty))
postban_rates = postban_rates.with_column("Death Penalty", False)
postban_rates = postban_rates.sort("State")
postban_rates

In the next cell, we combine the two tables `preban_rates` and `postban_rates` to create a table `change_in_death_rates` that contains each state's population, murder rate, and whether or not that state had the death penalty for both 1971 and 1973. 

In [None]:
# combine the two tables preban_rates and postban_rates
preban_rates_copy = preban_rates.copy()
change_in_death_rates = preban_rates_copy.append(postban_rates)
change_in_death_rates

Run the cell below to view the distribution of death rates during the pre-ban and post-ban time periods.

In [None]:
change_in_death_rates.hist('Murder Rate', group = 'Death Penalty')

**Question 4.4. (2 pts)** Create a table `rate_means` that contains the average murder rates for the states that had the death penalty and the states that didn't have the death penalty. It should have two columns: one indicating if the penalty was in place, and one that contains the average murder rate for each group.

*Hint:* Table 3 displays the expected output.

In [None]:
rate_means = ...
rate_means

In [None]:
grader.check("q4_4")

Table 3:
<img src="Q4.4_table.PNG">    

**Question 4.5. (3 pts)** We want to figure out if there is a difference between the distribution of death rates in 1971 and 1973. Specifically, we want to test if murder rates were higher in 1973 than they were in 1971. 

What should the test statistic be? How does it help us differentiate whether the data supports the null and alternative? 

If you are in lab, confirm your answer with a lab instructor before moving on.


*Write your answer here, replacing this text.*

**Question 4.6. (2 pts)** Set `observed_difference` to the observed test statistic using the `rate_means` table 


In [None]:
observed_difference = ...
observed_difference

In [None]:
grader.check("q4_6")

**Question 4.7. (4 pts)** Given a table like `change_in_death_rates`, a value column `label`, and a group column `group_label`, write a function that calculates the appropriate test statistic.

In [None]:
def find_test_stat(table, labels_col, values_col):
    ...
    
    
find_test_stat(change_in_death_rates, "Death Penalty", "Murder Rate")

In [None]:
grader.check("q4_7")

When we run a simulation for A/B testing, we resample by shuffling the labels of the original sample. If the null hypothesis is true and the murder rate distributions are the same, we expect that the difference in mean death rates will be not change when "Death Penalty" labels are changed.

**Question 4.8. (6 pts)** Write a function `simulate_and_test_statistic` to compute one trial of our A/B test. Your function should run a simulation and return a test statistic.

Note: The test here is fairly lenient, if you have an issue with the following questions, make sure to take a look at your answer to 4.7. Specifically, make sure that you are taking the directionality of our alternative hypothesis into account.

In [None]:
def simulate_and_test_statistic(table, labels_col, values_col):
    ...

simulate_and_test_statistic(change_in_death_rates, "Death Penalty", "Murder Rate")

In [None]:
grader.check("q4_8")

**Question 4.9. (3 pts)** Simulate 5000 trials of our A/B test and store the test statistics in an array called `differences`

In [None]:
# This cell might take some time to run
differences = make_array()

...

differences

In [None]:
grader.check("q4_9")

Run the cell below to view a histogram of your simulated test statistics plotted with your observed test statistic.

In [None]:
Table().with_column('Difference Between Group Means', differences).hist()
plt.scatter(observed_difference, 0, color='red', s=60, zorder=2);

**Question 4.10. (2 pts)** Find the p-value for your test and assign it to `empirical_P`

In [None]:
empirical_P = ...
empirical_P

In [None]:
grader.check("q4_10")

**Question 4.11. (3 pts)** Using a 5% P-value cutoff, draw a conclusion about the null and alternative hypotheses. Describe your findings using simple, non-technical language. What does your analysis tell you about murder rates after the death penalty was suspended? What can you claim about causation from your statistical analysis?


*Write your answer here, replacing this text.*

**You've completed Homework 10!**

Please save your notebook, download a pdf version of the notebook, and submit it to Canvas.