# Homework 5: Hypothesis Testing

Reading: Chapter [10](https://www.cs.cornell.edu/courses/cs1380/2018sp/textbook//chapters/10/testing-hypotheses.html).

Interesting and relevant XKCD comics: [here](https://xkcd.com/1478/) and [here](https://xkcd.com/882/).

Please complete this notebook by filling in the cells provided. 

### <font color="red">Before you begin, execute the following cell to load the provided tests. Each time you start your server, you will need to execute this cell again to load the tests.</font>

In [None]:
# Don't change this cell; just run it. 

import numpy as np
from datascience import *

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

from test import *

## 1. Catching Cheaters at the Casino


Suppose you are a casino owner, and your casino runs a very simple game of chance using coins.  The dealer flips a (*hopefully fair*) coin.  The customer wins \$7 from the casino if it comes up heads and loses \$10 if it comes up tails.

**Question 1.** Assuming no one is cheating and the coin is fair (i.e. a 50% chance of heads and a 50% chance of tails), if a customer plays three times, what is the chance they make money?

*Hint*: Think about the number of different outcomes that results in the customer winning.

In [None]:
p_winning_after_two_flips = ...


In [None]:
check1_1(p_winning_after_two_flips)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


A certain customer plays the game 20 times and wins 14 of the bets.  You suspect that the customer is cheating!  That is, you think that their chance of winning is higher than the normal chance of winning.

You decide to test your hunch using the outcomes of the 20 games you observed.

#### Question 2
Define the null hypothesis and alternative hypothesis for this investigation.

**Null hypothesis:** ...

**Alternative hypothesis:** ...

In [None]:
# DO NOT DELETE THIS CELL


**Question 3.** Given the outcome of 20 games, which of the following test statistics would be a reasonable choice for this hypothesis test?

1. Whether there is at least one win.
1. Whether there is at least one loss.
1. The number of wins.
1. The number of wins minus the number of losses.
1. The total variation distance between the probability distribution of a fair coin and the observed distribution of heads and tails.
1. The total amount of money that the customer won.

Assign `reasonable_test_statistics` to a **list** of numbers corresponding to these test statistics.

*Hint*: Think about which statistic gives evidence that the alternative hypothesis you wrote down is correct.

In [None]:
reasonable_test_statistics = ...


In [None]:
check1_3(reasonable_test_statistics)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 4.** Write a function called `simulate`.  It should take no arguments.  It should return the number of wins in 20 games simulated under the assumption that the result of each game is sampled from a fair coin (*the null hypothesis*).

*Hint*: Use the .sample() function

In [None]:
def simulate():
    return ...


simulate()

In [None]:
check1_4(simulate)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 5.** Using a 10,000 trial simulation, generate a histogram of the empirical distribution of the number of wins in 20 games.

In [None]:
test_statistics_under_null = ... # store the results of your simulations into this variable
... # plot a histogram using test_statistics_under_null



**Question 6.** Compute an empirical P-value for this test.

In [None]:
p_value = ...


In [None]:
check1_6(p_value)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


#### Question 7
You want to be very sure that the customer is cheating before accusing them.  Suppose you use a P-value cutoff of 1%, according to the arbitrary conventions of hypothesis testing.  What do you conclude?

1. The result of the test is not significant. We fail to reject the null hypothesis.
2. The result of the test is significant. We reject the null hypothesis.

Assign `conclusion` to a **int** representing your conclusion.

In [None]:
conclusion = ...


In [None]:
check1_7(conclusion)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


#### Question 8
Is `p_value` the probability that the customer cheated, or the probability that the customer did not cheat, or neither?

1. the probability that the customer cheated
2. the probability that the customer did not cheat
3. neither

Assign `p_value_probability` to an **int** representing your choice.

In [None]:
p_value_probability = ...


In [None]:
check1_8(p_value_probability)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


#### Question 9
Is 1% (the P-value cutoff) the probability that the customer cheated, or the probability that the customer did not cheat, or neither?

1. the probability that the customer cheated
2. the probability that the customer didn't cheat
3. neither

Assign `p_value_cutoff_probability` to a **int** representing your choice.

In [None]:
p_value_cutoff_probability = ...


In [None]:
check1_9(p_value_cutoff_probability)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


**Question 10**

Suppose you run this test for 500 different customers after observing each customer play 20 games.  When you reject the null hypothesis for a customer, you accuse that customer of cheating.  If no customers were actually cheating, how many would you expect to accuse, on average (if any)?  Assume a 1% P-value cutoff. Explain your answer.

*Write your answer here, replacing this text.*

In [None]:
# DO NOT DELETE THIS CELL


## 2. Testing Dice


Students in a Data Science class want to figure out whether a six-sided die is fair or not. On a fair die, each face of the die appears with chance 1/6 on each roll, regardless of the results of other rolls.  Otherwise, a die is called unfair.  We can describe a die by the probability of landing on each face.  For example, this table describes a die that is unfairly weighted toward 1:

|Face|Probability|
|--|
|1|.5|
|2|.1|
|3|.1|
|4|.1|
|5|.1|
|6|.1|



#### Question 1
Define a null hypothesis and an alternative hypothesis for this question.

**Null hypothesis:** ...

**Alternative hypothesis:** ...

In [None]:
# DO NOT DELETE THIS CELL


We decide to test the die by rolling it 5 times. The proportions of the 6 faces in these 5 rolls are stored in a table with 6 rows.  For example, here is the table we'd make if the die rolls ended up being 1, 2, 3, 3, and 5:

|Face|Proportion|
|--|
|1|.2|
|2|.2|
|3|.4|
|4|.0|
|5|.2|
|6|.0|

The function `mystery_test_statistic`, defined below, takes a single table like this as its argument and returns a number (which we will use as a test statistic).

In [None]:
# Note: We've intentionally used obscurantist function and
# variable names to avoid giving away answers.  It's rarely
# a good idea to use names like "x" in your code.  It makes
# it difficult for other people to read your code, like us!

def mystery_test_statistic(sample):
    x = sum(sample.column("Face")*sample.column("Proportion"))
    y = np.mean(np.arange(1, 6+1, 1))
    return abs(x - y)

#### Question 2
Describe in English what the test statistic is.  Is it equivalent to the total variation distance between the observed face distribution and the fair die distribution?

*Write your answer here, replacing this text.*

In [None]:
# DO NOT DELETE THIS CELL


The function `simulate_observations_and_test` takes as its argument a table describing the probability distribution of a die.  It simulates one set of 5 rolls of that die, then tests the null hypothesis about that die using our test statistic function above.  It returns `False` if it __*rejects*__ the null hypothesis about the die, and `True` otherwise.

Run the cell multiple times and notice the result of the function

In [None]:
# The probability distribution table for a fair die:
fair_die = Table().with_columns(
        "Face", np.arange(1, 6+1),
        "Probability", [1/6, 1/6, 1/6, 1/6, 1/6, 1/6])

def simulate_observations_and_test(actual_die):
    """Simulates die rolls from actual_die and tests the hypothesis that the die is fair.
    
    Returns True if that hypothesis is accepted, and False otherwise."""
    sample_size = 5
    p_value_cutoff = .3
    num_simulations = 250
    
    observation_set = proportions_from_distribution(actual_die, "Probability", sample_size)
    actual_statistic = mystery_test_statistic(observation_set.relabeled("Random Sample", "Proportion"))
    
    simulated_statistics = make_array()
    for _ in np.arange(num_simulations):
        one_observation_set_under_null = proportions_from_distribution(fair_die, "Probability", sample_size)
        simulated_statistic = mystery_test_statistic(one_observation_set_under_null.relabeled("Random Sample", "Proportion"))
        simulated_statistics = np.append(simulated_statistics, simulated_statistic)
    p_value = np.count_nonzero(actual_statistic < simulated_statistics) / num_simulations
    
    return p_value >= p_value_cutoff

# Calling the function to simulate a test of a fair die:
simulate_observations_and_test(fair_die)

#### Question 3
By examining `simulate_observations_and_test`, compute the probability that it returns `False` when its argument is `fair_die` (which is defined above the function).  You can call the function a few times to see what it does, but **don't** perform a simulation to compute this probability.  Use your knowledge of hypothesis tests and examining the code carefully.

In [None]:
probability_of_false = ...


In [None]:
check2_3(probability_of_false)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


#### Question 4

From the perspective of someone who wants to know the truth about the die, is it good or bad for the function to return `False` when its argument is `fair_die`?

*Write your answer here, replacing this text.*

In [None]:
# DO NOT DELETE THIS CELL


This problem is known as a Type I Error (see Wikipedia [here](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors)).  Generally when making a hypothesis test we pick our p-value cutoff (and test statistic) to be small, like 1% or so, in order to ensure that the probability of this happening is low.

#### Question 5
Verify your answer to question 3 by simulation, computing an approximate probability that `simulation_observations_and_test` returns `False`.  

*Hint:* You can use the `simulate_observations_and_test` function defined earlier.


<font color="red">**Note:** This simulation shouldn't take very long to run; only about 15 seconds on Vocareum.  Please be Patient</font>

In [None]:
num_test_simulations = 300

# For reference, the staff solution involved 6 lines of code before this.
...
approximate_probability_of_false = ...
approximate_probability_of_false

In [None]:
check2_5(approximate_probability_of_false)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


## 3. Landing a Spacecraft


(Note: This problem describes something that's close to [a real story with a very exciting video](http://www.space.com/29119-spacex-reusable-rocket-landing-crash-video.html), but the details have been changed somewhat poorly.)

SpaceY, a company that builds and tests spacecraft, is testing a new reusable launch system.  Most spacecrafts use a "first stage" rocket that propels a smaller payload craft away from Earth, then falls back to the ground and crashes.  SpaceY's new system is designed to land safely at a landing pad at a certain location, ready for later reuse.  If it doesn't land in the right location, it crashes, and the (very expensive) vehicle is destroyed.

SpaceY has tested this system over 1000 times.  Ordinarily, the vehicle doesn't land exactly on the landing pad.  For example, a gust of wind might move it by a few meters just before it lands.  It's reasonable to think of these small errors as random.  That is, the landing locations are drawn from some distribution over locations on the surface of Earth, centered around the landing pad.

Run the next cell to see a plot of those locations.

In [None]:
ordinary_landing_spots = Table.read_table("ordinary_landing_spots.csv")
ordinary_landing_spots.scatter("x", label="Landing locations")
plots.scatter(0, 0, c="w", s=1000, marker="*", label="Landing pad")
plots.legend(scatterpoints=1, bbox_to_anchor=(1.6, .5));

During one test, the vehicle lands far away from the landing pad and crashes.  SpaceY investigators suspect there was a problem unique to this landing, a problem that wasn't part of the ordinary pattern of variation in landing locations.  They think a software error in the guidance system caused the craft to incorrectly attempt to land at a spot other than the landing pad.  The guidance system engineers think there was nothing out of the ordinary in this landing, and that there was no special problem with the guidance system.

Run the cell below to see a plot of the 1100 ordinary landings and the crash.

In [None]:
landing_spot = make_array(80.59, 30.91)
ordinary_landing_spots.scatter("x", label="Other landings")
plots.scatter(0, 0, c="w", s=1000, marker="*", label="Landing pad")
plots.scatter(landing_spot.item(0), landing_spot.item(1), marker="*", c="r", s=1000, label="Crash site")
plots.legend(scatterpoints=1, bbox_to_anchor=(1.6, .5));

#### Question 1
Suppose we'd like to use hypothesis testing to shed light on this question.  We've written down an alternative hypothesis below.  What is a reasonable null hypothesis?

**Null hypothesis:** ...

**Alternative hypothesis:** This landing was special; its location was a draw from some other distribution, not the distribution from which the other 1100 landing locations were drawn.

In [None]:
# DO NOT DELETE THIS CELL


#### Question 2
Write a function called `landing_test_statistic`.  It should take two arguments: an "x" location and a "y" location (both numbers).  It should return the distance between those coordinates and the origin (0,0).  

*Hint:* use the [Pythagorean theorem](https://en.wikipedia.org/wiki/Pythagorean_theorem).

In [None]:
def landing_test_statistic(x_coordinate, y_coordinate):
    return ...
    


In [None]:
check3_2(landing_test_statistic)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


#### Question 3
The cell below computes a P-value using your test statistic.  Describe the test procedure in words.  Is there a simulation involved?  If so, what is being simulated? If not, why not?

In [None]:
test_stat = landing_test_statistic(
    landing_spot.item(0),
    landing_spot.item(1))

null_stats = make_array()
for i in np.arange(ordinary_landing_spots.num_rows):
    null_stat = landing_test_statistic(
        ordinary_landing_spots.column('x').item(i),
        ordinary_landing_spots.column('y').item(i))
    null_stats = np.append(null_stats, null_stat)

p_value = np.count_nonzero(null_stats > test_stat) / len(null_stats)
p_value

*Write your answer here, replacing this text.*

In [None]:
# DO NOT DELETE THIS CELL


#### Question 4
Does the small p-value indicate that there is a large difference between the landing locations normally and the landing location of the crash site?

1. Yes
2. No

Assign `p_value_difference` to an **int** representing your choice.

In [None]:
p_value_difference = ...


In [None]:
check3_4(p_value_difference)

In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


## 4. Submission


To submit your assignment, click the red Submit button above. You may submit as many times as you wish before the deadline. Only your final submission will be graded. No late work will be accepted, so please make sure you submit something before the deadline!

Before you submit, it would be wise to click on the menu item Kernel -> Restart & Run All. That will re-run all your cells from scratch. Take a second look to make sure all your answers are passing the checks. Doing this will help catch any errors in your homework that result from running cells in a strange order.