# Homework 6: Hypothesis Testing and Permutation Testing

## Due Tuesday, November 26th at 11:59PM

Welcome to Homework 6, the last homework of the quarter! This homework covers hypothesis testing ([CIT 11](https://inferentialthinking.com/chapters/11/Testing_Hypotheses.html)) and permutation testing ([CIT 12](https://inferentialthinking.com/chapters/12/Comparing_Two_Samples.html)).

### Instructions

You are given six slip days throughout the quarter to extend deadlines. See the syllabus for more details. With the exception of using slip days, late work will not be accepted unless you have made special arrangements with your instructor.

**Important**: For homeworks, the `otter` tests don't usually tell you that your answer is correct. More often, they help catch careless mistakes. It's up to you to ensure that your answer is correct. If you're not sure, ask someone (not for the answer, but for some guidance about your approach). These are great questions for office hours (see the schedule on the [Calendar](https://dsc10.com/calendar)) or Ed. Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged.

In [None]:
# Please don't change this cell, but do make sure to run it
import babypandas as bpd
import numpy as np

import matplotlib.pyplot as plt
plt.style.use('ggplot')

import otter
grader = otter.Notebook()

## 1. Was it by Random Chansey? 🎲

<img src='images/chansey.png' width='250'>

You recently decided to buy the video game *Pokémon Yellow* from someone on Ebay. The seller tells you that they've modified the game so that the probabilities of encountering certain Pokémon in certain locations have been altered. However, the seller doesn't tell you which specific locations have had their probability models changed and what they've been changed to.

As you are playing *Pokémon Yellow*, you arrive at the Safari Zone, one of the most iconic locations in the game. You're curious as to your chances of encountering your favorite Pokémon, Chansey, in this location. You go onto [Bulbapedia](https://bulbapedia.bulbagarden.net/wiki/Kanto_Safari_Zone#Area_1) to find the probability model for this location, and you discover that for each Pokémon encounter in the Safari Zone, there is a 4% chance of encountering Chansey. 

After a few hours of gameplay in the Safari Zone, you have encountered Chansey **48 times out of 821 total Pokémon encounters**, which is almost 6% of the time! You start to suspect that the Safari Zone may have been one of the locations in which the previous owner of the game changed the probability model.

To test this, you decide to run a hypothesis test with the following hypotheses:

- **Null Hypothesis**: In your copy of *Pokémon Yellow*, the probability of encountering Chansey at each Pokémon encounter in the Safari Zone is 4%. 

- **Alternative Hypothesis**: In your copy of *Pokémon Yellow*, the probability of encountering Chansey at each Pokémon encounter in the Safari Zone is greater than 4%.

**Question 1.1.** Complete the implementation of the function `one_simulation`, which has no arguments. It should randomly generate 821 Pokémon encounters in the Safari Zone and return the **proportion** of encountered Pokémon that were Chansey. 

***Hint:*** Use `np.random.multinomial`. You don't need a `for`-loop.

In [None]:
def one_simulation():
    ...
    
one_simulation()

In [None]:
grader.check("q1_1")

**Question 1.2.** The test statistic for our hypothesis test will be the difference between the proportion of Chansey encounters in a given sample of 821 Safari Zone encounters and the expected proportion of Chansey encounters, i.e.

$$\text{test statistic} = \text{proportion of Chansey encounters in sample} - 0.04$$


Let's conduct 10,000 simulations. Create an array named `proportion_diffs` containing 10,000 simulated values of the test statistic described above. Utilize the function created in the previous question to perform this task.

In [None]:
proportion_diffs = ...

# Visualize with a histogram. Don't change anything below.
bpd.DataFrame().assign(proportion_differences=proportion_diffs).plot(kind='hist', bins=20, density=True, ec='w', figsize=(10, 5));
plt.axvline(x=(48 / 821 - 0.04), color='black', linewidth=4, label='observed statistic')
plt.legend();

In [None]:
grader.check("q1_2")

**Question 1.3.** Calculate the p-value for this hypothesis test, and assign the result to `safari_zone_p`.

***Hint:*** Do large values of our test statistic favor the alternative hypothesis, or do small values of our test statistic favor the alternative hypothesis?

In [None]:
safari_zone_p = ...
safari_zone_p

In [None]:
grader.check("q1_3")

**Question 1.4.** Using the standard p-value cutoff of 0.05, what can we conclude from our hypothesis test? Assign either 1, 2, 3, or 4 to the variable `safari_zone_conclusion`, corresponding to the best conclusion.
   
   1. We reject the null hypothesis. There is not enough evidence to say that the observed data is inconsistent with the model.
   1. We reject the null hypothesis. The observed data is inconsistent with the model.
   1. We accept the null hypothesis. The observed data is consistent with the model.
   1. We fail to reject the null hypothesis. There is not enough evidence to say that the observed data is inconsistent with the model.

In [None]:
safari_zone_conclusion = ...

In [None]:
grader.check("q1_4")

**Question 1.5.** In this question, we chose as our test statistic the proportion of Chansey encounters in the Safari Zone minus 0.04. But this is not the only statistic we could have chosen; there are many that could have worked here. 

From the options below, choose the test statistic that would **not** have worked for this hypothesis test, and assign 1, 2, 3, or 4 to the variable `bad_choice`.

1. The number of Chansey encounters out of 821 enounters in the Safari Zone.
1. The proportion of Chansey encounters in the Safari Zone.
1. 0.04 minus the proportion of Chansey encounters in the Safari Zone.
1. The absolute difference between 0.04 and the proportion of Chansey encounters in the Safari Zone.

***Hint:*** Our goal is to find a test statistic that will help us determine whether we encounter Chansey **more** often than expected.

In [None]:
bad_choice = ...
bad_choice

In [None]:
grader.check("q1_5")

## 2. Let's Roll 🍣🍥🥢

As some of you may know, [The Bistro](https://hdh-web.ucsd.edu/dining/apps/diningservices/Restaurants/Venue_V3?locId=27&subLoc=00&locDetID=13&dayNum=0) is a popular specialty dining hall on campus located in Seventh College, which serves many types of sushi rolls. Our DSC 10 tutor, Daniel, is a big fan of The Bistro and spends a lot of time there. He proposes the following probability distribution for how frequently each type of sushi roll is ordered, based on his own observations. Note that the sum of the estimated probabilities is 1.

| Type | Daniel's Estimated Probability|
| --- | --- |
| Cucumber Avocado Roll | $0.08$ |
| Seared Tuna Roll | $0.09$ |
| Spicy Tuna Roll | $0.08$ |
| Horizon Roll | $0.11$ |
| The OC Roll | $0.15$ |
| Rainbow Roll | $0.12$ |
| Sun God Roll| $0.16$ |
| Dragon Roll|$0.09$|
| Crunchy Roll| $0.12$|

We'll store this **proposed** distribution in an array, in the order shown above.

In [None]:
# Just run this cell, do not change it!
proposed_dist = np.array([0.08, 0.09, 0.08, 0.11, 0.15, 0.12, 0.16, 0.09, 0.12])
proposed_dist

To assess the validity of Daniel's model, you collect data directly from The Bistro. You learn that their last 1,000 sushi orders were as follows:
- 85 `'Cucumber Avocado Roll'`
- 83 `'Seared Tuna Roll'`
- 90 `'Spicy Tuna Roll'`
- 104 `'Horizon Roll'`
- 162 `'The OC Roll'`
- 112 `'Rainbow Roll'`
- 145 `'Sun God Roll`
- 115 `'Dragon Roll` 
- 104 `'Crunchy Roll`

You then calculate the **observed** distribution using the data you collected and store it in an array as well (in the same order as before):

In [None]:
# Just run this cell, do not change it!
observed_dist = np.array([85, 83, 90, 104, 162, 112, 145, 115, 104]) / 1000
observed_dist

While `observed_dist` is not identical to `proposed_dist`, it's still possible that Daniel's model is plausible, and that the differences are simply due to random chance. Let's run a hypothesis test to investigate further, using the following hypotheses: 

- **Null Hypothesis**: Sushi orders at The Bistro are randomly drawn from the distribution `proposed_dist`.

- **Alternative Hypothesis**: Sushi orders at The Bistro are _not_ drawn randomly from the distribution `proposed_dist`.

Note that this hypothesis test involves nine proportions, one for each type of sushi.

**Question 2.1.**  Which of the following is **not** a reasonable choice of test statistic for this hypothesis test? Assign 1, 2, or 3 to the variable `unreasonable_test_statistic`. 
1. The sum of the absolute difference between the proposed distribution (Daniel's expected proportion of types) and the observed distribution (actual proportion of types).
1. The absolute difference between the sum of the proposed distribution (Daniel's expected proportion of types) and the sum of the observed distribution (actual proportion of types).
1. Among all nine sushi types, the largest absolute difference between Daniel's expected proportion and the actual proportion of sushi of that type.

In [None]:
unreasonable_test_statistic = ...

In [None]:
grader.check("q2_1")

**Question 2.2.** We'll use the TVD, i.e. **total variation distance**, as our test statistic. Below, complete the implementation of the function `total_variation_distance`, which takes as input two distributions (stored as arrays) and returns the total variation distance between those distributions.

Then, use the function `total_variation_distance` to determine the TVD between the type distribution proposed by Daniel and the observed distribution of types. Assign this TVD to `observed_tvd`.

In [None]:
def total_variation_distance(first_distrib, second_distrib):
    '''Computes the total variation distance between two distributions.'''
    ...

observed_tvd = ...
observed_tvd

In [None]:
grader.check("q2_2")

**Question 2.3.** Now, we'll calculate 3,000 simulated TVDs to see what a typical TVD between the proposed distribution and a simulated distribution would look like if Daniel's model were accurate. Since our real-life data includes 1000 sushi orders, in each trial of the simulation, we'll:
- draw 1000 sushi orders at random from Daniel's proposed distribution, then 
- calculate the TVD between **Daniel's proposed type distribution** and the **type distribution from the simulated sample**. 

Store these 3,000 simulated TVDs in an array called `simulated_tvds`.

In [None]:
simulated_tvds = ...

# Visualize the distribution of TVDs with a histogram
bpd.DataFrame().assign(simulated_tvds=simulated_tvds).plot(kind='hist', density=True, ec='w', figsize=(10, 5));
plt.axvline(x=observed_tvd, color='black', linewidth=4, label='observed TVD')
plt.legend();

In [None]:
grader.check("q2_3")

**Question 2.4.** Now, determine the p-value for our test by finding the proportion of times in our simulation that we saw a TVD greater than or equal to our observed TVD. Assign your result to `sushi_p`.

In [None]:
sushi_p = ...
sushi_p

In [None]:
grader.check("q2_4")

**Question 2.5.** Using the p-value cutoff of 0.01, what can we conclude from our hypothesis test? Assign either 1, 2, 3, or 4 to the variable `sushi_conclusion`, corresponding to the best conclusion.
   
   1. We accept the null hypothesis. The observed data is consistent with the model.
   1. We reject the null hypothesis. There is not enough evidence to say if the observed data is consistent with the model.
   1. We reject the null hypothesis. The observed data is inconsistent with the model.
   1. We fail to reject the null hypothesis. There is not enough evidence to say that the observed data is inconsistent with the model.

In [None]:
sushi_conclusion = ...
sushi_conclusion

In [None]:
grader.check("q2_5")

## 3. Chocolate 🍫😋
<img src='images/chocolate_bars.png' width='1000'>

Chocolate is a well-loved treat that many enjoy, but some people take their chocolate very seriously. [The Manhattan Chocolate Society](https://flavorsofcacao.com/mcs_index.html) is an invitation-only society founded to taste and review chocolate bars from around the world. The [Flavors of Cacao database](https://flavorsofcacao.com/index.html) was born from tastings done by this exclusive society, and it contains reviews of almost 2,700 different dark chocolate bars. Which dark chocolate bars do these connoisseurs consider to be the best? Let's find out!

Run the next cell to load in the data.

In [None]:
choco = bpd.read_csv('data/chocolate.csv')
choco

We will primarily be working with the `'Characteristics'` and `'Rating'` columns. The `'Rating'` column contains a score from 1 to 5. According to Flavors of Cacao, each rating can be interpreted as follows:

| Rating | Meaning |
| ------ | ------- |
| 4.0 - 5.0  | Outstanding |
| 3.5 - 3.9  | Highly Recommended |
| 3.0 - 3.49 | Recommended |
| 2.0 - 2.9  | Disappointing |
| 1.0 - 1.9  | Unpleasant |

Ratings are determined by a combination of factors including flavor, texture, and "aftermelt", or the lingering experience after the chocolate has melted in your mouth.

The `'Characteristics'` column contains the *most memorable characteristics* of each chocolate bar. Each bar may have several memorable characteristics, separated by a comma. For example, the chocolate bar at the last index of the DataFrame was memorable for its woody flavor and butterscotch notes.

Compared to other types of chocolate, dark chocolate tends to be less sweet. However, quite a few of the chocolate bars in the DataFrame above were memorable for being sweet. How do sweet dark chocolate bars get rated relative to non-sweet dark chocolate bars? In this section, we will explore whether the ratings for sweet chocolate bars come from the same distribution as non-sweet chocolate bars. 

**Question 3.1.** Complete the implementation of the function `label_sweet`, which takes in a string of characteristics associated with a single row of `choco` and returns one of two strings. If `'sweet'` is among these characteristics, then the function should return `'Sweet'`, otherwise it should return `'Not Sweet'`.

Once you've done that, use your function to help you create a new DataFrame named `labeled` that has all the same columns as `choco`, in the same order, with an additional column named `'Sweetness'` that contains whether the chocolate bar is characterized as sweet. The `'Sweetness'` column should contain only two distinct values: `'Sweet'` and `'Not Sweet'`.

***Note:*** Some chocolate bars may have characteristics where `'sweet'` is contained within a word, such as `'bittersweet'`. For this question, we only want to identify bars where a characteristic is `'sweet'` itself. For example, `label_sweet('nutty, bittersweet, chalky')` should evaluate to `'Not Sweet'`.

In [None]:
def label_sweet(characteristics): 
    ...
    
labeled = ...
labeled

In [None]:
grader.check("q3_1")

**Question 3.2.** Assign `chocolate` to a DataFrame with only two columns, `'Sweetness'` and `'Rating'`, since these are the only relevant columns in `labeled` to answer the question we've proposed.

In [None]:
chocolate = ...
chocolate

In [None]:
grader.check("q3_2")

**Question 3.3.** Using the DataFrame `chocolate`, calculate the difference between the **mean** `'Rating'` of sweet chocolate bars and non-sweet chocolate bars. Assign your answer to `observed_difference`.

$$\text{observed difference} = \text{mean rating of sweet chocolate bars} - \text{mean rating of non-sweet chocolate bars}$$

In [None]:
observed_difference = ...
observed_difference

In [None]:
grader.check("q3_3")

**Question 3.4.** What does the number you obtained for `observed_difference` mean? Assign `interpretation` to 1, 2, 3, 4, 5 or 6 corresponding to the best explanation below.

1. In our sample, the mean rating for sweet chocolate bars is higher than the mean rating for non-sweet chocolate bars by about 16 percent.
1. In our sample, the mean rating for sweet chocolate bars is higher than the mean rating for non-sweet chocolate bars by about 0.16 percent.
1. In our sample, the mean rating for sweet chocolate bars is higher than the mean rating for non-sweet chocolate bars by about 0.16 rating points.
1. In our sample, the mean rating for sweet chocolate bars is lower than the mean rating for non-sweet chocolate bars by about 16 percent.
1. In our sample, the mean rating for sweet chocolate bars is lower than the mean rating for non-sweet chocolate bars by about 0.16 percent.
1. In our sample, the mean rating for sweet chocolate bars is lower than the mean rating for non-sweet chocolate bars by about 0.16 rating points.


In [None]:
interpretation = ...

In [None]:
grader.check("q3_4")

**Question 3.5.** Now we want to conduct a **permutation test** to see if sweet chocolate bars actually have a lower rating on average than non-sweet chocolate bars, or whether this was just observed in our sample by random chance.

- **Null Hypothesis**: The ratings of sweet chocolate bars and non-sweet chocolate bars come from the same distribution.  
- **Alternative Hypothesis**: The ratings of sweet chocolate bars are lower on average than the ratings of non-sweet chocolate bars.

Run a permutation test to see if the `observed_difference` you calculated in Question 3.3 is actually a statistically significant difference. Simulate 1000 values of the test statistic by shuffling the `'Sweetness'` column of `chocolate` and calculating the difference in mean rating between the two groups determined by the shuffling (again, in the order sweet minus non-sweet). Store your 1000 differences in the `differences` array. 

***Hint:*** It's a good idea to simulate one value of the test statistic before putting everything in a for-loop.

In [None]:
differences = ...

# Just display the first ten differences.
differences[:10]

In [None]:
grader.check("q3_5")

**Question 3.6.** Compute a p-value for this hypothesis test and assign your answer to `chocolate_p`. To decide whether to use `<=` or `>=` in the calculation of the p-value, think about whether larger values or smaller values of our test statistic favor the alternative hypothesis.

In [None]:
chocolate_p = ...
chocolate_p

In [None]:
grader.check("q3_6")

**Question 3.7.** Assign the variable `chocolate_conclusion` to a **list** of all the true statements below.

1. We accept the null hypothesis at the 0.01 significance level.
1. We reject the null hypothesis at the 0.01 significance level.
1. We fail to reject the null hypothesis at the 0.01 significance level.
1. We accept the null hypothesis at the 0.05 significance level.
1. We reject the null hypothesis at the 0.05 significance level.
1. We fail to reject the null hypothesis at the 0.05 significance level.

Then, interpret your results by setting `sweeter_is_worse` to `True` or `False`, based on the outcome of your permutation test. `True` means that sweet chocolate bars actually do have lower ratings than non-sweet bars, and `False` means they do not.

In [None]:
chocolate_conclusion = ...
sweeter_is_worse = ...

In [None]:
grader.check("q3_7")

**Question 3.8.** Suppose in this question you had shuffled the `'Rating'` column instead and kept the `'Sweetness'` column in the same order. Assign `shuffled_rating` to either 1, 2, 3, or 4, corresponding to the true statement below.


1. The new p-value from shuffling `'Rating'` would be $1 - p$, where $p$ is the old p-value from shuffling `'Sweetness'` (i.e. your answer to Question 3.6).
1. We would need to change our null hypothesis in order to shuffle the `'Rating'` column. 
1. There would be no difference in the conclusion of the test if we had shuffled the `'Rating'` column instead.
1. The `'Rating'` column cannot be shuffled because it contains numbers.

In [None]:
shuffled_rating = ...

In [None]:
grader.check("q3_8")

**Question 3.9.** Which of the following choices best describes the purpose of shuffling one of the columns in our dataset in a permutation test? Assign `why_shuffle` to either 1, 2, 3, or 4, corresponding to the true statement below.

1. Shuffling mitigates noise in our data by generating new permutations of the data.
1. Shuffling is a special case of bootstrapping and allows us to produce interval estimates.
1. Shuffling allows us to generate new data under the null hypothesis, which we can use in testing our hypothesis.
1. Shuffling allows us to generate new data under the alternative hypothesis, which helps us identify when the data come from different distributions.

In [None]:
why_shuffle = ...

In [None]:
grader.check("q3_9")

Feel free to explore the chocolate data some more to see if other characteristics are linked with higher or lower ratings! 

## 4. New York Times Mini Crossword 🧩🕐
<img src='images/nyt_mini_crossword.png' width='500'>

[The New York Times Mini Crossword](https://www.nytimes.com/crosswords/game/mini) is a smaller and quicker version of the traditional crossword puzzle. It features straightforward clues and is designed to be completed in a few minutes. After completing the puzzle, players have the option to send the time it took to complete the puzzle to their friends, to try to compete for the lowest time.

Ciro and Athu have been playing the New York Times Mini Crossword for a couple months, often sending each other the time it takes for them to complete each puzzle. Today, Ciro's time was much faster than Athu's so he bragged that he is better than Athu at the game. Athu vehemently disagrees and thinks that they are equally skilled.

Since the two of them have learned about hypothesis testing in DSC10, they decided to look at their history of times to determine if they were equally skilled or if one of them was better than the other.

Let's look at all the data that they collected. Each entry in the `'Time'` column represents the amount of time it took in seconds for one person to complete the New York Times Mini Crossword on a single day. There are 25 times for Ciro and 25 times for Athu.

In [None]:
mini_cw = bpd.read_csv('data/mini-crossword.csv')
mini_cw 

**Question 4.1.** Now let's address the question: how does the average time for Ciro to complete the crossword compare to Athu's average time? Create a DataFrame called `ciro` with only the rows of `mini_cw` that correspond to Ciro, and set `ciro_mean` to Ciro's mean time to complete the crossword. Similarly, create a DataFrame `athu` for Athu and compute `athu_mean`. Finally, set `observed_diff_mean`, to the difference in mean times to complete the crossword in our sample, computed as follows.

$$\text{difference} = \text{mean time to complete the crossword for Ciro} - \text{mean time to complete the crossword for Athu}$$


In [None]:
ciro = ...
athu = ...
ciro_mean = ...
athu_mean = ...
observed_diff_mean = ...
observed_diff_mean

In [None]:
grader.check("q4_1")

If you answered Question 4.1 correctly, you should have noticed a difference in the average times between Ciro and Athu. But we only have a small sample of their performance on the Mini Crossword, so it's possible that this difference is merely a result of the specific samples we happened to collect. Let's do a **hypothesis test** to find out if there's actually a difference in their abilities. We'll state our hypotheses as follows:

- **Null Hypothesis**: The average time it takes to complete the New York Times Mini Crossword is the same for both Ciro and Athu. In other words, their difference in average time is equal to 0 seconds.

- **Alternative Hypothesis**: The average time it takes to complete the New York Times Mini Crossword is not the same for both Ciro and Athu. In other words, the difference in average time between the two of them is not equal to 0 seconds.


Since we are able to frame our hypothesis test as a question of whether a certain population parameter – the difference in average times – is equal to a specific value, we can **test our hypotheses by constructing a confidence interval** for this parameter. For a refresher on this method, refer to [CIT 13.4](https://inferentialthinking.com/chapters/13/4/Using_Confidence_Intervals.html) or the human body temperature example from [Lecture 22](https://dsc10.com/resources/lectures/lec22/lec22.html).

***Note:*** We are **not** conducting a permutation test here, although that would also be a valid approach to test these hypotheses.

**Question 4.2.** Compute 1000 **bootstrapped estimates** for the difference in average times between Ciro and Athu. As in Question 4.1, calculate the difference as Ciro minus Athu. Store your 1000 estimates in the `difference_means` array.

You should generate your Ciro resamples by sampling from `ciro`, and your Athu resamples by sampling from `athu`. Do not use the combined dataset `mini_cw` for this task, otherwise you might not wind up with 25 of each!

In [None]:
difference_means = ...

# Just display the first ten differences.
difference_means[:10]

In [None]:
grader.check("q4_2")

Let's visualize your estimates:

In [None]:
bpd.DataFrame().assign(BootstrappedDifferenceMeans = difference_means).plot(kind = 'hist', density=True, ec='w', bins=20, figsize=(10, 5));

**Question 4.3.** Compute a 95% confidence interval for the difference in mean times (as before, in the order Ciro minus Athu). Assign the left and right endpoints of this confidence interval to `left_endpoint` and `right_endpoint` respectively. 

In [None]:
left_endpoint = ...
right_endpoint = ...

print('Bootstrapped 95% confidence interval for the mean difference in time to complete the crossword for Ciro and Athu:\n [{:f}, {:f}]'.format(left_endpoint, right_endpoint))

In [None]:
grader.check("q4_3")

**Question 4.4.** Based on the confidence interval you've created, would you reject the null hypothesis at the 0.05 significance level? Set `reject_null` to True if you would reject the null hypothesis, and False if you would not.

In [None]:
reject_null = ...

In [None]:
grader.check("q4_4")

**Question 4.5.** Consider what would happen if Ciro and Athu collected their times in minutes instead of seconds. Would your confidence interval have the same endpoints either way? Set `same_endpoints` to True or False. Would your hypothesis test still come to the same conclusion either way? Set `same_conclusion` to True or False.

In [None]:
same_endpoints = ...
same_conclusion = ...

In [None]:
grader.check("q4_5")

## Finish Line: Almost there, but make sure to follow the steps below to submit! 🏁

**_Citations:_** Did you use any generative artificial intelligence tools to assist you on this assignment? If so, please state, for each tool you used, the name of the tool (ex. ChatGPT) and the problem(s) in this assignment where you used the tool for help.

<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">

Please cite tools here.

<hr style="color:Maroon;background-color:Maroon;border:0 none; height: 3px;">

Congratulations! You are done with Homework 6 – the final homework of the quarter! 🎉

To submit your assignment:

1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells.
1. Read through the notebook to make sure everything is fine and all tests passed.
1. Run the cell below to run all tests, and make sure that they all pass.
1. Download your notebook using `File -> Download as -> Notebook (.ipynb)`, then upload your notebook to Gradescope.
1. Stick around while the Gradescope autograder grades your work. Make sure you see that all tests have passed on Gradescope.
1. Check that you have a confirmation email from Gradescope and save it as proof of your submission.

In [None]:
grader.check_all()