In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab06.ipynb")

<table style="width: 100%;">
<tr style="background-color: transparent;">
<td width="100px"><img src="https://cs104williams.github.io/assets/cs104-logo.png" width="90px" style="text-align: center"/></td>
<td>
  <p style="margin-bottom: 0px; text-align: left; font-size: 18pt;"><strong>CSCI 104: Data Science and Computing for All</strong><br>
                Williams College<br>
                Fall 2023</p>
</td>
</tr>


# Lab 6: Probability, Simulation, Models, Cleaning

<hr style="margin: 0px; border: 3px solid #500082;"/>

<h2>Instructions</h2>

- Before you begin, execute the cell at the TOP of the notebook to load the provided tests, as well as the following cell to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute these cells again.  
- Be sure to consult your [Python Reference](https://cs104williams.github.io/assets/python-library-ref.html)!
- Complete this notebook by filling in the cells provided. For problems asking you to write explanations, you **must** provide your answer in the designated space. 
- Please be sure to not re-assign variables throughout the notebook.  For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously.
- This lab has hidden tests on it. That means even though tests may say 100% passed, doesn't mean your final grade will be 100%. We will be running more tests for correctness once everyone turns in the lab.
- To use one or more late days on this lab, please fill out our [late day form](https://forms.gle/4sD16h3hN1xRqQM27) **before** the due date.

<hr/>
<h2>Setup</h2>


In [None]:
# Run this cell to set up the notebook.
# These lines import the numpy, datascience, and cs104 libraries.

import numpy as np
from datascience import *
from cs104 import *
%matplotlib inline

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 1. Roulette (55 pts)



<font color=#B1008E>
    
#### Learning objectives
- Simulate and evaluate the outcomes of various models involving chance.
- Use simulation and samples to estimate a probability.
</font>

A Nevada roulette wheel has 38 pockets and a small ball that rests on the wheel. When the wheel is spun, the ball comes to rest in one of the 38 pockets. That pocket is declared the winner. 

The pockets are labeled 0, 00, 1, 2, 3, 4, ... , 36. Pockets 0 and 00 are green, and the other pockets are alternately red and black. The table `wheel` below is a representation of a Nevada roulette wheel. 

*Note*: The data type of entries in *both* columns are strings.

<img src="roulette_wheel.jpeg" width="330px">

In [None]:
wheel = Table.read_table('roulette_wheel.csv', dtype=str)
wheel

### Betting on Red ###
If you bet on *red*, you are betting that the winning pocket will be red. This bet *pays 1 to 1*. That means if you place a one-dollar bet on red, then:

- If the winning pocket is red, you gain 1 dollar. That is, you get your original dollar back, plus one more dollar.
- If the winning pocket is not red, you lose your dollar. In other words, you gain -1 dollars.

Let's see if you can make money by betting on red at roulette.

#### Part 1.1 (5 pts)


 Define a function `dollar_bet_on_red` that takes the name of a color and returns your gain in dollars if that color had won and you had placed a one-dollar bet on red. 
 
 
*Hints:*
- Remember that the gain can be negative.
- Make sure your function returns an integer. 
- You can assume that the only colors that will be passed as arguments are `"red"`, `"black"`, and `"green"`. Your function doesn't have to check that.

In [None]:
def dollar_bet_on_red(color):
    ...


In [None]:
grader.check("p1.1")

#### Part 1.2 (5 pts)


Add a column labeled `Winnings for Red` to the table `wheel` and assign this new table to `wheel_w_red`. For each pocket (each row in `wheel`, the column `Winnings for Red` should contain your gain in dollars if that pocket won and you had bet one dollar on red. 

*Hints:*
- Your code should use the function `dollar_bet_on_red`.
- A simple solution will use one of these table methods: `pivot`, `group`, or `apply`. 

In [None]:
red_winnings = ...
wheel_w_red = ...
wheel_w_red

In [None]:
grader.check("p1.2")

### Simulating 10 bets on Red
Roulette wheels are set up so that each time they are spun, the winning pocket is equally likely to be any of the 38 pockets regardless of the results of all other spins. Let's see what would happen if we decided to bet one dollar on red each round.

#### Part 1.3 (5 pts)


Create a table `ten_bets` by sampling the table `wheel_w_red` to simulate 10 spins of the roulette wheel. 

This new table, `ten_bets`, should have the same three column labels as `wheel_w_red`. Once you've created that table, set `sum_bets` to your net gain in all 10 bets, assuming that you bet one dollar on red each time.

*Hint:* It may be helpful to print out `ten_bets` after you create it.

In [None]:
ten_bets = ...
sum_bets = ...(ten_bets.column('Winnings for Red'))

sum_bets

In [None]:
grader.check("p1.3")

#### Part 1.4 (5 pts)


 Let's see what would happen if you made more bets. Define a function `net_gain_red` that takes as an argument the number of bets and returns the net gain in that number of one-dollar bets on red. 

*Hint:* You should use your `wheel_w_red` table within your function definition.

In [None]:
def net_gain_red(n):
...

# Run cell a few times to observe what happens 
net_gain_red(10)

In [None]:
grader.check("p1.4")

#### Part 1.5 (5 pts)


Complete the cell below to simulate the net gain in 200 one-dollar bets on red, repeating the process 10,000 times. We have given you a function to compute the net gain for 200 bets that will be useful in your call to `simulate`.


After the cell is run, `all_gains_red` should be an array with 10,000 entries, each of which is the net gain in 200 one-dollar bets on red. 

*Note:* Running 10,000 times might take up to 60-90 seconds to run.

In [None]:
def net_gain_red_200():
    """Make one outcome for betting on red 200 times"""
    return net_gain_red(200)

all_gains_red = simulate(..., ...)


In [None]:
grader.check("p1.5")

Run the cell below to visualize the results of your simulation. We have plotted a yellow vertical line for at x=0 for visual convenience. 

In [None]:
gains = Table().with_columns('Net Gain on Red for 200 bets', all_gains_red)
plot = gains.hist(bins = np.arange(-80, 41, 4))
plot.line(0, color='yellow')

<!-- BEGIN QUESTION -->

#### Part 1.6 (5 pts)


 Using the histogram above, decide whether the following statement is true or false:

>If you make 200 one-dollar bets on red, your chance of losing money is more than 50%.

Explain your answer in one or two sentences.


<hr style="margin:0; border: 1px solid #FFBE0A;"/><font color='#FFBE0A'>Written Answer:</font>

_Type your answer here, replacing this text._


<hr style="margin:0; border: 1px solid #FFBE0A;"/>

<!-- END QUESTION -->

### Betting on a Split
If betting on red doesn't seem like a good idea, maybe a gambler might want to try a different bet. A bet on a *split* is a bet on two consecutive numbers such as 5 and 6. This bets pays 17 to 1. That means if you place a one-dollar bet on the split 5 and 6, then:

- If the winning pocket is either 5 or 6, your gain is 17 dollars.
- If any other pocket wins, you lose your dollar, so your gain is -1 dollars.

#### Part 1.7 (5 pts)


Define a function `dollar_bet_on_split` that takes as an argument the pocket number that the ball lands on and returns your gain in dollars if that pocket won and you had bet one dollar on the 5-6 split.

*Hint:* Remember that the pockets are represented as strings, such as `'5'` and `'6'`.

In [None]:
def dollar_bet_on_split(pocket):
...


In [None]:
grader.check("p1.7")

#### Part 1.8 (5 pts)


Add a column labeled `Winnings for Split` to the `wheel` table and assign this to the variable `wheel_w_split`. For each pocket (each row in the `wheel` table), the column `Winnings for Split` should contain your gain in dollars if that pocket won and you had bet one dollar on the 5-6 split. 

In [None]:
split_winnings = ...
wheel_w_split = ...

wheel_w_split

In [None]:
grader.check("p1.8")

#### Part 1.9 (5 pts)


 Simulate the net gain in 200 one-dollar bets on the 5-6 split, repeating the process 10,000 times and saving your gains in the array `all_gains_split`.  

*Hints:* 
- Create a helper function `net_gains_split_200()` that makes one outcome for splitting on 200 bets.
- Your code parts 4 and 5 may be helpful here. 

In [None]:
def net_gains_split_200():
    """Make one outcome for splitting 200 times"""    
    ...

all_gains_split = simulate(..., ...)


In [None]:
grader.check("p1.9")

<!-- BEGIN QUESTION -->

#### Part 1.10 (5 pts)


Run this code to create a histogram of `all_gains_split`:

In [None]:
# Do not change the two lines below
# We have plotted a yellow vertical line for at x=0 for visual convenience.
gains = gains.with_columns('Net Gain on Split for 200 bets', all_gains_split)
plot = gains.hist(bins = np.arange(-200, 150, 20))
plot.line(0, color='yellow')

Look carefully at the histograms above and state whether each of the following statements is true or false.  No need to justify the answers.  Just assign `True` or `False` for each of the four statements.

- **Statement 1**: If you bet one dollar 200 times on a split, your chance of losing money is more than 50%.
- **Statement 2**: If you bet one dollar 200 times, your chance of making more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.
- **Statement 3**: If you bet one dollar 200 times, your chance of losing more than 50 dollars is greater if you bet on a split each time than if you bet on red each time.
- **Statement 4**: The empirical distribution for `Net Gain on Red for 200 bets` has a greater variance than that for `Net Gain on Split for 200 bets`. 

In [None]:
statement_1 = ...
statement_2 = ...
statement_3 = ...
statement_4 = ...

In [None]:
grader.check("p1.10")

<!-- END QUESTION -->

If this exercise has put you off playing roulette, it has done its job. If you are still curious about other bets, [here](https://en.wikipedia.org/wiki/Roulette#Bet_odds_table) they all are, and [here](https://en.wikipedia.org/wiki/Roulette#House_edge) is the bad news. The house – that is, the casino – always has an edge over the gambler.

### Simulation of Chance

#### Part 1.11 (5 pts)


Estimate the proability that the color green wins at least once on the first 10 spins via a simulation.  Specifically, repeatedly simulate 10 spins and determine what percent of those trials yield spins containing at least one green.  You should use our `simulate` library function and 10,000 trials.

*Hints:*
- Should you be sampling with or without replacement? 

In [None]:
def green_in_ten_spins():
    """
    Make one outcome of spinning 10 times.
    Return True if green occurs at least once, 
    and False if green does not occur in 10 spins.
    """
    ...
    

green_wins = simulate(green_in_ten_spins, 10000)
prob_green_wins = ...
print("Estimated probability: ", prob_green_wins)

In [None]:
grader.check("p1.11")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 2. Earthquakes (25 pts)



<font color=#B1008E>
    
##### Learning objectives
- Evaluate the quality of different sampling strategies.
- Use simulation and samples to estimate a population statistic.
</font>

The next cell loads a table containing information about **every earthquake with a magnitude above 5** in 2021.

Find out more about this data [here](https://earthquake.usgs.gov/earthquakes/search/).

In [None]:
earthquakes = Table().read_table('earthquakes_2021.csv').select(make_array('time', 'mag', 'place'))
earthquakes

If we were studying all human-detectable 2021 earthquakes and had access to the above data, we’d be in good shape - however, if the USGS didn’t publish the full data, we could still learn something about earthquakes from just a smaller sample. If we gathered our sample correctly, we could use that sample to get an idea about the distribution of magnitudes (above 5, of course) throughout the year!=.

In the following lines of code, we take two different samples from the earthquake table, and calculate the mean of the magnitudes of these earthquakes.

In [None]:
sample1 = earthquakes.sort('mag', descending = True).take(np.arange(100))
sample1_magnitude_mean = np.mean(sample1.column('mag'))
sample2 = earthquakes.take(np.arange(100))
sample2_magnitude_mean = np.mean(sample2.column('mag'))
print('sample1 magnitude mean=', sample1_magnitude_mean)
print('sample2 magnitude mean=', sample2_magnitude_mean)

<!-- BEGIN QUESTION -->

#### Part 2.1 (5 pts)


Are these samples representative of the population of earthquakes in the original table (that is, the should we expect the mean to be close to the population mean)? 

*Hint:* Consider the ordering of the `earthquakes` table and investigate the code that we're using to create Sample 1 and Sample 2.  

<hr style="margin:0; border: 1px solid #FFBE0A;"/><font color='#FFBE0A'>Written Answer:</font>

_Type your answer here, replacing this text._


<hr style="margin:0; border: 1px solid #FFBE0A;"/>

<!-- END QUESTION -->

#### Part 2.2 (5 pts)


Complete the function `sample_earthquake_mags` below to produce a sample of earthquake magnitudes that is representative of the population (this should be an array of just the magnitudes).  

The sample size is dicated by the `sample_size` parameter.  Use your function to create a sample of size 200, and then take the mean of the magnitudes of the earthquakes in this sample. Assign these to `representative_sample` and `representative_mean` respectively. 

In [None]:
def sample_earthquake_mags(sample_size):
    ...

representative_sample = ...
representative_mean = ...
representative_mean

In [None]:
grader.check("p2.2")

#### Part 2.3 (5 pts)


 Suppose we want to figure out what the greatest magnitude earthquake was in 2021, but we only have our representative sample of 200. Let’s see if trying to find the greatest magnitude in the population from a random sample of 200 is a good way to estimate the greatest magnitude.

Create 5,000 random samples of 200 earthquake magnitudes from the `earthquakes` table and compute the maximum of each sample. This requires only one line if you utilize the `simulate_sample_statistic` function.

In [None]:
maximums = ...

In [None]:
grader.check("p2.3")

In [None]:
# Don't change this line 
# Plots the histogram of your maximums
Table().with_column('Largest magnitude in sample', maximums).hist('Largest magnitude in sample', bins=np.arange(5.5,9,0.25)) 

#### Part 2.4 (5 pts)


 Now find the magnitude of the actual strongest earthquake in 2021 (not the maximum of a sample). This will help us determine whether a random sample of size 200 is likely to help you determine the largest magnitude earthquake in the population.

In [None]:
strongest_earthquake_magnitude = ...
strongest_earthquake_magnitude

In [None]:
grader.check("p2.4")

<!-- BEGIN QUESTION -->

#### Part 2.5 (5 pts)


Explain whether you believe you can accurately use a sample size of 200 to determine the maximum. What is one problem with using the maximum as your estimator? Use the histogram  to help answer.  We've repeated it below, along with a red point for the strongest earthquake in the whole data set.

In [None]:
largest_in_sample = Table().with_column('Largest magnitude in sample', maximums)
plot = largest_in_sample.hist('Largest magnitude in sample', bins=np.arange(5.5,9,0.25)) 
plot.dot(strongest_earthquake_magnitude)

<hr style="margin:0; border: 1px solid #FFBE0A;"/><font color='#FFBE0A'>Written Answer:</font>

_Type your answer here, replacing this text._


<hr style="margin:0; border: 1px solid #FFBE0A;"/>

<!-- END QUESTION -->

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 3. Uploading and Cleaning Data (15 pts)



<font color=#B1008E>
    
##### Learning objectives
- Learn how to load a .csv file from your local machine into Jupyter 
- Learn how to import a new library and use its data cleaning operations
</font>

So far in this course, we have given you "clean" data---data that is already in a format that you can use immediately. We've also loaded into these notebooks for you.  However, when you apply what you've learned in this class to real datasets for projects of your own or in the real world, you will need to load data and be able to do basic checks and fixes to your datasets. While you can do this manually in Excel or another editor, it is far less error prone (and more repeatable!) to do this to your data in your Python code.

#### Part 3.1 (5 pts)


1. Download [finch_beaks_1975_dirty.csv](https://www.cs.williams.edu/~cs104/_static/finch_beaks_1975_dirty.csv) to your local machine by clicking on the link in this sentence.  
1. Open it with a text edit and inspect its contents.  You will find a CSV file containing a slightly modified version of our finch data from 1975.
1. Upload this file to Jupyter.
   - **Step 1:**  The top left corner of your Jupyter window should like something like the following:

       ![](file-upload.png)
       
       If you do not see `/.../labs/lab06/` next to the small folder icon, you will need to navigate to your `lab06` folder on the server.  To do that, click on that small folder icon, then select `cs104-public`, then `labs`, then `lab06`.
   - **Step 2:** Once you have verified you are in the right folder, click the upload button (hilighted in red in the image).  Select the `finch_beaks_1975_dirty.csv` from your hard drive and upload it.  It should appear in the list of files, and if you click on it, you should see its contents in tabular form inside the Jupyter window.


In [None]:
grader.check("p3.1")

#### Part 3.2 (5 pts)


Load the `finch_beaks_1975_dirty.csv` file into a `Table` using `read_table()` as usual.  You may notice some bad values in the table.  More on this below!

In [None]:
finches = ...
finches.show(10)

In [None]:
grader.check("p3.2")

#### Data cleaning

If you inspect the first 10 rows of our `finches` table above, you'll notice some of the values are invalid: they may be `nan` to represent missing data or entries with types other than the expect type, such as 12.5 in the `band` column, which should contain only `int`s.

To tidy up and clean this table, we will remove any rows with invalid data and ensure all remaining values are of the correct type.  

(There are other strategies for dealing with missing data (e.g. [imputation](https://en.wikipedia.org/wiki/Imputation_(statistics))) but we'll stick to this simple strategy of filtering out missing/bad data here.)

Let's look at the data in the `'Beak length, mm'` column to illustrate some of the steps we often must take to clean up a data set before we can use it.  That column should contain `float` numbers.

In [None]:
beak_lengths = finches.column('Beak length, mm')
beak_lengths

There are three problems with the data in this column:
1. **`'nan'` values.**  When `Table()` loads in a .csv file, it records missing data as `'nan'` which has type `str`.  We should remove rows with `'nan'`.  
2. **Non-numeric values.**  There are other values that do not represent numbers, such as the string `'missing'`.  Again, we'd like to remove rows containing anything that isn't a valid nubmer.
3. **Wrong Types.**  Since such bad values for beak lengths were encountered while loading the CSV, all the entries in the column have been left as *strings*, with quotes around them, eg: `'9.4'`.  We won't be able to perform arithmetic on any beak length without first converting them to `float`.

The first two problems could be addressed using `where`, as we have done in the past, but ensuring that we have the type of value we are expecting can be a bit tricky.  To that end, you will find two additional Table methods handy:

* `table.take_clean(column, type)`: Takes a column label, and the type of data expected in that column (`int`, `float`, `str`, or `bool`).  Returns a new table where any rows with missing or bad values in that column have been removed.
* `table.take_messy(column, type)`: Takes a column label, and the type of data expected in that column (`int`, `float`, `str`, or `bool`).  Returns a new table with rows from `table` with missing or bad values in that column.


We revisit our beak lengths, this time cleaning that column to remove bad data:

In [None]:
tidy_beaks = finches.take_clean('Beak length, mm', float)
tidy_beaks

You can see from the output above that our cleaning algorithm found 5 bad rows, corresponding to the 'nan' and 'missing' values.  And if you inspect that beak lengths column as an array, you will find that it contains `float` values as we want:

In [None]:
tidy_beaks.column('Beak length, mm')

We can also use our library to see what rows were removed because they had beak lengths that could not be converted to valid values of type `float`:

In [None]:
finches.take_messy('Beak length, mm', float)

Looking at the bad rows can give you confidence that you are indeed rejecting only bad entries, and that you are not throwing away too much of your data.

Another task you must also do while cleaning data is fixind any small inconsistencies in the names used for a categorical variable.  You may have noticed, for example, 
that bird with band 307 was listed as having the species "for" rather than "fortis".  

In [None]:
tidy_beaks.where("band", are.equal_to("307"))

If we know that those are indeed the same species, we can make them consistent with another `Table` method, [replace](https://cs104williams.github.io/assets/python-library-ref.html#replace).  Here's an example of how we use it.  Notice how the resulting table has the full "fortis" species name for bird 307 now.

In [None]:
tidy_beaks.replace('species', 'for', 'fortis')

#### Part 3.3 (5 pts)


Use our library functions to clean up our finch data by cleaning each column in turn.  You should use our library routines to ensure that the `tidy_finches` table contains rows for which: 
- The `band` column is always an `int`
- The `species` is always a string
- And the beak length and depth columns are always `float`

Also, replace any species value "for" with "fortis".  You will likely want to add several additional lines of code.

In [None]:
tidy_finches = finches.take_clean('Beak length, mm', float, ...)
...
...

# Do not modify. This is helpful to see how many rows were removed 
rows_removed = finches.num_rows - tidy_finches.num_rows
print('\nRemoved a total of', rows_removed, 'rows.\n')
tidy_finches.show(10)

In [None]:
grader.check("p3.3")

<hr style="margin-bottom: 0px; padding:0; border: 2px solid #500082;"/>


## 4. Find New Data to Explore (25 pts)



<font color=#B1008E>
    
##### Learning objectives
- Practice looking for a find new sources of data 
- Apply all steps of our "data science pipeline" -- finding, uploading, cleaning, manipulating, and visualizing -- to a dataset of your choice.
</font>

For this question, we'd like you to find a data of your own from any public source that you'd like to explore.  The web is full of many excellent troves of data -- look around, but if you have trouble finding something appropriate you can try one of the sites below.  For best results, we suggest you find a csv file suitable for uploading and using without a lot of work.  (Google sheets and Excel documents can be exported to csv if you have something in one of those formats.)

For that your data set, please write code to load it into a Table.  Then, write a few lines of code to examine some aspect of the data, generate a histogram of one variable or other plots to visualize relationship between two variables, and make one concrete statement about what the data shows.

This question is intentionally open ended, and you may take it in any direction you like.  We divide your work into the following four steps to make it more managable:

1. Upload your csv data file and read it into a table.
2. Clean each column of your table using the table operations like `take_clean` and `replace` that we practiced in the previous question.  Even if your table is in perfect shape, please still practice this step, as you never know whether the next version of the data will still be clean.
3. Explore the data by writing some code to transform your table (if necessary) and create a visualization (either a histogram for other plot).
4. Tell us what you can conclude from your work, and how the data and visualization supports your conclusions.

If you are having trouble getting started, here are a few csv's and places to look for data:
* [Google's Dataset Search](https://datasetsearch.research.google.com/)
* [Five-Thirty Eight](https://data.fivethirtyeight.com/)
* [Data Is Plural](https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0)
* [Our World in Data](https://ourworldindata.org/)


*Note:* We give you one code cell for your Python code below.  You may add as many additional cells as you like.  Just click the `+` sign in this tab's toolbar to insert a new cell.  Then select "Markdown" from the toolbar's popup menu if you want the new cell to be for text, or "Code" if you want it to be for Python code.  Markdown cells can include basic formatting.  Click on any of our formatted text cells to see how to create lists, bold text, etc., or have a look [here](https://www.markdownguide.org/basic-syntax).

<!-- BEGIN QUESTION -->

#### Part 4.1 (5 pts)


Tell us the source of your data, eg: a specific URL or any
other information that would help us find it on our own.

<hr style="margin:0; border: 1px solid #FFBE0A;"/><font color='#FFBE0A'>Written Answer:</font>

_Type your answer here, replacing this text._


<hr style="margin:0; border: 1px solid #FFBE0A;"/>

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

#### Part 4.2 (15 pts)


In the cell below, write the code to to tackle steps 1-3 described above. 

In [None]:
# 1. Read data into table and show the first few lines.

...

# 2. Clean the data and show the first few lines again

...

# 3. Explore the data and present at least one visualization.

...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

#### Part 4.3 (5 pts)


In this cell below, state a quantitative property you discovered for the data set you examined in the first part.  Write a couple sentences stating your conclusion and how your work in the first part supports that conclusion.

<hr style="margin:0; border: 1px solid #FFBE0A;"/><font color='#FFBE0A'>Written Answer:</font>

_Type your answer here, replacing this text._


<hr style="margin:0; border: 1px solid #FFBE0A;"/>

<!-- END QUESTION -->

<hr class="m-0" style="border: 3px solid #500082;"/>

# You're Done!
Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to 
the corresponding assignment. For Lab N, the assignment will be called "Lab N Autograder".

Once you have submitted, your Gradescope assignment should show you passing all the tests you passed in your assignment notebook.


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)