# Homework 2: DataFrames, Data Visualization, and Functions

## Due Saturday, October 18th at 11:59PM

Welcome to Homework 2! This week, we will cover DataFrame manipulations, making visualizations, and defining functions. You can find additional help on these topics in  [BPD 6, 9-12](https://notes.dsc10.com/01-getting_started/functions-defining.html) in the `babypandas` notes and [CIT 7-7.3](https://inferentialthinking.com/chapters/07/Visualization.html) in the textbook.

### Instructions

Remember to start early and submit often. You are given six slip days throughout the quarter to extend deadlines. See the syllabus for more details. With the exception of using slip days, late work will not be accepted unless you have made special arrangements with your instructor.

**Important**: For homeworks, the `otter` tests don't usually tell you that your answer is correct. More often, they help catch careless mistakes. It's up to you to ensure that your answer is correct. If you're not sure, ask someone (not for the answer, but for some guidance about your approach). These are great questions for office hours (see the schedule on the [Calendar](https://dsc10.com/calendar)) or Campuswire. Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. 

**Please do not use for-loops for any questions in this homework.** If you don't know what a for-loop is, don't worry – we haven't covered them yet. But if you do know what they are and are wondering why it's not OK to use them, it is because loops in Python are slow, and looping over arrays and DataFrames should usually be avoided.

<font color=red>**🚨 If you create a data visualization that is too cluttered to read or takes more than a few seconds to generate, this is a sign you are doing something wrong. Do not submit code like this, or the Gradescope autograder may fail to grade your entire assignment. It's better to leave a question blank than to submit code that will cause the autograder to fail on your full assignment.**</font>

In [1]:
# Please don't change this cell, but do make sure to run it
import babypandas as bpd
import numpy as np

import matplotlib.pyplot as plt
plt.style.use('seaborn-colorblind')
plt.rcParams['figure.figsize'] = (10, 5)

import otter
grader = otter.Notebook()

## 1. Gotta Catch 'Em All! ✅

<center><img src="./images/pokemon.png" width=400/></center>

Pokémon is an immensely popular video game and animation franchise that originated in Japan in 1996. Pokémon, short for Pocket Monsters, come in a variety of types.  In this problem, we will investigate how attack and defense statistics vary among these types.

The file named `pokedex.csv` in the `data/` directory ([source](https://www.kaggle.com/datasets/cristobalmitchell/pokedex)) has a row for each Pokémon, and the following columns.

|Column|Description|
|------|-----------|
|`'pokedex_number'`|The Pokémon's identification number in an encyclopedia of all Pokémon.|
|`'name'`|The name of the Pokémon.|
|`'type'`|The categorical type of the Pokémon, for example, "normal", "fire", "water". Each Pokémon is limited to one primary type for simplicity.|
|`'hp'`| Hit points. Indicates how much damage a Pokémon can tolerate.|
|`'attack'`|The Pokémon's power for physical moves.|
|`'defense'`|The Pokémon's ability to prevent damage from attacks.|
|`'sp_attack'`|Special attack. The Pokémon's power for special offensive moves.|
|`'sp_defense'`|Special defense. The Pokémon's ability to prevent damage from special attacks.|
|`'is_legendary'`|Indicates whether the Pokémon is legendary. Legendary Pokémon are rare and powerful. 1 means legendary, 0 means not.|
|`'generation'`|A group of Pokémon that are compatible for Pokémon games. Ranges from 1 to 8.|

First, we read the data in as a DataFrame.

In [2]:
pokedex = bpd.read_csv('data/pokedex.csv')
pokedex

Let's explore particular columns to get to know the data a little better. The `.describe()` method gives us some useful information about a column. Try it out on the `'name'` column.

In [3]:
pokedex.get('name').describe()

We learn that this column has 898 values, all of which are unique, and as a result the most frequent name appears only once.

If we use this same method on the `'type'` column, we'll see that although there are 898 entries, only 18 of them are unique. There are many Pokémon with the same `'type'`. The most common `'type'` is `'water'`; there are 123 such Pokémon. 

In [4]:
pokedex.get('type').describe()

**Question 1.1.** Which would be a better choice of index for this dataset, `'name'` or `'type'`? Set the index of `pokedex` to whichever of these two attributes makes more sense.

In [5]:
pokedex = ...
pokedex

In [None]:
grader.check("q1_1")

**Question 1.2.** Assign `weakest_attack` and `weakest_defense` to the names of the weakest Pokémon in terms of attack and defense respectively.

Similarly, assign `strongest_attack` and `strongest_defense` to the names of the strongest Pokémon in terms of attack and defense respectively.

In the case of a tie, choose any one of the equally weakest or equally strongest Pokémon.

In [8]:
weakest_attack = ...
print('Weakest attack:', weakest_attack)

strongest_attack = ...
print('Strongest attack:', strongest_attack)

weakest_defense = ...
print('Weakest defense:', weakest_defense)

strongest_defense = ...
print('Strongest defense:', strongest_defense)

In [None]:
grader.check("q1_2")

**Question 1.3.** Typically at the beginning of a game, the Pokémon trainer (the player) has to make a choice between `'water'`, `'grass'`, and `'fire'` Pokémon. Make a DataFrame named `water_grass_fire` containing only Pokémon of these `'type'`s. All columns of `pokedex` should be included.

In [15]:
water_grass_fire = ...
water_grass_fire

In [None]:
grader.check("q1_3")

**Question 1.4.** Some Pokémon are considered *legendary*, which means they are especially rare or powerful. The column `'is_legendary'` records this information in our data set. 

Create a DataFrame named `legendary_pokemon`, indexed by `'type'` and having one column, called `'num_legendary'`, that contains the number of legendary Pokémon of each `'type'`. Only include `'type'`s with at least one legendary Pokémon of that `'type'`.

***Hint:*** You will need to [drop and rename columns](https://dsc10.com/resources/lectures/lec06/lec06.html#Adjusting-columns-with-.assign,-.drop,-and-.get). Instead of using `.drop`, you may want to use `.get` [with a list](https://dsc10.com/resources/lectures/lec06/lec06.html#Two-ways-to-.get) containing the name of a single column.

In [20]:
legendary_pokemon = ...
legendary_pokemon

In [None]:
grader.check("q1_4")

**Question 1.5.** Notice that the `legendary_pokemon` DataFrame has fewer than 18 rows, though the original data had 18 unique `'type'`s; this means that there are certain `'type'`s that don't have any legendary Pokémon. Determine which `'type'`s don't have any legendary Pokémon, and assign `non_legendary` to an array of these `'type'`s. 

***Hint:*** You will want to group the Pokémon by `'type'`. What aggregation method could you use to identify when there are no legendary Pokémon of a given type?

In [26]:
non_legendary = ...
non_legendary

In [None]:
grader.check("q1_5")

**Question 1.6.** Suppose that as a Pokémon trainer, you want to assemble a strong team of Pokémon of various `'type'`s. Create a DataFrame called `mean_stats`, indexed by `'type'`, that contains the average statistics for Pokémon of each type. `mean_stats` should have five columns: `'hp'`, `'attack'`, `'defense'`, `'sp_attack'`, and `'sp_defense'`.

In [30]:
mean_stats = ...
mean_stats

In [None]:
grader.check("q1_6")

**Question 1.7.** A strong Pokémon is one that has high values for `'hp'`, `'attack'`, `'defense'`, `'sp_attack'`, and `'sp_defense'`. Suppose that you develop a formula to summarize all of these stats into a single number called strength. The strength of a Pokémon is a weighted average of these five stats, where each stat is weighted as follows:

- `hp`: 30%
- `attack`: 20%
- `defense`: 20%
- `sp_attack`: 15%
- `sp_defense`: 15%

Define a function called `calculate_strength` that takes as input the `'pokedex_number'` of a Pokémon and returns its strength, as defined above.

In [35]:
def calculate_strength(number):
    ...

In [None]:
grader.check("q1_7")

**Question 1.8.** Create a DataFrame called `with_strength` that contains all the columns of `pokedex` plus one more called `'strength'`, containing the strength of each Pokémon as defined in the previous question. Order the rows in descending order of `'strength'`.

In [41]:
with_strength = ...
with_strength

In [None]:
grader.check("q1_8")

**Question 1.9.** Considering that Pokémon `generation`s were developed in order, we might wonder how Pokémon have evolved over time. Draw a line plot that shows the trend, across generations, of the median strength of Pokémon from each generation. This kind of plot might help you answer the question "*Are later-generation Pokémon stronger than earlier-generation Pokémon?*"

***Hint:*** You'll have to do some DataFrame manipulation before you can create the line plot.

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q1_9
manual: true
-->

In [45]:
# Create your line plot here.
...

<!-- END QUESTION -->



## 2. As Seen on TV 📺

In this section, we'll work with a dataset from [Kaggle](https://www.kaggle.com/datasets/devanshiipatel/imdb-tv-shows/)  containing information about different TV shows, originally obtained from the Internet Movie Database (IMDb). In the cell below, we load the dataset in as a DataFrame named `tv_shows`. Take some time to look at the data in `tv_shows` to see what information is recorded.

In [46]:
# Run this cell to load the dataset.
tv_shows = bpd.read_csv('data/tv_shows.csv')
tv_shows

**Question 2.1.** If you look at the `'Years'` column in the DataFrame, you'll notice that most of the year ranges are separated by en dashes (`–`). Note that if a show was still on air at the time of data collection, its value in the `'Years'` column will end with an en dash. For example, in the first row of the DataFrame, the value in the `'Years'` column is `'2019–'`, meaning that the show has been airing since 2019. The presence of en dashes indicates that the `'Years'` column contains strings, not ints, since Python never displays ints separated by en dashes.

Complete the implementation of the function `extract_start_year_as_int`, which takes as input a string `v` representing a year range, like the values listed in the DataFrame above, and outputs the start year of `v` as an int. For example, passing in the string `'2016–2018'` to the function should return the int `2016`.

Then, use your function to add a column called `'StartYear'` to the `tv_shows` DataFrame that contains the start year of each show as an integer. Make sure to "save" your changes in the `tv_shows` DataFrame!


**_Hints:_** 
- En dashes (`–`) are **not** the same as hyphens (`-`)! The easiest way to get an en dash is to **copy it from here**: `–`.
- The string method [`.split()`](https://docs.python.org/3/library/stdtypes.html#str.split) will be helpful.


In [47]:
def extract_start_year_as_int(v):
    ...

In [48]:
tv_shows = ...
tv_shows

In [None]:
grader.check("q2_1")

**Question 2.2.** You've been starting to feel that TV shows are just not as good as they used to be.  Create an appropriate plot that shows the relationship between the `'StartYear'` and the `'Rating'` of each show in `tv_shows`.

In [51]:
# Create your plot here.
...

Unfortunately, this plot is not very informative. Instead, make a plot that shows how the **median** `'Rating'` of all shows from a given `'StartYear'` has changed over time. 

In [52]:
# Create your plot here.
...

Now, use the second plot you made to determine whether TV shows are as good as they used to be. Choose the most accurate statement below and assign an integer from 1 to 3 to the variable `q2_2`.

1. TV shows have been getting worse since 2000.
2. TV shows were getting worse until 2005, and since then, they have improved.
3. TV shows have been getting worse since 2005.

In [53]:
q2_2 = ...
q2_2

In [None]:
grader.check("q2_2")

**Question 2.3.** Assign `most_common_genres` to a DataFrame that contains the five most common genre combinations of TV shows in our dataset, in descending order of popularity. The DataFrame should be indexed by `'Genres'` and have only one column, `'Count'`, which is the number of TV shows of that genre combination.

**_Note:_**  For this question, each TV show is assicated with one set of genres, which determines its genre combination. For example, `'Action, Adventure, Drama'` is one genre combination. `'Drama'`, by itself, is another genre combination.

In [56]:
most_common_genres = ...
most_common_genres

In [None]:
grader.check("q2_3")

**Question 2.4.** Using the `most_common_genres` DataFrame you created in Question 2.3, create a horizontal bar chart that shows the distribution of TV shows into these five genre combinations. Make sure the bars are sorted such that the most common genre appears as the top-most bar in the bar chart.

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2_4
manual: true
-->

In [61]:
# Create your bar chart here.
...

<!-- END QUESTION -->



**Question 2.5.** Assign the variable `third_highest` to the genre combination with the third highest average overall `'Rating'` (*not* the third highest frequency), among all genre combinations in `tv_shows`.

Do not manually type out your answer. Use `babypandas` methods to produce the answer.

In [62]:
third_highest = ...
third_highest

In [None]:
grader.check("q2_5")

**Question 2.6.** Create a histogram showing the distribution of `'Rating'` among shows in the `tv_shows` DataFrame.

Remember to set `density=True` since we always use density histograms and `ec='w'` to make the separation of the bars more clear. You don't have to set the `bins` argument.

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2_6
manual: true
-->

In [65]:
# Create your histogram here.
...

<!-- END QUESTION -->

**Question 2.7.** Using the `'StartYear'` column, identify the year between 2001 and 2009 (inclusive) that has the lowest total number of `'Votes'` for TV shows released that year. Assign this year to the variable `lowest_01_09`.

Please make sure to use `babypandas` methods to find your answer; you should not type in the year manually.

<!--
BEGIN QUESTION
name: q2_7
-->

In [66]:
lowest_01_09 = ...
lowest_01_09

In [None]:
grader.check("q2_7")

## 3. The Best Invention of the 20th Century 🍜

Instant ramen was first invented by Momofuku Ando in 1958 to cure hunger during wartime. It started off with only one kind for its original purpose, but the instant ramen industry has expanded over the years, and now there are over 100 different kinds of instant ramen. At the turn of the millenium, 2000 Japanese citizens even [ranked](https://abcnews.go.com/International/story?id=81946&page=1) instant ramen as the best invention of the 20th century! Click [here](https://www.cupnoodles-museum.jp/en/osaka_ikeda/) to learn more about the history of this quintessential college meal.

<img src="./images/noodles-lowres-8607.png" width=350/>

We have a [dataset of instant ramen ratings from Kaggle](https://www.kaggle.com/datasets/residentmario/ramen-ratings?resource=download). First, we'll read in the data from a CSV. There is no good index, so we will leave it unset.

In [69]:
ramen_data = bpd.read_csv('data/ramen-rating.csv')
ramen_data

Notice that the `'Country'` column contains a country code. We want to convert these country codes into actual country names that everyone can understand.

We'll use a Python [dictionary](https://www.tutorialspoint.com/python/python_dictionary.htm) to help us with this conversion. A dictionary is a simple way to map a unique key to a value. For example, the dictionary below maps course codes to course names.

In [70]:
dsc_courses = {
    # key: value
    'DSC 10': 'Principles of Data Science',
    'DSC 20': 'Programming and Basic Data Structures for Data Science',
    'DSC 30': 'Data Structures and Algorithms for Data Science',
    'DSC 40A': 'Theoretical Foundations of Data Science I',
    'DSC 40B': 'Theoretical Foundations of Data Science II',
    'DSC 80': 'The Practice and Application of Data Science'
}

We can access the value corresponding to each key using bracket notation.

In [71]:
dsc30_name = dsc_courses['DSC 30']
dsc30_name

Here, `'DSC 30'` is the key and `'Data Structures and Algorithms for Data Science'` is the value.

Let's use a dictionary to help us with our country code to country name conversion. Below is a dictionary containing country codes as keys and country names as values for each of the countries in our ramen dataset.

In [72]:
# Run this cell, DO NOT change it.
country_codes = {
    'AU':'Australia',
    'BD':'Bangladesh', 
    'BR':'Brazil', 
    'KH':'Cambodia' , 
    'CA':'Canada', 
    'CN':'China',
    'CO':'Colombia', 
    'DXB':'Dubai' , 
    'EE':'Estonia' , 
    'FIJI':'Fiji', 
    'FI':'Finland' , 
    'DE':'Germany',
    'GHAN':'Ghana' , 
    'NL':'Holland', 
    'HK':'Hong Kong', 
    'HU':'Hungary', 
    'IN':'India', 
    'ID':'Indonesia',
    'JP':'Japan', 
    'MY':'Malaysia', 
    'MX':'Mexico', 
    'MM':'Myanmar', 
    'NP':'Nepal', 
    'AN':'Netherlands',
    'NG':'Nigeria', 
    'PK':'Pakistan', 
    'PH':'Philippines', 
    'PL':'Poland', 
    'SWK':'Sarawak',
    'SG':'Singapore', 
    'KOR':'South Korea', 
    'SE':'Sweden', 
    'TW':'Taiwan', 
    'TH':'Thailand', 
    'UK' :'United Kingdom' ,
    'USA':'United States', 
    'VN':'Vietnam' 
    }

**Question 3.1.** Using the dictionary `country_codes`, define a function named `code_to_country` that takes as input a country code and returns the corresponding country's name. This should only take one line of code.

_*Hints*_: 
- If you're stuck, take a look at the DSC 30 example above.
- Once you've implemented `code_to_country`, you should verify that it works as intended by trying a few examples yourself. The provided tests will **not** do this for you.

In [73]:
def code_to_country(code):
    ...

In [None]:
grader.check("q3_1")

**Question 3.2.** Use your `code_to_country` function and the `.apply` method to convert all of the country codes in the `'Country'` column of `ramen_data` into country names. Do this without creating an additional column or reordering the existing columns. Assign the resulting DataFrame to the variable name `ramen`.

***Hint:*** `.assign` can be used to replace values in a column, if the column name used corresponds to a column that already exists in the DataFrame.

In [78]:
ramen = ...
ramen

In [None]:
grader.check("q3_2")

🚨 **Important**: For the rest of the questions in this section, use the DataFrame `ramen` instead of `ramen_data`.

**Question 3.3.** 
Define a function named `word_count` that returns the number of words in a ramen's `'Variety'`. It should take as input a string from the `'Variety'` column and  return the number of words in that string. We'll consider a piece of text to be a word if and only if it is separated from adjacent words by a space. 
For example:
- `word_count('Cup Noodles Chicken Vegetable')` should return 4.
- `word_count('Tonkotsu-Shoyu Rich Pork Flavor Ramen')` should return 5. Notice that `'Tonkotsu-Shoyu'` counts as one word.

_*Hint*_: The string method [`.split`](https://docs.python.org/3/library/stdtypes.html#str.split) will be helpful.

In [84]:
def word_count(variety):
    ...
    
# Test cases for your own reference. Feel free to test out more!
print(word_count('Cup Noodles Chicken Vegetable'))  # Should print 4
print(word_count('Tonkotsu-Shoyu Rich Pork Flavor Ramen')) # Should print 5

In [None]:
grader.check("q3_3")

**Question 3.4.** Create a DataFrame called `with_word_count` with columns from left to right `'Brand'`, `'Country'`, `'Stars'` , `'Style'`, and `'Variety'`and a new column `'Word_Count'` that has the word count for each variety. Sort the DataFrame in descending order of `'Word_Count'`.

_*Note*_: The `'Country'` column should have full country names, not codes.

In [89]:
with_word_count = ...
with_word_count

In [None]:
grader.check("q3_4")

**Question 3.5.** Among the ramen that from `'Japan'`, how many words does the longest ramen `'Variety'` have? Assign this number to `most_ramen_words`. How many words does the shortest Japanese ramen `'Variety'` have? Assign this number to `fewest_ramen_words`. What is the absolute difference between these values? Assign this number to `range_ramen_words`.

In [95]:
most_ramen_words = ...
fewest_ramen_words = ...
range_ramen_words = ...

print('Most ramen words', most_ramen_words)
print('Fewest ramen words:', fewest_ramen_words)
print('Range of ramen words:', range_ramen_words)

In [None]:
grader.check("q3_5")

**Question 3.6.** Create a function named `mean_stars` that takes as an input the name of a ramen brand and returns the average `'Stars'` for all ramen belonging to that brand.

In [100]:
def mean_stars(brand):
    ...

In [None]:
grader.check("q3_6")

**Question 3.7.** Create a horizontal bar chart that displays the mean word count for all ramen brands that have **more than 10 varieties**. Sort the bars so the brands whose varieties have the most words on average appear at the very top, and those with the fewest words on average appear at the bottom.

_*Hints*_: 
- If you use `.groupby` more than once on the same DataFrame, the order of rows will be the same, even with different aggregation methods, as long as the column you group by is the same.
- To get the bar chart to display nicely, try adjusting the optional `figsize` argument.

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3_7
manual: true
-->

In [105]:
# Create your plot here.
...

<!-- END QUESTION -->



**Question 3.8.** Define a function named `point_total` that takes in a full country name and returns a point total for that country's ramen, according to the following scheme:
- 1 point for every variety of ramen that has at least 1 star and less than 3 stars,
- 2 points for every variety with at least 3 stars and less than 4 stars, and
- 3 points for every variety with at least 4 stars (and at most 5 stars, which is the maximum possible).

|Points Received | Stars (Condition)| 
| --- | --- | 
|1| $[1,3)$|
|2| $[3, 4)$ | 
|3| $[4,5]$ |

_*Hint*_: Make sure that your function works for countries that don't have varieties of ramen at every possible number of stars. If you aren't able to accomplish this using grouping, try another strategy! Remember, don't use a for-loop. There is a better solution using DataFrame manipulations.

In [106]:
def point_total(country):
    ...

In [None]:
grader.check("q3_8")

**Question 3.9.** Which of the five countries listed below has the **highest** point total, using the points system from Question 3.8? Assign the variable `country` to either 1, 2, 3, 4, or 5 corresponding to your choice.

1. `'Canada'`
1. `'China'`
1. `'Singapore'`
1. `'United States'`
1. `'Vietnam'`



In [112]:
country = ...

In [None]:
grader.check("q3_9")

## 4. Final Stretch 🧘‍♀️

Suppose we have a DataFrame named `data` with two numerical columns, `'x'` and `'y'`. Consider the following scatter plot, which was generated by calling `data.plot(kind='scatter', x='x', y='y')`:

<img src='images/q3_scatter.png' width=400/>

Now consider these two histograms:

<center>
    <table><tr>
        <td><center><b>Histogram A</b><br> <img src='images/q3_hist_one.png' width=400></center> </td>
        <td><center><b>Histogram B</b><br> <img src='images/q3_hist_two.png' width=400></center> </td>
    </tr></table>
</center>

**Question 4.1.** Which of the following lines of code generated Histogram A? Assign either `1`, `2`, `3`, or `4` to `which_code`.

 1. `data.plot(kind='hist', density=False, y='x')`
 2. `data.plot(kind='hist', density=False, y='y')` 
 3. `data.plot(kind='hist', density=True, y='x')`
 4. `data.plot(kind='hist', density=True, y='y')`

In [115]:
which_code = ...

In [None]:
grader.check("q4_1")

**Question 4.2.** Suppose we run this block of code:

```py
new_data = bpd.DataFrame().assign(
    x = data.get('x') / 5,
    y = data.get('y')
)
```
    
We then run 

`new_data.plot(kind='hist', density=True, y='x')`.

How will this histogram look compared to the histogram created by 

`data.plot(kind='hist', density=True, y='x')`, 

assuming both histograms are drawn on the same axes? Assign `histogram_difference` to either 1, 2, 3, or 4, corresponding to your choice.

1. The `new_data` histogram will be wider and taller than the `data` histogram.
2. The `new_data` histogram will be wider and shorter than the `data` histogram.
3. The `new_data` histogram will be narrower and taller than the `data` histogram.
4. The `new_data` histogram will be narrower and shorter than the `data` histogram.

_*Hint*_: Look at the end of [Lecture 7](https://dsc10.com/resources/lectures/lec07/lec07.html#Plotting-overlaid-histograms) for an example of two histograms drawn on the same axes.

In [118]:
histogram_difference = ...

In [None]:
grader.check("q4_2")

**Question 4.3.** Below, we show Histogram B again.

<img src="./images/q3_hist_two.png" width=400/>

What **percent** of values in Histogram B are between -4 (inclusive) and -2 (exclusive)? While we cannot answer this question exactly since we do not know where the bins start and end, we can still approximate the answer. Assign the variable `percent_between` to a number 1 through 5, corresponding to the closest answer.

1. 10% 
2. 13% 
3. 27%
4. 35%
5. 48%

In [121]:
percent_between = ...

In [None]:
grader.check("q4_3")

## Finish Line: Almost there, but make sure to follow the steps below to submit! 🏁

**_Citations:_** Did you use any generative artificial intelligence tools to assist you on this assignment? If so, please state, for each tool you used, the name of the tool (ex. ChatGPT) and the problem(s) in this assignment where you used the tool for help.

<hr style='color:Maroon;background-color:Maroon;border:0 none; height: 3px;'>

Please cite tools here.

<hr style='color:Maroon;background-color:Maroon;border:0 none; height: 3px;'>

To submit your assignment:

1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells. 
1. Read through the notebook to make sure all cells ran and all tests passed.
1. Run the cell below to run all tests, and make sure that they all pass.
1. Download your notebook using `File -> Download as -> Notebook (.ipynb)`, then upload your notebook to Gradescope.
1. Stick around while the Gradescope autograder grades your work. Make sure you see that all tests have passed on Gradescope.
1. Check that you have a confirmation email from Gradescope and save it as proof of your submission. 

With homeworks, unlike with labs, the grade you see on Gradescope is **not your final score**. We will run correctness tests after the assignment's due date has passed.

In [124]:
grader.check_all()