In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw03.ipynb")

<img src="https://github.com/data-6-berkeley/materials-fa24/blob/main/hw/hw03/data6.png?raw=true" width="150px" align="right">

# Homework 3 – Advanced Table Methods

## Data 6 Visualizations Module

In this homework assignment, you will use exercise your newfound table manipulation skills.

This homework is due on **(TWO WEEKS AFTER RELEASE)**. You must submit the assignment to Gradescope. Submission instructions can be found at the bottom of this notebook. See the [syllabus](http://data6.org/su24/syllabus/#late-policy-and-extensions) for our late submission policy.

**Note:** Unlike the previous two homework assignments, most questions in this assignment will depend on all previous work. As such, it's in your best interest to work through the questions sequentially.

In [None]:
# Run this cell.
from datascience import *
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline

import warnings
warnings.simplefilter('ignore')

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Part 1: UC Berkeley Admissions

<br></br>
<hr style="border: 1px solid #fdb515;" />

## Understanding the data

In this part of the homework, we will ask and answer questions about UC Berkeley's undergraduate admissions numbers for the class that entered in Fall 2020. The data we'll work with in this question comes from [this public webpage](https://www.universityofcalifornia.edu/infocenter/admissions-source-school).

Run the cell below to load in our data as a table.

In [None]:
schools = Table.read_table('enrollment.csv')
schools

Each row corresponds to a high school. For each high school, we have the following information:
- `'Name'`: The name of the high school. Note, this is not unique – for instance, the top three rows of our table correspond to three different high schools all with the name `'ABRAHAM LINCOLN HIGH SCHOOL'`; one is in Los Angeles, one is in San Francisco, and one is in San Jose.
- `'City'`: The city in which the high school is. Note, only schools within the US have a valid `'City'` listed; international schools have a city of `'nan'`. (`'nan'` means "missing value".) 
- `'Region'`: The county in which the high school is if the high school is in California, or the state in which the high school is if the high school is elsewhere in the US (see `'ADLAI E STEVENSON HIGH SCHOOL'` above). Again, if the high school is not within the US, `'Region'` is `'nan'`.
- `'Applied'`: The number of students who applied to UC Berkeley from that high school for admission in Fall 2020.
- `'Admitted'`: The number of students who were admitted to UC Berkeley from that high school for admission in Fall 2020.
- `'Enrolled'`: The number of students who actually chose to attend UC Berkeley from that high school starting in Fall 2020.

**Note:** It's a good idea to have the [Python Reference Sheet](https://data6.org/su24/reference/) open while working on the assignment in the event you have any questions.

You can also easily see the documentation for a function by either:
- typing the name of the function on a new line, followed by a `?`, and running the cell
- typing the name of the function anywhere in a code cell and hitting `Shift + Tab` on your keyboard

Try it out below! Just add a `?` to the end of the line.

In [None]:
Table.where

<br></br>
<hr style="border: 1px solid #fdb515;" />

# Question 1 – Key Numbers

<br></br>

---
## Question 1a – How many students were admitted?

Suppose we're interested in determining the number of students who *applied* to UC Berkeley. We can calculate that number by finding the sum of the `'Applied'` column in our dataset like so:

In [None]:
# This is just an example
sum(schools.column('Applied'))

**Task**: Below, assign the variable `num_admitted` to an integer corresponding to the number of students who were *admitted* to UC Berkeley in our dataset.

_Hint_: Do something similar to the example above.


In [None]:
num_admitted = ...
num_admitted

In [None]:
grader.check("q1a")

---
## Question 1b – What was the overall acceptance rate?

Below, assign the variable `overall_acceptance_rate` to a float corresponding to the proportion of students who applied to UC Berkeley that were admitted.

_Hint_: Use `num_admitted` along with the example that came right before it.


In [None]:
overall_acceptance_rate = ...
overall_acceptance_rate

In [None]:
grader.check("q1b")

<!-- BEGIN QUESTION -->

---
## Question 1c – Wait... what?

In **Question 1a**, you computed the number of students that UC Berkeley admitted for enrollment in Fall 2020. Scroll back up to Question 1a to look at that number, and then come back to this question.

Strangely, this [news.berkeley.edu](https://news.berkeley.edu/2020/07/16/uc-berkeleys-push-for-more-diversity-shows-in-its-newly-admitted-class/) article from July 2020 states

> Overall, UC Berkeley admitted 14,668 students as freshmen in 2019 and 15,435 for fall 2020. The admit rate remains the same as last year, at 15%.

The number that you computed in **Question 1a** is much smaller than the 15,435 figure that this article provides. But both are official University of California sources. What's going on here?

In the cell below, write a short answer to the question "**Why is the number of admitted students in our dataset less than the true number of admitted students?**" To find your answer, go to the [UC site where we got this data from](https://www.universityofcalifornia.edu/infocenter/admissions-source-school) and look for the fine print under the table. You'll find that only schools with a certain number of applicants and admitted students are represented; **your answer must mention those specific thresholds as well as why you think they may have excluded schools who don't meet the thresholds from the dataset.**


_Type your answer here, replacing this text._

<!-- END QUESTION -->

<br></br>
<hr style="border: 1px solid #fdb515;" />

# Question 2 – Which Schools?

Now it's time to answer questions of the form "Which schools \_\_\_\_\_"? In order to proceed, you'll need to make sure you're familiar with selecting/dropping, table sorting, and element-wise array operations.

<br></br>

---
## Question 2a – Removing columns

In this section, we're not going to worry about the city where each school is – we'll look at cities in the next section. It'll be helpful to keep around the `'Region'` column just so that we can see at a glance if a school is in-state, domestic, or international. We also need it to tell apart the three `'ABRAHAM LINCOLN HIGH SCHOOL'`s!

**Task**: Assign `schools_stats` to a new table  that contains all of the columns in `schools` except for `'City'`.


In [None]:
schools_stats = ...
schools_stats

In [None]:
grader.check("q2a")

---
## Question 2b – Which school sent the most students?

The value in the `'Enrolled'` column for each high school is the number of students they sent to UC Berkeley.

Below, assign `feeders` to a table with the same columns as `schools_stats`, but with **only the 14 high schools who sent the most students to UC Berkeley**, sorted in descending order. The first five rows of your table should look like this:

| Name                         | Region        |   Applied |   Admitted |   Enrolled |
|-----------------------------:|--------------:|----------:|-----------:|-----------:|
| LOWELL HIGH SCHOOL           | San Francisco |       435 |        106 |         64 |
| IRVINGTON HIGH SCHOOL        | Alameda       |       248 |         63 |         47 |
| DOUGHERTY VALLEY HIGH SCHOOL | Contra Costa  |       430 |         78 |         39 |
| CANYON CREST ACADEMY         | San Diego     |       269 |         66 |         38 |
| PORTOLA HIGH SCHOOL          | Orange        |       175 |         57 |         30 |

_Hint_: Use the `sort` and `take` table methods


In [None]:
feeders = ...
feeders.show()

In [None]:
grader.check("q2b")

---
## Question 2c – What was the acceptance rate of each school?

Right now we have the number of students who applied, were admitted, and actually enrolled from each school. We don't have the acceptance rate of students at each school, but we can easily figure that out using some array operations!

Below, assign `schools_stats_acc` to a table with the same four columns as `schools_stats` plus an additional fifth column. This fifth column should have the label `'Acceptance Rate'`, and its values should be the acceptance rates of each school, each as a decimal between 0 (no students were admitted) and 1 (all students were admitted).

There are several steps involved:
- First, create an array containing the acceptance rates for each school. This should be done in one line; remember that each column in a table is an array, and that if you divide two arrays, the division is performed element-wise (as we saw in Week 1).
- Then, use the `with_columns` method to add an `'Acceptance Rate'` column to `schools_stats`, using the array you just created. Store your result in the table `schools_stats_acc`. The `schools_stats` table should not change!
- **Note**: unlike in the previous question, you aren't supposed to sort or take the top 10.

The first few rows of your table should look like this:

| Name                        | Region        |   Applied |   Admitted |   Enrolled |   Acceptance Rate |
|----------------------------:|--------------:|----------:|-----------:|-----------:|------------------:|
| ABRAHAM LINCOLN HIGH SCHOOL | Los Angeles   |        17 |          6 |          3 |          0.352941 |
| ABRAHAM LINCOLN HIGH SCHOOL | San Francisco |       106 |         21 |         14 |          0.198113 |
| ABRAHAM LINCOLN HIGH SCHOOL | Santa Clara   |        48 |         10 |          4 |          0.208333 |
| ACADEMY OF THE CANYONS      | Los Angeles   |        45 |         15 |          6 |          0.333333 |
| ACADEMY-SAN FRAN @ MCATEER  | San Francisco |        19 |          8 |          5 |          0.421053 |



In [None]:
acceptance_rates = ...
schools_stats_acc = ...
schools_stats_acc

In [None]:
grader.check("q2c")

---
## Question 2d – Which schools had the lowest and highest acceptance rate?

Now that we have a table, `schools_stats_acc`, containing the acceptance rate of each school, it's natural to ask which schools had the highest and lowest acceptance rates.

Your job below is to define two **arrays**:
- `top_5_acc`, which contains the **names** of the five schools with the highest acceptance rates, such that the first element of `top_5_acc` has the absolute highest acceptance rate, the second element has the second highest acceptance rate, and so on.
- `bottom_5_acc`, which contains the **names** of the five schools with the lowest acceptance rates, such that the first element of `bottom_5_acc` has the absolute lowest acceptance rate, the second element has the second lowest acceptance rate, and so on.

At some point, you'll need to sort `schools_stats_acc` by acceptance rate. However, how you choose to do that is up to you – you could elect to sort it in both descending and ascending order, or you could just sort it once and be creative with how you use `.take` (which you will need to use regardless).


In [None]:
...

top_5_acc = ...
bottom_5_acc = ...

# Don't change anything below this comment, it's just for visualization
print('Top 5 acceptance rates:')
for school in top_5_acc:
    print(school)

print('----------\nBottom 5 acceptance rates:')
for school in bottom_5_acc:
    print(school)

In [None]:
grader.check("q2d")

<br></br>
<hr style="border: 1px solid #fdb515;" />

# Question 3 – Location

In the last question, we did not use the `'City'` column from `schools`. In this question, we'll bring that information back in. Here, we're going to heavily rely on the `.where` method and the various `are` predicates, so you may want to open the [`.are` documentation](http://data8.org/datascience/predicates.html?highlight=#datascience.predicates.are).

In this question you will use `schools_acc`, with the five original columns in `schools` plus `'Acceptance Rate'` from `schools_stats_acc`.

In [None]:
# Just run this cell
schools_acc = schools.with_columns('Acceptance Rate', schools_stats_acc.column('Acceptance Rate'))
schools_acc

---
## Question 3a – How many schools were in Los Angeles county?

Los Angeles is both the name of a city and a county, and counties correspond to regions in our dataset (at least for California high schools).

Below, assign `num_schools_lac` to the **number** of schools in our dataset that are from Los Angeles county.

_Hint: This involves using `.where` and `.num_rows`._


In [None]:
num_schools_lac = ...
num_schools_lac

In [None]:
grader.check("q3a")

---
## Question 3b – How many students actually enrolled from schools in Los Angeles county?

Below, assign `num_students_lac` to the number of students who enrolled at UC Berkeley from high schools in Los Angeles county. This involves using `.where`.

*Note*: While our solution is only one line, yours doesn't have to be.


In [None]:
num_students_lac = ...
num_students_lac

In [None]:
grader.check("q3b")

---
## Question 3c – Which schools in Los Angeles county sent the most students?

Below, assign `top_lac_schools` to a **table** with the same columns as `schools_acc`, but with **only the 10 high schools in Los Angeles county who sent the most students to UC Berkeley**, sorted in descending order. The first five rows of your table should look like this:

| Name                          | City             | Region      |   Applied |   Admitted |   Enrolled |   Acceptance Rate |
|------------------------------:|-----------------:|------------:|----------:|-----------:|-----------:|------------------:|
| PALISADES CHARTER HIGH SCHOOL | Pacific Palisade | Los Angeles |       221 |         46 |         26 |          0.208145 |
| ARCADIA HIGH SCHOOL           | Arcadia          | Los Angeles |       249 |         55 |         21 |          0.220884 |
| DIAMOND BAR HIGH SCHOOL       | Diamond Bar      | Los Angeles |       264 |         39 |         19 |          0.147727 |
| GRETCHEN WHITNEY HIGH SCHOOL  | Cerritos         | Los Angeles |        86 |         21 |         17 |          0.244186 |
| SANTA MONICA HIGH SCHOOL      | Santa Monica     | Los Angeles |       195 |         38 |         15 |          0.194872 |

*Note*: A high school *sends* a student to Berkeley when that student **enrolls** in the university.

In [None]:
top_lac_schools = ...
top_lac_schools

In [None]:
grader.check("q3c")

---
## Question 3d – Which schools in Alameda county sent more than 20 students?

Below, assign `big_alameda` to a table containing all of the columns of `schools_acc`, but only the rows corresponding to schools in Alameda county that sent more than 20 students to Berkeley. Don't sort.

_Hint_: You can use `.where` multiple times if there are multiple conditions you want to be true; that's what you'll need to do here.


In [None]:
big_alameda = ...
big_alameda

In [None]:
grader.check("q3d")

---
## Question 3e – How many students applied from schools in the Bay Area? 

<img src='https://upload.wikimedia.org/wikipedia/commons/b/bc/Bayarea_map.png' width=400>

The Bay Area consists of the nine counties `'San Francisco'`, `'San Mateo'`, `'Santa Clara'`, `'Alameda'`, `'Contra Costa'`, `'Solano'`, `'Napa'`, `'Sonoma'`, and `'Marin'`.

Below, you have two tasks.
1. Assign `bay_schools` to a table with the same columns as `schools_acc`, but only with rows corresponding to schools in the Bay Area. You should do this by first creating an array of the names of the nine Bay Area counties, and then use `.where` with `are.contained_in` to filter just the relevant rows from `schools_acc`. Don't sort.
2. Assign `bay_acc_rate` to the overall acceptance rate of students from the Bay Area. **This requires a new calculation, you can't just look at the `'Acceptance Rate'` column in your table.** 

_Hint_: How did we calculate the overall acceptance rate in Question 1?


In [None]:
bay_counties = ...
bay_schools = ...
bay_acc_rate = ...

# Don't change anything below this comment, it's just for visualization
display(bay_schools)
print(f"Bay Area Acceptance Rate: {bay_acc_rate}")

In [None]:
grader.check("q3e")

---
## Question 3f – Which large cities are most successful in admissions? 

Below, you have two tasks.
1. Assign `schools_by_city` to a table with a row for each city in the table. It should have at three columns the City, Applied Average - the average number of applicants, and Acceptance Rate Average - the unweighted average acceptance rate in that city. You should do this by first selecting the relevant columns, and then using `.group`. 
2. Now that you have the `schools_by_city` table assign `high_acceptance_city` to the top five cities by acceptance rate who had at least 10 applicants on average.

_Hint_: ????

In [None]:
schools_by_city = ...
high_acceptance_city = ...
high_acceptance_city

In [None]:
grader.check("q3f")

## Part 4: Real-World Risks within Data

Now, we are going to look at a very interesting dataset, the 1973's UC Berkeley Admission for data analysis! Through out this part, we will be working with `UCBerkeley1973_Admission.csv` file.

In [None]:
cal_data  = Table.read_table("UCBerkeley1973_Admission.csv")
cal_data

<!-- BEGIN QUESTION -->

### Question 4.1 (Discussion)
Looking at `cal_data` first glance, what do you observe? Try exploring the dataset by what you have learned so far in class! Reflect in 2-3 sentences on what you have noticed about te data. Feel free to add/ delete more cells beyond our given cell.

_Type your answer here, replacing this text._

<!-- END QUESTION -->



In [None]:
# You can use this cell to explore the `cal_data` table

For this case study, we'll be doing some exploration of the rates of admission between males and females at UC Berkeley in 1973. To start off, below we calculate the raw acceptance rates for males and females:

In [None]:
total_f = sum(cal_data.column("Gender") == "F")
total_m = sum(cal_data.column("Gender") == "M")
accepted_f = cal_data.where("Admission", "Accepted").where("Gender", "F").num_rows
accepted_m =  cal_data.where("Admission", "Accepted").where("Gender", "M").num_rows
acceptance_rate_f = accepted_f / total_f * 100
acceptance_rate_m = accepted_m / total_m * 100
print("1973's Berkeley admission rate seems to be: female:", acceptance_rate_f, "and male:", acceptance_rate_m)

Keep these rates in mind as we begin exploring some visualization and table methods to look into an effect called **"Simpson's Paradox"**. 

---
### Simpson's Paradox
According to [Wikipedia](https://en.wikipedia.org/wiki/Simpson%27s_paradox), Simpson's Paradox "is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined."

At first glance of the UC Berkeley Admission dataset, it seems male have higher chance being accepted. However, when looked closer by major (department), a different story emergered. We found that females were equal or more likely to be accepted when looked at each department. In fact, a [study](https://homepage.stat.uiowa.edu/~mbognar/1030/Bickel-Berkeley.pdf) found that female applicants disproportionately applied to highly competitive majors with low acceptance rates (e.g., English), while men more often applied to less competitive majors with higher acceptance rates. *More female applicants applying to competitive programs skewed the combined data, creating the appearance of bias against female applicants, when in many cases, women were either equally likely or more likely to be admitted compared to men in the same department.*

Now let's see this phenomenon in practice! 

We have manipulated the `cal_data` and got the numbers for rejected and accepted count for both female and male in our 1973 admission data.

### Question 4.2
Using the `.pivot()` method, assign the table `admission_f_m` to one that contains the count of individuals rejected and accepted for both females and males.

*Hint:* Using the `cal_data` table, we want the `"Admission"` column to be our columns, and the `"Gender"` column to be our rows.

In [None]:
admission_f_m = ...
admission_f_m

In [None]:
grader.check("q4_2")

### Question 4.3
We want to manually add a new column called `"Acceptance Rate"` with values of acceptance rates for females and males to the `admission_f_m` table. Below, fill in the code to calculate the female and male acceptance rates, and from there, we supply the code that creates a new column with these rates. Acceptance rate should be calculated as accepted count divided by total count!

*Hint:* You may find it helpful to use the `.column()` and `.item()` methods.

In [None]:
f_acceptance_rate = ...
m_acceptance_rate = ...
acceptance_rates = make_array(f_acceptance_rate, m_acceptance_rate)
admission_f_m = admission_f_m.with_column("Acceptance Rate",  acceptance_rates)
admission_f_m

In [None]:
grader.check("q4_3")

<!-- BEGIN QUESTION -->

### Question 4.4
Using the adjusted `admission_f_m`, create a bar graph comparing the acceptance rate between female and male applicants. If you are stuck, consider taking a look at Lab 3 again!

In [None]:
...

<!-- END QUESTION -->

### Question 4.5
Which of the following most accurately describes the takeaway from the visualization created by `admission_f_m` above? You should answer the question by assigning `q4_5` to `make_array(...)` where `...` is the choice of your answer (eg. `make_array(3)`).

1. Female applicants have higher chance of being accepted.
2. Male applicants have higher chance of being accepted.
3. Female and male applicants have the same amount chance of being accepted.

In [None]:
q4_5 = ...

In [None]:
grader.check("q4_5")

We see by simply utilizing `"Gender"` and `"Admission"` to calculate overall acceptance rates for graduate admissions, the male acceptance rate seems to be higher than the female acceptance rate. However, besides `"Gender"` and `"Admission"`, there was a third column in this table: `"Major"`. Let's investigate how the major will take into play with admission rate between these two recorded gender. (How "grouping" by major as described in Simpson's Paradox will affect the applicants' acceptance rate by gender).

Below, we create two different tables using more complex functionalities of `.group()` and `.pivot()`. Take a look at the information we have gained using the `"Major"` feature below:

In [None]:
admission_major = cal_data.pivot('Gender', 'Major', collect = lambda x: sum(x == 'Accepted') / len(x), values = 'Admission')
admission_major = admission_major.relabeled(['F', 'M'], ['F Acceptance Rate', 'M Acceptance Rate'])
admission_major

In [None]:
num_applicants = cal_data.group(['Major', 'Gender']).pivot('Gender', 'Major', collect = np.sum, values = 'count')
num_applicants = num_applicants.relabeled(['F', 'M'], ['F Application Count', 'M Application Count'])
num_applicants

#### BREAK FOR FIGURE 1 TEST
This is test code to see if we could create Figure 1 from [this paper](https://homepage.stat.uiowa.edu/~mbognar/1030/Bickel-Berkeley.pdf) using the data we had access to.

In [None]:
percent_admit_test = cal_data.pivot('Admission', 'Major')
percent_admit_array = percent_admit_test.column('Accepted') / (percent_admit_test.column('Accepted') + percent_admit_test.column('Rejected'))
percent_admit_test = percent_admit_test.with_column('Percent Admitted', percent_admit_array)
percent_admit_test

In [None]:
percent_female_array = num_applicants.column('F Application Count') / (num_applicants.column('F Application Count') + num_applicants.column('M Application Count'))
num_applicants_test = num_applicants.with_column('Percent Female Applicants', percent_female_array)
num_applicants_test

In [None]:
fig1_test = percent_admit_test.join('Major', num_applicants_test)
fig1_test = fig1_test.with_column('Number Applicants', fig1_test.column('Accepted') + fig1_test.column('Rejected'))
fig1_test

In [None]:
fig1_test.scatter('Percent Female Applicants', 'Percent Admitted', sizes = 'Number Applicants')

#### BREAK DONE

<!-- BEGIN QUESTION -->

### Question 4.6
Using `admission_major` and `num_applicants`, create two separate overlaid bar charts. The first one will plot the number of applicants of males and females across the six majors, and the second will plot their acceptance rates in the majors.

In [None]:
...
...

<!-- END QUESTION -->

### Question 4.7
Which of the following multiple choice correcrly describes *the visualization created above* with `admission_major` and `num_applicants`? You should answer the question by assigning `q4_7` to `make_array(...)` where `...` is the choice of your answer (eg. `make_array(3, 4)`). Your choice can be can multiple.

1. Among the 6 different majors (not including `"Other"`, female applicants have a slightly higher chance of being accepted than male applicants in several majors
2. Among the 6 different majors (not including `"Other"`), male applicants have a slightly higher chance of being accepted than female applicants in several majors.
3. Overall, female applicants has higher chance of getting in to UC Berkeley.
4. Overall, male applicants has higher chance of getting in to UC Berkeley. 
5. There seems to be **higher female applicants** to the major when there are **lower female acceptance rate** compared to male acceptance rate.  
6. With the visualizatoin above, we can claim there is structrual inequity against female women applicants. 

In [None]:
q4_7 = ...

In [None]:
grader.check("q4_7")

---
### Thinking about Implications 🤔

<!-- BEGIN QUESTION -->

### Question 4.8 (Discussion)
Consider the takeaways that you had after looking at the different visualizations you created in **Questions 4.4 and 4.6**. Based on what we've discussed regarding the various sociological paradigms, how do you think these conclusions best align with the paradigms?

*Hint:* The conclusions might not line up perfectly with any one of the paradigms! Just feel free to reflect on which ones they might line up best or worst with. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Question 4.9 (Discussion) 
A media company wants to make a report on UC Berkeley admissions. What would happen if they ONLY report with the visualization you created in **`Question 4.4`**? How would this report potentially impact the society we see today?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Question 4.10 (Final Reflection)

_Type your answer here, replacing this text._

<!-- END QUESTION -->

# Done!

Congrats! You've finished another Data 6 homework assignment! To submit your work, follow the steps outlined on the [submissions](https://data6.org/su24/submissions/) page.

This homework is out of **40 points**. The point breakdown for this assignment is given in the table below:

| **Category** | Points |
| --- | --- |
| Autograder | 33 |
| Written | 7 |
| **Total** | 40 |

---

## Pets of Data 6
Sunkist is living it up! Good job on HW3

<img src="https://github.com/data-6-berkeley/materials-su24/blob/main/hw/hw03/sunkist.jpg?raw=true" width="40%" alt="Orange cat laying down"/>

## Submission

Below, you will see two cells. Running the first cell will automatically generate a PDF of all questions that need to be manually graded, and running the second cell will automatically generate a zip with your autograded answers. You are responsible for submitting both the coding portion (the zip) and the written portion (the PDF) to their respective Gradescope portals. **Please save before exporting!**

> **Important: You must correctly assign the pages of your PDF after you submit to the correct gradescope assignment. If your pages are not correctly assigned and/or not in the correct PDF format by the deadline, we reserve the right to award no points for your written work.**

If there are issues with automatically generating the PDF in the first cell, you can try downloading the notebook as a PDF by colicking on `File -> Save and Export Notebook As... -> PDF`. If that doesn't work either, you can manually take screenshots of your answers to the manually graded questions and submit those. Either way, **you are responsible for ensuring your submision follows our requirements, we will NOT be granting regrade requests for submissions that don't follow instructions.**

In [None]:
from otter.export import export_notebook
from os import path
from IPython.display import display, HTML
name = 'hw03'
export_notebook(f"{name}.ipynb", filtering=True, pagebreaks=True)
if(path.exists(f'{name}.pdf')):
    display(HTML(f"Download your PDF <a href='{name}.pdf' download>here</a>."))
else:
    print("\n Pdf generation failed, please try the other methods described above")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)