In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("HW03.ipynb")


# Homework 3 - Analzying CULPA

In this assignment, you will scrape and analyze course reviews from [CULPA](http://www.culpa.info/) and
explore if there is a difference in the words used for reviews for female and male faculty on this site.

This assignment is inspired by the following studies:

- Ben Schmitt's Teaching Reviews [tool](http://benschmidt.org/profGender/) described in [HigherEd](https://www.insidehighered.com/news/2015/02/09/new-analysis-rate-my-professors-finds-patterns-words-used-describe-men-and-women)
- Giacomo Gioli's Bachelor's Thesis at ZURICH UNIVERSITY OF APPLIED SCIENCES titled [Analysing Student Comments on RateMyProfessors.com Using NLP Techniques](https://digitalcollection.zhaw.ch/bitstream/11475/21754/3/gioligia_thesis.pdf)
- [Gender bias in student evaluations](https://www.cambridge.org/core/journals/ps-political-science-and-politics/article/gender-bias-in-student-evaluations/1224BE475C0AE75A2C2D8553210C4E27) in  PS: Political Science& Politics 2018
- [NCSU study](http://news.ncsu.edu/2014/12/macnell-gender-2014/) covered in [Slate](https://slate.com/human-interest/2014/12/gender-bias-in-student-evaluations-professors-of-online-courses-who-present-as-male-get-better-marks.html)


**Deadline:** Please submit this assignment to gradescope by Thursday June 3rd at 11:59pm EST.

**What to submit:** Submit this completed notebook, a pdf version of your notebook, the excel file you upload to Wordify and the two excel files Wordify will email you.


Let's start by importing the following libraries

In [2]:
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup
import requests

from datetime import datetime
import dateutil

from tqdm import tqdm
import re
import os

import nltk

## 1. Exploring RateMyProfessor (4 points)

[Ben Shcmitt](http://benschmidt.org/), a Professor at NYU, developed this [interactive chart](http://benschmidt.org/profGender/) where you can explore the words used to describe male and female teachers in about 14 million reviews from RateMyProfessor.com.


<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q1.1
manual: True
points: 2
-->

**Question 1.1 (4 points):** Spend a few minutes playing with the interactive chart. What words (besides for "his children" and "her children") do you find are used more for male faculty and which for female faculty?

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## 2. Web Scraping CULPA (30 points)

We will begin this assignment by scraping CULPA.


<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.2
manual: True
points: 2
-->


**Question 2.1 (2 points):** Use the `requests` library to get the html of the CULPA homepage and then use BeautifulSoup to store and parse the webpage. Store the created BeautifulSoup object in the variable named `soup` 

*Hint 1:* Demo 12 provides an example of how to use `requests` 

*Hint 2:* Remember from lecture that we can parse an html page by creating a new BeautifulSoup object and initializing it with the html as a string.

In [None]:
HOME_PATH = "http://www.culpa.info"
response = ...
html = ...
soup = ...
response, type(soup) 

<!-- END QUESTION -->



### 2.1 Departments (8 points)

We are going to build a crawler to traverse pages on CULPA by departments.

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.2
manual: True
points: 1
-->

On the left side of the homepage of CULPA, there is a list of departments. 

**Question 2.2 (1 point):** What is the class name of the div that contains the list? Assign the class name of the div to the variable called `dept_list_div_name`.

*Hint:* Either use the inspect element tool in your browser that we used in the Webscraping lecture or look at the html in the soup object from the previous question.

In [None]:
dept_list_div_name = ...
dept_list_div_name

<!-- END QUESTION -->



The next cell will find all tags in the html that have the class name you specified in the previous answer and store the tags in a list called `dept_list_tag`.

In [None]:
dept_list_tag = soup.find_all(attrs={"class": dept_list_div_name})
len(dept_list_tag)

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.3
manual: True
points: 2
-->

Looking at dept_list_tag, you should notice that there is one `ul` tag and many `li` tags.

**Question 2.3 (2 points):** In the next cell, briefly describe what these html tags represent


_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!--
BEGIN QUESTION
name: q2.4
manual: False
points: 
    - 1
    - 1
    - 1
-->


**Question 2.4 (3 points):** Use BeautifulSoup to parse `dept_list_tag` and extract each department's name and path to the department's webpage on CULPA.  Store the results in a list of tuples and save the list to a variable named `department_tups`.
The first element in each tuple should be the department name and the second element in each tuple should be the path for the department's page on CULPA specified by the `href`. 

For example, the first item in the list should be `('af-am studies', '/departments/3')`

*Note:* Right now, we only want departments, not courses (everything listed under *Core* (on the left side of the page) is classes, not departments).




In [None]:
department_tups = []

...
len(department_tups)

In [None]:
grader.check("q2.4")

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.5
manual: True
points: 2
-->


**Question 2.5 (2 points):** Download each department page from `department_tups` and store them in the folder `data/html/departments`. Make sure that each html page stored in that folder ends with ".html". The last line in the code will print out how many files are stored in `data/html/departments`.

*Note:* You likely need to make a new directory called `html/` under `data/`

*My code took about 34 seconds to parse and store the pages for 63 departments* 

In [None]:
for tup in tqdm(department_tups):
...

!ls data/html/departments | wc

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.6
manual: True
points: 1
-->

### 2.2 Courses (20 points)
Each department page has a list of courses and we will use that for the next part of scraping CULPA.


**Question 2.6 (1 point):** Looking at the html of a deparment's webpage, what is the name of the class of the html div that contains the list of courses? Assign the answer as a string to the variable `courses_list_class_name`.

*Hint:* You might want to use "Inspect Element" in the same way that we did in lecture.

In [None]:
courses_list_class_name = ...
courses_list_class_name

<!-- END QUESTION -->

<!--
BEGIN QUESTION
name: q2.7
manual: False
points: 
    - 1
    - 1
    - 1
    - 1
    - 1
-->


**Question 2.7 (5 points):** Loop through the stored html pages to extract the following information for each course offered by the department:

1. `Department Name` - The name based on the department `box` on the department page on CULPA. 
    - For example, the name for http://www.culpa.info/departments/3 is `African American Studies`
2. `Department Id` - the number associated with the department on CULPA. 
    - For example, the department id for `African American Studies` is 3.
3. `Course Name` - the text specified by the `course_name` span
4. `Course Listing` - the text specified by the `course_number` span
5. `Course Id` - the number associated with the course on CULPA (the digit at the end of the course link) 
6. `Course Path` - the path to the course. For example, the link for the `Caribbean Cultures and Societies` course would be `courses/7188`
    
Stores these columns in a dataframe called `course_df`. 

In [None]:
DEPART_HTML_PATH = "data/html/departments/"

...
print(course_df.shape)
course_df.head(5)

In [None]:
grader.check("q2.7")

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.8
manual: True
points: 6
-->


**Question 2.8 (6 points):** It is often a good idea to get a sense of the distribution of your data. In the next cell, using a horizontal bar-plot, plot the number of courses in each department. Only include the 20 departments with the most number of courses. 

*Hint 1:* Don't forget to use good conventions in your figure (like adding a title, naming the axis, etc).

*Hint 2:* You might want to use a dataframe function that *returns a Series containing counts of unique rows in the DataFrame*.

In [None]:
...

<!-- END QUESTION -->



#### 2.2.1 Duplicate Courses (7 points)

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.9
manual: True
points: 1
-->

It is very likely `course_df` contains duplicate courses.

**Question 2.9 (1 points):** If this is the case, explain why this might have happened and give an example? If not, explain what steps you took earlier to prevent this and justify why these specific steps were taken? 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!--
BEGIN QUESTION
name: q2.10
manual: False
points: 
    - 1
    - 1
-->


**Question 2.10 (2 points):** In the next cell, remove duplicate courses and store the resulting dataframe in the variable `course_df`

In [None]:
course_df = ...

In [None]:
grader.check("q2.10")

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.11
manual: True
points: 1
-->


**Question 2.11 (4 points):** In the next cell, plot the number of courses in each department after duplicate courses were removed. Below the figure, make a new markdown cell and describe any differences you might notice. If there are no differences, make sure to note that. 


In [None]:
...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

#### 2.2.2 Saving Courses (1 point)

<!--
BEGIN QUESTION
name: q2.12
manual: True
points: 1
-->


**Question 2.12 (1 point):** Now that we have extracted information about courses offered by each department, save the dataframe in a csv file called `courses.csv`. 
Make sure to save the table in the directory called `data/tables`. Lastly, do not store the index of the table when you make the new csv file.

In [None]:
...

<!-- END QUESTION -->



Saving the dataframe at this point allows us to take a break and not have to re-run the previous code when we come back to continue working. When working on a project, it is a good practice to save data from intermediate steps. 

In [None]:
course_df = pd.read_csv("data/tables/courses.csv")
course_df.head(10)

<!-- BEGIN QUESTION -->

### 2.3 Download Course Reviews (9 points)

<!--
BEGIN QUESTION
name: q2.13
manual: True
points: 2
-->


For this part we will use the list of courses stored in `course_df`.

**Question 2.13 (2 points):** Download each course page stored in `course_df['Course Link']` and store them in the folder `data/html/courses`. Make sure that each html page stored in that folder ends with ".html".

*My code took about 13 minutes to parse and store the pages for 2564 courses* 

*Note:* While the code is running, it is a good idea to look at `data/html/courses/` to make sure the courses were stored there. 

You might want to answer Question 2.13.1 while the next cell is running.

In [None]:
...

!ls data/html/courses | wc

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.13.1
manual: True
points: 2
-->

In this assignment we are downloading all the webpages and then parsing them, rather than extracting just the data we want when crawling the website.

**Question 2.13.1 (2 points):** In the next cell, briefly describe what you think are the pro's and con's for these two different approaches?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### 2.4 Extracting course reviews (13 points)

<!--
BEGIN QUESTION
name: q2.14
manual: False
points: 
    - 1
    - 1
    - 1
    - 1
    - 1
-->


Now that we have stored the html of every course from CULPA, we can now extract the course reviews.

**Question 2.14 (5 points):** Loop through the stored html pages in `data/html/courses` and parse the downloaded webpages using BeautifulSoup. You'll note that each course webage has a list of reviews. For each review on each course webpage, extract the following information:

1. `Prof` - the name of the professor teaching the course
1. `Date` - the date of when the review was written
1. `Course Id` -  the number associated with the course on CULPA 
1. `Course Name`
1. `Course Listing` - (the `course number` in the html)
1. `Review` - the text of the review


Store these columns in a dataframe called `review_df`. Make sure to only include reviews that have not been tagged as "old"

*Hint:* What is the class name of the div that contains a review? In each of those divs, what is the class name that contains the actual content of the review, what is the class name that contains the meta data of the review (like date, course number, professor's name, etc)?

In [None]:
COURSE_HTML_PATH = "data/html/courses/"

...
review_df.shape

In [None]:
grader.check("q2.14")

<!-- BEGIN QUESTION -->

#### 2.4.1 Duplicate Reviews (8 points)

<!--
BEGIN QUESTION
name: q2.15
manual: True
points: 2
-->


It is very likely `review_df` contains duplicate review.

**Question 2.15 (2 points):** If this is the case, explain why this might have happened and give an example? If not, explain what steps were taken earlier to prevent this and justify why specific steps were taken? 


_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.16
manual: True
points: 3
-->
Its a good idea to explore different aspects of your collected dataset.

**Question 2.16 (3 points):** Plot the number of reviews each faculty member received. Only show the 15 faculty that received the most reviews.

In [None]:
review_df['Prof'].value_counts().head(15).plot(kind='barh') ## SOLUTION

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.17
manual: True
points: 1
-->

Let's now drop duplicate reviews.

**Question 2.17 (1 point):** In the next cell, drop duplicate reviews

In [None]:
print(review_df.shape)
review_df = ...
review_df.shape

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q2.18
manual: True
points: 2
-->


**Question 2.18 (2 points):** In the next cell, plot the number of reviews each faculty received after reviews courses were removed. Below the figure, make a new markdown cell and describe any differences you might notice. If there are no differences, make sure to note that. 


In [None]:
review_df['Prof'].value_counts().head(15).plot(kind='barh') ## SOLUTION

<!-- END QUESTION -->



#### 2.4.2 Saving Reviews

The next cell will now save the dataframe of reviews.

In [None]:
review_df.to_csv("data/tables/reviews.csv", index=None)

## 3. Processing reviews (23 points)

### 3.1 Cleaning Text (5 points)

Now that we have the reviews, we will start cleaning the text for each review.
However, let's first update the list of stopwords that we will be using.

In [4]:
from nltk.corpus import stopwords

keep_words = set(['he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself'])
remove_stop_words = list(set(stopwords.words('english')).difference(keep_words))
" ".join(remove_stop_words)

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3.1
manual: True
points: 1
-->

**Question 3.1 (1 point):** Briefly explain the previous code cell

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3.2
manual: True
points: 2
-->

**Question 3.2 (2 points):** Write a function called `clean_text` that processes each review. Make sure the function:

- lower cases the review
- tokenizes the review
- removes stop words and punctuation 

*Note:* Use the updated list of stopwords from the previous question

In [5]:
def clean_text(input_string):
    ...
    
    
clean_text(review_df['Review'].iloc[0]), review_df['Review'].iloc[0]

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3.3
manual: True
points: 1
-->

**Question 3.3 (2 points):** In the next cell, apply the function to the dataframe and store the cleaned reviews in a new column called `Cleaned Review`

In [49]:
...

<!-- END QUESTION -->



### 3.2 Inferring Professor Gender (18 points)

We are going to explore if there is a difference in the words used for reviews for female and male faculty.
CULPA does not contain metadata for the gender of a professor so we will need to infer this from the text. 

For this section, we will infer the professor's gender by counting the number of gendered pro-nouns used in the review. This is something that is commonly done. 
For example, [Serina Chang](https://serinachang5.github.io/) (Columbia ugrad at the time) and [Kathleen McKeown](http://www.cs.columbia.edu/~kathy/) (Columbia CS Prof)
 write in their 2019 paper titled [*Automatically Inferring Gender Associations from Language*](https://www.aclweb.org/anthology/D19-1579.pdf)  
>We labeled each review with the gender of the professor
whom it was about, which we determined by comparing the count of male versus female pronouns
over all reviews for that professor. This method
was again effective, because the reviews are expressly written about a certain professor, so the
pronouns typically resolve to that professor.

<!--
BEGIN QUESTION
name: q3.4
manual: False
points: 2
-->

**Question 3.4 (2 points):** Add at least two more gendered pronouns to the sets `male_pronouns` and `female_pronouns`.

*Hint:* If any of the pronouns are in the set of stopwords, you might need to remove the pronoun from the set of stopwords above.

In [68]:
male_pronouns = set(["he"])
female_pronouns = set(["she"])

...
...

male_pronouns, female_pronouns

In [69]:
## TEST 
assert len(male_pronouns) >= 3

In [70]:
## TEST 
assert len(female_pronouns) >= 3

<!--
BEGIN QUESTION
name: q3.5
manual: False
points: 2
-->

**Question 3.5 (2 points):** Implement the function `infer_gender` that determines the gender of a professor based on the content of the review. The function should return `"M"`, `"F"`, or `"-"` if the professor is male, female, or if the gender is unknown. 

In [71]:
def infer_gender(input_string):
    
    ...
    
infer_gender(review_df['Cleaned Review'].iloc[0]), review_df['Cleaned Review'].iloc[0]    

<!--
BEGIN QUESTION
name: q3.6
manual: False
points: 2
-->

**Question 3.6 (2 points):** Apply `infer_gender` to the cleaned reviews and save the infered gender in a new column called `Prof_Gender`

In [72]:
...
review_df['Prof_Gender'].value_counts()

#### 3.2.1 Validating Inferred Genders (11 points)

It is important to validate the method we just applied to determine the gender of a professor based on the text of a review

<!--
BEGIN QUESTION
name: q3.7
manual: False
points: 4
-->


**Question 3.7 (4 points):** In the next cell, write code to deterime if there any professors that have more than one of the 3 possible labels? Store the list of professors that were assigned more than one of the 3 possible labels in the variable named `multiple_gender_labels`

*Note:* **Do not** write a for loop to traverse the dataframe. This will result in received 0 points for this question.

*Hint:* Use the dataframe operation ''involves some combination of splitting the object, applying a function, and combining the results of the dataframe method that will combine''.

In [73]:
unique_gender_series = ...
multiple_gender_labels = ...
len(multiple_gender_labels)

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3.8
manual: True
points: 1
-->

It make senses for a professor to have a "-" label and a "M" or "F" label. However, if our `infer_gender()` function works as intended, a prof most likely would not be labeled as both `F` and `M`.

**Question 3.8 (1 points):** Write a function called `check_group_gender` that given a group (dataframe), determines whether there exists at least one row with a "M" label for `Prof_Gender` and one with a "F" label. The function should return `False` if the group has both `F` and `M` labels in the `Prof_Gender` column, and `True` if it does not.

*Hint 1:* You can assume the dataframe has the same columns as `review_df`.

*Hint 2:* Your function should only use one of the columns.

In [74]:
def check_group_gender(group):
    ...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q3.9
manual: True
points: 4
-->


**Question 3.9 (4 points):** Group `review_df` by `Prof` and apply the function `check_group_gender` to each group to determine if there are professors that have been tagged with both "M" and "F".

In [75]:
multiple_gender_series = ...
dup_gender_profs = ...
dup_gender_profs_df = ...
dup_gender_profs_df.sort_values("Prof").head(5)

<!-- END QUESTION -->

<!--
BEGIN QUESTION
name: q3.10
manual: False
points: 2
-->

You'll notice that there are indeed some professors that have both gendered labels. 

**Question 3.10 (2 points):** Looking at some reviews for any one of these Professors, provide a rationale why they had both gendered labels. 

_Type your answer here, replacing this text._

The next cell will tell us the total number of reviews and the number of reviews with a Professor who is tagged with both "M" and "F" based on their reviews.

In [76]:
review_df.shape[0], dup_gender_profs_df.shape[0], dup_gender_profs_df.shape[0]/review_df.shape[0]

#### 3.2.2 Cleaning Infered Gender (1 point)

We can come up with methods to automatically fix these incorrect labels. For example, if the overwhelming majority of a professor's review was tag with "F" and just one review was tagged as "M", we could simple convert that one review to "M". 

However, remember the 80-20 effort-reward rule. Since these are a small percentage of our reviews (in my solutions it was just 2.2% of all reviews), let's go ahead and just remove reviews for these faculty members. 


<!--
BEGIN QUESTION
name: q3.11
manual: False
points: 1
-->

**Question 3.11 (1 point):** Remove the rows where the Professor is listed in `dup_gender_profs`. Store the resulting dataframe in `gender_df`

In [77]:
gender_df = ...
gender_df.head(5)

In [78]:
## TEST
assert len(gender_df.groupby('Prof').apply(check_group_gender).value_counts().keys()) == 1

We have just removed a small percentage of reviews from professors that had conflicting gender labels. Running the next cell will give statistics of how many professors have the following set of gender labels:

In [91]:
def genders(row):
    return " ".join(set(row['Prof_Gender']))

gender_vc = gender_df.groupby('Prof').apply(genders).value_counts()
gender_vc

You likely will have some professors in your dataset that were tagged with a gender label and a "-" label. Running the next cell will determine the percetange of professors who have a gender label and a "-" label.

In [81]:
(gender_vc['- M'] + gender_vc['- F']) / gender_vc.sum()

Since this number is not that small (about 16% in my solution), we should go ahead convert the `-` labels to `M` or `F` accordingly.

In [82]:
def which_gender(group):
    if 'M' in set(group['Prof_Gender']):
        return 'M'
    elif 'F' in set(group['Prof_Gender']):
        return 'F'
    else:
        return '-'

We will now create a dictionary that maps the name of the professor to a single inferred gender label

In [83]:
name2gender = gender_df.groupby('Prof').apply(which_gender).to_dict()

The next cell will now use that dictionary to update the inferred gender column in our dataframe

In [88]:
gender_df = gender_df.assign(Prof_Gender=gender_df['Prof'].map(lambda x: name2gender[x] if name2gender[x] != '-' else np.nan))
gender_df['Prof_Gender'].value_counts()


Finally, we will now remove any reviews where we could not infer the gender of the professor

In [89]:
gender_df = gender_df.dropna(subset=['Prof_Gender']) 
gender_df.shape

## 4. Discovering Biased Terms (12 points)

In this course we discussed point-wise mutual information, a metric used to discover connotations between random variables. This [blog post on Towards Data Science](https://towardsdatascience.com/harvesting-the-power-of-mutual-information-to-find-bias-in-your-nlp-dataset-c172c0dddebe) provides a nice detailed discussion on how we can use pmi (and its variants) to identify biased or discriminative terms.

In the original assignment, we would have now implemented PMI and applied it to our data. Instead, we will use [Wordify](https://wordify.unibocconi.it/index) *to identify words that discriminate categories in* the dataset you just collected.



<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q4.1
manual: True
points: 2
-->

**Question 4.1 (2 points)** Use pandas to create an Excel file with two columns named `text` and `label`. 
For each row, the `text` should be the cleaned review and the `label` should the corresponding `Prof_Gender`.

*Hint:* You might need to install openpyxl using pip


In [100]:
...

<!-- END QUESTION -->



Now, upload the excel sheet to Wordify twice. The first time, use the default threshold of 0.3. For the second time choose another threshold.

After some time, you should receive an email from Wordify with results. The email will have an attached file that *contains two sheets: one with the positive indicators for each label and one with the negative indicators (note that if you have only two labels, the positive indicators of one label are the negative ones of the other, and vice versa).*


<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q4.2
manual: True
points: 10
-->

**Question 4.2 (10 points):**  Once you receive results from Wordify, look through the results.
In the next cell, write a paragraph answering the following questions:

- Are there many biased terms in the dataset?
- Are the biased terms you found in question 1.1 also biased terms on CULPA?
- Were there many differences in discriminative terms between the two thresholds? If so, what were they?
- Are the course reviews on CULPA biased like the course reviews on Rate My Professor or similar sites? If not, why do you think that might be the case.

_Type your answer here, replacing this text._

<!-- END QUESTION -->



# 5. Feedback (1 point)

<!--
BEGIN QUESTION
name: q5.1
manual: false
points: 1   
-->

**Question:** Roughly how many hours did you spend on this assignment. Assign the total number of hours to the variable `time_spent`

In [101]:
time_spent = ...
time_spent

In [None]:
grader.check("q5.1")

<!-- BEGIN QUESTION -->

<!--
BEGIN QUESTION
name: q5.2
manual: True
points: 1   
-->

**Optional:** Provide any comments or feedback below

_Type your answer here, replacing this text._

<!-- END QUESTION -->



**Submission:** Submit this completed notebook, a pdf version of your notebook, the excel file you upload to Wordify and the two excel files Wordify will email you.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()