<a href="https://colab.research.google.com/github/afeld/python-public-policy/blob/main/hw_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **HOMEWORK 4**

# Coding

This assignment is going to be looking at how rates of various 311 complaints have changed in the time of the COVID-19 pandemic.

## Step 0: Setup

For this homework, instead of the data being provided, you will export it directly from the NYC Open Data Portal, as if you were working on your own project.

1. Download the data.
  1. Visit the [311 data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data) page.
  1. From that page, filter the data to `Created Date`s between `01/01/2020 12:00:00 AM` and `03/31/2020 11:59:59 AM`.
  1. It should say "Showing 311 Service Requests 1-100 out of 469,594" near the bottom of the screen.
    - It's ok if the total is slightly different.
  1. Click `Export`.
  1. Click `CSV`. It will start downloading a file.
  1. Rename the file `311_covid.csv`.
1. Upload the data.
  1. Open your NYU [Google Drive](https://drive.google.com/).
  1. [Upload the CSV](https://support.google.com/drive/answer/2424368) to My Drive or a folder under it.
1. Read the data.
  1. Here in Colab, click the folder icon on the left.
  1. Click `Mount Drive`.
  1. Click `Connect to Google Drive`.
  1. Grant permissions.
  1. Under `Files`, you should now see a `drive` folder.
  1. Expand the folder to find your file.
  1. Right-click the file.
  1. Click `Copy path`.

## Step 1: Load data

Read the data into a DataFrame.

In [None]:
import pandas as pd

df = pd.read_csv('YOUR PATH HERE', low_memory=False)
df

## Step 2: Convert dates

Copy code from [Lecture 4](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_4.ipynb#scrollTo=CHZ-Pqj0bS9w) to convert the `Created Date` and `Closed Date` to `datetime`s. Print out the types of the columns using [`.info()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html).

In [None]:
# your code here

## Step 3: Time/date components

Add `date` and `month_name` columns from the `Created Date`. (See the [`.dt` example from Lecture 4](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_4.ipynb#scrollTo=bhh2c3ZCnDYB).)

In [None]:
# your code here

## Step 4: Date counts

Create a DataFrame called `date_counts` that has the count of complaints per Complaint Type per day, then display it. Should end up looking like:

. | date | Complaint Type | count
--- | --- | --- | ---
0 | 2020-01-01 | APPLIANCE | 6
1 | 2020-01-01 | Abandoned Vehicle | 63
… | … | … | …

In [None]:
# your code here

## Step 5: Plotting over time

Create a line chart of the count of complaints over time, one line per `Complaint Type`. Should look something like this (but with a legend):

![Line chart showing complaints per day by type](https://github.com/afeld/python-public-policy/raw/main/img/complaints_per_day.png)

In [None]:
# your code here

To zoom in, do one or more of the following:

- Click some of the Complaint Types to filter them out
- Draw a rectangle to zoom in

When done, click the home icon above the chart to `Reset axes`.

## Step 6: Month counts

Create a DataFrame called `month_counts` that has the count of complaints per `Complaint Type` per month, then display it. Should end up looking like:

. | month_name | Complaint Type | count
--- | --- | --- | ---
0 | February | APPLIANCE | 557
1 | February | Abandoned Vehicle | 3371
… | … | … | …

In [None]:
# your code here

## Step 7: Pivot

Use the provided code to [pivot the DataFrame](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html). This will make the `month_name`s the columns, with one `Complaint Type` per row, and the `count`s as the values. (No changes needed.)

In [None]:
complaints_by_month = month_counts.pivot(index='Complaint Type', columns='month_name', values='count')

# put the columns in order
complaints_by_month = complaints_by_month[['January', 'February', 'March']]

complaints_by_month

## Step 8: Percent change

Use `complaints_by_month` to calculate the percent change from February to March for each `Complaint Type`. Save as the `pct_change` column. Reminder that you can do arithmetic between columns just like you would between variables or even numbers. Should result in something like this:

Complaint Type | January | February | March | pct_change
--- | --- | --- | --- | ---
APPLIANCE | 696.0 | 557.0 | 806.0 | 0.447038
Abandoned Vehicle | 3760.0 | 3371.0 | 2468.0 | -0.267873
… | … | … | … | …

In [None]:
# your code here

## Step 9: Filter

Use the provided code to filter to `Complaint Type`s that were common in February and have changed by more than 90%.

In [None]:
top_changed = complaints_by_month[(complaints_by_month['February'] > 300) & (complaints_by_month['pct_change'].abs() > 0.9)]
top_changed

## Step 10: Top changed

Filter the `date_counts` to only the `top_changed` `Complaint Type`s. Save as `top_changed_by_day`.

Things that will be useful:

- [`.isin()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html) - see [Lecture 1](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_1.ipynb#scrollTo=PII26jb0g8Eg)
- [`.index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.index.html) since the `Complaint Types` are the index (row labels) rather than a proper "column"

In [None]:
# your code here

## Step 11: Plotting changed complaints

Make a similar plot to Step 5, but with only the top complaints (`top_changed_by_day`).

In [None]:
# your code here

## Question 0

***Did the change of any of the `Complaint Type`s in Step 10/11 surprise you? Why or why not? (Speak at least one specifically.)***

YOUR RESPONSE HERE

Then, give these a read:

- [NY Daily News article](https://www.nydailynews.com/coronavirus/ny-coronavirus-price-gouging-new-york-city-20200429-z5zs4ygfxbcmrpgzfrnlbxsnea-story.html)
- [Press release from Department of Consumer and Worker Protection](https://www1.nyc.gov/site/dca/media/pr031720-DCWP-Emergency-Rule-Price-Gouging-Illegal.page)

Overall caveat for this assignment: [**correlation does not imply causation**](https://www.khanacademy.org/math/probability/scatterplots-a1/creating-interpreting-scatterplots/v/correlation-and-causality).

## Question 1

***Did you work with anyone else on this assignment?***

YOUR RESPONSE HERE

# Tutorial

**Read [spaCy 101](https://spacy.io/usage/spacy-101#whats-spacy) through `Linguistic annotations`** (stop at `Pipelines`). It's ok if it doesn't all make sense—this is mostly to introduce you to some terminology for the next topic: natural language processing!