In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab01.ipynb")

In [None]:
import pandas as pd
from datetime import date

# Lab 1: Using Pandas

*This lab heavily inspired by [Data 198 Module 2](https://github.com/ds-modules/DATA-198-SP21/blob/main/module_2/module2.ipynb), [Data 88E Lab 9](https://github.com/data-88e/fa22-dev/blob/main/lab/lab09/lab09.ipynb).*

The `pandas` module is a powerful library for manipulating and analyzing data. In this lab, you will use what you learned in lecture, as well as the `pandas` documentation, to manipulate and analyze some example datasets.

**Learning Objectives**  

By the end of this lab, you should be able to work with datasets with basic Pandas methods, including
- Read a `.csv` file into a `pandas` dataframe
- Select certain columns from a dataframe
- Filter a dataframe
- Use the `pandas` documentation to complete a certain data manipulation task (e.g. a join)

## Section 1: Price Data Over Time

We begin by importing an example dataset that tells us the price of some good over time.

**Question 1.1:** Import the data from the `prices.csv` file into a dataframe named `prices_raw`.

In [None]:
prices_raw = ...
prices_raw

In [None]:
grader.check("q1_1")

First, we want to add a `date` row to this table that tells us the date, as a [Python date object](https://docs.python.org/3/library/datetime.html#datetime.date), of the entry. Let's define a function that, given a row, tells us the date of that row based on its `year` and `month`:

In [None]:
def date_of_row(row):
    return date(int(row['year']), int(row['month']), 1)

As an example, let's see what this function returns when applied to the first row:

In [None]:
date_of_row(prices_raw.iloc[0])

This looks good. So how do we apply it to every row in the table? Let's look at [the `pandas` documentation for the `apply` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html).

<!-- BEGIN QUESTION -->

**Question 1.2:** What is the `func` argument? What should we pass for the `func` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.3:** What is the `axis` argument? What should we pass for the `axis` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 1.4:** Now, construct a `date_column` using the `apply` function. 

In [None]:
date_column = ...
date_column

In [None]:
grader.check("q1_4")

**Question 1.5:** Now we want to add the dates as a new column to `prices_raw`. Construct `prices_raw_with_date`, a dataframe containing all of the columns of `prices_raw`, as well as a `date` column with values from `date_column`.

In [None]:
prices_raw_with_date = prices_raw.copy() # make a copy of prices_raw
prices_raw_with_date[...] = ...
prices_raw_with_date

In [None]:
grader.check("q1_5")

Note that another way to construct a new table with this new column is to use the [`assign` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html). The documentation might be a little confusing to read; when `**kwargs` is a parameter to a function, such as `assign`, it means that we can set its keywords and values like so:
```python
prices_raw.assign(keyword0=value0, keyword1=value1, keyword2=value2, ...)
```
The documentation says that, for the `assign` function, "the [new] column names are keywords," so in this case, the keywords are the new column names, and the value corresponding a keyword is the data for that new column.

**Question 1.6:** Create a new table, `prices`, that contains only the `date` and `price` columns, in that order, from the `prices_raw_with_date` table.

In [None]:
prices = ...
prices

In [None]:
grader.check("q1_6")

**Question 1.7:** Now, we can filter the dataframe like we did in lecture. For example, we could get all rows where the price is between 105 and 115, exclusive. Put all of these rows in a new dataframe, `filtered_prices`.

In [None]:
filtered_prices = ...
filtered_prices.head()

In [None]:
grader.check("q1_7")

**Question 1.8:** Make a new dataframe that contains just the first 5 rows of `filtered_prices` and assign it to `filtered_prices_subset`. 

Hint: Be careful if you choose to use `.loc` because the index does not start from 0. `.iloc` may be a better choice. 

In [None]:
filtered_prices_subset = ...
filtered_prices_subset

In [None]:
grader.check("q1_8")

Now, we can do some data analysis with the `filtered_prices` dataframe. We will discuss more advanced plotting later in the course. For now, we can make a simple plot using [the `plot` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html):

In [None]:
filtered_prices.plot();

**Question 1.9:** The x-axis doesn't tell us much here. By reindexing the dataframe to the `date` column, we can make the x-axis more clear. Plot the data after reindexing `filtered_prices` on the `date` column.

In [None]:
reindexed_filtered_prices = ...
reindexed_filtered_prices.plot();

In [None]:
grader.check("q1_9")

## Section 2: Constructing a Phillips Curve

In this example, we'll construct a Phillips curve, showing the relationship between unemployment and inflation. To start, we'll import `unemployment.csv` and `core_inflation.csv` into two dataframes, `unemployment` and `inflation`, respectively. 

The datasets are from [Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/), a classical and very accessible data source for economics. We will also learn about how to get data from FRED using an API later! But for now, let's import the data manually. 

In [None]:
unemployment = pd.read_csv('unemployment.csv') # UNRATE
inflation = pd.read_csv('core_inflation.csv') # CPILFESL_PC1

In [None]:
unemployment

In [None]:
inflation

In order to combine these two datasets, we are going to perform an *inner join*. This will construct a new table that only contains rows where the value of `DATE` is the same in both tables. By doing so, we will be able to plot a graph where a point represents the inflation rate and federal funds rate *at the same point in time*. In `pandas`, to join two dataframes, we use the [`merge` function](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html). Take a look at the documentation, and use it to answer the following questions.

<!-- BEGIN QUESTION -->

**Question 2.1:** What should we pass for the `right` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.2:** What is the `how` argument? What should we pass for the `how` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.3:** Take a look at the `on`, `left_on`, and `right_on` arguments. Note that for both tables, the name of the date column, `DATE`, is the same. Which of the `on`, `left_on`, and `right_on` arguments do we need to pass in this case, and what values should they take on?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 2.4:** Based on the answers to the previous questions, construct `phillips_curve_df`, a dataframe with `DATE`, `CPILFESL_PC1`, and `FEDFUNDS` columns by merging the `unemployment` and `inflation` dataframes.

In [None]:
phillips_curve_df = ...
phillips_curve_df

In [None]:
grader.check("q2_4")

And now we can graph our empirical Phillips curve!

In [None]:
phillips_curve_df.plot.scatter('UNRATE', 'CPILFESL_PC1');

<!-- BEGIN QUESTION -->

**Question 2.5:** How does this empirical Phillips curve match with your expectation? 

Note: If you are not familiar with Phillips curve, feel free to skim through [this chapter](https://data-88e.github.io/textbook/content/09-macro/phillips_curve.html) from Data 88E. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

Congratulations! You're done with Econ 148 Lab 1!

## Feedback

**Question 3:** Please fill out this short [feedback form](https://forms.gle/jCG7VvhptpzTfTFG6) to let us know your thoughts on this lab! We really appreciate your opinions and feedbacks! At the end of the Google form, you should see a codeword. Assign the codeword to the variable `codeword` below. 

In [None]:
codeword = ...

In [None]:
grader.check("q3")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)