In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab01.ipynb")

In [None]:
import pandas as pd
import numpy as np
from datetime import date
import matplotlib.pyplot as plt
%matplotlib inline

# Lab 1: Using Pandas

*This lab is heavily inspired by [Data 198 Module 2](https://github.com/ds-modules/DATA-198-SP21/blob/main/module_2/module2.ipynb) and [Data 88E Lab 9](https://github.com/data-88e/fa22-dev/blob/main/lab/lab09/lab09.ipynb).*

The `pandas` module is a powerful library for manipulating and analyzing data. In this lab, you will use what you learned in lecture, as well as the `pandas` documentation, to manipulate and analyze some example datasets.

### Learning Objectives

By the end of this lab, you should be able to work with datasets with basic Pandas methods, including
- Read a `.csv` file into a `pandas` dataframe
- Select certain columns from a dataframe
- Filter a dataframe
- Use the `pandas` documentation to complete a certain data manipulation task (e.g. a join)

----

## Price Data over Time

We begin by importing an example dataset that tells us the price of some good over time.

**Question 1.1:** Import the data from the `prices.csv` file into a DataFrame named `prices_raw`.

In [None]:
prices_raw = ...
prices_raw

In [None]:
grader.check("q1_1")

First, we want to add a `date` row to this table that contains the date of each entry as a [Python date object](https://docs.python.org/3/library/datetime.html#datetime.date). Using Python date objects allows us to manipulate dates and times conveniently (for example, they make sorting by dates very easy). Below, we've defined a function that, given a row, tells us the date of that row based on its `year` and `month`:

In [None]:
def date_of_row(row):
    return date(int(row['year']), int(row['month']), 1)

As an example, let's see what this function returns when applied to the first row:

In [None]:
date_of_row(prices_raw.iloc[0])

This looks like what we're looking for. So, how do we apply it to every row in the table? 

Let's look into the `apply` function. Read the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html).

<!-- BEGIN QUESTION -->

**Question 1.2:** What is the `func` argument in the documentation? What should we pass for the `func` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.3:** What is the `axis` argument? What should we pass for the `axis` argument in this case?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 1.4:** Now, construct a `date_column` using the `apply` function. 

In [None]:
date_column = ...
date_column

In [None]:
grader.check("q1_4")

**Question 1.5:** Now we want to add the dates as a new column to `prices_raw`. Construct `prices_raw_with_date`, a copy of the DataFrame `prices_raw`, as well as a `date` column in our new DataFrame with the values from `date_column`.

In [None]:
prices_raw_with_date = prices_raw.copy() # make a copy of prices_raw
...
prices_raw_with_date

In [None]:
grader.check("q1_5")

**Question 1.6:** Create a new table, `prices`, that contains only the `date` and `price` columns from the `prices_raw_with_date` table.

In [None]:
prices = ...
prices

In [None]:
grader.check("q1_6")

**Question 1.7:** Now, we can filter the dataframe like we did in lecture. For example, we could get rows where the price is strictly between 105 and 115 (exclusive). Put all of these rows in a new dataframe, `filtered_prices`.

In [None]:
filtered_prices = ...
filtered_prices.head()

In [None]:
grader.check("q1_7")

**Question 1.8:** How many rows and columns are in `filtered_prices_subset`? Assign `dims` to a tuple (ex. (1, 3) is a tuple of length 2) containing the dimensions of `filtered_prices_subset`.

In [None]:
dims = ...
dims

In [None]:
grader.check("q1_8")

**Question 1.9:** Make a new dataframe that contains just the first 5 rows of `filtered_prices` and assign it to `filtered_prices_subset`. 

Hint: Should you use `.loc` or `.iloc` here?

In [None]:
filtered_prices_subset = ...
filtered_prices_subset

In [None]:
grader.check("q1_9")

Now, let's visualize how the price changes over time. We're using [Matplotlib](https://matplotlib.org/) to produce this plot; you will learn more about this later in the course.

In [None]:
plt.plot(filtered_prices['price'])
plt.xlabel("Time")
plt.ylabel("Prices")
plt.title("Prices changes over time");

**Question 1.10:** The x-axis doesn't tell us much here. By reindexing the dataframe to the `date` column, we can make the x-axis more clear. Use `.set_index` to change the index of `filtered_prices` to be the `date` column and then replot the graph.

P.S. We are only able to get a meaningful x-axis because the `date` column only contains Python datetime objects!

P.S. There are other ways to plot the graph with `date` as x, but here we specifically are asking to reindex. 

In [None]:
reindexed_filtered_prices = ...
plt.plot(reindexed_filtered_prices['price']); # This line replots the graph

In [None]:
grader.check("q1_10")

While looking at raw prices can be helpful, it is often more informative to adjust for inflation and consider the real price of the good over time. Let's say (hypothetically of course) you know that for each month in `reindexed_filtered_prices`, inflation grew by roughly 0.1%. 

**Question 1.11:** Create a dataframe `real_cost` which has the date as the index and 3 columns. The `price` column should be the same as in `reindexed_filtered_prices`, `deflator` should represent the [GDP deflator](https://www.khanacademy.org/economics-finance-domain/ap-macroeconomics/economic-iondicators-and-the-business-cycle/real-vs-nominal-gdp/v/gdp-deflator) (with the deflator for the first month being 100, the deflator for the second month being 100.1, etc.), and `real price` should be the inflation-adjusted price. You can see what the first 5 rows of the resulting DataFrame should look like by running the cell below.

*Hint:* Consider using a list comprehension combined with a NumPy method for the `inflation` column.


In [None]:
first_5_rows = {
    'date': ['2004-08-01', '2004-09-01', '2004-10-01', '2004-11-01', '2004-12-01'],
    'price': [105.04, 105.20, 105.63, 105.81, 105.93],
    'deflator': [100.0000, 100.1000, 100.2001, 100.3003, 100.4006],
    'real price': [105.0400, 105.0949, 105.4191, 105.4932, 105.5073]
}

pd.DataFrame(first_5_rows).set_index('date')

In [None]:
real_cost = ...
...
...
real_cost

In [None]:
grader.check("q1_11")

<!-- BEGIN QUESTION -->

**Question 1.12:** How do the real cost and nominal cost change over time?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

Let us now plot the real cost over time.

In [None]:
plt.plot(real_cost['real price']);

Going back to the dataframe `prices`, which dates had the most expensive prices?

**Question 1.13:** Sort the dataframe `prices` by the `price` column, in descending order. Notice how we expect you to sort the dataframe itself, not assign the sorted dataframe to a new variable. You should be able to do this in one line of code.

In [None]:
prices = ...

In [None]:
grader.check("q1_13")

**Question 1.14:** Just for fun, let us rename the columns of `prices` to have correct capitalization. `date` should become `Date` and `price` should become `Price`.

In [None]:
...

In [None]:
grader.check("q1_14")

## Pre-Semester Survey

**Question 2:** Please fill out the [pre-semester survey](https://docs.google.com/forms/d/e/1FAIpQLSeLp78xHmUOro5Rkxg9glTIQwaxMiucS0_agMxBbyoegcYxKw/viewform?usp=sharing) to help us get to know you better! Similar to the feedback, at the end of the Google form, you should see a codeword. Assign the codeword to the variable `codeword_survey` below. 

**Note:** This question is worth 3 points, we strongly recommend you do not skip it.

In [None]:
codeword_survey = ...

In [None]:
grader.check("q2")

**Congratulations**, you are finished with lab 1 of econ 148!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)