In [None]:
# Initialize Otter
import otter
grader = otter.Notebook()

<table style="width: 100%;" id="nb-header>">
        <tr style="background-color: transparent;"><td>
            <img src="https://d8a-88.github.io/assets/images/blue_text.png" width="250px" style="margin-left: 0;" />
        </td><td>
            <p style="text-align: right; font-size: 10pt;"><strong>Economic Models</strong>, Spring 2020<br>
                Dr. Eric Van Dusen<br>
            Notebook by Andrei Caprau<br>
            Based on "Does the Stock Market Overreact?" by De Bondt and Thaler, 1985</p></td></tr>
    </table>

# Project 4: Does the Stock Market Overreact?

Welcome to Project 4! In this project we'll attempt to repeat the procedure De Bondt and Thaler use to show that people tend to "overreact" to sudden and dramatic news events, and that these overreactions are evident in stock prices. We'll may deviate from their procedure in certain spots, and we'll examine how this affects the conclusions we come to. Throughout this project not only will you be developing your data science skills and performing economics research similar to De Bondt and Thaler's, but you will also be prompted to think about the idea of reproducibility in economics and data science. Have fun!

In [1]:
import datetime as dt
import numpy as np
import pandas as pd
from datascience import *

## Part 1: Reading in the Data

First we will need to read in the relevant data, which has been obtained beforehand from Global Financial Data (GFD). Provided to you is data on every stock tracked by GFD that is or has ever been traded on the New York Stock Exchange. Specifically, we have **closing** data, which is simply the last price the stock traded at on a particular day. The granularity of this data is monthly, meaning for each stock we have the closing price on the **last trading day of each month**. Additionally, we may also have the closing price on the **first** trading day of each month, although for some older stocks this beginning-of-the-month data is unavailable, so we will be using the end-of-month data.

Already our methodology possibly differs from that of De Bondt and Thaler. There are two main chances for differences so far. First, De Bondt and Thaler used data from CRSP, whereas we used GFD data. If the two sources have differing data, naturally our results may differ. Additionally, we will use end-of-month closing price for monthly stock data, whereas De Bondt and Thaler may have measured monthly stock prices in a different way. If there is some trend that stock prices follow on specific days of the month, our results might reflect this trend. For example, it might be the case that more people than usual trade stocks on the last day of the month, in which case our end-of-month data might not be representative of how the stock did that month overall.

<!-- BEGIN QUESTION -->

**Question 1.1:** Suppose that we are concerned our results might be biased due to some phenomenon that occurs at the end of the month. What could we do to alleviate this concern (in other words, how could we test to see if our results seem genuine or if they are a affected by our choice of day)?

<!--
BEGIN QUESTION
name: q1_1
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



Now let's read in the data. Below we load in an array called `nyse_stocks` which has the tickers (nicknames) of all the stocks in our data. We will use this to help us read in the data.

In [2]:
# Do not edit this cell
nyse_stocks = np.load("nyse_stocks.npy", allow_pickle=True)
print("First 5 stocks in the array: ", nyse_stocks[:5])
print("Number of stocks in the array: ", len(nyse_stocks))

In a folder called `data`, we have one `.csv` file for each stock. We will load in each file into a table and store each table in a **dictionary**. If you have not encountered dictionaries before, they are exactly what they sound like. A paper dictionary has a mapping between words and definitions. If you know a word, you can look it up and see its definition. A Python dictionary is almost exactly the same; it has *keys* and *values*. A key is like the word and a value is like the definition it maps to. A dictionary has the following syntax:

```python
>>> # Set key to value in the dictionary
>>> dictionary[key] = value
>>>
>>> # Retrieve the value corresponding to key
>>> dictionary[key]
value
```

In [3]:
# Do not edit this cell
# This cell may take a minute or so to complete

# Here we create a new, empty dictionary called nyse
nyse = {}

for stock in nyse_stocks:
    # Here we read in a table and assign it to the stock name in nyse
    current_stock = Table.read_table("data/{}.csv".format(stock))
    if current_stock.num_rows < 5:
        continue
    if 0 in current_stock.column("close"):
        continue
    current_stock = current_stock.with_column(
        "date", 
        current_stock.apply(lambda row: dt.date.fromisoformat(row), "date")
    )
    
    nyse[stock] = current_stock
print("Number of keys in nyse: ", len(nyse.keys()))

Let's take a look at the data for IBM.

In [4]:
nyse["IBM"]

First notice in the column called `close` how little the price was for one share of IBM over 100 years ago! It actually wasn't so cheap at the time, the data we have has been automatically adjusted for events called *stock splits*. You don't need to know this to successfully complete the project, and if it doesn't make sense feel free to move on to the next paragraph, but sometimes companies split their stock such that each share becomes multiple shares. Doing so naturally divides the price of each share by whatever the split ratio was, since the overall value of the company hasn't changed, the only thing that changed is what proportion of a company a single share is worth. We don't want to use these arbitrary prices, because we really want to measure the change in the value of a company. So we adjust the prices to reflect any splits that have happened.

Additionally, note the date format in the column called `date`. Actually, this isn't just a string that looks like a date, it is a special Python object from a package called `datetime`, imported as `dt`. If you aren't familiar with what an object is, don't worry! All you need to know are the following things:

* `datetime` objects can be compared! For example, you and I know that 2020/01/01 comes before 2020/01/02, but how would you make this comparison in code? Fortunately, you don't have to, `datetime` does it for you.
```python
>>> # All this does is make a datetime object corresponding to 2020/01/01
>>> date1 = dt.date(2020, 1, 1)
>>> # All this does is make a datetime object corresponding to 2020/01/02
>>> date2 = dt.date(2020, 1, 2)
>>> date1 < date2
True
>>> date1 > date2
False
```

* To extract the year of a `datetime` object, use `.year`.
```python
>>> date1.year
2020
```

* To extract the month of a `datetime` object, use `.month`.
```python
>>> date1.month
1
```

* To extract the day of a `datetime` object, use `.day`.
```python
>>> date1.day
1
```

Easy!

Our data has days for both the end of each month and the beginning of each month. We only want data for the end of each month.

**Question 1.2:** Complete the function `shift_column`, which takes in a table containing stock data for a company, extracts the column defined by the `column` argument, and shifts all values either up by one or down by one depending on the `direction` argument. For the last element (or the first element if we're shifting down), since there was nothing to shift from, assign this element to `None`. Make sure the first element doesn't appear anymore if we're shifting up (last element if we're shifting down). So for example, if the original column is `array([1, 2, 3])` and we want to shift up, the new shifted column will be  `array([2, 3, None])`. If we want to shift down, the new column will be `array([None, 1, 2])`.

*Hint*: You can access the last item in an array with `.item(-1)`.

<!--
BEGIN QUESTION
name: q1_2
-->

In [5]:
def shift_column(table, column, direction):
    """
    Shift the column in table in a direction by one, and return the shifted column as an array.
    
    Parameters
    ----------
    table: a datascience table
    column: str
        a string of a column that appears in table
    direction: str
        either the string "up" or "down"
    
    Returns
    -------
    array of shifted column
    """
    
    assert direction in ["up", "down"], "Invalid direction, must be either 'up' or 'down'."
    assert column in table.labels, "Invalid column."
    
    array = table.column(...).copy()
    
    if direction == "up":
        # .itemset sets the item at the index of the first argument to the value of the second argument.
        array.itemset(..., None)
        
        # Look at the comment below to understand what roll_direction is used for.
        roll_direction = -1

    else:
        array.itemset(..., None)
        roll_direction = ...
    
    # np.roll shifts the array forward by the number in its second argument (backward if negative).
    shifted_array = np.roll(array, ...)
    
    return shifted_array

In [None]:
grader.check("q1_2")

**Question 1.3:** Complete the function `get_day`, which takes in a `datetime` object, and returns the day of each `datetime` object. If the function receives `None` as an argument, it returns -1.

<!--
BEGIN QUESTION
name: q1_3
-->

In [12]:
def get_day(date_arg):
    """
    Get the day attribute of a datetime object. If date_arg is None, return -1.
    
    Parameters
    ----------
    date_arg: datetime
    
    Returns
    -------
    int
        day attribute of datetime object
    """
    
    if date_arg == None:
        return ...
    else:
        return ...

In [None]:
grader.check("q1_3")

**Question 1.4:** Complete the function `get_month`, which takes in a `datetime` object, and returns the month of each `datetime` object. If the function receives `None` as an argument, it returns -1.

<!--
BEGIN QUESTION
name: q1_4
-->

In [15]:
def get_month(date_arg):
    """
    Get the month attribute of a datetime object. If date_arg is None, return -1.
    
    Parameters
    ----------
    date_arg: datetime
    
    Returns
    -------
    int
        month attribute of datetime object
    """
    
    if date_arg == None:
        return ...
    else:
        return ...

In [None]:
grader.check("q1_4")

**Question 1.5:** Complete the function `filter_end_of_month`, which takes in a table, and returns a copy of the table but only with rows corresponding to the end of each month. Think of how we can tell whether a date is the last day of the month in the dataset, given that the data is sorted on date (it is guaranteed that the date of a current row is always before the date of the next row in our data). The final table that the function returns
must have the same columns as it had when the function was called.

*Hint*: Use `shift_column`, `get_day`, and `get_month`. Also note that you can filter a table using a predicate on another column with the syntax `tbl.where("col1", are.equal_to, "col2")`.

<!-- `array1 > array2` performs an element-wise greater-than comparison between the two arrays and returns an array of booleans. Similarly, `array1 != array2` performs element-wise not-equal comparison, `array1 & array2` performs element-wise `and` comparison between booleans, and `array1 | array2` performs element-wise `or` comparison between booleans. -->

<!--
BEGIN QUESTION
name: q1_5
-->

In [18]:
def filter_end_of_month(table):
    """
    Filter table so that only rows corresponding to the end of a month remain.
    
    Parameters
    ----------
    table: a datascience table
    
    Returns
    -------
    table
        Filtered table. Remember that the filtered table should have the same number of columns as the input.
    """
    
    table = table.copy()
    table = table.with_column("shifted_date", ...)
    table = table.with_columns(
        "date_attribute", ..., 
        "shifted_attribute", ...
    )
    table = table.where(..., ..., ...)
    table = table.drop([...])
    return table

    return table

In [None]:
grader.check("q1_5")

Excellent. We can now filter our data so that we only consider days that are at the end of each month. This way, when we calculate each stock's monthly return we have a standard way of defining return for every stock. The following code cell applies the `filter_end_of_month` function to each table in our `nyse` dictionary. It might take a minute or so to complete.

*Warning*: The following code cell irreversibly alters our existing data in memory. If your function is incorrect and you ran this code cell, you may have to re-load the data into memory.

In [33]:
for stock in nyse.keys():
    filtered_data = filter_end_of_month(nyse[stock])
    if filtered_data.num_rows < 2:
        nyse.pop(stock, None)
    else:
        filtered_data = filtered_data.with_column(
            "date_string", 
            np.array([date.strftime("%Y-%m") for date in filtered_data.column("date")])
        )
        nyse[stock] = filtered_data
print("Number of keys in nyse: ", len(nyse.keys()))

## Part 2: Calculating Returns

There are several ways to calculate returns for stocks; recall from lecture even De Bondt and Thaler tried a few different methods in their paper. For this project, to keep things simple we will assume return is defined as shown:

$$
\text{return} = \dfrac{\text{new price}}{\text{old price}} - 1
$$

To find the total return over a period of several months, we would do

$$
\prod_{\text{month}=0}^{n-1}\dfrac{\text{price}_{\text{month}+1}}{\text{price}_{\text{month}}} - 1 = \dfrac{\text{price}_{n}}{\text{price}_{0}} - 1 = \dfrac{\text{new price}}{\text{old price}} - 1
$$

Notice that if the price didn't change from one period to the next, return will be 1.

**Question 2.1:** Complete the function `calculate_return`, which finds the return of any given month in a table using that month's price, the previous month's price, and the formula above. It applies this to each row, then assigns these values to a new column, and returns (no pun intended) the new table. Notice that the first row will not be able to give a return, since we don't have any previous value to reference. Your function should thus also delete the first row of the table. It is recommended that you carefully check to see that your function outputs the desired values as the public tests are not rigorous.

*Hint*: Use a function you've defined earlier.

<!--
BEGIN QUESTION
name: q2_1
-->

In [34]:
def calculate_return(table):
    """
    Calculate return, assign to new column, return table.
    
    Parameters
    ----------
    table: a datascience table
    
    Returns
    -------
    table
        Remember that the return table should have the same number of columns as 
        the input plus one, and one fewer row than the input. The columns willl be 
        'date', 'close', and 'return'.
    """
    
    table = table.copy()
    
    # Creates a new column that is the close column but shifted down by one.
    table = table.with_column("shifted_close", ...)
    
    # Delete the first row, cannot calculate return for it.
    table = table....
    
    # Calculate return for each row as this row / previous row quantity minus one.
    table = table.with_column("return", (... / ...) - ...)
    
    # Drop the column we created in this function.
    table = table....
    
    return table


In [None]:
grader.check("q2_1")

As before, let's apply this function to every stock we have.

*Warning*: The following code cell irreversibly alters our existing data in memory. If your function is incorrect and you ran this code cell, you may have to re-load the data into memory. Additionally, since your function deletes a row every time it runs, you should only run this cell once. If you run it twice in the same notebook instance, you will likely have to re-load the data into memory.

In [41]:
for stock in nyse.keys():
    nyse[stock] = calculate_return(nyse[stock])

Calculating return is not enough, however. To see why, imagine something catastrophic happens to the economy, for instance a pandemic. Think to yourself what would happen to the stock price of any arbitrary company, and what would happen to the stock prices of all public companies on average. If something happens to the stock price of a company, does that necessarily reflect something fundamentally different that the company has done to warrant a change in valuation?

<!-- BEGIN QUESTION -->

**Question 2.2:** In two or fewer sentences explain why the returns we have calculated so far are not sufficient for understanding how well a company does, and why we therefore need to additionally measure the return of some benchmark, like the overall market.

<!--
BEGIN QUESTION
name: q2_2
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



We'll make another simplifying assumption and say that the return of the market is simply the average return of every stock we have data for, with each stock receiving equal weight. In reality this is likely a poor metric for market return, as it doesn't make much sense for small, volatile stocks to have the same contribution to average market return as large, well-established companies. It is, however, straightforward to understand and implement.

We now create an array that contains all of the dates in our desired range by just taking the date array of one company that we know is old enough to have data for every desired date. We will use this to help us create a new "stock" which captures the average market return of all stocks. We do need to be careful in our procedure, however, because as time goes on some stocks come and go from our data, and we need to include all NYSE companies that existed in a given month.

**Question 2.3:** From looking at IBM's data, we can be sure that the company's date range contains all of the dates required to reproduce the paper's procedure. Let's just use this data to make an array called `date_range` that has all of the `datetime` objects between 1930/01/01 and 1981/1/1, for which there will be one per month.

<!--
BEGIN QUESTION
name: q2_3
-->

In [42]:
filtered_dates = nyse["IBM"]....
date_range = filtered_dates.column("date_string")

In [None]:
grader.check("q2_3")

Now that we have an array of dates that we want to look at, let's create a "stock" that captures the average return of the market.

**Question 2.4:** Complete `market_return`, which takes in a dictionary of tables and a date string, and computes the market return by taking the average of all returns for all stocks that have data for that given date, and returns the market return.

<!--
BEGIN QUESTION
name: q2_4
-->

In [48]:
def market_return(dictionary, date):
    """
    Calculate returns for a date by finding the average of all returns for stocks present at that date.
    
    Parameters
    ----------
    dictionary: dict
        nyse data
    date: datetime
        a date from date_range
    
    Returns
    -------
    the market return on the specified date
    """
    
    total_return = 0
    n = 0
    for stock in dictionary.keys():
        stock_data = dictionary[...]
        this_stock_return = stock_data.where("date_string", ...).column("return")
        
        if len(this_stock_return) > 0:
            total_return += this_stock_return.item(0)
            n += 1

    if n == 0:
        # We should never need to print this. If this does get printed there may be a mistake somewhere. 
        # Do you know why?
        print("No data when trying to calculate market return for ", date)
        return None

    avg_return = ... / ...

    return avg_return

In [None]:
grader.check("q2_4")

Now let's run your `market_return` for the first date in `date_range`:

In [50]:
market_return(nyse, date_range.item(0))

You should have noticed that it took a few seconds to run `market_return`. Considering how much data we have for this project, it would take about 20 minutes for you to run `market_return` on every date in `date_range`. Luckily, we've already done the heavy lifting for you! The cell below will load our market return data into an array nyse_returns.

<!-- Now let's run it on all of `nyse`! As mentioned in the problem, it will take a bit to run. Enjoy a cup of tea or a 20-minute nap in the meantime, or perhaps skip ahead and take a look at the next problem. See you there. -->

In [51]:
# nyse_returns = market_return(nyse, date_range)
nyse_returns = np.load("nyse_returns.npy")
print("First 5 returns in the array:", nyse_returns[:5])
print("Number of returns in the array:", len(nyse_returns))

## Part 3: Forming Portfolios

Now comes the final part. We're going to finally form portfolios based on how stocks perform relative to the market and see how these portfolios do over time.

**Question 3.1:** Complete `test_complete_data`, which takes in a table, a start date, an end date, and outputs the filtered table if the array contains complete monthly data for the range of dates, inclusive, and `None` otherwise. For simplicity, we can just assume that a table has complete data in the range if the number of rows it has in that range makes sense.

<!--
BEGIN QUESTION
name: q3_1
-->

In [52]:
def test_complete_data(table, start, end):
    """
    Check if table has complete data in the specified date range.
    
    Parameters
    ----------
    table: table
    start: datetime
        start of date range
    end: datetime
        end of date range
    
    Returns
    -------
    table
        Return the original table where the dates fall within the range if the original table has 
        complete data, otherwise return None.
    """
    
    # Isolate the part of the table where date is in the given range, inclusive.
    table = table....
    table = table....
    
    # Measure how many years the range spans.
    year_diff = ... - ...
    
    # Measure how many months ignoring years.
    month_diff = ... - ...
    
    # Now we calculate how many months are in the range by multiplying the number of years by 12.
    total_diff = ... * 12 + ... + ...
    
    # There should be one row per month, and so if the table has complete data there will be as many
    # rows as months in the date range, inclusive.
    if table.num_rows == ...:
        return table
    else:
        return None

In [None]:
grader.check("q3_1")

**Question 3.2:** Complete `rank_stocks`, which takes in a dictionary, a start index, an end index, an array containing a range of dates, and an array containing market returns that correspond to the range of dates, and then outputs a sorted table containing stocks and their excess returns over the period, with best-performing stocks first. There are no hidden tests for this quesiton.

<!--
BEGIN QUESTION
name: q3_2
-->

In [55]:
def rank_stocks(dictionary, start, end, dates, market_returns):
    """
    Create a table of stock rankings based on excess returns, with best stocks at the head of the table.
    
    Parameters
    ----------
    dictionary: dict
        nyse data
    start: int
        index of the desired start date in dates
    end: int
        index of the desired end date in dates
    dates: array
        date_range computed earlier
    market_returns: array
        nyse_returns computed earlier
    
    Returns
    -------
    table
        Sorted table of stocks and returns over date range. Table columns should be 'stocks' and 
        'excess_returns'.
    """
    
    # Convert the date strings received to datetime objects for easy date selection.
    start_date = dt.date.fromisoformat(dates.item(start) + "-01")
    end_date = dt.date.fromisoformat(dates.item(end) + "-31")
    
    # We only need the market returns in the given date range. Fortunately, we can simply index into the 
    # market return array since we computed it using the same master date array, so the indices for the 
    # date array will match the indices for the market returns array.
    relevant_market_returns = market_returns.take(np.arange(start, end + 1))
    
    stocks = make_array()
    excess_returns = make_array()
    for stock in dictionary.keys():
        stock_data = dictionary[...]
        test_table = test_complete_data(..., ..., ...)
        
        if type(test_table) != type(None):
            cumulative_excess_return = test_table.column("return") - relevant_market_returns + 1
            excess_returns = np.append(excess_returns, cumulative_excess_return.prod())
            stocks = np.append(stocks, stock)
    
    return Table().with_columns("stock", ..., 
                                "excess_returns", ...).sort("excess_returns", descending=...)

In [None]:
grader.check("q3_2")

**Question 3.3:** Complete `track_portfolio`, which tracks the progress of the best and worst performers from a table of ranks. There are no hidden tests for this question, and the public tests only check that `track_portfolio` works correctly in simple test-cases so students are not penalized for cascading errors. Notice that this function returns two things. If you have not seen this before, this is something nice we can make functions do. To save both things that such a function outputs into variables, we would simply write `thing1, thing2 = function(arguments)`.

*Hint*: What function have we defined and used above that takes a group of stocks and finds their average return?

<!--
BEGIN QUESTION
name: q3_3
-->

In [58]:
def track_portfolio(ranks, top_n, start, end, dates, market_returns, dictionary):
    """
    Track the performance of the winning and losing portfolios.
    
    Parameters
    ----------
    ranks: table
    top_n: int
        top n companies to select in portfolio
    start: int
    end: int
    dates: array
    market_returns: array
    dictionary: dict
        nyse
    
    Returns
    -------
    float, float
        cumulative average returns of portfolios
    """

    winning_indices = np.arange(0, top_n)
    losing_indices = np.arange(ranks.num_rows - top_n, ranks.num_rows)
    winning_port = ranks.take(...).column("stocks")
    losing_port = ranks.take(...).column("stocks")
    winning_dictionary = {}
    losing_dictionary = {}
    for stock in ...:
        winning_dictionary[stock] = dictionary[stock]
    for stock in ...:
        losing_dictionary[stock] = dictionary[stock]
    current_dates = dates.take(np.arange(start, end + 1))
    relevant_market_returns = market_returns.take(np.arange(start, end + 1))
    winning_market_returns = make_array()
    losing_market_returns = make_array()
    for date in current_dates:
        winning_market_returns = np.append(winning_market_returns, ...)
        losing_market_returns = np.append(losing_market_returns, ...)
    winning_returns = winning_market_returns - relevant_market_returns + 1
    losing_returns = losing_market_returns - relevant_market_returns + 1
    return winning_returns.prod(), losing_returns.prod()

In [None]:
grader.check("q3_3")

**Question 3.4:** Let's wrap it up. Complete `main_function`, which specifies blocks of dates, and for each block it forms winning and losing portfolios, finds the cumulative excess returns of these portfolios for each date block, and averages all of these together. There are no hidden tests for this question.

<!--
BEGIN QUESTION
name: q3_4
-->

In [60]:
def main_function(dictionary, dates, market_returns, top_n, chunk_size=36):
    """
    Compute two arrays, each containing cumulative excess returns of winning and losing portfolios
    respectively for all dates.
    
    Parameters
    ----------
    dictionary: dict
        nyse
    dates: array
    market_returns: array
    top_n: int
        top n companies to select in portfolio
    
    Returns
    -------
    array, array
    """
    winning_average_cumulative_excess_return = make_array()
    losing_average_cumulative_excess_return = make_array()
    date_index_ranges = np.split(np.arange(0, len(dates)), int(len(dates) / chunk_size))
    for i in range(len(date_index_ranges)-1):
        date_index_array = date_index_ranges[i]
        date_index_array_next = date_index_ranges[i+1]
        ranking = ...(dictionary, date_index_array...., date_index_array...., dates, market_returns)
        w, l = ...(..., top_n, date_index_array_next...., date_index_array_next...., 
                   dates, market_returns, dictionary)
        winning_average_cumulative_excess_return = np.append(winning_average_cumulative_excess_return, ...)
        losing_average_cumulative_excess_return = np.append(losing_average_cumulative_excess_return, ...)
        print("Finished block ", i)
    return winning_average_cumulative_excess_return, losing_average_cumulative_excess_return

In [None]:
grader.check("q3_4")

Now call your main function! Its arguments should be `nyse`, `date_range`, `nyse_returns`, `35`. You can also explore values of `top_n` other than 35. The function may take a couple minutes to run.

In [63]:
w_array, l_array = ...

In [64]:
w_array

In [65]:
l_array

<!-- BEGIN QUESTION -->

**Question 3.5:** What do you notice about the array of returns of winner portfolios and loser portfolios? Does this seem to match what De Bondt and Thaler argue? Please limit your response to one or two sentences.

<!--
BEGIN QUESTION
name: q3_5
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.6:** However close or far away your numbers look from what you expected, they are not an exact match to the numbers from the paper. Why is this, and what does this say about the idea of reproducibility? Please limit your response to two or three sentences.

<!--
BEGIN QUESTION
name: q3_6
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->



## Conclusion

Congratulations, you've finished Project 4! Hopefully you've enjoyed doing the kind of work that foreshadows what some research in economics might look like. Additionally, we hope the theme of reproducibility we targeted for this week/project was insightful.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export("proj04.ipynb")