In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("proj01.ipynb")

In [None]:
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
import geopandas as gpd
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
%matplotlib inline

# Project 1: Regional GDP
### Regional Heterogeneity: Varying Economic Performance in US Counties
In the first project of Econ 148, we will examine county-level economic performance as measured by real Gross Domestic Product (GDP) over the past two decades. GDP is often analyzed at the country level; however, regional heterogeneity is also a crucial source of variation in growth and business cycle analysis. Therefore, in this project we will use a county-level real GDP dataset from the Bureau of Economic Analysis (BEA) to try to find out the regional differences in economic performance, especially during recessions.

You will use the data cleaning and data manipulation skills you have learned so far in this course to wrangle this rich, but rather complex, real-world dataset. 

#### Data sources: 

The main dataset we will use in this notebook is ["CAGDP9: Real GDP in Chained Dollars by County and MSA"](https://www.bea.gov/data/gdp/gdp-county-metro-and-other-areas) from the Bureau of Economic Analysis (BEA), accessed in January 2023. It provides a comprehensive measure of the gross domestic product of counties, metropolitan statistical areas, and some other local areas in the United States from 2001 to the present. *We use a subset of the full dataset (about 50%) that includes some of the industries available in the original dataset.*

We will also use the ["United States Counties Database"](https://simplemaps.com/data/us-counties) from Simplemaps.com, accessed Jun 2022. Specifically, we will use the geographic data of U.S. counties (i.e., latitude and longitude) to create the visualizations in the last section. 

### Learning Objectives: 
- Importing and exporting dataframes
- Metadata of a dataframe
- Recognizing and handling missing values and NaNs
- String methods and type conversions
- Grouping and aggregating
- Calculating changes and percentage changes
- Joining and merging two dataframes
- A demo of using Jupyter widgets

**A Note on Grading:**  
In Project 1, the autograded questions will have hidden tests, and the text-based free response questions will be graded on correctness. 

---
## Part 1: Importing dataset

Datasets are encoded in different codecs. In most cases, the default codex (utf-8) will be able to process the datasets. But in other cases, if we run into some issues with decoding (especially with datasets containing symbols or other languages), we can manually specify other codecs (e.g. ascii, latin-1). A complete list of codecs for Python 3.7 and newer can be found [here](https://docs.python.org/3.7/library/codecs.html#standard-encodings). 

As a side note, if we want to export the dataframe when we are done, we will also want to make sure that we are using the correct codecs.

For example, *some of you may not be able to import the real GDP dataset by BEA* with the default codecs (utf-8). You may get an error message like the one below.
```python
>>> rgdp = pd.read_csv("data/sample_CAGDP9__ALL_AREAS_2001_2021.csv")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 137852: invalid continuation byte
```

**Question 1.1:** Import the dataset `data/sample_CAGDP9__ALL_AREAS_2001_2021.csv` using an alternative codec `latin-1`.

*Hint:* Look up the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) for `pd.read_csv` and see if there is any argument related to codec or encoding. 

Note: It's totally fine if you see a warning after you successfully import the dataset. This has to do with the content of this dataset.

In [None]:
rgdp_raw = ...
rgdp_raw

In [None]:
grader.check("q1_1")

**Question 1.2:** We notice that the last four rows in `rgdp_raw` are just some footnotes, so we will drop them. To do so, you can either select the top 47670 rows for the data that we want, or you can drop the bottom 4 rows with [`pandas.DataFrame.drop`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html). Assign the modified dataframe to `rgdp`. 

In [None]:
rgdp = ...
rgdp

In [None]:
grader.check("q1_2")

---
## Part 2: Learn about the dataset

Like we did in Lab 3, one of the first things that we will do with our dataset is to learn about its structure: how many rows and columns are there in the dataset? What values does each column store? What is the data type for each column (int, string, etc.)? For categorical variables, what are unique values? For numerical variables, what is the mean, median, min, and max? 

In this section, we will use the built-in functions in Pandas to quickly answer the question above. 

**Question 2.1:** How many rows and columns are there in this dataframe `rgdp`? Assign the number of rows to `N_rows` and the number of columns to `N_cols`. 

Hint: The first section of lab 3 can be a good reference. 

In [None]:
N_rows = ...
N_cols = ...
N_rows, N_cols

In [None]:
grader.check("q2_1")

**Question 2.2:** How many unique GeoFIPS codes are there in this dataframe `rgdp`? Assign the number of unique counties to `N_unique_geofips`. 

In [None]:
N_unique_geofips = ...
N_unique_geofips

In [None]:
grader.check("q2_2")

<!-- BEGIN QUESTION -->

**Question 2.3:** What do the values in the "Description" column represent? Are the categories in the "Description" column mutually exclusive, or are they potentially subsets of each other? Give an example to illustrate your point. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.4:** What are the data types of columns `GeoFIPS`, `GeoName`, `Unit`, and `2021`? Are they integers, floats, strings, objects, or mixed types? Do you find any data types in these columns problematic? Why?

*Hint:* Look into `df.dtypes`. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---
## Part 3: Missing Values and NaNs

The difference between the data found in many tutorials and real-world data is that real-world data is rarely clean and homogeneous. In particular, many interesting datasets will have some amount of missing data. To complicate matters, different data sources may report missing data in different ways.

In our dataset, there are two types of 'missing values': (D) and (NA). Let's see how they look like.

In [None]:
rgdp[rgdp["2001"] == "(D)"][:5]

In [None]:
rgdp[rgdp["2001"] == "(NA)"][:5]

<!-- BEGIN QUESTION -->

**Question 3.1:** Look up the [footnote](https://www.econ148.org/sp23/resources/assets/supp_materials/proj01/CAGDP9__Footnotes.html) of this dataset, what does each of these two types of missing values represent? What do you think is a good way to handle these two types of missing values respectively? This is an open-ended question. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 3.2:** For the sake of simplicity, simply drop all rows that contain missing values (either (D) or (NA)) for this project. 

*Note:* This is not good practice, do not do this in the real world.


In [None]:
rgdp_no_nans = rgdp.copy()
...
rgdp_no_nans.head()

In [None]:
grader.check("q3_2")

---
## Part 4: Type conversions and Regex

In part 2, we noticed that the dataframe we are working with does not have the desired data types for many columns. For example, the real GDP data has some entries that are kept as strings. To convert these entries to the desired data types, the most common way is to use [`pandas.DataFrame.astype(type)`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html). 

For example, we can convert the column "2001" from strings to integers using `astype("int64")`.

In [None]:
rgdp_no_nans["2001"]

In [None]:
rgdp_no_nans["2001"].astype("int64")

It is good practice to perform your data cleaning on a copy of the original dataset, so you can always reference the original dataset if necessary. We create this copy below.

In [None]:
rgdp_clean = rgdp_no_nans.copy()

**Question 4.1:** Write a for-loop that converts all values in the `rgdp_clean` columns ranging from `2001` to `2021`  to `int64`. 

*Hint:* Be careful when accessing the column labels, they are strings.

In [None]:
for year in ...:
    rgdp_clean[str(year)] = ...

In [None]:
grader.check("q4_1")

In [None]:
# check to see if the data types are correct now
rgdp_clean.dtypes

Now we can see the `dtype` for 2001-2021 are `int64`.  

We can also convert data type into `float`, `str`, etc. using the `astype` method on the entire dataframe or some specified data series. Pandas also provides a [`pandas.to_numeric()`](https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html) function to easier convert different data types into numeric. 

In [None]:
# convert LineCode to int64
rgdp_clean["LineCode"] = rgdp_clean["LineCode"].astype("int64")

### Regular Expressions and Strings

But sometimes the data entries require some manipulation before can be converted to the desirable data types easily. For example, entries in the `GeoFIPS` column in our dataset has the following form. 

In [None]:
rgdp_no_nans["GeoFIPS"]

Note that the quotations are there in the data, so simple conversion like `astype(int)` will fail.

In [None]:
# this will produce an error
# rgdp_no_nans["GeoFIPS"].astype("int64")

To extract relevant information, we will use regular expressions. A regular expression is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

For example, we can extract student ID among a bunch of other texts. 

In [None]:
some_text = "Name: Oski, Age: 999, SID: 12345678"
re.findall(r"SID: (\d*)", some_text) 

Or we can replace some text we want. 

In [None]:
some_other_text = "Stanford is the No.1 university in California. "
re.sub("Stanford", "UC Berkeley", some_other_text)

Note that a common way to get rid of texts in a specified pattern is to use [`re.sub`](https://docs.python.org/3/library/re.html#re.sub) and replace the pattern with the empty strings. For example: 

In [None]:
some_messy_text = "UoskiC Beroskikeleoskiy oskiis oskitheoski No.1oski univoskiersioskity ioskin Calioskifooskirnia."
re.sub("oski", "", some_messy_text) # substitute with the empty string

Regular expression is a deep topic and it requires practice to be able to use it well. A well-known website to test if your regular expression works or not is [regex101](https://regex101.com/). It will be very helpful skill in terms of data cleaning. But for now, we will just use it to get rid of the parentheses in data entries in the `GeoFIPS` column. 

**Question 4.2:** Remove all instances of quotation marks from `too_many_quotation_marks` using regular expressions. 

In [None]:
too_many_quotation_marks = 'U"C B"er"kel"ey i"s t"he" No".1 u"niv"ersity i"n C"ali"fo"rn"i"a.'
no_quotation_marks = ...
no_quotation_marks

In [None]:
grader.check("q4_2")

To apply regex and many other string method to a dataframe, we can use `pandas.Series.str` methods, and apply a string function. In our case [`pandas.Series.str.replace`](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html) that can replace each occurrence of pattern/regex in the Series/Index. 


**Question 4.3:** Write the code below that first delete all the quotation marks in values in the `GeoFIPS` column in `rgdp_clean` with [`pandas.Series.str.replace`](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html) and then convert all strings into integers using `astype('int64')`.

> **Note:** While we are converting `GeoFIPS` to integers here for convenience and to practice data manipulation techniques, it is important to note that `GeoFIPS` codes are typically handled as **strings** in real-world applications. This is because many FIPS codes contain leading zeros (e.g., `"01"` for Alabama), which are essential for accurate identification. Converting them to integers would remove these leading zeros, potentially causing issues when merging datasets or performing lookups.

In [None]:
rgdp_clean["GeoFIPS"] = ...
rgdp_clean

In [None]:
grader.check("q4_3")

In [None]:
rgdp_clean.dtypes

Now everything should be the correct data type.

---
## Part 5: Pivot tables and melt

You should be familiar with pivot tables from data 8; feel free to review them [here](https://www.data8.org/interactive_table_functions/) if you like. Implementing pivot tables in pandas is fairly similar to implementing them in the datascience package, look at the documentation [here](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html). Looking at our dataset, it seems like it's already pivoted. So, we would like to introduce you to `.melt()`, which is essentially just the inverse of the pivot table.

Many economic datasets are in 'spreadsheet' formats, which have groups of columns representing the same type of information. For example, in our real GDP dataframe, columns like `2001`, `2002` simply give the real GDP values in the given year. To make our lives easier when working with the data later, we can convert this pivoted dataframe to a more traditional dataframe, where all data are in just one column. We can use [`pandas.melt`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to do this (in other languages like Stata, this would be called as converting from a wide to a long format).

**Question 5.1:** Convert the dataframe using [`pandas.melt`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) so that it contains seven columns: `GeoFIPS`, `GeoName`, `Region`, `LineCode`, `Description`, `year`, and `value`. The `value` column should contain the real GDP value for the given region, industry, and year on that row. 

The first 5 rows of what your resulting dataframe should look like have been provided below in `samp_df`. There are no hidden tests for this question.

In [None]:
sample_data = {
    'GeoFIPS': [0, 0, 0, 0, 0],
    'GeoName': ['United States', 'United States', 'United States', 'United States', 'United States'],
    'Region': [" "," ", " "," ", " "],
    'LineCode': [1.0, 2.0, 3.0, 6.0, 10.0],
    'Description': ['All industry total', 'Private industries', 'Agriculture, forestry, fishing and hunting',
                    'Mining, quarrying, and oil and gas extraction', 'Utilities'],
    'year': [2001, 2001, 2001, 2001, 2001],
    'value': [13263417000, 11452473000, 154754000, 272249000, 214832000]
}

samp_df = pd.DataFrame(sample_data)
samp_df

In [None]:
rgdp_melted = pd.melt(rgdp_clean, 
                      id_vars=[...], 
                      # id_vars should be the five columns you want to keep the same from rgdp_clean 
                      # (aka the columns you don't want to unpivot)
                      value_vars=[...]
                      # value_vars should be the columns you want to combine, or unpivot
                     ).rename(columns={"variable": "year"}) 
rgdp_melted

In [None]:
grader.check("q5_1")

Now, all the real GDP values are in just one column. 

**Question 5.2:** One issue remains: the `year` column has data type as strings. Convert the column into `int64`.

In [None]:
...
rgdp_melted["year"]

In [None]:
grader.check("q5_2")

As we are only concerned about the *county-level* GDP data, we will filter `rgdp_melted` for only the relevant data. Some entries in the dataframe represent state aggregates or national aggregate, and these entries will have a GeoFIPS ending in 000. For example, the code of 00000 represents the entire US; 01000 represents the Alabama state; 01001 represents Autauga--a county in Alabama. 

**Question 5.3:** Filter the `rgdp_melted` dataframe for rows that contain real GDP data for **only counties (not aggregates)**. Assign the filtered dataframe to `rgdp_county`. 

*Hint:* You can get the remainder of a division using the modulo operator % in Python. 

In [None]:
...
rgdp_county["Region"] = rgdp_county["Region"].astype(int) # we can finally do this!
rgdp_county

In [None]:
grader.check("q5_3")

**Question 5.4** Now filter the `rgdp_county` dataframe for rows that contain **all industry total** real GDP data for only counties (not aggregates). Assign the filtered dataframe to `rgdp_county_allindustry`, then drop the column `Description`. So in the end `rgdp_county_allindustry` dataframe should have six columns: `GeoFIPS`, `GeoName`, `Region`, `LineCode`, `year`, and `value`. 

In [None]:
rgdp_county_allindustry = ...
rgdp_county_allindustry

In [None]:
grader.check("q5_4")

Now we have our dataframe consisting of county-level real GDP data of all industries total. 

---
## Part 6: Groupby

Groupby's are useful for aggregating data across certain categories. When we use a groupby, we essentially split the Pandas dataframe into smaller subframes (one subframe for each group) and perform aggregation functions on each subframe, outputting one dataframe with the result of the aggregation function on all subframes. Pandas offers several built-in aggregation functions, but we can also choose to define our own if we wish. 

**Question 6.1:** Find the annual average GDP for all industries across all US counties. Assign the result to `rgdp_county_allindustry_mean`. This dataframe should only have two columns: `year` and `value`. 

*Hint:* `pandas.Dataframe.groupby` may be helpful. 

In [None]:
rgdp_county_allindustry_mean = ...
rgdp_county_allindustry_mean.head()

In [None]:
grader.check("q6_1")

In [None]:
plt.figure(figsize=(8, 6))
plt.plot(rgdp_county_allindustry_mean["year"], rgdp_county_allindustry_mean["value"])
plt.xticks(np.arange(2001, 2022, 3))
plt.xlabel("Year")
plt.ylabel("Thousands of chained 2012 dollars")
plt.title("Mean Real GDP for All Industries across all U.S. Counties (2001-2021)");

**Question 6.2:** Repeat the same process as above, but for median GDP instead.

In [None]:
rgdp_county_allindustry_median = ...
rgdp_county_allindustry_median.head()

In [None]:
grader.check("q6_2")

In [None]:
plt.figure(figsize=(8, 6))
plt.plot(rgdp_county_allindustry_median["year"], rgdp_county_allindustry_median["value"])
plt.xticks(np.arange(2001, 2022, 3))
plt.xlabel("Year")
plt.ylabel("Thousands of chained 2012 dollars")
plt.title("Median Real GDP for All Industries in U.S. Counties (2001-2021)");

**Question 6.3:** Compare and contrast the annual mean and median real GDP for all US counties. What do they have in common? What differences do they have? Why do you think this is the case? 

---
## Part 7: Changes and percent changes

Analyzing raw changes and percent changes of economic data is pertinent for many economic research and studies. Pandas provides convenient methods for us to obtain raw changes and percent changes between the different rows in a dataframe easily. 

In this part, we will use [`pd.DataFrame.diff`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html) and [`pd.DataFrame.pct_change`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pct_change.html) methods to see how median county-level real GDP has changed in each region in U.S. during the past 20 years. We use the same regions as defined by the Bureau of Economic Analysis, as shown below. 

<img src="assets/BEA_regions_iowa_state.jpg" width="600">
</br>
<center>U.S. Bureau of Economic Analysis Regions Reference Map</center>
<center>Source: <a href="https://www.icip.iastate.edu/maps/refmaps/bea">Iowa State University</a></center>

In [None]:
bea_regions = {1: "New England", 
               2: "Mideast", 
               3: "Great Lakes", 
               4: "Plains", 
               5: "Southeast", 
               6: "Southwest", 
               7: "Rocky Mountains", 
               8: "Far West"}

**Question 7.1:** Compute the percent changes across the years of median real GDP in the Far West region (coded as 8 in the dataset). The result should have two columns `year` and `value` (containing the percent change), and be stored in `rgdp_far_west_pct_chg`. For example, in `rgdp_far_west_pct_chg`, a row with year 2002 should have a value on the same row that corresponds to the percent change in real GDP from 2001 to 2002. `rgdp_far_west_pct_chg` should not have any rows with NaN values. 

*Hint:* Consider using [`pd.DataFrame.pct_change`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pct_change.html).

In [None]:
rgdp_far_west = rgdp_county_allindustry[...] # select relevant rows
rgdp_far_west_groupby = ... # aggregate data
...
rgdp_far_west_pct_chg["value"] = ... # find the percent changes
rgdp_far_west_pct_chg = ... # drop NaN values
rgdp_far_west_pct_chg

In [None]:
grader.check("q7_1")

Now, the numbers in the `value` column represent the percentage chances. However, their units aren't in percents (i.e. 0.01 represents 1%, not 0.01%). If we would like to convert their units to percents, we would need to multiply by 100.

Now we will make a plot for all regions. 

<!-- BEGIN QUESTION -->

**Question 7.2:** Now, we want to find the percent changes of median real GDP in each region with `pd.DataFrame.pct_change`. So, write the code that computes the percent changes of median real GDP in each region. 

*Hint:* start by copying your code from question 7.1.

In [None]:
plt.figure(figsize=(12, 9))
for region in np.sort(rgdp_county_allindustry["Region"].unique()):
    ... # Use as many lines as you like
    rgdp_region_pct_chg = ...
    plt.plot(rgdp_region_pct_chg["year"], 
             rgdp_region_pct_chg["value"] * 100, # as percentages
             label=bea_regions[region]
            )
    
plt.xticks(np.arange(2001, 2022, 3))
plt.xlabel("Year")
plt.ylabel("Percent Change")
plt.title("Percentage Changes of Median Real GDP for All Industries across Counties in Different Regions")
plt.legend(title="Region", loc=(1.03, 0.58));

<!-- END QUESTION -->

---
## Part 8: Merge

In this section, we will combine our real GDP dataframe with another dataframe that contains the geographical data of each county, so that we can make some beautiful and informative visualizations in the next section. 

First, we will import the new dataset. 

In [None]:
county_geo = pd.read_csv("data/uscounties_geo.csv")
county_geo

**Question 8.1:** Select only `county_fips`, `lat`, `lng`, `population` columns in the `county_geo` dataframe. 

In [None]:
county_geo = ...
county_geo

In [None]:
grader.check("q8_1")

Learn more about GeoFIPS [here](https://en.wikipedia.org/wiki/FIPS_county_code). Note that the `county_fips` column in the new dataset represents the same information as the `GeoFIPS` column in our real GDP dataframe. So, these columns can serve as [foreign keys](https://www.cockroachlabs.com/blog/what-is-a-foreign-key/#:~:text=Foreign%20keys%20link%20data%20in,cross%2Dreferencing%20the%20two%20tables) for each other.

**Question 8.2:** Merge `rgdp_county_allindustry` with `county_geo` on GeoFIPS. No need to drop the duplicate GeoFIPS column. 

In [None]:
rgdp_county_allindustry_geo = ...
rgdp_county_allindustry_geo

In [None]:
grader.check("q8_2")

---
## Part 9: Visualize the regional GDP!

Now we have cleaned up our dataset and computed the percent change of GDP for each county. It's time to use this data to show how economic performance vary across different regions! 

In [None]:
rgdp_county_allindustry_geo_chg = rgdp_county_allindustry_geo.copy()
rgdp_county_allindustry_geo_chg["value"] = rgdp_county_allindustry_geo[["value"]].pct_change()
rgdp_county_allindustry_geo_chg = rgdp_county_allindustry_geo_chg[rgdp_county_allindustry_geo_chg["year"] != 2001]
rgdp_county_allindustry_geo_chg

To eliminate some outliers--some have extreme values for changes between years--we will only work with the data with GDP changes from 10th percentile to 90th percentile. The following function uses a library called [GeoPandas](https://geopandas.org/en/stable/index.html) to plot the changes for a given year and a given industry over a map of the US. 

In [None]:
def plot_counties(data, year, industry="All Industries"):
    MIN = np.nanpercentile(data["value"], 10)
    MAX = np.nanpercentile(data["value"], 90)

    rgdp_county_year = data[data["year"] == year]
    rgdp_county_year = rgdp_county_year[(rgdp_county_year["value"] < MAX) & (rgdp_county_year["value"] > MIN)]

    geometry = gpd.points_from_xy(rgdp_county_year["lng"], rgdp_county_year["lat"])
    
    gdf = gpd.GeoDataFrame(rgdp_county_year, geometry=geometry)

    fig, ax = plt.subplots(figsize=(16, 9))
    world = gpd.read_file('assets/110m_cultural/ne_110m_admin_0_countries_lakes.shp')
    world.plot(ax=ax, color='lightgrey')  # The above 2 lines are optional, they add a world map as the background
    
    plot = gdf.plot(ax=ax, marker='o', markersize=20, alpha=0.8, column='value', 
                    legend=True, cmap="coolwarm", 
                    legend_kwds={"label": "Percent Change", "shrink": 0.5, "orientation": "horizontal", "pad": 0.1})
    
    plt.title(f"Percentage Changes of Real GDP for {industry} in U.S. Counties ({year})", size=20)
    plt.xlabel("Longitude", size=16)
    plt.ylabel("Latitude", size=16)
    plt.xlim([-170, -50])
    plt.ylim([10, 70])
    plt.tight_layout()
    plt.show()

In [None]:
plot_counties(rgdp_county_allindustry_geo_chg, 2002)

In [None]:
plot_counties(rgdp_county_allindustry_geo_chg, 2008)

In [None]:
plot_counties(rgdp_county_allindustry_geo_chg, 2020)

<!-- BEGIN QUESTION -->

**Question 9.1:** Comment on the results above. Are the economic performance similar or different in each region? Do you find it surprising?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 9.2:** Let's look at the plot for 2008 that shows the regional economic performance in the midst of the Great Recession. The causes of the Great Recession include a combination of vulnerabilities that developed in the financial system, along with a series of triggering events that began with the bursting of the United States housing bubble in 2005–2012. As a sidenote, many empirical works suggest that housing crises usually accompany high levels of mortgage delinquencies (people default on their mortgage). 

Look at the [county-level change in mortgage delinquency figure](https://www.federalreserve.gov/images/bernanke20080505fig3.jpg) that was used in Ben Bernanke's (the chairman of the Federal Reserve at that time) speech [_Mortgage Delinquencies and Foreclosures_](https://www.federalreserve.gov/newsevents/speech/bernanke20080505a.htm) at the Columbia Business School's 32nd Annual Dinner in May 2008. What is the association between this mortgage delinquency graph and the regional real GDP graph we have above? How can this result potentially inform us about the causes of the Great Recession?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

We can also make some widgets! You will get more practice with this in lab 6.

In [None]:
def plot_counties_widgets(industry, year):
    rgdp_county_industry = rgdp_county[rgdp_county["Description"] == industry].drop("Description", axis=1)
    rgdp_county_industry_geo = rgdp_county_industry.merge(county_geo, left_on="GeoFIPS", right_on="county_fips")
    rgdp_county_industry_geo_chg = rgdp_county_industry_geo[rgdp_county_industry_geo["year"] != 2001].copy()
    
    pct_chg_vals = rgdp_county_industry_geo[["value"]].pct_change()
    rgdp_county_industry_geo_chg["value"] = pct_chg_vals.drop(index=np.arange(0, len(pct_chg_vals), 20), axis=0)
    
    plot_counties(rgdp_county_industry_geo_chg, year, re.sub("  ", "", industry))

In [None]:
i = widgets.Dropdown(options=rgdp_county["Description"].unique(),
                     value="All industry total", 
                     description="Industry", 
                     layout={'width': 'max-content'})

t = widgets.IntSlider(min=2002, max=2020, step=1, 
                      description="Year", 
                      layout={'width': '300px'})

interact(plot_counties_widgets, industry=i, year=t);

**Congratulations!** You're done with Econ 148 Project 1!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)