# Problem set: `pandas` practice

This problem set gives you an opportunity to practice some of what you learned in the introductory `pandas` lectures.

In [None]:
import pandas as pd

%matplotlib inline

**Data**

We load the unemployment data that we were using in class.

In [None]:
## Load up the data -- this will take a couple seconds
url = "https://datascience.quantecon.org/assets/data/state_unemployment.csv"
unemp_raw = pd.read_csv(url, parse_dates=["Date"])

unemp_all = (
    unemp_raw
    .reset_index()
    .pivot_table(index="Date", columns="state", values="UnemploymentRate")
)

laborforce_all = (
    unemp_raw
    .reset_index()
    .pivot_table(index="Date", columns="state", values="LaborForce")
)

states = [
    "Arizona", "California", "Florida", "Illinois",
    "Michigan", "New York", "Texas"
]

unemp = unemp_all[states]
print(unemp.head())

laborforce = laborforce_all[states]
print(laborforce.head())

### Problem 1:

Imagine that we want to determine whether unemployment was high (> 6.5),
medium (4.5 < x <= 6.5), or low (<= 4.5) for each state and each month.

1. Write a Python function that takes a single number as an input and outputs a single string noting if that number is high, medium, or low.  
2. Pass your function to the `map` method and save the result in a new DataFrame called `unemp_bins`.
3. Now use another transform on `unemp_bins` to count how many times each state had each of the three classifications
  - Hint 1: Will this value counting function be a Series or scalar transform?
  - Hint 2: Google or ask an LLM about "counting unique values in pandas"
4. Build a horizontal bar chart that shows the number of occurences of each unemployment level with one bar per state and classification -- This should give you 21 possible bars. 
5. Repeat step 3, but count how many states had each classification in each month. Which month had the most states with high unemployment? What about medium and low?

### Problem 2:

The "Great Recession" is typically defined as lasting from December 2007 to June 2009. This problem focuses on how different states were impacted during this specific window.

1. Create a new DataFrame named `recession_unemp` that contains only the rows from `unemp` between December 2007 and June 2009.
2. For each state, calculate the total *increase* in the unemployment rate by subtracting the rate in December 2007 from the rate in June 2009.
3. Identify the state that experienced the largest increase in its unemployment rate during this period. How does this compare to the state with the smallest increase?

### Problem 3: Calculating the Number of Unemployed (Series Operations)

We have created a second dataframe called `laborforce` that includes the total `LaborForce` for each state.

1. Try multiplying `laborforce` and `unemp`. What do you think happens? How would you interpret these numbers?
2. Calculate the "National Total" of unemployed people for each month by summing across all states. Plot this national total over time.
3. What was the national labor force at the beginning of the recession? What was it at the end of the recession? Why do you think this is?

### Problem 4: Relative Performance (Vectorized Logic and Counts)

In this problem, you will compare individual state performance against a "national average" to see which states consistently stayed below the mean.

1. Calculate the **unweighted mean** unemployment rate across our sample states for every month in your `unemp` DataFrame.
2. Create a boolean DataFrame that is `True` if a state's unemployment rate was *lower* than the unweighted mean for that month and `False` otherwise.
3. For the year 2015, determine which state spent the most months "outperforming" the national average (i.e., having a rate lower than the mean).