# Week 1 — Python + Plots + Getting Ready for Your Economic Project

This notebook starts from **zero Python**. You will learn the minimum syntax you need, practice making plots, and then switch to your real dataset for project setup + EDA.


## How this notebook works

Each time we introduce a new idea, you will see:

1. **What it is** (plain English)
2. **The syntax** (what to type)
3. **A tiny example** (run it)
4. **Your turn** (a short TODO)

## Success Tips

- Every TODO code cell: add **comments** explaining what the code does.
- Every plot: write **2–4 sentences** answering: *What do you notice? What might cause it?*


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (10, 4)

# If you see errors about missing packages in Colab:
# !pip -q install pandas numpy matplotlib

## 1) Variables (names for values)

### Syntax
```python
x = 5
name = "GDP"
```
- `=` means **assign** (store a value).
- `type(x)` tells you what kind of value it is.

### Example

In [None]:
x = 5
name = "GDP"
print(x, type(x))
print(name, type(name))

# Notice:
# - int is an integer (whole number)
# - str is a string (text)

### Your turn (Required)
Create a variable called `state` with a string value (example: `"MI"`). Create a variable called `start_year` with an integer value.

Write a `print(...)` line that prints both.


In [None]:
# TODO: create variables
# state = ...
# start_year = ...

# TODO: print them
# print(...)

## 2) Strings and f-strings (easy text formatting)

### Syntax
```python
msg = f"State: {state}, Start: {start_year}"
```
- Put an `f` before the quotes.
- Anything inside `{}` is replaced by a variable value.

### Example

In [None]:
state_demo = "CA"
start_demo = 1995
msg = f"State: {state_demo}, Start year: {start_demo}"
print(msg)

### Your turn (Required)
Make an f-string that uses your `state` and `start_year` variables and prints a sentence.


In [None]:
# TODO: write an f-string sentence using state and start_year
# sentence = f"..."
# print(sentence)

## 3) Lists (ordered collections)

### Syntax
```python
items = ["GDP", "CPI", "UNRATE"]
items[0]  # first item (index 0)
```
- Lists use square brackets `[]`.
- Indexing starts at **0**.

### Example

In [None]:
series = ["GDP", "CPIAUCSL", "UNRATE"]
print(series)
print("First series id:", series[0])
print("Last series id:", series[-1])  # -1 means last

### Your turn (Required)
Create a list called `my_series` with 3 items (strings). Print the second item.


In [None]:
# TODO: create my_series list with 3 strings
# my_series = [...]

# TODO: print the second item (index 1)
# print(...)

## 4) Dictionaries (named fields: key → value)

### Syntax
```python
info = {"state": "MI", "target": "UNRATE"}
info["target"]
```
- Dictionaries use curly braces `{}`.
- You look up values using square brackets with a key.

### Example

In [None]:
info = {"state": "MI", "target": "UNRATE", "start": 2000}
print(info)
print("Target is:", info["target"])

### Your turn (Required)
Make a dictionary called `project` with keys: `state`, `target`, `start_year`.
Print the value for `target`.


In [None]:
# TODO: create the dictionary
# project = {...}

# TODO: print project["target"]
# print(...)

## 5) If statements (make decisions)

### Syntax
```python
if x > 0:
    print("positive")
else:
    print("not positive")
```
- The colon `:` starts a block.
- **Indentation** (spaces) shows which lines belong inside.

### Example

In [None]:
rate = 4.2
if rate > 5:
    print("High unemployment")
else:
    print("Not high unemployment")

## 6) For loops (repeat over items)

### Syntax
```python
for item in my_list:
    print(item)
```
- Runs once per item.

### Example

In [None]:
for s in ["GDP", "CPI", "UNRATE"]:
    print("Series id:", s)

### Your turn (Required)
Loop over your `my_series` list and print each item with a label.


In [None]:
# TODO: loop over my_series
# for ... in ...:
#     print(...)

## 7) Functions (reusable code blocks)

Why functions matter: in the final project, you will repeat the same steps (clean, plot, summarize). Functions prevent copy/paste.

### Syntax
```python
def add_one(x):
    return x + 1
```
- `def` starts a function.
- Indentation matters.
- `return` gives back an output.

### Example

In [None]:
def percent_change(new, old):
    """Return percent change from old to new."""
    return 100 * (new - old) / old

print(percent_change(110, 100))  # expect 10.0

### Your turn (Required)
Write a function `safe_ratio(a, b)` that returns `a/b` **unless** `b` is 0. If `b` is 0, return `np.nan`.

Test it with two examples.


In [None]:
# TODO: write safe_ratio(a, b)
# def safe_ratio(a, b):
#     ...

# TODO: test it (two prints)
# print(...)

## 8) DataFrames (tables) with pandas

A **DataFrame** is like an Excel table.

### Syntax you will use constantly
```python
df.head()
df["column"]
df[["col1", "col2"]]
```

### Example (make a tiny fake dataset)

In [None]:
# Create a small table with years and a made-up value
sample = pd.DataFrame({
    "year": [2018, 2019, 2020, 2021, 2022],
    "value": [100, 105, 98, 110, 115],
    "category": ["A", "A", "A", "A", "A"],
})

print(sample.head())
print("Columns:", list(sample.columns))
print("Just the value column:")
print(sample["value"])

### Your turn (Required)
1) Print the first 3 rows using `head(3)`.
2) Print only the `year` and `value` columns.


In [None]:
# TODO: sample.head(3)
# TODO: sample[["year", "value"]]

## 9) Plotting with matplotlib (you will reuse these in your final report)

We will practice several common plot types. After each plot, you will write a short interpretation.

### General syntax pattern
```python
plt.figure()
plt.plot(x, y)
plt.title("...")
plt.xlabel("...")
plt.ylabel("...")
plt.show()
```

### A) Line plot (trend over time)
Line plots are great for time series.

In [None]:
# Line plot: year vs value
plt.figure()
plt.plot(sample["year"], sample["value"], marker="o")
plt.title("Sample trend over time")
plt.xlabel("Year")
plt.ylabel("Value")
plt.grid(True)
plt.show()

### B) Bar plot (compare categories)
Bar plots compare a small number of categories.

### Syntax
```python
plt.bar(categories, heights)
```

In [None]:
# Make a slightly richer example with 2 categories
sample2 = pd.DataFrame({
    "category": ["A", "B", "A", "B", "A", "B"],
    "value": [10, 12, 9, 15, 11, 14]
})
means = sample2.groupby("category")["value"].mean()

plt.figure()
plt.bar(means.index, means.values)
plt.title("Average value by category")
plt.xlabel("Category")
plt.ylabel("Average value")
plt.show()

### C) Scatter plot (relationship between two variables)
Scatter plots are used to see whether two numeric variables move together.

### Syntax
```python
plt.scatter(x, y)
```

In [None]:
# Create x and y with some relationship
np.random.seed(0)
x = np.arange(30)
y = 2*x + np.random.normal(0, 10, size=len(x))

plt.figure()
plt.scatter(x, y)
plt.title("Scatter example: x vs y")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

### D) Histogram (distribution of one variable)
Histograms show the shape of values: where most observations fall.

### Syntax
```python
plt.hist(values, bins=...)
```

In [None]:
values = np.random.normal(loc=0, scale=1, size=500)

plt.figure()
plt.hist(values, bins=25)
plt.title("Histogram example")
plt.xlabel("Value")
plt.ylabel("Count")
plt.show()

### Your turn: Make your own plots (Required)
Using the toy data above, create:
1) A line plot of `sample` (year vs value) but change the title to something meaningful.
2) A histogram of `sample2["value"]` with a different number of bins.

Add comments explaining each line.


In [None]:
# TODO: create 2 plots as described above
# (1) line plot
# (2) histogram


## 10) Saving figures (so you can use them in your final report)

### Syntax
```python
import os
os.makedirs("figures", exist_ok=True)
plt.savefig("figures/my_plot.png", dpi=150, bbox_inches="tight")
```

### Example

In [None]:
import os
os.makedirs("figures", exist_ok=True)

plt.figure()
plt.plot(sample["year"], sample["value"], marker="o")
plt.title("Sample trend (saved)")
plt.xlabel("Year")
plt.ylabel("Value")
plt.grid(True)
plt.savefig("figures/week1_example_trend.png", dpi=150, bbox_inches="tight")
plt.show()

print("Saved: figures/week1_example_trend.png")

# Week 1: Project Setup + Exploratory Data Analysis (Condensed)

You just learned the absolute basics of Python and plotting.  
Now we will apply those skills to your **final project dataset**.

## What you will produce today (final-project artifacts)
- A clean loaded dataset (CSV or similar) with a brief data dictionary
- At least 3 saved figures in `figures/`
- A short written insights section you can paste into your final report


# Week 1 — Universal FRED Starter (Monthly or Quarterly)

**Goal:** Choose your own FRED series, align them by time, and create a clean dataset ready for analysis.

 Works for **monthly or quarterly** data (including mixes).

> **Expanded version** (generated 2026-01-05). Added extra coding + commenting + writing tasks.


## Setup

In [None]:
# If needed (first time only):
# !pip -q install pandas_datareader scikit-learn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (11, 4)  # Create a plot for interpretation / reporting
plt.rcParams["axes.grid"] = True  # Create a plot for interpretation / reporting

## FRED helpers (do not edit)

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

from pandas_datareader import data as pdr

def fetch_fred_series(series_id: str, start="1990-01-01", end=None) -> pd.DataFrame:
    """Fetch one FRED series as a DataFrame with a datetime index."""
    if end is None:
        end = pd.Timestamp.today().strftime("%Y-%m-%d")
    s = pdr.DataReader(series_id, "fred", start, end)
    s.columns = [series_id]
    s.index = pd.to_datetime(s.index)  # Ensure date/time column is parsed correctly
    return s

def fetch_many(series_ids, start="1990-01-01"):
    dfs = [fetch_fred_series(s, start=start) for s in series_ids]
    return pd.concat(dfs, axis=1).sort_index()

def infer_freq(index: pd.DatetimeIndex) -> str:
    f = pd.infer_freq(index)
    if f is None:
        return "U"
    f = f.upper()
    if "Q" in f:
        return "Q"
    if "M" in f:
        return "M"
    return "U"

def to_period_end(df: pd.DataFrame, target: str) -> pd.DataFrame:
    # Default: use last observation within each period.
    if target == "M":
        return df.resample("M").last()
    if target == "Q":
        return df.resample("Q").last()
    raise ValueError("target must be 'M' or 'Q'")

def add_common_features(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    for c in out.columns:
        out[f"{c}_lag1"] = out[c].shift(1)
        out[f"{c}_diff1"] = out[c].diff(1)
        out[f"{c}_pct1"] = out[c].pct_change(1) * 100
        out[f"{c}_roll3"] = out[c].rolling(3).mean()
    return out

## Choose your dataset (edit series IDs + target)

In [None]:
# STUDENT CHOICE (EDIT HERE)
# Choose 3–6 FRED series IDs relevant to your question.
# Search on https://fred.stlouisfed.org and copy the series ID.

series_ids = [
    "UNRATE",
    "CPIAUCSL",
    "FEDFUNDS"
]

# Choose your target variable (must be one of the series_ids)
target_id = "CPIAUCSL"

start_date = "1990-01-01"

## Fetch + infer frequency

In [None]:
df_raw = fetch_many(series_ids, start=start_date)

# Infer each column's native frequency
freqs = {c: infer_freq(df_raw[c].dropna().index) for c in df_raw.columns}  # Handle missing values
freqs

## Standardize to common frequency

In [None]:
# Rule: if any series is quarterly, use quarterly for everything (safe when mixing).
use_freq = "Q" if any(v == "Q" for v in freqs.values()) else "M"
print("Using frequency:", use_freq)

df = to_period_end(df_raw, use_freq)

# drop rows where target is missing
df = df.dropna(subset=[target_id])  # Handle missing values

# Missing-value strategies:
df_complete = df.dropna()                 # simplest
df_ffill = df.fillna(method="ffill")      # common for time series

df_use = df_complete   # or df_ffill
df_use.head()

## EDA: missingness + summary stats

In [None]:
def missing_report(d):
    return d.isna().sum().sort_values(ascending=False)  # Handle missing values

missing_report(df_use), df_use.describe().T.head()

# TODO: Add 2–3 bullets interpreting missingness and ranges (write in markdown below)

## Plot trends over time

###  Extra Visualization Practice (Required)
Create **two additional plots** that tell a *different story* than the plain trend plot.

Pick **two** from the list (or propose your own):
1. Percent change (growth rate) over time  
2. Rolling mean (e.g., 6- or 12-month smoothing)  
3. Compare two series after standardizing (z-score)  
4. Highlight recession periods (if you know how; optional)  

**Deliverable:** each plot must have:
- a title that explains what we’re seeing
- axis labels
- 2–3 bullet points interpreting the plot


In [None]:

# TODO (STUDENTS):
# Plot option 1: Percent change (growth rate)
# - Choose ONE numeric column and create a new column with percent change.
# - Plot it over time.
# - Add 2-3 bullet points in a markdown cell below explaining what changed when.

# Example:
# df_final["COL_pct_change"] = df_final["COL"].pct_change()*100

In [None]:

# TODO (STUDENTS):
# Plot option 2: Rolling mean
# - Choose ONE numeric column.
# - Compute a rolling mean (window=6 or 12 depending on frequency).
# - Plot raw series + rolling mean on the same figure.
# - Add comments explaining each line of code.

###  Interpretation Check (Required)
In 4–6 sentences, answer:
- What is the overall trend?
- Are there obvious regime changes (periods where the behavior changes)?
- Which variable seems most promising as a predictor and why?


Write at least 3 sentences. Include: (1) what you observe, (2) why it might be happening, (3) how it affects your modeling choices.

In [None]:
df_use[series_ids].plot(subplots=True, layout=(len(series_ids), 1), sharex=True)  # Create a plot for interpretation / reporting
plt.tight_layout()  # Create a plot for interpretation / reporting
plt.show()  # Create a plot for interpretation / reporting

# TODO: Write 2 observations about trends/spikes in markdown below

## Relationship check (levels)

###  Challenge: Correlation vs Causation (Required)
Do a quick relationship check in **two ways**:
1) Correlation in levels  
2) Correlation after percent-change (or first-difference)

**Deliverable:**
- Print both correlations.
- Write 3–5 sentences: Why might these differ? Which is more appropriate for economic time series?


In [None]:

# TODO (STUDENTS):
# Compute correlation between target and ONE predictor in:
# (1) levels
# (2) changes (pct_change or diff)
#
# Print both correlations clearly.

# corr_levels = ...
# corr_changes = ...

# print("Levels correlation:", corr_levels)
# print("Changes correlation:", corr_changes)

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

predictor_id = [s for s in series_ids if s != target_id][0]

plt.figure()  # Create a plot for interpretation / reporting
plt.scatter(df_use[predictor_id], df_use[target_id], alpha=0.5)  # Create a plot for interpretation / reporting
plt.title(f"{predictor_id} vs {target_id} (levels)")  # Create a plot for interpretation / reporting
plt.xlabel(predictor_id)  # Create a plot for interpretation / reporting
plt.ylabel(target_id)  # Create a plot for interpretation / reporting
plt.show()  # Create a plot for interpretation / reporting

# TODO: Try at least 2 predictors by changing predictor_id

## Draft 2–3 project questions

Template: **Can X help predict Y (and why might that make sense)?**

### ️ Writing Task: Make Your Questions Testable (Required)
Rewrite your 2–3 questions so that each one:
- names a specific target variable
- names at least one predictor variable
- says what metric you’ll use (correlation, regression coefficient, forecast error, etc.)

**Example format:**  
> "Does X help predict Y (measured by ___) during ___ years?"


**Your questions:**

1.
2.
3.

## Checkpoint

 You have `df_use` aligned at M/Q frequency + EDA + plots + questions.

## Final Project Artifacts (Save these for your report)

By the end of the project, you should have:
- At least 2 polished figures that show *trends* and *relationships*
- A small table of your **top correlations** with the target
- A short written interpretation of what the plots suggest

In this section you will generate and save figures you can reuse in your final write-up.


In [None]:
# TODO (STUDENTS):
# 1) Create a folder called 'figures' (if it doesn't exist).
# 2) Choose:
#    - one "trend" figure (a time series plot)
#    - one "relationship" figure (scatter with regression line OR correlation heatmap)
# 3) Save both figures as PNG files into the figures/ folder.
#
# Requirements:
# - Use descriptive filenames (e.g., 'trend_unemployment.png', 'corr_heatmap.png')
# - Add titles, axis labels, and a short comment explaining what the figure shows

import os
import matplotlib.pyplot as plt

os.makedirs("figures", exist_ok=True)  # Create output folder if it does not exist

# Example placeholder (replace with your chosen variables/figures):
# fig, ax = plt.subplots(figsize=(10,4))
# df_use["YOUR_COLUMN"].plot(ax=ax)
# ax.set_title("Trend of YOUR_COLUMN over time")
# ax.set_xlabel("Date")
# ax.set_ylabel("Value")
# fig.tight_layout()
# fig.savefig("figures/trend_YOUR_COLUMN.png", dpi=200)

### Written Insights (Required)

Write 5–8 bullet points answering:
1. Which variable trends most strongly over time? What might explain it?
2. Which pair of variables looks most related? Is that relationship stable over time?
3. What missingness or outliers could bias modeling?
4. What is one feature engineering idea you want to try next week?
