# Week 7 — Final Project Website Prep (40–60 minutes)
 Works for any FRED monthly/quarterly dataset

## Today’s goals
1) Produce your **3 required visuals**
2) Produce model metrics + coefficient table
3) Draft website text using a template


## Quick Python Basics Recap (for this week)

This notebook assumes **you are still new to Python**. Below is the minimum syntax you need today.

### DataFrames (pandas)
```python
import pandas as pd
df = pd.read_csv("your_file.csv")
df.head()            # shows first rows
df.columns           # column names
df["col_name"]       # select one column (a Series)
df[["a","b"]]        # select multiple columns (a DataFrame)
df.isna().sum()      # missing values per column
```

### Making plots (matplotlib)
```python
import matplotlib.pyplot as plt

plt.figure()
plt.plot(df["x"], df["y"])   # line plot
plt.title("Title")
plt.xlabel("x label")
plt.ylabel("y label")
plt.tight_layout()
plt.savefig("figures/example.png", dpi=150)
plt.show()
```
Key idea: **You build a plot step-by-step**, then save it with `savefig`.

### Writing comments
- Use `#` for a comment on one line.
- In this course, you must explain what your code does and what you learned from each plot.

### Strings and f-strings (for readable printing)
```python
value = 3.14
print(f"The value is {value}")
```


## Final Project Artifacts (you must create these)

By the end of this notebook, you must have:
1. At least **2 saved figures** in the `figures/` folder (PNG files).
2. A short **Insights** write-up answering: What changed? What matters? What would you model next?

If you cannot find `figures/`, create it using:
```python
import os
os.makedirs("figures", exist_ok=True)
```


## Syntax Toolbox for Week 7 (Website + Reporting)

This week is about turning your work into a clean final deliverable.

### 1) File paths and folders
```python
import os
os.makedirs("figures", exist_ok=True)
```

### 2) Saving figures
```python
plt.savefig("figures/my_figure.png", dpi=150, bbox_inches="tight")
```

### 3) Writing short captions (what your figure shows)
A good caption answers:
- What is plotted (variables + units)
- Time range
- One key takeaway

Example caption:
"Unemployment rate (percent) from 2000–2024. Rates spike during recessions, then gradually decline."

### 4) Checklist for a project-ready folder
You should end with:
- `figures/` containing PNGs
- A short written insights section for each figure
- A list of data sources and any assumptions


> **Expanded version** (generated 2026-01-05). Added extra coding + commenting + writing tasks.


## 1) Final question (5 min)
> Can X help predict/explain Y over time, and what evidence supports that?

**Instructor example:** Can interest rates and unemployment changes help predict inflation changes?

## 2) Rebuild dataset + features (15 min)

###  Reproducibility Checklist (Required)
Add a short checklist and complete it:
- [ ] dataset source link recorded
- [ ] code runs top-to-bottom without errors
- [ ] random seed set where needed
- [ ] plots have titles + axis labels
- [ ] figures saved to files (if using a website)
- [ ] one-paragraph limitations section written

Paste your filled checklist below.


###  Write Your Methods Section (Required)
In **6–10 sentences**, describe:
- data source (what it is, time range)
- cleaning steps
- feature engineering (at least 2 features)
- modeling approach + evaluation (train/test split, metric)

Write it so a classmate could reproduce your work.


In [None]:
# If needed (first time only):
# !pip -q install pandas_datareader scikit-learn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (11, 4)  # Create a plot for interpretation / reporting
plt.rcParams["axes.grid"] = True  # Create a plot for interpretation / reporting

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

from sklearn.linear_model import LinearRegression  # Fit a model (baseline or predictive)
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

from pandas_datareader import data as pdr

def fetch_fred_series(series_id: str, start="1990-01-01", end=None) -> pd.DataFrame:
    """Fetch one FRED series as a DataFrame with a datetime index."""
    if end is None:
        end = pd.Timestamp.today().strftime("%Y-%m-%d")
    s = pdr.DataReader(series_id, "fred", start, end)
    s.columns = [series_id]
    s.index = pd.to_datetime(s.index)  # Ensure date/time column is parsed correctly
    return s

def fetch_many(series_ids, start="1990-01-01"):
    dfs = [fetch_fred_series(s, start=start) for s in series_ids]
    return pd.concat(dfs, axis=1).sort_index()

def infer_freq(index: pd.DatetimeIndex) -> str:
    f = pd.infer_freq(index)
    if f is None:
        return "U"
    f = f.upper()
    if "Q" in f:
        return "Q"
    if "M" in f:
        return "M"
    return "U"

def to_period_end(df: pd.DataFrame, target: str) -> pd.DataFrame:
    # Default: last observation within each period.
    if target == "M":
        return df.resample("M").last()
    if target == "Q":
        return df.resample("Q").last()
    raise ValueError("target must be 'M' or 'Q'")

def add_common_features(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()
    for c in out.columns:
        out[f"{c}_lag1"] = out[c].shift(1)
        out[f"{c}_diff1"] = out[c].diff(1)
        out[f"{c}_pct1"] = out[c].pct_change(1) * 100
        out[f"{c}_roll3"] = out[c].rolling(3).mean()
    return out

In [None]:
# ===========================
# STUDENT CHOICE (EDIT HERE)
# ===========================
# Choose 3–6 FRED series IDs relevant to your question.
# Search on https://fred.stlouisfed.org and copy the series ID.

series_ids = [
    "UNRATE",
    "CPIAUCSL",
    "FEDFUNDS"
]

# Choose your target variable (must be one of the series_ids)
target_id = "CPIAUCSL"

start_date = "1990-01-01"

In [None]:
df_raw = fetch_many(series_ids, start=start_date)

# Infer each series' frequency
freqs = {c: infer_freq(df_raw[c].dropna().index) for c in df_raw.columns}  # Handle missing values
freqs

In [None]:
# Rule: if any series is quarterly, use quarterly for everything (safe when mixing).
use_freq = "Q" if any(v == "Q" for v in freqs.values()) else "M"
print("Using frequency:", use_freq)

df = to_period_end(df_raw, use_freq)

# Drop rows where target is missing (we can’t model without target)
df = df.dropna(subset=[target_id])  # Handle missing values

# Missing-value strategies:
df_complete = df.dropna()                 # simplest: keep only complete rows
df_ffill = df.fillna(method="ffill")      # common: forward-fill predictors

#  Choose ONE:
df_use = df_complete   # or df_ffill

df_use.head()

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

df_feat = add_common_features(df_use[series_ids]).dropna()  # Handle missing values
df_feat.head()

## 3) Choose target form (5 min)

In [None]:
# TODO: choose your target definition
y = df_feat[f"{target_id}_pct1"].rename("y")  # try level/diff if needed
y.head()

## 4) Visual 1 — Trend plot (8 min)

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

plt.figure()  # Create a plot for interpretation / reporting
plt.plot(df_feat.index, y.values)  # Create a plot for interpretation / reporting
plt.title(f"Target trend: {target_id} (chosen form)")  # Create a plot for interpretation / reporting
plt.xlabel("Date"); plt.ylabel("Target")  # Create a plot for interpretation / reporting
plt.show()  # Create a plot for interpretation / reporting

**Instructor note:** require plain-language description.

## 5) Visual 2 — Relationship scatter (8–10 min)

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

predictor_id = [s for s in series_ids if s != target_id][0]  # change this!

plt.figure()  # Create a plot for interpretation / reporting
plt.scatter(df_feat[predictor_id], y, alpha=0.5)  # Create a plot for interpretation / reporting
plt.title(f"Relationship: {predictor_id} vs target")  # Create a plot for interpretation / reporting
plt.xlabel(predictor_id); plt.ylabel("Target")  # Create a plot for interpretation / reporting
plt.show()  # Create a plot for interpretation / reporting

**Instructor note:** weak/no pattern is valid.

## 6) Visual 3 — Model results (15 min)

###  Make a Results Table (Required)
Create a small table summarizing your key results:
- model type
- features used (short list)
- train metric
- test metric

**Deliverable:** a pandas DataFrame named `results_table`.


In [None]:

# TODO (STUDENTS):
# Build results_table as a DataFrame and display it.
# results_table = pd.DataFrame([...])
# display(results_table)

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

# SOLUTION (INSTRUCTOR): Example implementation scaffold
import numpy as np
import matplotlib.pyplot as plt

num_cols = df_use.select_dtypes(include=[np.number]).columns.tolist()
print("Numeric columns:", num_cols[:10])

# Choose columns safely
col = num_cols[0]
fig, ax = plt.subplots(figsize=(10,4))  # Create a plot for interpretation / reporting
df_use[col].plot(ax=ax)  # Create a plot for interpretation / reporting
ax.set_title(f"Example plot for {col}")
ax.set_xlabel("Date")
ax.set_ylabel("Value")
fig.tight_layout()
plt.show()  # Create a plot for interpretation / reporting

### ️ Write a Clear Takeaway (Required)
Write a 3–5 sentence "headline" conclusion:
- what you found
- how strong the evidence is
- what you would do next with more time/data


In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

predictors = [s for s in series_ids if s != target_id]
cols=[]
for s in predictors:
    cols += [f"{s}_lag1", f"{s}_pct1", f"{s}_roll3"]
X = df_feat[cols].copy()

cut = int(len(df_feat) * 0.8)
X_train, X_test = X.iloc[:cut], X.iloc[cut:]
y_train, y_test = y.iloc[:cut], y.iloc[cut:]

model = LinearRegression().fit(X_train, y_train)  # Fit a model (baseline or predictive)
pred_test = model.predict(X_test)

print("Test MSE:", mean_squared_error(y_test, pred_test))
print("Test R2 :", r2_score(y_test, pred_test))

coef_tbl = pd.DataFrame({"feature": X.columns, "coef": model.coef_})
coef_tbl["abs_coef"] = coef_tbl["coef"].abs()
coef_tbl.sort_values("abs_coef", ascending=False).head(10)

**Instructor note:** require association language, 1 limitation, and references to visuals.

## 7) Draft website text (15–20 min)

### Overview
### Data (series IDs, target, frequency)
### EDA (Visual 1)
### Feature engineering
### Modeling (time split + regression)
### Results (Visual 2 & 3)
### Limitations
### Conclusion (2–4 bullets)

###  Website Writing Prompts (Required)
Add these sections to your website draft (each 3–6 sentences):
1. **Motivation** (why this question matters)
2. **Data** (what the dataset is, where it came from)
3. **Approach** (features + model)
4. **Results** (what worked / what didn’t)
5. **Limitations** (at least 2)
6. **Next steps** (at least 2)

Tip: Write for a general audience—no jargon unless you define it.


**Instructor note:** each section must reference at least one visual.

## Optional: save plots as images in Colab

### ️ Required if you are building a website
Save at least **two** plots as PNG files with clear filenames.

In Colab, you can check they saved by running `!ls`.


In [None]:

# TODO (STUDENTS):
# Save two figures as PNG files.
# Example:
# fig = ...
# fig.savefig("trend_plot.png", dpi=200, bbox_inches="tight")

In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

# SOLUTION (INSTRUCTOR): Example implementation scaffold
import numpy as np
import matplotlib.pyplot as plt

num_cols = df_use.select_dtypes(include=[np.number]).columns.tolist()
print("Numeric columns:", num_cols[:10])

# Choose columns safely
col = num_cols[0]
fig, ax = plt.subplots(figsize=(10,4))  # Create a plot for interpretation / reporting
df_use[col].plot(ax=ax)  # Create a plot for interpretation / reporting
ax.set_title(f"Example plot for {col}")
ax.set_xlabel("Date")
ax.set_ylabel("Value")
fig.tight_layout()
plt.show()  # Create a plot for interpretation / reporting

In [None]:
# plt.savefig('plot1.png', dpi=200, bbox_inches='tight')

## End-of-class checkpoint
 3 visuals + metrics + narrative draft

## Final Project Artifacts (Save these for your report)

By the end of the project, you should have:
- At least 2 polished figures that show *trends* and *relationships*
- A small table of your **top correlations** with the target
- A short written interpretation of what the plots suggest

In this section you will generate and save figures you can reuse in your final write-up.


In [None]:
# Explanation:
# - Goal: Run and understand the steps below
# - What you should check after running:
#   1) The output has the expected shape/columns
#   2) The values look reasonable (no obvious NaNs or impossible values)
#   3) Any figures have clear titles/labels and are saved to disk when required
#
# How to read this code:
# - Imports / configuration come first
# - Then we compute intermediate variables (feature engineering)
# - Then we summarize / visualize
# - Finally, we write a short interpretation in Markdown below the figure/table

# SOLUTION (INSTRUCTOR):
# Save a trend figure and a relationship figure using available numeric columns.

import os
import numpy as np
import matplotlib.pyplot as plt

os.makedirs("figures", exist_ok=True)  # Create output folder if it does not exist

num_cols = df_use.select_dtypes(include=[np.number]).columns.tolist()
if len(num_cols) == 0:
    raise ValueError("No numeric columns found in df_use. Check data loading/cleaning steps.")

# Trend figure: first numeric column
col_trend = num_cols[0]
fig, ax = plt.subplots(figsize=(10,4))  # Create a plot for interpretation / reporting
df_use[col_trend].plot(ax=ax)  # Create a plot for interpretation / reporting
ax.set_title(f"Trend of {col_trend} over time")
ax.set_xlabel("Date")
ax.set_ylabel("Value")
fig.tight_layout()
fig.savefig(f"figures/trend_{col_trend}.png", dpi=200)  # Save an artifact you can reuse in the final project

# Relationship figure: correlation heatmap (top 8 numeric cols)
use_cols = num_cols[:8]
corr = df_use[use_cols].corr()  # Compute correlations to look for relationships

fig, ax = plt.subplots(figsize=(7,6))  # Create a plot for interpretation / reporting
im = ax.imshow(corr.values)
ax.set_xticks(range(len(use_cols)))
ax.set_yticks(range(len(use_cols)))
ax.set_xticklabels(use_cols, rotation=45, ha="right")
ax.set_yticklabels(use_cols)
ax.set_title("Correlation heatmap (subset of variables)")
fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
fig.tight_layout()
fig.savefig("figures/corr_heatmap.png", dpi=200)  # Save an artifact you can reuse in the final project

corr.round(3)

### Written Insights (Required)

Write 5–8 bullet points answering:
1. Which variable trends most strongly over time? What might explain it?
2. Which pair of variables looks most related? Is that relationship stable over time?
3. What missingness or outliers could bias modeling?
4. What is one feature engineering idea you want to try next week?


#### Sample instructor bullets (example)

- The first numeric series shows a clear upward trend; this suggests non-stationarity and motivates using percent change or differencing.
- Correlation heatmap indicates several variables move together, suggesting multicollinearity; regularization may help.
- Missingness is concentrated in a small set of columns; imputation strategy should be justified and tested.
- Outliers appear around major economic events; consider robust scaling or winsorization and document the rationale.
- Next week: create lag features and rolling aggregates to capture delayed effects.
