# DX702 — Coding Homework Week 3

**Student:** _<your name>_  
**Date:** _<MM/DD/YY>_

This notebook is organized to help you complete Week 3 tasks:
1) Load a time series dataset that contains an *event date* and test for a **discontinuity** using linear regression (Regression Discontinuity Design).
2) Produce visuals (time plot, moving averages, and fitted lines).
3) Implement **normalization** (min–max) and **standardization** (z-scores) helpers.
4) Answer short-response prompts in the provided *Answer* cells.

> Replace any placeholder paths and event date below with what your instructor provided (e.g., from the course GitHub repo).

In [None]:
# === Setup ===
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Helper: nice display options
pd.set_option('display.max_colwidth', 120)
pd.set_option('display.precision', 4)

print('✅ Libraries imported.')

## 1) Load Data

Update the `DATA_PATH`, `DATE_COL`, and `TARGET_COL` to match the dataset from your Week 3 repo.  
If your data is daily, ensure the date column is parsed as datetime and sorted.

In [None]:
# --- EDIT THESE ---
DATA_PATH  = 'path/to/your_week3_timeseries.csv'  # e.g., '../data/week3_series.csv'
DATE_COL   = 'date'       # column with timestamps/dates
TARGET_COL = 'y'          # the numeric series you will analyze
EVENT_DATE = '2020-06-15' # <-- replace with the actual event date given in the assignment

# Load
df = pd.read_csv(DATA_PATH)
# Parse date and sort
df[DATE_COL] = pd.to_datetime(df[DATE_COL])
df = df.sort_values(DATE_COL).reset_index(drop=True)

print('✅ Data loaded. Shape:', df.shape)
display(df.head(10))

## 2) Quick Visualization
A simple line plot to see the series and visually locate the event date.

In [None]:
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df[DATE_COL], df[TARGET_COL], linewidth=1)
ax.axvline(pd.to_datetime(EVENT_DATE), linestyle='--')
ax.set_title('Time Series with Event Date')
ax.set_xlabel('Date'); ax.set_ylabel(TARGET_COL)
plt.show()

# Optional: simple moving average to smooth noise
df['SMA_7'] = df[TARGET_COL].rolling(7, min_periods=1).mean()
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df[DATE_COL], df[TARGET_COL], alpha=0.4, label='raw')
ax.plot(df[DATE_COL], df['SMA_7'], linewidth=2, label='SMA 7')
ax.axvline(pd.to_datetime(EVENT_DATE), linestyle='--')
ax.legend(); ax.set_title('Raw vs 7-Day SMA'); ax.set_xlabel('Date')
plt.show()

## 3) Regression Discontinuity (Event) Test with Linear Regression

We test whether there is a **level jump** at the event date while controlling for a linear time trend.  
We create regressors:
- `t`: a running time index (0 at first observation)
- `post`: indicator (1 if date ≥ event date, else 0)
- `t_post`: interaction (`t * post`) to allow a slope change after the event

The parameter on `post` captures the *instant level change* at the event.
The parameter on `t_post` captures a *slope change* after the event.

In [None]:
# Build design matrix
df = df.copy()
df['t'] = np.arange(len(df))
event_ts = pd.to_datetime(EVENT_DATE)
df['post'] = (df[DATE_COL] >= event_ts).astype(int)
df['t_post'] = df['t'] * df['post']

df[['t','post','t_post']].head()

In [None]:
# Fit OLS using numpy (to avoid extra dependencies)
# Model: y = b0 + b1*t + b2*post + b3*t_post
X = np.column_stack([
    np.ones(len(df)),      # intercept
    df['t'].values,
    df['post'].values,
    df['t_post'].values
])
y = df[TARGET_COL].values

# Closed-form OLS: beta = (X'X)^(-1) X'y
XtX = X.T @ X
XtX_inv = np.linalg.pinv(XtX)
beta = XtX_inv @ (X.T @ y)

coef_names = ['Intercept','t (trend)','post (level jump)','t_post (slope change)']
for name, b in zip(coef_names, beta):
    print(f'{name:22s}: {b: .4f}')

# Fitted values
y_hat = X @ beta
df['y_hat'] = y_hat

# Plot fit pre/post
fig, ax = plt.subplots(figsize=(10,4))
ax.plot(df[DATE_COL], df[TARGET_COL], alpha=0.5, label='actual')
ax.plot(df[DATE_COL], df['y_hat'], linewidth=2, label='fitted OLS')
ax.axvline(event_ts, linestyle='--', label='event')
ax.set_title('OLS with Event (discontinuity)')
ax.legend(); ax.set_xlabel('Date')
plt.show()

### (Optional) Robust Standard Errors (Quick Approximation)

Below is a simple (not cluster-robust) sandwich variance approximation for reference so you can inspect **t-stats**.  
For rigorous inference, you may use `statsmodels` with HAC/Newey-West or cluster options if allowed.

In [None]:
# Simple (non-robust) OLS standard errors
resid = y - y_hat
n, k = X.shape
sigma2 = (resid @ resid) / (n - k)
var_beta = sigma2 * XtX_inv
se = np.sqrt(np.diag(var_beta))
t_stats = beta / se

print('\nCoefficients, SE, t-stats (non-robust)')
for name, b, s, t in zip(coef_names, beta, se, t_stats):
    print(f'{name:22s}: beta={b: .4f},  SE={s: .4f},  t={t: .2f}')

### Sensitivity: Local Bandwidth Around Event

Sometimes you restrict the analysis to a window around the event. Adjust `WINDOW_DAYS` as needed.

In [None]:
WINDOW_DAYS = 60  # try 30, 60, 90, etc.
mask = (df[DATE_COL] >= event_ts - pd.Timedelta(days=WINDOW_DAYS)) &        (df[DATE_COL] <= event_ts + pd.Timedelta(days=WINDOW_DAYS))
dfl = df.loc[mask].copy()
dfl['t_local'] = np.arange(len(dfl))
dfl['post_local'] = (dfl[DATE_COL] >= event_ts).astype(int)
dfl['t_post_local'] = dfl['t_local'] * dfl['post_local']

Xl = np.column_stack([
    np.ones(len(dfl)),
    dfl['t_local'].values,
    dfl['post_local'].values,
    dfl['t_post_local'].values
])
yl = dfl[TARGET_COL].values

beta_l = np.linalg.pinv(Xl.T @ Xl) @ (Xl.T @ yl)
yh_l = Xl @ beta_l

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(dfl[DATE_COL], dfl[TARGET_COL], alpha=0.5, label='actual')
ax.plot(dfl[DATE_COL], yh_l, linewidth=2, label='fitted (local window)')
ax.axvline(event_ts, linestyle='--', label='event')
ax.set_title(f'Local Window ±{WINDOW_DAYS} days around event')
ax.legend(); ax.set_xlabel('Date')
plt.show()

for name, b in zip(coef_names, beta_l):
    print(f'{name:22s}: {b: .4f}')

## 4) Helper Functions — Normalization & Standardization
Use these to answer quiz items like “normalize [1,2,4]” or “standardize [1,2,4]”.

In [None]:
def minmax_normalize(a):
    a = np.asarray(a, dtype=float)
    mn, mx = np.min(a), np.max(a)
    if mx == mn:
        return np.zeros_like(a)
    return (a - mn) / (mx - mn)

def zscore_standardize(a, ddof=0):
    """Return z-scores. By default uses population std (ddof=0).
    If your course expects sample std, set ddof=1."""
    a = np.asarray(a, dtype=float)
    mu = a.mean()
    std = a.std(ddof=ddof)
    if std == 0:
        return np.zeros_like(a)
    return (a - mu) / std

# Demo on [1,2,4]
arr = np.array([1,2,4])
print('Normalize [1,2,4]:', np.round(minmax_normalize(arr), 3))
print('Standardize [1,2,4] (population):', np.round(zscore_standardize(arr, ddof=0), 2))
print('Standardize [1,2,4] (sample):     ', np.round(zscore_standardize(arr, ddof=1), 2))

## 5) Short Answers (fill these in)

**Q1.** In your OLS RDD, what is the estimated level jump at the event (coefficient on `post`)?  
**Answer:** _<fill in>_

**Q2.** Does the slope appear to change after the event (coefficient on `t_post`)? Interpret briefly.  
**Answer:** _<fill in>_

**Q3.** Normalize the vector [1, 2, 4] using min–max.  
**Answer:** _[0.0, 0.333..., 1.0]_

**Q4.** Standardize [1, 2, 4] using z-scores. Indicate whether you used population or sample std.  
**Answer:** _Population std → roughly [-1.07, -0.27, 1.34]_

**Q5.** Include a figure that visually marks the event date and the fitted regression lines (already produced above). Paste it in your submission where requested.

---

### What to Submit
- Export this notebook as required by your course (e.g., .ipynb and/or PDF).
- Make sure you replaced the placeholder paths and `EVENT_DATE`, and that your figures render correctly.
- Keep narrative answers concise and evidence-based.

> Tip: If your grader expects `statsmodels` or scikit-learn, you can add those imports and replicate the OLS with robust SEs. The core design matrix and interpretation remain the same.