In [37]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.read_csv("homework_3.1.csv")

In [38]:
results = {}
for col in ["value1", "value2", "value3"]:
    df["event"] = (df["time"] >= 50).astype(int)
    df["interaction"] = df["time"] * df["event"]
    y = df[col]


Line 1: Dictionary to hold results for each dataset.

Line 2: Runs the same analysis on each of the three series.

Line 3: event: 0 before time 50, 1 after.

Line 4: interaction: 0 before event, equals time after event (used for slope test).

Line 5: y: the current dataset column (value1, value2, or value3).

For questions 1 and 2: 

Given a dataset with time series data containing an event, use a linear regression to test whether there was a discontinuity in the data at the event. Consider the possibility, first, of a discontinuity only in the value of the variable but not the derivative. Then consider that there may be a discontinuity in the first derivative (the slope).  
Use the file homework_3.1.csv. 

## **Model 1: test for a jump in the level (value discontinuity)**

In [39]:
X1 = df[["time", "event"]]
model1 = LinearRegression().fit(X1, y)
jump_coef = model1.coef_[1]



Uses time and event as predictors.

jump_coef measures the size of the jump in the series at the event.

## **Model 2 - Slope Change**

In [40]:
X2 = df[["time", "event", "interaction"]]
model2 = LinearRegression().fit(X2, y)
slope_coef = model2.coef_[2]


Adds interaction to allow slope change after event.

slope_coef measures the change in slope at the event.

In [41]:
results[col] = {"jump_coef": jump_coef, "slope_coef": slope_coef}
print(pd.DataFrame(results).T)


        jump_coef  slope_coef
value3   1.767254    0.050695


Stores results for each dataset.

Converts dictionary to a table and prints it.

Largest jump_coef → strongest discontinuity in value (Q1).

Largest slope_coef → strongest discontinuity in slope (Q2).

For questions 3 to 5:  

Given a dataset with treatment and control data having “before” and “after” parts, apply a differences-in-differences regression.  
Use homework_3.2.a.csv and homework_3.2.b.csv. 

File A (homework_3.2.a.csv)
Columns:

group1 → 0 = control, 1 = treatment

time1 → 0 = before, 1 = after

outcome1 → measured outcome


File B (homework_3.2.b.csv)
Columns:

group2 → 0 = control, 1 = treatment

time2 → 0 = before, 1 = after

outcome2 → measured outcome

## **Question 3 - 5 Differences-in-Differences Code**

In [42]:
a = pd.read_csv("homework_3.2.a.csv")
b = pd.read_csv("homework_3.2.b.csv")


In [43]:
def run_did(df, group_col, time_col, outcome_col):
    df["interaction"] = df[group_col] * df[time_col]
    X = df[[group_col, time_col, "interaction"]]
    y = df[outcome_col]
    model = LinearRegression().fit(X, y)
    return model.coef_[2]
did_a = run_did(a.copy(), "group1", "time1", "outcome1")
did_b = run_did(b.copy(), "group2", "time2", "outcome2")
print("Differences-in-Differences estimate for A:", did_a)
print("Differences-in-Differences estimate for B:", did_b)

Differences-in-Differences estimate for A: 0.685846968992887
Differences-in-Differences estimate for B: 1.3498589246939958


Line 1: Imports pandas to load and manipulate CSV data.

Line 2: Imports LinearRegression (scikit-learn) to fit the DiD regression (OLS with an intercept by default).

Line 3: Reads homework_3.2.a.csv into DataFrame a.

Line 4: Reads homework_3.2.b.csv into DataFrame b.

Line 5: Starts a helper function run_did that will compute the DiD estimate for a given DataFrame and column names.

Line 6: Creates the interaction term (treatment * post), which equals 1 only for treated units after the intervention; this coefficient is the DiD effect.

Line 7: Builds the feature matrix X with three regressors: treatment dummy (group_col), post-period dummy (time_col), and the interaction. (Intercept is handled automatically by LinearRegression.)

Line 8: Sets the target vector y to the specified outcome column.

Line 9: Fits the linear model y ~ group + time + group*time and stores the trained model.

Line 10: Returns the interaction coefficient (index 2), which is the Differences-in-Differences estimate.

Line 11: Runs DiD on dataset A (using .copy() so adding the interaction column doesn’t mutate the original a) with its column names.

Line 12: Runs DiD on dataset B similarly.

Line 13: Prints the DiD estimate for A so you can read it off for the quiz.

Line 14: Prints the DiD estimate for B.