# Comparing Predictive Modeling vs Causal Inference in One Notebook

This notebook demonstrates how to:
- Use **scikit-learn** for predictive regression
- Use **DoWhy** for causal effect estimation

We simulate data where `study_hours` influences `score`, with `attendance` as a confounder.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Install DoWhy if necessary
# !pip install dowhy
from dowhy import CausalModel

# Set seed for reproducibility
np.random.seed(0)

In [None]:
# Generate synthetic data
n = 500
attendance = np.random.normal(0, 1, n)
study_hours = 2 * attendance + np.random.normal(0, 1, n)
score = 5 * study_hours + 3 * attendance + np.random.normal(0, 2, n)

df = pd.DataFrame({'study_hours': study_hours, 'attendance': attendance, 'score': score})
df.head()

## 1. Predictive Modeling with scikit-learn

In [None]:
# Prepare data
X = df[['study_hours', 'attendance']]
y = df['score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit linear regression
model = LinearRegression()
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

The coefficients show how the outcome `score` changes with each predictor, but this is correlational, not causal.

## 2. Causal Inference with DoWhy

In [None]:
# Define causal model
causal_model = CausalModel(
    data=df,
    treatment='study_hours',
    outcome='score',
    common_causes=['attendance']
)

# Visualize causal graph (optional)
# causal_model.view_model()

# Identify the causal effect
identified_estimand = causal_model.identify_effect()
print(identified_estimand)

# Estimate causal effect using linear regression (backdoor adjustment)
causal_estimate = causal_model.estimate_effect(
    identified_estimand,
    method_name='backdoor.linear_regression'
)
print('Causal Estimate:', causal_estimate.value)

The output above is the estimated **causal** effect of `study_hours` on `score`, adjusting for `attendance`.

## Summary

| Approach            | Tool            | Purpose                                 |
|---------------------|-----------------|-----------------------------------------|
| Predictive Modeling | scikit-learn    | Fit regression to predict outcomes      |
| Causal Inference    | DoWhy           | Estimate treatment effect with backdoor |
