# 🔬 Causal Inference & Endogeneity Analysis

## 🎯 Learning Objectives
This template explores **causal inference** and **endogeneity issues** in marketing analytics. You'll learn to:

- Identify and address endogeneity problems in observational data
- Apply causal inference methods (propensity score matching, instrumental variables)
- Test for heterogeneous treatment effects across customer segments
- Build causal models that go beyond correlation

## 📚 Key Concepts Covered
- **Endogeneity**: When treatment assignment is not random
- **Propensity Score Matching**: Balancing treated vs. untreated groups
- **Instrumental Variables**: Using exogenous variation to identify causal effects
- **Heterogeneous Treatment Effects**: Different effects for different customer segments
- **Causal Forests**: Machine learning for causal inference

---


## 1. Setup & Data Preparation


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Causal inference libraries
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor

# Load data
DATA = Path("../data")
transactions = pd.read_csv(DATA/"transactions.csv", parse_dates=['date'])
users = pd.read_csv(DATA/"users.csv")
products = pd.read_csv(DATA/"products.csv")
rfm = pd.read_csv(DATA/"user_rfm.csv")

print("✅ Data loaded successfully!")
print(f"📊 Transactions: {len(transactions)} records")
print(f"👥 Users: {len(users)} customers")
