# CausalPilot: A Comprehensive Tutorial ðŸš€

Welcome to the **CausalPilot** tutorial! This notebook will guide you through the framework's core features, including the new **Natural Language Interface**.

## What We'll Cover
1. **Setup**: Installing and importing CausalPilot.
2. **Data Loading**: Using built-in datasets (IHDP).
3. **Natural Language Modeling**: Defining a causal model using plain English.
4. **Estimation**: Using DoubleML and Causal Forests.
5. **Visualization**: Plotting causal graphs and effects.


In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import causalpilot as cp
from causalpilot.core import CausalModel
from causalpilot.datasets import load_ihdp

# Set random seed
np.random.seed(42)
plt.style.use('seaborn-v0_8-whitegrid')

## 1. Load Data

We'll use the **IHDP** dataset, a standard benchmark for causal inference. It measures the effect of home visits on child cognitive scores.

In [None]:
ihdp_data = load_ihdp()
print(f"Dataset Shape: {ihdp_data.shape}")
ihdp_data.head()

## 2. Define Causal Model (Natural Language)

Instead of manually defining edges, we can simply describe our problem in English! CausalPilot will parse this (using a mock LLM for now) and build the graph.

In [None]:
# Initialize model using Natural Language
model = CausalModel.from_natural_language(
    data=ihdp_data,
    query="I want to estimate the effect of 'treatment' on 'outcome', controlling for all other covariates."
)

# Visualize the generated graph
from causalpilot.visualization import plot_causal_graph
plot_causal_graph(model.graph, title="Generated Causal Graph")

## 3. Estimate Causal Effect

Now that we have our model, we can estimate the **Average Treatment Effect (ATE)**. We'll use **DoubleML**, which is robust to high-dimensional confounders.

In [None]:
# Estimate using DoubleML
result = model.estimate_effect(method='doubleml')

print(f"Estimated ATE (DoubleML): {result['ate']:.3f}")
print(f"95% Confidence Interval: {result['ci']}")

## 4. Heterogeneous Effects (Causal Forest)

Does the treatment work better for some children than others? We can use a **Causal Forest** to find out.

In [None]:
from causalpilot.inference import CausalForest

# Initialize and fit Causal Forest
cf = CausalForest(n_estimators=100, random_state=42)
cf.fit(ihdp_data.drop(columns=['treatment', 'outcome']), ihdp_data['treatment'], ihdp_data['outcome'])

# Predict individual effects (CATE)
cate = cf.predict(ihdp_data.drop(columns=['treatment', 'outcome']))

# Plot distribution of effects
plt.figure(figsize=(10, 6))
plt.hist(cate, bins=30, alpha=0.7, color='teal')
plt.title('Distribution of Individual Treatment Effects')
plt.xlabel('Treatment Effect')
plt.ylabel('Frequency')
plt.axvline(cate.mean(), color='red', linestyle='--', label=f'Mean: {cate.mean():.2f}')
plt.legend()
plt.show()