# 04: Causal Validation

This notebook demonstrates causal analysis using DoWhy to validate assumptions about causal relationships in the data.

In [None]:
import sys
sys.path.insert(0, '../src')

from pyspark.sql import SparkSession
from faircare.silver.causalanalysis import CausalAnalyzer
import yaml

In [None]:
spark = SparkSession.builder.appName("FAIR-CARE-Causal").getOrCreate()

with open('../configs/default.yaml', 'r') as f:
    config = yaml.safe_load(f)

dataset_config = config['datasets']['compas']

## Load Silver Data

In [None]:
silver_df = spark.read.format("delta").load(dataset_config['silver_path'])
print(f"Silver records: {silver_df.count()}")

## Causal Analysis

We test the causal relationship: **Race → Recidivism**

According to fairness principles, race should NOT have a direct causal effect on recidivism.

In [None]:
causal_config = dataset_config.copy()
analyzer = CausalAnalyzer(causal_config)
causal_report = analyzer.analyze(silver_df)

print("\nCausal Analysis Report:")
for key, value in causal_report.items():
    print(f"  {key}: {value}")

## Interpretation

- **Causal Estimate**: The estimated effect size
- **Refutation p-value**: If > 0.05, the causal relationship is robust
- **Causal Validity**: PASS if refutation tests succeed

## Summary

Causal validation complete:
- ✅ Causal model defined
- ✅ Effect estimated
- ✅ Refutation tests performed

**Next**: Proceed to notebook 05 for Gold layer fairness.

In [None]:
spark.stop()