# Venturalitica Tutorial: Training with Governance

This notebook demonstrates how to integrate fairness and performance checks into your ML workflow using **Venturalitica SDK**.

### Objectives:
1. **Pre-training Audit**: Detect data bias before training.
2. **Training**: Train a basic model while tracking duration and emissions.
3. **Post-training Audit**: Verify model fairness and performance on test data.

In [None]:
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import venturalitica as vl
import pandas as pd
import yaml

print("Venturalitica version:", vl.__version__ if hasattr(vl, '__version__') else '0.2.0') 

## 1. Load Data
We use the **UCI German Credit** dataset, a classic benchmark for credit scoring fairness.

In [None]:
dataset = fetch_ucirepo(id=144)
df = dataset.data.features.copy()
df['class'] = dataset.data.targets

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
print(f"Loaded {len(df)} samples.")
df.head()

## 2. Defining Policies (OSCAL)
We'll define a simple policy to check for **Class Imbalance** and **Disparate Impact** across Gender and Age.

In [None]:
policy_yaml = """
assessment-plan:
  metadata:
    title: "Credit Governance Policy"
  reviewed-controls:
    control-implementations:
      - description: "Bias and Performance rules"
        implemented-requirements:
          - control-id: class-balance
            description: "Minority class representation"
            props:
              - name: metric_key
                value: class_imbalance
              - name: threshold
                value: "0.2"
              - name: operator
                value: gt
          - control-id: gender-disparate
            description: "Gender fairness (DI > 0.8)"
            props:
              - name: metric_key
                value: disparate_impact
              - name: threshold
                value: "0.8"
              - name: operator
                value: gt
              - name: "input:dimension"
                value: gender
          - control-id: age-disparate
            description: "Age fairness (DI > 0.5)"
            props:
              - name: metric_key
                value: disparate_impact
              - name: threshold
                value: "0.5"
              - name: operator
                value: gt
              - name: "input:dimension"
                value: age
          - control-id: accuracy-score
            description: "Model utility (Accuracy > 70%)"
            props:
              - name: metric_key
                value: accuracy_score
              - name: threshold
                value: "0.7"
              - name: operator
                value: gt
"""

with open("policy.oscal.yaml", "w") as f:
    f.write(policy_yaml)

## 3. Pre-Training Audit
We run the audit on the **training set** to ensure our data foundation is fair.

In [None]:
vl.enforce(
    data=train_df,
    target="class",
    gender="Attribute9",
    age="Attribute13",
    policy="policy.oscal.yaml"
)

## 4. Train with Monitoring
We'll use `vl.monitor` to track the training metadata and emissions.

In [None]:
# Pre-processing
df_encoded = pd.get_dummies(df.drop(columns=['class']))
X_train, X_test, y_train, y_test = train_test_split(
    df_encoded, 
    df['class'].values.ravel(), 
    test_size=0.2, 
    random_state=42
)

model = RandomForestClassifier(n_estimators=100, random_state=42)

with vl.monitor(name="RandomForest-Credit"):
    model.fit(X_train, y_train)
    
predictions = model.predict(X_test)

## 5. Post-Training Audit
Finally, we verify if the trained model satisfies our fairness and performance requirements on the test set.

In [None]:
audit_df = df.iloc[X_test.index].copy()
audit_df['prediction'] = predictions

vl.enforce(
    data=audit_df,
    target="class",
    prediction="prediction",
    gender="Attribute9",
    age="Attribute13",
    policy="policy.oscal.yaml"
)

## Conclusion
- **Data Audit**: Detected imbalance and age bias.
- **Model Training**: Successfully monitored.
- **Model Audit**: Verified that fairness might have degraded post-training (Bias Amplification).