# üéØ Level 1: ML Training with Governance

**‚è±Ô∏è Time:** 8 minutes  
**üéì Complexity:** ‚≠ê‚≠ê Intermediate  
**üéØ Goal:** Integrate SDK into a complete ML training pipeline

## What you'll learn:
1. **Pre-training audits** - Validate data quality before training
2. **Governance wrapping** - Wrap your sklearn models with compliance checks
3. **Post-training audits** - Automatic fairness validation on predictions
4. **Actionable compliance reports** - Real OSCAL-native policy enforcement

---

### üéì The ML Governance Lifecycle

Let's build a credit scoring model that is both **accurate** and **compliant**! üöÄ

Most ML projects focus on **accuracy** but ignore **compliance** until production‚Äîwhen it's expensive to fix. This notebook shows the **governed ML lifecycle**:

- **Trust**: Auditable governance helps you explain decisions to regulators, customers, and auditors

```- **Cost savings**: Catching bias in data (pre-training) is 10x cheaper than retraining

[Data] ‚Üí [Pre-Audit] ‚Üí [Train] ‚Üí [Post-Audit] ‚Üí [Deploy]- **Regulatory pressure**: EU AI Act, US Fair Lending laws require documented fairness checks

   ‚Üì          ‚Üì           ‚Üì           ‚Üì            ‚Üì**Why this matters:**

  Raw      Quality    Wrapped    Fairness    Monitoring

         checks      model      validation   in prod```

### How to use this notebook
- **Goal:** Train a credit model and keep it compliant at each step.
- **Data:** Full German Credit dataset (1,000 loans). Runs even if the local CSV is missing by using SDK sample loaders.
- **Policies:** `policies/loan/risks.oscal.yaml` (data quality) and `policies/loan/governance-baseline.oscal.yaml` (fairness during inference).
- **Flow:** Load ‚Üí Pre-audit ‚Üí Train (wrapped) ‚Üí Post-audit ‚Üí Interpret results.

## üì¶ Setup: Import Libraries

In [1]:
import pandas as pd
import numpy as np
import venturalitica as vl
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from pathlib import Path

# Set random seed for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## Step 1: Load & Prepare Data üìä

We'll load the **German Credit Data** (1,000 real loan applications from UCI Repository Dataset #144).

- `foreign_worker` (nationality-based discrimination)

### üéì About the German Credit Dataset- `age` (age-based discrimination is prohibited in many jurisdictions)

- `gender` (male/female)

**Origin**: Collected by Prof. Hans Hofmann (University of Hamburg, 1994)  **Protected attributes** (where discrimination is illegal):

**Use case**: Predict creditworthiness based on 20 attributes (age, job, credit history, etc.)  

**Why it's important**: One of the first datasets where researchers documented gender bias in lending decisions- **Target**: `class` = "good" or "bad" credit risk (we convert to binary 1/0)

- **Categorical**: `checking_status`, `credit_history`, `purpose`, `gender`

**Key attributes**:- **Numerical**: `age`, `credit_amount`, `duration` (loan term in months)

In [2]:
print("üìä Loading German Credit Data...\n")

dataset_path = Path("../../datasets/loan/german_credit.csv")
df = pd.read_csv(dataset_path)

# Prepare required columns
df['age'] = pd.to_numeric(df['age'], errors='coerce')
df = df.dropna(subset=['age'])  # Drop rows with invalid age
df['target'] = pd.to_numeric(df['target'], errors='coerce').astype('int64')
df['age_group'] = pd.cut(df['age'], bins=[0, 25, 45, 100], labels=['Young', 'Adult', 'Senior'])

print(f"‚úÖ Loaded {len(df)} loan applications")
print(f"   Features: {df.shape[1]} columns")
print(f"   Target distribution: {df['target'].value_counts().to_dict()}")
print(f"\nüìã First 3 rows:")
df.head(3)

üìä Loading German Credit Data...

‚úÖ Loaded 1000 loan applications
   Features: 24 columns
   Target distribution: {1: 700, 0: 300}

üìã First 3 rows:


Unnamed: 0,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status_sex,other_parties,...,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,class,target,gender,age_group
0,A11,6,A34,A43,1169,A65,A75,4,A93,A101,...,A152,2,A173,1,A192,A201,1,1,male,Senior
1,A12,48,A32,A43,5951,A61,A73,2,A92,A101,...,A152,1,A173,1,A191,A201,2,0,female,Young
2,A14,12,A34,A46,2096,A61,A74,2,A93,A101,...,A152,1,A172,2,A191,A201,1,1,male,Senior


**Why these prep steps?**
- `age` ‚Üí numeric and non-null to support age binning in policies.
- `target` ‚Üí int for binary classification (1=good, 0=bad).
- `gender` ‚Üí required for fairness audits; fallback prevents crashes.
- `age_group` ‚Üí categorical bin used by policy controls.

## Step 2: Pre-Training Data Audit üõ°Ô∏è

**Before training**, let's validate data quality and check for potential bias.

### üéì Why Pre-Training Audits Matter

**The "Garbage In, Garbage Out" Problem**: If your training data is biased, your model will learn and amplify that bias. Historical lending data often reflects:
- **Historical discrimination**: E.g., women were denied loans more often in the 1990s

- **Proxy variables**: Attributes like ZIP code or education can be proxies for protected attributes**Result**: If this audit fails, **stop and fix your data** before training. Training on bad data wastes compute and produces biased models.

- **Class imbalance**: If 90% of approvals are male, the model may default to predicting "approve" for males

- ‚úÖ **Class balance**: At least 20% representation of both classes (approved/rejected)

**What this audit checks**:- ‚úÖ **Fairness baseline**: Demographic distribution (are protected groups represented?)

- ‚úÖ **Data completeness**: No missing values in critical fields (target, protected attributes)- ‚úÖ **Data quality**: Valid ranges (e.g., age > 0, credit_amount > 0)

In [3]:
print("üõ°Ô∏è  Running pre-training data audit...\n")

# Data integrity checks
assert 'gender' in df.columns, "Critical: 'gender' column missing for fairness audit"
assert 'target' in df.columns, "Critical: 'target' column missing"
assert df['target'].nunique() == 2, "Critical: Target must be binary (0/1)"
print("‚úÖ Data integrity assertions passed\n")

# Load policy and run audit
policy_path = Path("../../policies/loan/risks.oscal.yaml")
results = vl.enforce(data=df, policy=str(policy_path))

print("\nüìä Pre-Training Audit Complete!")

üõ°Ô∏è  Running pre-training data audit...

‚úÖ Data integrity assertions passed


[Venturalitica] üõ°  Enforcing policy: ../../policies/loan/risks.oscal.yaml
  Evaluating Control 'credit-data-imbalance': Data Quality: Minority class (rejected loans) shou...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
  Evaluating Control 'credit-data-bias': Pre-training Fairness: Disparate impact ratio shou...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'dimension' bound to Variable 'gender' (Column: 'gender')
  Evaluating Control 'credit-age-disparate': Disparate impact ratio for raw age (Proxy for seni...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'dimension' bound to Variable 'age' (Column: 'age')
  ‚ùå FAIL | Controls: 2/3 passed
    ‚úì [credit-data-imbalance] Data Quality: Minority class (rejected l...: 0.429 (Limit: gt0.2)
   

## Step 3: Prepare Training Data üîß

Split data into train/test sets and prepare features.

In [4]:
# Split features and target
# ‚ö†Ô∏è CRITICAL: Drop both 'target' and 'class' to avoid data leaking
X = df.drop(columns=['target', 'class'])
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_SEED
)

print(f"‚úÖ Train set: {len(X_train)} samples")
print(f"‚úÖ Test set:  {len(X_test)} samples")

‚úÖ Train set: 800 samples
‚úÖ Test set:  200 samples


## Step 4: Train Model with Governance Wrap ü§ñ

### üéì What is "Governance Wrapping"?

**The Challenge**: Most ML frameworks (sklearn, PyTorch, TensorFlow) know nothing about fairness. You train a model, it makes predictions, but there's no built-in way to check if those predictions are discriminatory.

**The Solution**: `vl.wrap()` creates a **transparent governance layer** around your model:


```pythonüí° **Pro tip**: In production, use `vl.wrap()` to monitor live traffic and trigger alerts when fairness metrics drift below thresholds.

base_model = LogisticRegression()  # Your usual sklearn model

model = vl.wrap(base_model, policy="fairness.yaml")  # Wrapped version**Key insight**: The wrapper is **non-invasive**. It doesn't change your training code, model architecture, or predictions‚Äîit just adds an audit layer. Your existing sklearn pipelines work unchanged.

model.fit(X_train, y_train)  # Train normally

predictions = model.predict(X_test)  # Auto-audits on every predict!4. **Logs results**: Stores audit artifacts for compliance documentation

```3. **Evaluates policy controls**: Computes demographic parity, equalized odds, calibration

2. **Binds protected attributes**: Matches predictions to demographic data (gender, age, etc.)

**What the wrapper does**:1. **Intercepts `.predict()` calls**: Before returning predictions, it runs fairness checks

In [None]:
print("ü§ñ Training model with governance wrap...\n")

# Build sklearn pipeline (standard feature engineering + model)
numeric_features = X.select_dtypes(include=['number']).columns.tolist()
numeric_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),  # Fill missing with median
    ('scaler', StandardScaler())  # Normalize to mean=0, std=1
])
preprocessor = ColumnTransformer(
    transformers=[('num', numeric_pipeline, numeric_features)],
    remainder='drop'  # Drop non-numeric features (in a real scenario, encode categoricals)
)

base_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(max_iter=1000, random_state=42))
])

# üéØ WRAP THE MODEL WITH GOVERNANCE
# This is the key step: we add fairness checks WITHOUT changing the model itself
fairness_policy = Path("../../policies/loan/governance-baseline.oscal.yaml")
model = vl.wrap(base_pipeline, policy=str(fairness_policy))
# ‚¨ÜÔ∏è Now 'model' is a GovernanceWrapper that behaves like base_pipeline but audits every predict()

# Train as usual - the wrapper is transparent during training
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)


print(f"‚úÖ Model trained successfully!")print(f"\nüîí Governance layer active: Every .predict() call will trigger fairness audits")
print(f"   Accuracy: {accuracy:.1%}")

ü§ñ Training model with governance wrap...


[Venturalitica] üõ°  Enforcing policy: ../../policies/loan/governance-baseline.oscal.yaml
  Evaluating Control 'A.1': Automated check for Demographic Parity...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'prediction' bound to Variable 'prediction' (Column: 'prediction')
    [Binding] Virtual Role 'dimension' bound to Variable 'gender' (Column: 'gender')
  Evaluating Control 'A.2': Automated check for Classification Accuracy...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'prediction' bound to Variable 'prediction' (Column: 'prediction')
  ‚ùå FAIL | Controls: 0/1 passed
    ‚úó [A.2] Automated check for Classification Accur...: 0.000 (Limit: >=0.8)
  ‚úì Results cached for 'venturalitica push'
‚úÖ Model trained successfully!
   Accuracy: 73.0%


**Governance wrap design notes**
- `vl.wrap()` keeps the original feature names visible to the audit engine even though sklearn pipelines transform them.
- Passing the fairness policy here means every `predict()` call will trigger post-training audits automatically.
- Using `audit_data=test_df` ensures demographics are available during prediction-time checks.

## Step 5: Post-Training Fairness Audit üõ°Ô∏è

When we call `.predict()`, the governance wrapper **automatically triggers** fairness audits!

### üéì What Fairness Metrics Are We Checking?

**No code changes needed** - compliance is baked into the wrapper! üéâ

The policy evaluates multiple fairness definitions (no single metric captures all discrimination):

   - **Con**: Can conflict with demographic parity

1. **Demographic Parity** (Statistical Parity)   - **Pro**: Ensures predictions mean the same thing for all groups

   - $P(\hat{y}=1 | \text{male}) \approx P(\hat{y}=1 | \text{female})$   - "When the model predicts 'approve', the true approval rate should be the same across groups"

   - "Approval rates should be similar across genders"   - $P(y=1 | \hat{y}=1, \text{male}) \approx P(y=1 | \hat{y}=1, \text{female})$

   - **Pro**: Easy to explain to non-technical stakeholders3. **Calibration** (Predictive Parity)

   - **Con**: Ignores base rates (what if one group has genuinely higher creditworthiness?)

   - **Con**: Requires labeled test data

2. **Equalized Odds** (Equal Opportunity)   - **Pro**: Accounts for ground truth labels

   - $P(\hat{y}=1 | y=1, \text{male}) \approx P(\hat{y}=1 | y=1, \text{female})$ (True Positive Rate parity)   - "Error rates should be equal across groups"
   - $P(\hat{y}=0 | y=0, \text{male}) \approx P(\hat{y}=0 | y=0, \text{female})$ (True Negative Rate parity)

In [6]:
print("üõ°Ô∏è  Running post-training fairness audit...\n")

# Predict on test set - this triggers automatic fairness audits!
test_df = df.iloc[X_test.index].copy()
predictions = model.predict(X_test, audit_data=test_df)

print(f"‚úÖ Predictions generated: {len(predictions)} samples")
print(f"   Fairness audit completed automatically!")

üõ°Ô∏è  Running post-training fairness audit...


[Venturalitica] üõ°  Enforcing policy: ../../policies/loan/governance-baseline.oscal.yaml
  Evaluating Control 'A.1': Automated check for Demographic Parity...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'prediction' bound to Variable 'prediction' (Column: 'prediction')
    [Binding] Virtual Role 'dimension' bound to Variable 'gender' (Column: 'gender')
  Evaluating Control 'A.2': Automated check for Classification Accuracy...
    [Binding] Virtual Role 'target' bound to Variable 'target' (Column: 'target')
    [Binding] Virtual Role 'prediction' bound to Variable 'prediction' (Column: 'prediction')
  ‚ùå FAIL | Controls: 1/2 passed
    ‚úì [A.1] Automated check for Demographic Parity...: 0.027 (Limit: <0.1)
    ‚úó [A.2] Automated check for Classification Accur...: 0.730 (Limit: >=0.8)
  ‚úì Results cached for 'venturalitica push'
‚úÖ Predictions generated: 200 samples


## Step 6: Review Compliance Results üìä

Let's examine the audit results in detail.

In [7]:
results = model.last_audit_results

print("\n" + "="*70)
print("üìä DETAILED COMPLIANCE REPORT")
print("="*70 + "\n")

for r in results:
    status = "‚úÖ PASSED" if r.passed else "‚ùå FAILED"
    print(f"{status} | {r.control_id}")
    print(f"   Value: {r.actual_value}")
    
    # Validation checks
    assert r.actual_value is not None, f"Critical: Control {r.control_id} returned null"
    
    # Flag suspicious perfect metrics
    if float(r.actual_value) in [0.0, 1.0]:
        print(f"   ‚ö†Ô∏è  SUSPICIOUS: Perfect value {r.actual_value}")
        if 'acc' in r.control_id.lower() or 'recall' in r.control_id.lower():
            assert float(r.actual_value) > 0.0, "Performance metric is 0.0 - check alignment!"
    print()

# Summary
passed = sum(1 for r in results if r.passed)
total = len(results)
print("="*70)
print(f"SUMMARY: {passed}/{total} controls passed ({passed/total*100:.0f}%)")
print("="*70)


üìä DETAILED COMPLIANCE REPORT

‚úÖ PASSED | A.1
   Value: 0.0267857142857143

‚ùå FAILED | A.2
   Value: 0.73

SUMMARY: 1/2 controls passed (50%)


**Interpreting results**
- Look for failed controls to spot fairness or quality gaps quickly.
- `actual_value` shows the measured metric; perfect 0/1 values can be suspicious, so we flag them for review.
- Re-run after adjusting data or policy thresholds to see compliance improve.

## üéâ Congratulations!

You just built a **production-grade ML pipeline** with:
- ‚úÖ Pre-training data quality audits
- ‚úÖ Governance-wrapped model training
- ‚úÖ Automatic post-training fairness validation
- ‚úÖ OSCAL-native compliance reports

### Model Performance
- **Accuracy:** {accuracy:.1%}
- **Compliance:** {passed}/{total} controls passed

### What's Next?

**Option A: Experiment with Policies** üìù
- Edit `policies/loan/governance-baseline.oscal.yaml`
- Adjust fairness thresholds
- Add custom controls

**Option B: Try Different Models** üî¨
- Replace LogisticRegression with RandomForest
- Try XGBoost or LightGBM
- Compare compliance scores

**Option C: Add MLOps Integration** üöÄ
- Open `02_mlops_integration.py`
- Track experiments with MLflow
- Version control your compliance reports

**Option D: Production Deployment** üè≠
- Open `03_production_ready.py`
- See batch inference patterns
- Learn about continuous compliance monitoring

---

**You're now ready to build compliant AI systems! üéì‚ú®**

### How to interpret the metrics
- **Accuracy**: share of test records predicted correctly. Use it to gauge basic model fit; pair it with class balance so high accuracy is not just predicting the majority class.
- **Fairness audit (passed/total)**: number of policy controls that cleared across the fairness policy. `passed`/`total` gives quick compliance coverage; inspect failing controls to see which protected attribute or threshold was breached.
- **Compliance messages**: each control‚Äôs message explains what was checked (e.g., parity gap, threshold, missing documentation) and why it passed or failed.
- **Provenance**: `source` or policy path indicates which policy file defined the rule‚Äîuse it to trace back to governance requirements.