# Venturalitica Tutorial: Training with Governance

This notebook demonstrates how to integrate fairness and performance checks into your ML workflow using **Venturalitica SDK**.

### Objectives:
1. **Pre-training Audit**: Detect data bias before training.
2. **Training with Monitoring**: Track duration and emissions.
3. **Sklearn Integration (`vl.wrap`)**: Automate governance in your standard pipelines.
4. **Post-training Audit**: Verify model fairness and performance on test data.

In [1]:
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import venturalitica as vl
import pandas as pd
import yaml

print("Venturalitica version:", vl.__version__ if hasattr(vl, '__version__') else '0.2.0') 

Venturalitica version: 0.2.4


## 1. Load Data
We use the **UCI German Credit** dataset, a classic benchmark for credit scoring fairness.

In [2]:
dataset = fetch_ucirepo(id=144)
df = dataset.data.features.copy()
df['class'] = dataset.data.targets

train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
print(f"Loaded {len(df)} samples.")
df.head()

Loaded 1000 samples.


Unnamed: 0,Attribute1,Attribute2,Attribute3,Attribute4,Attribute5,Attribute6,Attribute7,Attribute8,Attribute9,Attribute10,...,Attribute12,Attribute13,Attribute14,Attribute15,Attribute16,Attribute17,Attribute18,Attribute19,Attribute20,class
0,A11,6,A34,A43,1169,A65,A75,4,A93,A101,...,A121,67,A143,A152,2,A173,1,A192,A201,1
1,A12,48,A32,A43,5951,A61,A73,2,A92,A101,...,A121,22,A143,A152,1,A173,1,A191,A201,2
2,A14,12,A34,A46,2096,A61,A74,2,A93,A101,...,A121,49,A143,A152,1,A172,2,A191,A201,1
3,A11,42,A32,A42,7882,A61,A74,2,A93,A103,...,A122,45,A143,A153,1,A173,2,A191,A201,1
4,A11,24,A33,A40,4870,A61,A73,3,A93,A101,...,A124,53,A143,A153,2,A173,2,A191,A201,2


## 2. Defining Policies (OSCAL)
We'll define a simple policy to check for **Class Imbalance** and **Disparate Impact** across Gender and Age.

In [3]:
policy_yaml = """
assessment-plan:
  metadata:
    title: "Credit Governance Policy"
  reviewed-controls:
    control-implementations:
      - description: "Bias and Performance rules"
        implemented-requirements:
          - control-id: class-balance
            description: "Minority class representation"
            props:
              - name: metric_key
                value: class_imbalance
              - name: threshold
                value: "0.2"
              - name: operator
                value: gt
          - control-id: gender-disparate
            description: "Gender fairness (DI > 0.8)"
            props:
              - name: metric_key
                value: disparate_impact
              - name: threshold
                value: "0.8"
              - name: operator
                value: gt
              - name: "input:dimension"
                value: gender
          - control-id: age-disparate
            description: "Age fairness (DI > 0.5)"
            props:
              - name: metric_key
                value: disparate_impact
              - name: threshold
                value: "0.5"
              - name: operator
                value: gt
              - name: "input:dimension"
                value: age
          - control-id: accuracy-score
            description: "Model utility (Accuracy > 70%)"
            props:
              - name: metric_key
                value: accuracy_score
              - name: threshold
                value: "0.7"
              - name: operator
                value: gt
"""

with open("policy.oscal.yaml", "w") as f:
    f.write(policy_yaml)

## 3. Pre-Training Audit
We run the audit on the **training set** to ensure our data foundation is fair.

In [4]:
vl.enforce(
    data=train_df,
    target="class",
    gender="Attribute9",
    age="Attribute13",
    policy="policy.oscal.yaml"
)


[Venturalitica v0.2.4] üõ°  Enforcing policy: policy.oscal.yaml
  Evaluating Control 'class-balance': Minority class representation...
  Evaluating Control 'gender-disparate': Gender fairness (DI > 0.8)...
    [Binding] Virtual Role 'dimension' bound to Variable 'gender' (Column: 'Attribute9')
  Evaluating Control 'age-disparate': Age fairness (DI > 0.5)...
    [Binding] Virtual Role 'dimension' bound to Variable 'age' (Column: 'Attribute13')
  Evaluating Control 'accuracy-score': Model utility (Accuracy > 70%)...
    [Skip] Control 'accuracy-score' requires ['prediction'] which are not provided. Skipping.

  [1mCONTROL                DESCRIPTION                            ACTUAL     LIMIT      RESULT[0m
  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚

[ComplianceResult(control_id='class-balance', description='Minority class representation', metric_key='class_imbalance', threshold=0.2, actual_value=0.43112701252236135, operator='gt', passed=True, severity='low'),
 ComplianceResult(control_id='gender-disparate', description='Gender fairness (DI > 0.8)', metric_key='disparate_impact', threshold=0.8, actual_value=0.8361990950226245, operator='gt', passed=True, severity='low'),
 ComplianceResult(control_id='age-disparate', description='Age fairness (DI > 0.5)', metric_key='disparate_impact', threshold=0.5, actual_value=0.36111111111111105, operator='gt', passed=False, severity='low')]

## 4. Automatic Governance with `vl.wrap`

Instead of calling `enforce()` manually, you can wrap your Scikit-Learn model. This will automatically execute the audit every time you call `.fit()` or `.predict()`.

In [5]:
# Pre-processing as before
df_encoded = pd.get_dummies(df.drop(columns=['class']))
X_train, X_test, y_train, y_test = train_test_split(
    df_encoded, 
    df['class'].values.ravel(), 
    test_size=0.2, 
    random_state=42
)

# Wrap your model
base_model = RandomForestClassifier(n_estimators=100, random_state=42)
governed_model = vl.wrap(base_model, policy="policy.oscal.yaml")

# Audits automated! Just provide the raw data for attribution mapping
governed_model.fit(X_train, y_train, audit_data=train_df, gender="Attribute9", age="Attribute13")

# Predict also triggers a performance + fairness audit
predictions = governed_model.predict(X_test, audit_data=test_df.iloc[:len(X_test)], gender="Attribute9", age="Attribute13")


[Venturalitica v0.2.4] üõ°  Enforcing policy: policy.oscal.yaml
  Evaluating Control 'class-balance': Minority class representation...
    [Skip] Control 'class-balance' requires ['target'] which are not provided. Skipping.
  Evaluating Control 'gender-disparate': Gender fairness (DI > 0.8)...
    [Binding] Virtual Role 'dimension' bound to Variable 'gender' (Column: 'Attribute9')
    [Skip] Control 'gender-disparate' requires ['target'] which are not provided. Skipping.
  Evaluating Control 'age-disparate': Age fairness (DI > 0.5)...
    [Binding] Virtual Role 'dimension' bound to Variable 'age' (Column: 'Attribute13')
    [Skip] Control 'age-disparate' requires ['target'] which are not provided. Skipping.
  Evaluating Control 'accuracy-score': Model utility (Accuracy > 70%)...
    [Skip] Control 'accuracy-score' requires ['target', 'prediction'] which are not provided. Skipping.
  ‚ö† No applicable controls found in policy.oscal.yaml

[Venturalitica v0.2.4] üõ°  Enforcing policy: 

## 5. Monitoring Manual Training
If you prefer a manual approach, you can still use `vl.monitor` to track metadata like carbon emissions.

In [6]:
model = RandomForestClassifier(n_estimators=100, random_state=42)

with vl.monitor(name="RandomForest-Credit"):
    model.fit(X_train, y_train)
    
predictions = model.predict(X_test)




[Venturalitica] üü¢ Starting monitor: RandomForest-Credit
[Venturalitica] üî¥ Monitor stopped: RandomForest-Credit
  ‚è±  Duration: 5.10s
  üõ° [Security] Fingerprint: 3938946527b7 | Integrity: ‚úÖ Stable
  üíª [Hardware] Peak Memory: 358.10 MB | CPUs: 16
  üå± [Green AI] Carbon emissions: 0.000000 kgCO‚ÇÇ
  ü§ù [Handshake] Policy enforced verifyable audit trail present.


## Conclusion
- **Data Audit**: Detected imbalance and age bias.
- **vl.wrap**: Automated compliance within Scikit-Learn workflows.
- **Monitoring**: Tracked hardware and environmental impact.