# Chapter 66: Model Governance

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand the concept of model governance and its importance in regulated industries
- Identify the key components of a model governance framework: documentation, approval, risk management, and audit
- Create comprehensive model documentation, including model cards and data sheets
- Implement approval workflows for model development and deployment
- Assess and mitigate risks associated with machine learning models
- Establish audit trails to track model changes and decisions
- Navigate regulatory compliance requirements (GDPR, financial regulations) relevant to the NEPSE system
- Incorporate ethics and fairness checks into the model lifecycle
- Adopt best practices for responsible AI governance

---

## Introduction

As the NEPSE prediction system matures and potentially influences real trading decisions, it becomes subject to a range of governance requirements. Model governance is the system of policies, processes, and controls that ensure machine learning models are developed, deployed, and maintained in a responsible, transparent, and compliant manner. It encompasses everything from documentation and approval workflows to risk management and audit trails.

In the financial industry, model governance is not optional—it is mandated by regulators. For example, the Federal Reserve's SR 11-7 (in the US) and similar regulations globally require banks to have robust model risk management frameworks. Even if the NEPSE system is for educational or personal use, adopting governance practices is a mark of professionalism and prepares you for real‑world deployment.

In this chapter, we will explore the pillars of model governance. We will create model cards for our NEPSE models, establish approval workflows, consider risk and fairness, and set up audit trails. By the end, you will have a blueprint for governing ML models responsibly.

---

## 66.1 Governance Framework

A model governance framework typically includes the following components:

- **Policies and Standards**: High‑level principles and specific requirements that all models must meet.
- **Roles and Responsibilities**: Clear assignment of who is responsible for development, validation, approval, and monitoring.
- **Model Lifecycle Processes**: Defined stages from development to retirement, with gates at each stage.
- **Documentation Requirements**: What must be documented at each stage.
- **Risk Categorisation**: Models are categorised by risk level (e.g., low, medium, high) with corresponding oversight.
- **Validation and Testing**: Independent validation of models before deployment.
- **Monitoring and Reporting**: Ongoing performance monitoring and periodic reviews.
- **Audit and Compliance**: Ensuring adherence to policies and regulatory requirements.

For the NEPSE system, a simplified framework might include:

- **Roles**: Data Scientist (develops), ML Engineer (deploys), Risk Owner (approves), Model Validator (independent review).
- **Stages**: Development → Validation → Approval → Deployment → Monitoring → Retirement.
- **Risk Level**: Initially, treat the system as medium risk (since it could inform trading decisions).

---

## 66.2 Model Documentation

Comprehensive documentation is the foundation of governance. It should capture all aspects of a model's development and intended use.

### 66.2.1 What to Document

- **Business Objective**: What problem does the model solve? (e.g., predict next‑day price direction for NEPSE stocks)
- **Data Sources**: Description of data, including provenance, time period, and any preprocessing.
- **Feature Engineering**: List of features and how they are computed.
- **Model Architecture**: Type of model (e.g., XGBoost, LSTM), hyperparameters, training algorithm.
- **Training and Validation**: How data was split, performance metrics (accuracy, precision, recall, etc.).
- **Limitations**: Known weaknesses, such as poor performance during high‑volatility periods.
- **Intended Use**: Who should use the model and for what purpose? Who should not?
- **Ethical Considerations**: Potential biases, fairness checks.
- **Version History**: Changes over time.

### 66.2.2 Example Documentation Outline for NEPSE Model

```
Model Name: NEPSE Daily Direction Predictor v1.2
Date: 2025-03-15
Author: Data Science Team

1. Business Objective
   Predict whether the NEPSE index will close higher than the previous day.

2. Data Sources
   - NEPSE daily OHLCV data from 2015-01-01 to 2024-12-31 (CSV files).
   - Manually curated list of corporate actions (dividends, splits) from annual reports.

3. Feature Engineering
   - Lagged returns (1,2,3,5 days)
   - 20-day simple moving average
   - 14-day RSI
   - Volume ratio (current volume / 20-day average volume)
   - Day-of-week indicator (one-hot encoded)
   - Fiscal quarter indicator (Nepal-specific)

4. Model Architecture
   - Algorithm: XGBoost Classifier
   - Hyperparameters: n_estimators=200, max_depth=6, learning_rate=0.05, subsample=0.8
   - Training objective: binary logistic loss

5. Training and Validation
   - Training period: 2015-01-01 to 2023-12-31
   - Validation period: 2024-01-01 to 2024-06-30
   - Test period: 2024-07-01 to 2024-12-31
   - Performance (test): Accuracy = 0.62, Precision = 0.64, Recall = 0.59, F1 = 0.61

6. Limitations
   - Performance drops during election periods (accuracy ~0.52).
   - Does not incorporate news sentiment or macroeconomic data.
   - Trained only on daily data; not suitable for intraday predictions.

7. Intended Use
   - Generate trading signals for a quantitative strategy with appropriate risk management.
   - Not intended for use as the sole basis for investment decisions.

8. Ethical Considerations
   - Model was tested for performance across different market cap segments; no significant bias found.
   - No personal data used.

9. Version History
   v1.0: Initial release (2024-01-15)
   v1.1: Added fiscal quarter features (2024-06-20)
   v1.2: Retrained with additional data through 2024 (2025-03-15)
```

---

## 66.3 Model Cards

**Model cards** are a structured, transparent format for reporting model information. Introduced by researchers at Google, they provide a standardised way to communicate model details, intended use, and evaluation results. Model cards are particularly useful for sharing models with stakeholders or the public.

### 66.3.1 Structure of a Model Card

A typical model card includes:

- **Model Details**: Name, version, type, date, authors.
- **Intended Use**: Primary use cases, out‑of‑scope uses.
- **Factors**: Demographic or other groups considered in evaluation.
- **Metrics**: Performance measures and how they were computed.
- **Evaluation Data**: Description of datasets used for evaluation.
- **Training Data**: Description of training data.
- **Quantitative Analyses**: Performance breakdowns by segments.
- **Ethical Considerations**: Potential biases, risks.
- **Caveats and Recommendations**: Known limitations, usage advice.

### 66.3.2 Generating a Model Card Programmatically

We can create a model card as a Markdown file or JSON using a template. Here's a Python example that generates a model card from MLflow run data.

```python
import json
from datetime import datetime

def generate_model_card(run_id, model_name, model_version):
    # Fetch run data from MLflow (simplified)
    import mlflow
    client = mlflow.tracking.MlflowClient()
    run = client.get_run(run_id)
    
    # Extract parameters and metrics
    params = run.data.params
    metrics = run.data.metrics
    
    # Build model card
    card = {
        "model_details": {
            "name": model_name,
            "version": model_version,
            "date": datetime.now().isoformat(),
            "type": params.get("model_type", "XGBoost"),
            "authors": ["Data Science Team"],
        },
        "intended_use": {
            "primary_uses": ["Predict next-day direction of NEPSE index"],
            "out_of_scope": ["Intraday trading", "Individual stock prediction"],
        },
        "factors": {
            "relevant_factors": ["Market conditions", "Day of week"],
            "evaluation_groups": ["Bull market period", "Bear market period", "Election period"],
        },
        "metrics": {
            "accuracy": metrics.get("test_accuracy"),
            "precision": metrics.get("test_precision"),
            "recall": metrics.get("test_recall"),
            "f1": metrics.get("test_f1"),
        },
        "evaluation_data": {
            "description": "NEPSE daily data from 2024-07-01 to 2024-12-31",
            "size": int(params.get("test_size", 0)),
        },
        "training_data": {
            "description": "NEPSE daily data from 2015-01-01 to 2024-06-30",
            "size": int(params.get("train_size", 0)),
        },
        "quantitative_analyses": {
            "accuracy_by_period": {
                "bull": 0.65,
                "bear": 0.58,
                "election": 0.52,
            }
        },
        "ethical_considerations": {
            "bias_assessment": "Model performance was consistent across stock sectors.",
            "data_privacy": "No personal data used.",
        },
        "caveats": [
            "Performance degrades during periods of political instability.",
            "Model does not account for news sentiment.",
        ],
    }
    
    # Save to file
    with open(f"model_card_{model_name}_v{model_version}.json", "w") as f:
        json.dump(card, f, indent=2)
    
    return card

# Example usage
generate_model_card("abc123", "NEPSE_Predictor", 5)
```

**Explanation:**  
This function pulls information from an MLflow run and combines it with manually entered details to produce a JSON model card. The card can be stored alongside the model in the registry or shared with stakeholders.

---

## 66.4 Data Sheets

Similar to model cards, **data sheets** document the datasets used for training and evaluation. They promote transparency about data provenance, collection methods, and potential biases.

### 66.4.1 Content of a Data Sheet

- **Dataset Description**: Name, source, intended use.
- **Collection Method**: How data was gathered (e.g., API, manual entry).
- **Preprocessing**: Cleaning steps, feature engineering.
- **Distribution**: Time period, number of samples, feature types.
- **Known Issues**: Missing data, errors, biases.
- **Recommended Splits**: Training, validation, test.

### 66.4.2 Example Data Sheet for NEPSE Data

```yaml
dataset_name: NEPSE Daily OHLCV
version: 2024.1
source: Nepal Stock Exchange (public CSV files)
collection_method: Downloaded daily from exchange website; scripted download from 2015.
time_period: 2015-01-01 to 2024-12-31
number_of_samples: ~2500 trading days
features:
  - Date: trading date
  - Open: opening price (float)
  - High: daily high (float)
  - Low: daily low (float)
  - Close: closing price (float)
  - Volume: number of shares traded (integer)
  - Prev_Close: previous day's close (float)
preprocessing:
  - Removed rows with zero volume (non‑trading days)
  - Forward‑filled missing prices (rare)
  - Added derived features (returns, moving averages) separately
known_issues:
  - Data from 2020-03 to 2020-05 may have increased volatility due to COVID‑19.
  - Some older data may have inconsistent formatting (corrected in preprocessing).
recommended_splits:
  - Train: 2015-01-01 to 2023-12-31
  - Validation: 2024-01-01 to 2024-06-30
  - Test: 2024-07-01 to 2024-12-31
```

Data sheets can be stored as YAML or JSON alongside the dataset.

---

## 66.5 Approval Processes

A governance framework requires formal approval at key stages: before deployment, after major changes, and periodically. Approval workflows can be implemented using tools like Jira, ServiceNow, or even Git pull requests with checklists.

### 66.5.1 Example Approval Workflow

1. **Model Development**: Data scientist creates a model and documents it (model card, data sheet).
2. **Model Validation**: An independent validator (or a peer) reviews the model, tests it on holdout data, and checks for compliance. They produce a validation report.
3. **Risk Assessment**: The model owner assesses the risk level and proposes mitigations.
4. **Approval Committee**: A committee (including risk, compliance, and business representatives) reviews the documentation and validation report. They either approve, request changes, or reject.
5. **Deployment**: Upon approval, the model is deployed to production.
6. **Post‑Deployment Review**: After a period (e.g., 3 months), a review is conducted to ensure the model is performing as expected.

### 66.5.2 Automating Approval with Git and CI/CD

We can encode some approval steps in our CI/CD pipeline. For example, a pull request that adds a new model could trigger automated tests and require a review from a designated approver before merging.

```yaml
# .github/workflows/model_approval.yml
name: Model Approval

on:
  pull_request:
    paths:
      - 'models/**'
      - 'model_cards/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run model tests
        run: pytest tests/test_model.py
      - name: Check model card completeness
        run: python scripts/check_model_card.py
      - name: Require approval
        uses: hmarr/auto-approve-action@v2
        if: github.actor == 'approver-team'
```

**Explanation:**  
This workflow runs validation tests on any PR that changes models or model cards. It also uses an action that requires a specific team to approve the PR. This ensures that no model is merged without proper review.

---

## 66.6 Risk Assessment

Model risk is the potential for adverse consequences from decisions based on model outputs. For the NEPSE system, risks include:

- **Financial loss** if the model's predictions lead to bad trades.
- **Reputational damage** if the model is publicly perceived as flawed.
- **Regulatory non‑compliance** if the model is used in a regulated activity without proper governance.

Risk assessment involves:

1. **Identifying risks**: What could go wrong?
2. **Assessing likelihood and impact**: How likely is each risk? How severe?
3. **Mitigating risks**: What controls can reduce likelihood or impact?
4. **Monitoring**: How will we detect if a risk materialises?

### 66.6.1 Example Risk Assessment for NEPSE Model

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Model predicts incorrectly during high volatility | Medium | High (financial loss) | Implement stop‑loss; use ensemble models; monitor volatility and disable predictions during extreme periods. |
| Data feed error (missing prices) | Low | Medium | Use data validation checks; fallback to last known price. |
| Model overfits to historical data | Medium | Medium | Use walk‑forward validation; monitor performance decay; retrain regularly. |
| Regulatory scrutiny | Low | High | Maintain full documentation; ensure explainability. |

This risk assessment should be documented and reviewed periodically.

---

## 66.7 Audit Trails

An audit trail is a chronological record of all activities related to a model: changes to code, data, parameters, approvals, and deployments. This is essential for accountability and for responding to audits.

### 66.7.1 What to Log

- Model version changes (who, when, why)
- Data version changes
- Hyperparameter changes
- Approval decisions (who approved, when, based on what)
- Deployment events
- Monitoring alerts and responses

### 66.7.2 Implementing Audit Trails

We can leverage existing tools:

- **Git** for code and configuration changes.
- **DVC** for data version changes.
- **MLflow** for experiment and model registry changes.
- **Airflow** for pipeline run logs.
- **Cloud logging** (e.g., AWS CloudTrail) for infrastructure changes.

For a unified audit log, we can write a simple logging function that records events to a database or a file.

```python
import json
from datetime import datetime

def audit_log(event_type, user, details):
    log_entry = {
        "timestamp": datetime.now().isoformat(),
        "event_type": event_type,
        "user": user,
        "details": details
    }
    # Append to a file (or send to a database)
    with open("audit.log", "a") as f:
        f.write(json.dumps(log_entry) + "\n")

# Example
audit_log("model_approval", "alice", {"model_name": "NEPSE_Predictor", "version": 5, "decision": "approved"})
```

**Explanation:**  
This simple function writes structured logs. In production, you would use a more robust system like ELK stack or a dedicated audit database.

---

## 66.8 Compliance

Depending on the jurisdiction and application, various regulations may apply.

### 66.8.1 GDPR (General Data Protection Regulation)

If the NEPSE system ever processes personal data (e.g., trader IDs), GDPR applies. Key requirements:

- **Right to explanation**: Individuals have the right to an explanation of decisions made by automated systems. This requires interpretability.
- **Right to erasure**: Individuals can request deletion of their data. Models may need to be retrained without that data.
- **Data protection by design**: Privacy must be embedded into the system.

For the NEPSE system, if we only use market data, GDPR may not apply. However, if we expand to include user‑specific data, we must comply.

### 66.8.2 Financial Regulations

If the model is used for trading, it may fall under regulations like **MiFID II** (Europe) or **SEC rules** (US). These often require:

- **Algorithm testing**: Proof that the algorithm has been tested and does not create disorderly trading conditions.
- **Record keeping**: All algorithm changes and trading decisions must be logged.
- **Disclosure**: Clients must be informed if algorithms are used.

For a personal or educational project, these may not apply, but it's good to be aware.

### 66.8.3 SOC2

SOC2 is an auditing standard for service organisations. It covers security, availability, processing integrity, confidentiality, and privacy. Achieving SOC2 compliance demonstrates that you have appropriate controls. Relevant controls for ML include change management, access control, and monitoring.

---

## 66.9 Ethics and Fairness

Ethical AI goes beyond compliance. It ensures that models do not perpetuate bias, discriminate against groups, or cause harm.

### 66.9.1 Fairness in Financial Models

For the NEPSE system, fairness might involve checking that the model performs equally well for stocks of different sectors, sizes, or liquidity. If it systematically underperforms for small‑cap stocks, that could be a bias.

### 66.9.2 Fairness Metrics

Common metrics include:

- **Demographic parity**: The proportion of positive predictions should be similar across groups.
- **Equal opportunity**: True positive rates should be similar across groups.
- **Predictive parity**: Positive predictive value should be similar.

We can compute these if we have group labels (e.g., sector).

```python
import pandas as pd
from sklearn.metrics import confusion_matrix

def fairness_metrics(y_true, y_pred, groups):
    results = {}
    for group in groups.unique():
        mask = groups == group
        tn, fp, fn, tp = confusion_matrix(y_true[mask], y_pred[mask]).ravel()
        tpr = tp / (tp + fn) if (tp+fn)>0 else 0
        ppv = tp / (tp + fp) if (tp+fp)>0 else 0
        results[group] = {"TPR": tpr, "PPV": ppv}
    return pd.DataFrame(results).T

# Example
groups = test_data['sector']  # sector labels
fairness_df = fairness_metrics(y_test, y_pred, groups)
print(fairness_df)
```

**Explanation:**  
We compute true positive rate (TPR) and positive predictive value (PPV) for each sector. If they vary widely, the model may be unfair. This should be documented and, if necessary, mitigated (e.g., by reweighting or collecting more data for underperforming groups).

### 66.9.3 Ethical Review

An ethical review should consider:

- **Purpose**: Is the model being used for a beneficial purpose?
- **Transparency**: Are users aware they are interacting with an AI?
- **Accountability**: Who is responsible if the model causes harm?
- **Redress**: Is there a mechanism for users to challenge decisions?

For the NEPSE system, if it is used for personal trading, these questions are less critical. But if deployed in a professional context, they must be addressed.

---

## 66.10 Best Practices

Summarising the key best practices for model governance:

1. **Document everything**: Use model cards and data sheets.
2. **Establish clear roles**: Who develops, validates, approves, deploys.
3. **Implement approval workflows**: Formal sign‑offs for high‑risk models.
4. **Maintain audit trails**: Record all changes and decisions.
5. **Assess risk regularly**: Update risk assessments as the model evolves.
6. **Monitor for fairness and bias**: Check performance across groups.
7. **Ensure compliance**: Understand and adhere to relevant regulations.
8. **Plan for model retirement**: Define criteria for decommissioning.
9. **Educate the team**: Ensure everyone understands governance responsibilities.
10. **Automate where possible**: Use CI/CD pipelines to enforce governance gates.

---

## Chapter Summary

In this chapter, we explored model governance, a critical discipline for responsible AI. We covered:

- The components of a governance framework: policies, roles, processes.
- Comprehensive model documentation and the use of model cards and data sheets.
- Approval workflows and how to implement them with CI/CD.
- Risk assessment to identify and mitigate potential harms.
- Audit trails to track model changes and decisions.
- Compliance with regulations like GDPR and financial rules.
- Ethics and fairness, including metrics to detect bias.
- Best practices for embedding governance into the ML lifecycle.

For the NEPSE prediction system, adopting governance practices ensures that the model is developed and deployed responsibly, with transparency and accountability. Even for a personal project, these habits prepare you for professional environments where governance is mandatory.

In the next chapter, we will discuss **Infrastructure as Code**, which helps manage the infrastructure supporting the NEPSE system in a declarative, version‑controlled way.

---

**End of Chapter 66**