# Module 20 - Transparency and Interpretability

## Module Overview

During this programme, you have learned about a wide range of machine learning methods as well as about their potential applications and advantages for solving data-based problems. But you have only scratched the surface on the unintended consequences that can occur when data, or the model itself, have pre-existing biases baked in. This module covers several strategies for improving the transparency and interpretability of models so that biases can be more easily recognised and taken into consideration when analysing a model’s results.

## Learning outcomes

- LO 1: Define transparency and interpretability.
- LO 2: Recognise bias inherent in data.
- LO 3: Develop a datasheet for a data set.
- LO 4: Recognise bias in the creation of machine learning models.
- LO 5: Identify ways to benchmark models against reporting standards.
- LO 6: Identify trade-offs between explainability or interpretability.
- LO 7: Build an interpretable decision tree.
- LO 8: Refine a codebase for machine learning competitions.

## Module Summary Description

This module explores the critical concepts of transparency and interpretability in machine learning. As models increasingly influence real-world decisions, understanding their inner workings and the data they rely on is essential. The module highlights how machine learning systems can unintentionally reinforce bias, particularly when trained on historical or unbalanced data. Through real-world case studies, such as biased recruitment algorithms and wrongful arrests due to facial recognition errors, learners examine the ethical consequences of opaque model behavior.

To mitigate these risks, the module introduces tools like Datasheets for Datasets and Model Cards, which improve documentation, transparency, and accountability. Learners also explore methods to detect and reduce bias in data and models, and build interpretable models—such as decision trees—that offer human-understandable reasoning. The module concludes with a comparison between Explainable AI (post hoc interpretation of complex models) and Interpretable AI (models that are inherently transparent), emphasising the trade-offs between model complexity and human comprehension.

Additional Reading: https://excavating.ai/

## Introduction

As we deploy machine learning in real-world settings, we must ask critical questions: 
- Why did the model make the decision it made?
- Is the model aligned with our ethical and societal values?
- Is the model robust, or is it vulnerable to adversarial manipulation?

In addition, it is essential to understand and comply with regulatory standards, particularly the UK General Data Protection Regulation (UK GDPR), which emphasises individuals' rights to explanation and fairness in automated decision-making.

## Consequences of bad ML predictions

- **Discrimination and bias**: Machine learning models can unintentionally encode and reinforce societal biases, particularly around race, gender, and other protected characteristics.
- **Case study – wrongful arrest**: In 2020, the first publicly known case of faulty facial recognition led to the wrongful arrest of an innocent man, highlighting the real-world dangers of biased models in high-stakes scenarios like law enforcement.
- **Case study – Amazon recruitment tool**: Amazon developed a recruitment algorithm that penalised CVs containing the word “woman” or references to all-women colleges. This occurred because the system had learned from historical hiring patterns, which were themselves biased.

These examples demonstrate the importance of scrutinising both the training data and model behavior before deployment, especially in contexts where fairness and ethics are critical

## Data-related issues and how to address them

**Undersampled populations**: When certain groups are underrepresented in training data (e.g. fewer Black faces than white faces in a facial recognition dataset), models may perform poorly on those groups.

*Solution*: Augment the dataset with more diverse examples or apply data balancing techniques during training.
  
**Embedded perspectives**: Bias can be introduced through subjective choices during data annotation or collection, such as the assumptions of annotators or framing of questions.

*Solution*: Use diverse annotation teams, clear guidelines, and audit processes to reduce unintentional bias.

**Lack of transparency in dataset provenance**: Without context about how data was gathered, used, and annotated, it's difficult to evaluate potential sources of bias or ethical concerns.

*Solution*: Adopt Datasheets for Datasets. This is a documentation framework that captures important metadata about datasets, including motivation, composition, collection methodology, potential uses, and ethical considerations.

Datasheets for Datasets (Gebru et al., 2018) https://arxiv.org/abs/1803.09010. This influential paper proposes a standardised way to document datasets, promoting transparency, reproducibility, and accountability in machine learning.



## Recognising bias in models

When developing or deploying machine learning models, it's crucial to consider the perspectives and responsibilities of various stakeholders involved:
- ML and AI practitioners – Evaluate model performance, monitor fairness metrics, and identify bias during development.
- Model developers – Make design choices about training data, algorithms, and model architecture, all of which influence outcomes.
- Software developers – Integrate model predictions into applications and systems; must understand the model’s limitations.
- Policy makers – Assess how a model may impact different groups in society, especially with regard to fairness, transparency, and compliance.
- Organisations – Decide whether to adopt ML technologies, considering ethical, reputational, and legal implications.
- Affected individuals – Have the right to understand how a model works, particularly if it influences their lives (e.g. hiring, lending, law enforcement).

**Model cards** are a documentation framework designed to improve transparency and accountability in machine learning models. Much like Datasheets for Datasets, model cards aim to provide essential information that helps stakeholders understand when and how a model should (or should not) be used.

They help reduce the risk of inappropriate deployment by clearly stating the model’s capabilities, limitations, and ethical considerations.

A standard model card typically includes: 
- Intended uses – What the model is designed for, and what it is not intended for.
- Relevant factors – Characteristics (e.g., demographic, environmental) that may affect model performance.
- Metrics – The quantitative performance measures used to evaluate the model (e.g., accuracy, precision, fairness).
- Training data – Information about the data used to train the model, including sources and potential biases.
- Evaluation data – Details of datasets used to test the model, and their similarity to real-world use cases.
- Quantitative analysis – Results broken down by different subgroups (e.g., race, gender) to identify potential disparities.
- Ethical considerations – Potential risks, unintended uses, and steps taken to mitigate harm.
- Model details – Technical specifications, such as algorithm type, version, and hardware requirements.

By documenting and sharing this information, model cards empower all stakeholders to make more informed decisions and foster responsible AI development.


## Explainable and interpretable AI

As machine learning systems become more complex, it becomes increasingly important to understand why a model makes a particular decision. This is where Explainable AI (XAI) and Interpretable AI come into play.

#### Key Differences
**Explainable AI**
- Develops post hoc explanations for complex or "black box" models (e.g., deep neural networks).
- Example: In image classification, an XAI tool might highlight regions of an image that contributed most to the prediction (e.g., saliency maps or heatmaps).

**Interpretable AI**
- Focuses on building models that are inherently understandable by humans. These models are transparent by design—examples include decision trees, linear models, or rule-based systems.

#### Desirable Qualities for Modern Explanations
To be effective, modern explanation methods should aim for:
- **Soundness and completeness:** The explanation must be faithful to the actual behavior of the model and not oversimplify.
- **Computational tractability:** It must be feasible to generate the explanation without excessive computing resources.
- **Cognitive tractability:** The explanation should be understandable by humans, ideally non-experts, without requiring deep technical knowledge.

#### Characteristics of Interpretable AI
Interpretable AI models often aim for simplicity and transparency. Key features include: 
- **Sparsity:** The model uses as few features or rules as possible (aligned with Occam’s Razor) to offer clear, concise reasoning.
- **Monotonicity:** Ensures that relationships between inputs and outputs behave in expected ways (e.g., increasing income should not decrease a credit score).
- **Modularity:** The ability to break a model into interpretable components that can be understood individually.
- **Linearity:** Using linear relationships where possible makes the model easier to reason about.
