# AI Equity in Education: An Illustrative Example

## Overview

This notebook is designed as a comprehensive guide to understanding and implementing best practices for ensuring equity in educational AI applications. Through a series of illustrative examples, we explore the development, evaluation, and maintenance of machine learning models that are not only effective but also equitable and fair across different demographics, with a specific focus on gender equity.

## Purpose

The purpose of this notebook is to:

- Demonstrate the creation of synthetic datasets that simulate educational outcomes, with a particular emphasis on including gender as a key demographic variable. This allows us to explore equity-focused model development and evaluation.
- Evaluate model performance through various metrics and testing methodologies, ensuring that the models we develop do not perpetuate existing biases and are fair across different gender groups.
- Highlight the importance of continuous monitoring and maintenance of machine learning models to prevent the emergence of bias over time, ensuring that models remain equitable and effective in changing educational landscapes.

## AI Equity in Education

AI equity in education refers to the conscientious development and deployment of AI technologies that consider and actively address potential disparities in educational outcomes across different demographics. This notebook emphasizes gender equity, aiming to showcase how careful consideration of demographic factors during the model development process can lead to more inclusive and fair educational technologies.

## Illustrative Example

The example provided in this notebook uses a synthetic dataset to simulate an educational scenario where gender is a significant factor. We walk through:

1. The generation of this dataset, ensuring it reflects a balanced representation of gender.
2. The training and evaluation of machine learning models on this dataset, with an emphasis on fairness and equity metrics.
3. Strategies for mitigating biases in these models, including post-hoc analysis and continuous monitoring for fairness.
4. The importance of data retention and deletion strategies to address ethical and privacy issues in educational data handling.

Through this illustrative example, we aim to provide a practical framework for AI practitioners looking to enhance equity in educational applications of machine learning, ensuring that these technologies benefit all students fairly and equitably.

**Import Necessary Libraries**

In [1]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

**Generate Synthetic Dataset**

In [13]:
# Generate a synthetic dataset simulating student performance
# Feature 0 can represent gender (0 or 1), and other features represent various academic and non-academic factors
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, n_classes=2, random_state=42, weights=[0.5, 0.5])

# Ensuring feature 0 (gender) has equal representation
X[:, 0] = np.random.choice([0, 1], size=X.shape[0])

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Train a Simple Model**

In [14]:
# Initialize and train the RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

**Evaluate Model Performance**

In [15]:
# Predictions
y_pred = model.predict(X_test)

# Evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')

Accuracy: 0.90
Precision: 0.89
Recall: 0.90
F1 Score: 0.89


**Bias Mitigation: Verify and Adjust for Gender Balance in Synthetic Dataset**

In [16]:
# Verify the balance of the gender feature in the dataset
gender_counts = np.bincount(X[:, 0].astype(int))  # Assuming gender is represented by the first feature
print(f"Gender counts before adjustment: {gender_counts}")

# If the dataset is unbalanced, adjust it to ensure equal representation
if abs(gender_counts[0] - gender_counts[1]) > 0.05 * len(X):
    # Find the number of samples to adjust to achieve balance
    difference = abs(gender_counts[0] - gender_counts[1])
    minority_class = 0 if gender_counts[0] < gender_counts[1] else 1
    indices_to_add = np.where(X[:, 0] == minority_class)[0]
    np.random.shuffle(indices_to_add)
    indices_to_add = indices_to_add[:difference]

    # Add samples from the minority class to achieve balance
    X_balanced = np.vstack((X, X[indices_to_add]))
    y_balanced = np.hstack((y, y[indices_to_add]))

    # Shuffle the balanced dataset
    shuffled_indices = np.random.permutation(len(X_balanced))
    X_balanced = X_balanced[shuffled_indices]
    y_balanced = y_balanced[shuffled_indices]

    print(f"Dataset balanced with {len(X_balanced)} samples.")
else:
    X_balanced = X
    y_balanced = y
    print("Dataset is already balanced.")

# Updating the variable names to continue with the balanced dataset
X = X_balanced
y = y_balanced

Gender counts before adjustment: [506 494]
Dataset is already balanced.


**Serialize Model**

In [17]:
import pickle

# Serialize the model
with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)

print("Model serialized and saved as 'model.pkl'")

Model serialized and saved as 'model.pkl'


**A/B Testing Simulation**

In [20]:
# Assuming Model A is already trained and is model
# Let's introduce Model B, trained on the demographic-adjusted dataset for comparison

# Train Model B with a slightly different setup or on the adjusted dataset
model_B = RandomForestClassifier(n_estimators=50, random_state=24)  # Different seed and number of estimators
model_B.fit(X_train, y_train)  # Assuming X_train, y_train are from the demographic-balanced dataset

# A/B Testing Function remains the same
def ab_test(model_a, model_b, X, y):
    # Split the test data into two halves for A/B testing
    split_index = len(X) // 2
    X_a, X_b = X[:split_index], X[split_index:]
    y_a, y_b = y[:split_index], y[split_index:]

    # Predict with both models
    y_pred_a = model_a.predict(X_a)
    y_pred_b = model_b.predict(X_b)

    # Calculate accuracy for both models
    accuracy_a = accuracy_score(y_a, y_pred_a)
    accuracy_b = accuracy_score(y_b, y_pred_b)

    return accuracy_a, accuracy_b

# Conduct A/B testing using the test set
accuracy_a, accuracy_b = ab_test(model, model_B, X_test, y_test)  # `model` is Model A from previous cells
print(f"Accuracy of Model A: {accuracy_a:.2f}")
print(f"Accuracy of Model B: {accuracy_b:.2f}")

Accuracy of Model A: 0.93
Accuracy of Model B: 0.87


**Potential Impact Assessment of Output**

In [21]:
import pandas as pd
from sklearn.metrics import mean_absolute_error

# Assuming X_test, y_test are available from previous cells

# Invariance Test
def invariance_test(model, X, y, noise_scale=0.01):
    """
    Performs an invariance test by adding small noise to the input and comparing the prediction variance.
    """
    # Add noise
    X_noisy = X + np.random.normal(0, noise_scale, X.shape)
    y_pred = model.predict(X)
    y_pred_noisy = model.predict(X_noisy)
    variance = np.mean(np.abs(y_pred - y_pred_noisy))
    return variance

# Directional Expectation Test
def directional_expectation_test(model, X, y, feature_index, delta):
    """
    Tests if increasing a feature value increases (or decreases) the model's output as expected.
    """
    X_modified = X.copy()
    X_modified[:, feature_index] += delta  # Increase a feature value
    y_pred = model.predict(X)
    y_pred_modified = model.predict(X_modified)
    directional_change = (y_pred_modified > y_pred).mean()  # Expect more positive predictions
    return directional_change

# Minimum Functionality Test
def minimum_functionality_test(model, X, easy_samples):
    """
    Checks if the model predicts 'easy' samples correctly.
    """
    y_pred_easy = model.predict(easy_samples)
    accuracy_easy = (y_pred_easy == 1).mean()  # Assuming '1' is the expected easy prediction
    return accuracy_easy

# Perform tests
variance = invariance_test(model, X_test, y_test)
directional_change = directional_expectation_test(model, X_test, y_test, feature_index=0, delta=1)
easy_samples = np.random.normal(0, 1, (10, X_test.shape[1]))  # Generate easy samples, this part needs domain knowledge
accuracy_easy = minimum_functionality_test(model, X_test, easy_samples)

print(f"Invariance Test Variance: {variance:.4f}")
print(f"Directional Expectation Change: {directional_change:.2f}")
print(f"Minimum Functionality Test Accuracy: {accuracy_easy:.2f}")

Invariance Test Variance: 0.0000
Directional Expectation Change: 0.00
Minimum Functionality Test Accuracy: 0.60


## Maintenance of Machine Learning Models for Equity in Education

### Avoiding Bias Evolution Over Time

Bias in machine learning models can lead to unfair predictions, disproportionately affecting specific groups. For educational models, this could mean gender biases that affect student outcomes. To prevent bias evolution over time:

- **Disparate Impact Analysis**: Examine the impact of the model's decisions on different genders to measure outcome disparities. This helps identify any biases in predictions.
  
- **Fairness Metrics**: Utilize metrics such as Equal Opportunity Difference, Disparate Misclassification Rate, and Treatment Equality to assess the fairness of the model towards different genders.
  
- **Post-hoc Analysis**: Regularly review the model’s decisions to identify and correct biases, ensuring it remains fair and equitable.

### Data Retention and Deletion Strategy

Implementing a data retention policy is crucial for handling ethical and privacy issues, especially with student data:

- **Classify and Organize Data**: Based on risk level and intended use, ensuring compliance with privacy regulations.
  
- **Data Deletion Policies**: Adhere to regulations requiring request-based deletion and scheduled data purge processes, balancing data utility with privacy rights.

### Continuous Fairness Monitoring

Continuous monitoring is essential to maintain the fairness of ML models in educational settings:

- **Quantile Demographic Drift (QDD)**: Monitor fairness over the model lifecycle, using quantile binning to detect prediction disparities among genders. Incorporate tools like FairCanary for real-time bias metrics and mitigation strategies.
  
- **Bias Mitigation Strategies**: When unfairness is detected, employ strategies such as equalized odds post-processing to correct biases without the need for retraining the model.

### Mitigating Biases in Output Use

To ensure the equitable use of model outputs:

- **Model Facts Label**: Provide users with a comprehensive overview of the model, including its data sources, validation results, and usage guidelines, highlighting any limitations or risks.
  
- **Technical Mitigation**: Implement technical solutions like equalized odds post-processing to adjust model outputs, optimizing for fairness across different genders.

By adhering to these practices, we can ensure our models support equity in education, providing fair and accurate predictions for all students, regardless of gender. Continuous assessment and adjustment are key to mitigating biases and maintaining the relevance and ethical integrity of our models over time.

## Tools for Ensuring Fairness and Equity in ML Models

Maintaining fairness and equity in machine learning models requires continuous effort and the right set of tools. Here are some GitHub repositories that provide resources for monitoring model performance, analyzing biases, and implementing mitigation strategies:

### Fairness Metrics and Bias Analysis

- **AI Fairness 360 (AIF360)**: An extensible open-source library containing techniques to help detect and mitigate bias in machine learning models. The toolkit provides implementations of various fairness metrics and bias mitigation algorithms.
  - GitHub: [https://github.com/Trusted-AI/AIF360](https://github.com/Trusted-AI/AIF360)
  
- **Fairlearn**: A toolkit that aims to enable developers to assess and improve the fairness of their AI systems. The library includes fairness metrics and algorithms for mitigating unfairness.
  - GitHub: [https://github.com/fairlearn/fairlearn](https://github.com/fairlearn/fairlearn)

### Continuous Monitoring for Fairness

- **Fairness Monitoring**: Although not a direct GitHub tool, the concept of FairCanary, as mentioned, inspired the development of continuous monitoring solutions. For practical implementations, consider integrating fairness checks into existing model monitoring frameworks or using AIF360 in a continuous integration setup.

### Data Retention and Ethical Handling

- **Deon**: A command-line tool that allows you to easily add an ethics checklist to your data science projects. While not directly handling data retention, it prompts considerations around data privacy and ethical guidelines.
  - GitHub: [https://github.com/drivendataorg/deon](https://github.com/drivendataorg/deon)

### Bias Mitigation Techniques

- **What-If Tool (WIT)**: An interactive visual interface designed for probing and analyzing machine learning models, which can be used for post-hoc analysis and understanding model behavior across different groups.
  - GitHub: [https://github.com/PAIR-code/what-if-tool](https://github.com/PAIR-code/what-if-tool)

- **Equalized Odds Postprocessing in AIF360**: This technique, available within the AIF360 toolkit, optimizes prediction thresholds for groups to achieve equalized odds, helping mitigate output biases.
  - AIF360 Implementation: [https://aif360.readthedocs.io/en/stable/](https://aif360.readthedocs.io/en/stable/)

These tools and libraries provide practical ways to address bias, ensure fairness, and maintain the integrity and equity of machine learning models over time. Integrating these resources into your ML workflow can help in achieving more equitable outcomes in applications such as education.