# Bias, Safety, and Fairness in Natural Language Generation (NLG): A World-Class Guide

This Jupyter Notebook is designed as a comprehensive, timeless resource for researchers, scientists, engineers, and aspiring professionals. Inspired by the precision of Alan Turing, the curiosity of Einstein, and the innovation of Tesla, it covers fundamentals to advanced topics, with code, visualizations, projects, and more. Use it for self-study, research, or teaching—built to last for generations.

**Version:** 1.0 (August 2025)  
**Author:** Grok, built by xAI  
**Goal:** Equip you to become a leading AI scientist by mastering ethical NLG.

## Setup: Import Libraries

In [None]:
!pip install -q transformers datasets matplotlib seaborn torch numpy scikit-learn fairlearn aif360
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForMaskedLM
from datasets import load_dataset
import torch
from sklearn.metrics import confusion_matrix
from fairlearn.metrics import demographic_parity_difference
# Note: Some libraries like aif360 may need additional setup in your environment.

## 1. Theory & Tutorials: From Fundamentals to Advanced

### 1.1 Fundamentals of NLG
Natural Language Generation (NLG) is the AI process of creating human-like text from data. It uses models like transformers (e.g., GPT) trained on vast datasets to predict words probabilistically.

**Analogy:** Like a baker turning ingredients (data) into bread (text). If ingredients are biased, the bread is flawed.

**Key Concepts:**
- **NLP vs. NLG:** NLP processes language; NLG generates it.
- **Models:** RNNs, Transformers (attention mechanisms).
- **Math Basics:** Probability of next word: P(w_t | w_1...w_{t-1}) using softmax over logits.

Advanced: Sequence-to-sequence models, beam search for generation.

### 1.2 Bias in NLG
Bias: Outputs favor certain groups due to skewed data.

**Types:**
- Representation: Word embeddings link 'doctor' to 'man'.
- Selection: Overrepresentation (e.g., Western texts).
- Social: Stereotypes (e.g., gender roles).

**Causes:** Biased training data from society.

**Real-World Impact:** Reinforces inequalities (e.g., discriminatory hiring tools).

**Math:** Bias Score = |P(pos|group1) - P(pos|group2)| / max(P)

In [None]:
# Example: Simple Bias Calculation
def bias_score(p1, p2):
    return abs(p1 - p2) / max(p1, p2)

p_male = 0.9  # P(competent|male doctor)
p_female = 0.7
print(f"Bias Score: {bias_score(p_male, p_female):.3f}")

### 1.3 Safety in NLG
Safety: Ensuring outputs avoid harm (toxicity, misinformation).

**Types:**
- Content: No hate speech.
- Operational: Prevent misuse.
- Privacy: No leaks.

**Techniques:** RLHF, filters.

**Math:** Toxicity Score = sum(prob_toxic) / n

### 1.4 Fairness in NLG
Fairness: Equal treatment across groups.

**Types:** Individual, Group.

**Metrics:** Demographic Parity: P(Y=1|A=0) = P(Y=1|A=1)

**Link to Bias/Safety:** Bias causes unfairness; unfair outputs can be unsafe.

## 2. Visualizations
Visual aids to understand concepts.

In [None]:
# Visualization: Bias Bar Plot
groups = ['Male', 'Female']
probs = [0.9, 0.7]
plt.bar(groups, probs)
plt.ylabel('Probability of Positive Attribute')
plt.title('Gender Bias in Profession Descriptions')
plt.show()

In [None]:
# Pie Chart for Safety
labels = ['Safe', 'Unsafe']
sizes = [80, 20]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', colors=['green', 'red'])
plt.title('Toxicity Distribution')
plt.show()

## 3. Practical Code Guides
Step-by-step code examples.

### 3.1 Detecting Bias with HuggingFace
Load a model and test for gender bias.

In [None]:
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

prompt = "The doctor is a [MASK]."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model(**inputs).logits
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

preds = torch.topk(outputs[0, mask_token_index], 5).indices[0]
print(tokenizer.batch_decode(preds))  # Often biased toward 'man' or similar

### 3.2 Mitigation: Debiasing with Counterfactual Data
Generate balanced data.

In [None]:
# Simple example: Balance prompts
biased_prompt = "The nurse is she."
debias_prompt = "The nurse is they."  # Neutral
# Use in fine-tuning (advanced: requires training loop)

## 4. Applications: Real-World Use Cases
- **Chatbots:** Ensure safe responses (e.g., avoid toxicity in customer service).
- **Content Generation:** Fair news summaries without bias.
- **Hiring Tools:** Unbiased resume summaries.
From datasets like BOLD for evaluation.

## 5. Research Directions & Rare Insights
- **Rare Insight:** LLMs can exhibit 'implicit' biases even after explicit debiasing.
- **Directions:** Multimodal NLG fairness, continual debiasing.
- **Interdisciplinary:** Link to sociology for social bias.
From recent papers: Bias in LLMs survey.

## 6. Mini & Major Projects

### 6.1 Mini Project: Calculate Bias on BOLD Dataset
Load BOLD, generate continuations, compute bias score.

In [None]:
dataset = load_dataset("AlexaAI/bold", split="test")
# Select prompts
prompts = dataset['prompts'][:10]  # Example

generator = pipeline('text-generation', model='gpt2')
generations = [generator(p, max_length=50)[0]['generated_text'] for p in prompts]

# Pseudo: Compute bias (manual sentiment analysis)
sentiment = pipeline('sentiment-analysis')
scores = [sentiment(g)[0]['score'] if sentiment(g)[0]['label'] == 'POSITIVE' else 1 - sentiment(g)[0]['score'] for g in generations]
print(scores)

### 6.2 Major Project: Debias a Model on RealToxicityPrompts
Fine-tune GPT2 with RLHF-like for safety.

In [None]:
# Advanced: Load dataset
toxicity_dataset = load_dataset("allenai/real-toxicity-prompts")

# Fine-tuning code (simplified; full requires Trainer)
from transformers import Trainer, TrainingArguments
# Define model, tokenizer, dataset prep...
# trainer = Trainer(...)
# trainer.train()

## 7. Exercises
### Exercise 1: Compute Demographic Parity
Given predictions, calculate disparity.

**Solution:**

In [None]:
# Predictions: male positive 80/100, female 60/100
p_male = 0.8
p_female = 0.6
disparity = abs(p_male - p_female)
print(f"Disparity: {disparity}")

### Exercise 2: Visualize Word Embeddings Bias
Use GloVe or model embeddings to plot 'man-woman' vector.

In [None]:
# Load model
model = pipeline('fill-mask', model='bert-base-uncased')
# Advanced visualization with PCA (code here)

## 8. Future Directions & Next Steps
- **Next Content:** Explore multimodal (text+image) bias.
- **Paths:** Study papers [web:50-69], contribute to datasets [web:0-49].
- **Long-Term:** Ethical AI governance, quantum NLG for complex fairness.
Build on this notebook: Fork, experiment, publish.

## 9. What’s Missing in Standard Tutorials
- **Interdisciplinary Links:** Psychology of bias, legal implications.
- **Mathematical Derivations:** Full proof of attention mechanism in transformers.
- **Historical Context:** Evolution from rule-based NLG to LLMs.
- **Scalability:** Handling petabyte datasets for global fairness.

### Derivation Example: Softmax for Word Prediction
Softmax: σ(z_i) = e^{z_i} / ∑ e^{z_j}

Ensures probabilities sum to 1.

In [None]:
def softmax(z):
    e_z = np.exp(z - np.max(z))
    return e_z / e_z.sum()

z = np.array([2.0, 1.0, 0.1])
print(softmax(z))