# Responsible AI and Fairness

## Overview
Essential knowledge for enterprise ML and FAANG interviews:
- **Bias Detection**: Identifying unfair model behavior
- **Fairness Metrics**: Demographic parity, equalized odds, calibration
- **Explainability**: SHAP, LIME, attention visualization
- **Compliance**: EU AI Act, GDPR, model documentation

## Why This Matters
- **Legal**: EU AI Act mandates fairness and explainability
- **Reputation**: Biased AI causes PR disasters
- **Business**: Unfair models lose customers and face lawsuits
- **Ethics**: Building AI that serves everyone fairly

## FAANG Interview Focus
- How do you detect bias in ML models?
- Explain the trade-off between different fairness metrics
- How do you make model predictions explainable?
- Design a fairness monitoring system

In [None]:
# Installation
# pip install aif360 fairlearn shap lime scikit-learn pandas numpy

import numpy as np
import pandas as pd
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass
from enum import Enum
from collections import defaultdict

print("Responsible AI & Fairness - FAANG Interview Prep")

## Part 1: Understanding Bias in ML

### Types of Bias
1. **Historical Bias**: Training data reflects past discrimination
2. **Representation Bias**: Underrepresentation of certain groups
3. **Measurement Bias**: Features measured differently across groups
4. **Aggregation Bias**: One model doesn't fit all subgroups
5. **Evaluation Bias**: Test data doesn't represent production

In [None]:
class BiasType(Enum):
    HISTORICAL = "historical"  # Past discrimination in data
    REPRESENTATION = "representation"  # Underrepresented groups
    MEASUREMENT = "measurement"  # Inconsistent feature collection
    AGGREGATION = "aggregation"  # Model doesn't fit subgroups
    EVALUATION = "evaluation"  # Test data not representative
    DEPLOYMENT = "deployment"  # Different behavior in production

@dataclass
class FairnessReport:
    """Standardized fairness evaluation report."""
    dataset_name: str
    protected_attribute: str
    privileged_group: str
    unprivileged_group: str
    metrics: Dict[str, float]
    violations: List[str]
    recommendations: List[str]

def create_synthetic_biased_data(n_samples: int = 1000) -> pd.DataFrame:
    """
    Create synthetic dataset with intentional bias for demonstration.
    
    Scenario: Loan approval with gender bias.
    """
    np.random.seed(42)
    
    # Demographics
    gender = np.random.choice(['male', 'female'], n_samples, p=[0.6, 0.4])
    age = np.random.normal(35, 10, n_samples).clip(18, 65).astype(int)
    
    # Features
    income = np.random.normal(50000, 20000, n_samples).clip(20000, 150000)
    credit_score = np.random.normal(650, 100, n_samples).clip(300, 850).astype(int)
    
    # Introduce bias: males get higher credit scores on average
    credit_score = np.where(gender == 'male', credit_score + 30, credit_score).clip(300, 850)
    
    # Target: Loan approval (biased toward males)
    base_prob = 0.3 + (credit_score - 500) / 1000 + (income - 30000) / 200000
    # Add gender bias to approval
    approval_prob = np.where(gender == 'male', base_prob + 0.1, base_prob - 0.05)
    approval_prob = approval_prob.clip(0.1, 0.9)
    approved = np.random.binomial(1, approval_prob)
    
    return pd.DataFrame({
        'gender': gender,
        'age': age,
        'income': income,
        'credit_score': credit_score,
        'approved': approved
    })

# Create biased dataset
print("\n=== Synthetic Biased Dataset ===")
df = create_synthetic_biased_data(5000)

print(f"Dataset shape: {df.shape}")
print(f"\nApproval rates by gender:")
print(df.groupby('gender')['approved'].mean())

## Part 2: Fairness Metrics

### Key Metrics:
1. **Demographic Parity**: Equal positive rates across groups
2. **Equalized Odds**: Equal TPR and FPR across groups
3. **Calibration**: Predictions mean the same thing across groups
4. **Individual Fairness**: Similar individuals treated similarly

In [None]:
class FairnessMetrics:
    """
    Comprehensive fairness metrics calculator.
    
    Based on: IBM AIF360, Microsoft Fairlearn
    """
    
    def __init__(self, y_true: np.ndarray, y_pred: np.ndarray, 
                 protected_attribute: np.ndarray,
                 privileged_value: Any):
        """
        Args:
            y_true: Ground truth labels
            y_pred: Predicted labels (binary)
            protected_attribute: Protected attribute values (e.g., gender)
            privileged_value: Value of the privileged group (e.g., 'male')
        """
        self.y_true = y_true
        self.y_pred = y_pred
        self.protected = protected_attribute
        self.privileged_value = privileged_value
        
        # Masks for groups
        self.privileged_mask = protected_attribute == privileged_value
        self.unprivileged_mask = ~self.privileged_mask
    
    def demographic_parity_difference(self) -> float:
        """
        Demographic Parity: P(Y_hat=1|A=0) = P(Y_hat=1|A=1)
        
        Measures: Are positive predictions equal across groups?
        Ideal: 0 (no difference)
        """
        priv_rate = self.y_pred[self.privileged_mask].mean()
        unpriv_rate = self.y_pred[self.unprivileged_mask].mean()
        return priv_rate - unpriv_rate
    
    def disparate_impact_ratio(self) -> float:
        """
        Disparate Impact: P(Y_hat=1|A=0) / P(Y_hat=1|A=1)
        
        Legal threshold: 0.8 (80% rule)
        Ideal: 1.0 (equal rates)
        """
        priv_rate = self.y_pred[self.privileged_mask].mean()
        unpriv_rate = self.y_pred[self.unprivileged_mask].mean()
        return unpriv_rate / (priv_rate + 1e-10)
    
    def equalized_odds_difference(self) -> Dict[str, float]:
        """
        Equalized Odds: Equal TPR and FPR across groups.
        
        P(Y_hat=1|Y=1,A=0) = P(Y_hat=1|Y=1,A=1)  # TPR
        P(Y_hat=1|Y=0,A=0) = P(Y_hat=1|Y=0,A=1)  # FPR
        """
        # TPR for privileged
        priv_pos = self.privileged_mask & (self.y_true == 1)
        priv_tpr = self.y_pred[priv_pos].mean() if priv_pos.sum() > 0 else 0
        
        # TPR for unprivileged
        unpriv_pos = self.unprivileged_mask & (self.y_true == 1)
        unpriv_tpr = self.y_pred[unpriv_pos].mean() if unpriv_pos.sum() > 0 else 0
        
        # FPR for privileged
        priv_neg = self.privileged_mask & (self.y_true == 0)
        priv_fpr = self.y_pred[priv_neg].mean() if priv_neg.sum() > 0 else 0
        
        # FPR for unprivileged
        unpriv_neg = self.unprivileged_mask & (self.y_true == 0)
        unpriv_fpr = self.y_pred[unpriv_neg].mean() if unpriv_neg.sum() > 0 else 0
        
        return {
            'tpr_difference': priv_tpr - unpriv_tpr,
            'fpr_difference': priv_fpr - unpriv_fpr,
            'privileged_tpr': priv_tpr,
            'unprivileged_tpr': unpriv_tpr,
            'privileged_fpr': priv_fpr,
            'unprivileged_fpr': unpriv_fpr
        }
    
    def calibration_difference(self, y_prob: np.ndarray, 
                               n_bins: int = 10) -> Dict[str, float]:
        """
        Calibration: Predicted probability should match actual rate.
        
        For each group: P(Y=1|Y_hat=p) = p
        """
        def calibration_error(mask):
            bins = np.linspace(0, 1, n_bins + 1)
            errors = []
            
            for i in range(n_bins):
                bin_mask = mask & (y_prob >= bins[i]) & (y_prob < bins[i + 1])
                if bin_mask.sum() > 0:
                    pred_mean = y_prob[bin_mask].mean()
                    actual_mean = self.y_true[bin_mask].mean()
                    errors.append(abs(pred_mean - actual_mean))
            
            return np.mean(errors) if errors else 0
        
        priv_cal = calibration_error(self.privileged_mask)
        unpriv_cal = calibration_error(self.unprivileged_mask)
        
        return {
            'privileged_calibration_error': priv_cal,
            'unprivileged_calibration_error': unpriv_cal,
            'calibration_difference': abs(priv_cal - unpriv_cal)
        }
    
    def get_all_metrics(self, y_prob: np.ndarray = None) -> Dict[str, Any]:
        """Compute all fairness metrics."""
        metrics = {
            'demographic_parity_diff': self.demographic_parity_difference(),
            'disparate_impact_ratio': self.disparate_impact_ratio(),
            'equalized_odds': self.equalized_odds_difference()
        }
        
        if y_prob is not None:
            metrics['calibration'] = self.calibration_difference(y_prob)
        
        return metrics
    
    def is_fair(self, threshold: float = 0.1) -> Tuple[bool, List[str]]:
        """
        Check if model passes fairness criteria.
        
        Common thresholds:
        - Demographic parity diff: |diff| < 0.1
        - Disparate impact: 0.8 < ratio < 1.25
        - Equalized odds: |diff| < 0.1
        """
        violations = []
        
        dp_diff = abs(self.demographic_parity_difference())
        if dp_diff > threshold:
            violations.append(f'Demographic parity: {dp_diff:.3f} > {threshold}')
        
        di_ratio = self.disparate_impact_ratio()
        if di_ratio < 0.8 or di_ratio > 1.25:
            violations.append(f'Disparate impact: {di_ratio:.3f} (should be 0.8-1.25)')
        
        eo = self.equalized_odds_difference()
        if abs(eo['tpr_difference']) > threshold:
            violations.append(f'TPR difference: {eo["tpr_difference"]:.3f} > {threshold}')
        if abs(eo['fpr_difference']) > threshold:
            violations.append(f'FPR difference: {eo["fpr_difference"]:.3f} > {threshold}')
        
        return len(violations) == 0, violations

# Example: Evaluate fairness on biased data
print("\n=== Fairness Metrics Example ===")

# Simulate model predictions (biased model)
y_true = df['approved'].values
# Biased predictions: higher for males
y_prob = np.where(
    df['gender'] == 'male',
    np.random.beta(3, 2, len(df)),  # Higher for males
    np.random.beta(2, 3, len(df))   # Lower for females
)
y_pred = (y_prob > 0.5).astype(int)

# Calculate fairness metrics
fairness = FairnessMetrics(
    y_true=y_true,
    y_pred=y_pred,
    protected_attribute=df['gender'].values,
    privileged_value='male'
)

metrics = fairness.get_all_metrics(y_prob)
print(f"\nDemographic Parity Difference: {metrics['demographic_parity_diff']:.4f}")
print(f"Disparate Impact Ratio: {metrics['disparate_impact_ratio']:.4f}")
print(f"TPR Difference: {metrics['equalized_odds']['tpr_difference']:.4f}")
print(f"FPR Difference: {metrics['equalized_odds']['fpr_difference']:.4f}")

is_fair, violations = fairness.is_fair(threshold=0.1)
print(f"\nModel is fair: {is_fair}")
if violations:
    print("Violations:")
    for v in violations:
        print(f"  - {v}")

## Part 3: Bias Mitigation Techniques

### Mitigation Strategies:
1. **Pre-processing**: Fix the data before training
2. **In-processing**: Modify the training algorithm
3. **Post-processing**: Adjust predictions after training

In [None]:
class BiasMitigation:
    """Bias mitigation techniques."""
    
    @staticmethod
    def reweighting(y_true: np.ndarray, protected: np.ndarray,
                   privileged_value: Any) -> np.ndarray:
        """
        Pre-processing: Assign weights to balance groups.
        
        Weight = P(Y,A) / (P(Y) * P(A))
        """
        priv_mask = protected == privileged_value
        
        # Compute joint and marginal probabilities
        n = len(y_true)
        weights = np.ones(n)
        
        for y_val in [0, 1]:
            for is_priv in [True, False]:
                mask = (y_true == y_val) & (priv_mask == is_priv)
                if mask.sum() > 0:
                    p_y = (y_true == y_val).mean()
                    p_a = (priv_mask == is_priv).mean()
                    p_ya = mask.mean()
                    
                    # Weight to achieve independence
                    expected = p_y * p_a
                    weights[mask] = expected / (p_ya + 1e-10)
        
        return weights
    
    @staticmethod
    def disparate_impact_remover(X: np.ndarray, protected: np.ndarray,
                                 repair_level: float = 1.0) -> np.ndarray:
        """
        Pre-processing: Transform features to remove correlation with protected attribute.
        
        repair_level: 0.0 (no change) to 1.0 (full repair)
        """
        X_transformed = X.copy()
        unique_groups = np.unique(protected)
        
        for col in range(X.shape[1]):
            # Compute median for each group
            medians = {}
            for g in unique_groups:
                medians[g] = np.median(X[protected == g, col])
            
            # Overall median
            overall_median = np.median(X[:, col])
            
            # Repair: Move each group's distribution toward overall median
            for g in unique_groups:
                mask = protected == g
                group_vals = X[mask, col]
                
                # Shift toward overall median
                shift = (overall_median - medians[g]) * repair_level
                X_transformed[mask, col] = group_vals + shift
        
        return X_transformed
    
    @staticmethod
    def threshold_optimizer(y_prob: np.ndarray, y_true: np.ndarray,
                           protected: np.ndarray, privileged_value: Any,
                           constraint: str = 'demographic_parity') -> Dict[str, float]:
        """
        Post-processing: Find optimal thresholds per group for fairness.
        
        Grid search over thresholds to satisfy fairness constraint.
        """
        priv_mask = protected == privileged_value
        best_thresholds = {'privileged': 0.5, 'unprivileged': 0.5}
        best_accuracy = 0
        
        for priv_thresh in np.arange(0.3, 0.7, 0.05):
            for unpriv_thresh in np.arange(0.3, 0.7, 0.05):
                # Apply group-specific thresholds
                y_pred = np.where(
                    priv_mask,
                    (y_prob > priv_thresh).astype(int),
                    (y_prob > unpriv_thresh).astype(int)
                )
                
                # Check fairness constraint
                priv_rate = y_pred[priv_mask].mean()
                unpriv_rate = y_pred[~priv_mask].mean()
                
                if constraint == 'demographic_parity':
                    is_fair = abs(priv_rate - unpriv_rate) < 0.05
                else:
                    is_fair = True
                
                if is_fair:
                    accuracy = (y_pred == y_true).mean()
                    if accuracy > best_accuracy:
                        best_accuracy = accuracy
                        best_thresholds = {
                            'privileged': priv_thresh,
                            'unprivileged': unpriv_thresh
                        }
        
        return best_thresholds

# Example: Apply mitigation
print("\n=== Bias Mitigation Example ===")

# Reweighting
weights = BiasMitigation.reweighting(
    y_true=y_true,
    protected=df['gender'].values,
    privileged_value='male'
)
print(f"Sample weights: mean={weights.mean():.3f}, std={weights.std():.3f}")

# Threshold optimization
thresholds = BiasMitigation.threshold_optimizer(
    y_prob=y_prob,
    y_true=y_true,
    protected=df['gender'].values,
    privileged_value='male'
)
print(f"\nOptimal thresholds: {thresholds}")

## Part 4: Model Explainability

### Explainability Methods:
1. **Global**: Overall model behavior (feature importance)
2. **Local**: Individual prediction explanation (SHAP, LIME)

In [None]:
class ModelExplainer:
    """
    Model explainability using SHAP-like analysis.
    
    Implements simplified versions of:
    - SHAP (SHapley Additive exPlanations)
    - LIME (Local Interpretable Model-agnostic Explanations)
    """
    
    def __init__(self, model, feature_names: List[str]):
        self.model = model
        self.feature_names = feature_names
    
    def permutation_importance(self, X: np.ndarray, y: np.ndarray,
                              n_repeats: int = 10) -> Dict[str, float]:
        """
        Global: Feature importance via permutation.
        
        Shuffle each feature and measure accuracy drop.
        """
        baseline_score = self._score(X, y)
        importances = {}
        
        for i, name in enumerate(self.feature_names):
            scores = []
            for _ in range(n_repeats):
                X_permuted = X.copy()
                np.random.shuffle(X_permuted[:, i])
                score = self._score(X_permuted, y)
                scores.append(baseline_score - score)
            
            importances[name] = np.mean(scores)
        
        return importances
    
    def shap_values_approx(self, X: np.ndarray, instance: np.ndarray,
                          n_samples: int = 100) -> Dict[str, float]:
        """
        Local: Approximate SHAP values for a single instance.
        
        Uses sampling-based approximation of Shapley values.
        """
        n_features = len(self.feature_names)
        shap_values = np.zeros(n_features)
        
        # Background data (reference)
        background = X[np.random.choice(len(X), min(100, len(X)), replace=False)]
        
        for _ in range(n_samples):
            # Random permutation
            perm = np.random.permutation(n_features)
            
            # Background instance
            bg_idx = np.random.randint(len(background))
            bg = background[bg_idx].copy()
            
            # Add features one by one according to permutation
            current = bg.copy()
            prev_pred = self.model(current.reshape(1, -1))[0]
            
            for i, feat_idx in enumerate(perm):
                current[feat_idx] = instance[feat_idx]
                curr_pred = self.model(current.reshape(1, -1))[0]
                
                # Marginal contribution
                shap_values[feat_idx] += (curr_pred - prev_pred)
                prev_pred = curr_pred
        
        # Average
        shap_values /= n_samples
        
        return dict(zip(self.feature_names, shap_values))
    
    def lime_explanation(self, instance: np.ndarray, 
                        n_samples: int = 500) -> Dict[str, float]:
        """
        Local: LIME explanation for a single instance.
        
        Fits a local linear model around the instance.
        """
        from sklearn.linear_model import Ridge
        
        # Generate perturbed samples around instance
        n_features = len(instance)
        perturbations = np.random.normal(0, 0.1, (n_samples, n_features))
        samples = instance + perturbations
        
        # Get predictions for perturbed samples
        predictions = self.model(samples)
        
        # Compute weights (closer samples have higher weight)
        distances = np.sqrt(np.sum(perturbations ** 2, axis=1))
        kernel_width = 0.75 * np.sqrt(n_features)
        weights = np.exp(-distances ** 2 / kernel_width ** 2)
        
        # Fit weighted linear model
        model = Ridge(alpha=1.0)
        model.fit(samples, predictions, sample_weight=weights)
        
        return dict(zip(self.feature_names, model.coef_))
    
    def _score(self, X: np.ndarray, y: np.ndarray) -> float:
        """Compute accuracy score."""
        predictions = self.model(X)
        if predictions.ndim > 1:
            predictions = predictions.squeeze()
        return (predictions.round() == y).mean()

# Example: Create a simple model and explain it
print("\n=== Model Explainability Example ===")

# Simple model function
def simple_model(X):
    """Logistic-like model."""
    weights = np.array([0.001, 0.002, 0.00001, 0.001])  # income, credit, age, gender
    logits = X @ weights
    return 1 / (1 + np.exp(-logits))

# Prepare data
X = df[['income', 'credit_score', 'age']].values
X = np.column_stack([X, (df['gender'] == 'male').astype(float)])
feature_names = ['income', 'credit_score', 'age', 'is_male']

explainer = ModelExplainer(simple_model, feature_names)

# Permutation importance
importance = explainer.permutation_importance(X, y_true, n_repeats=5)
print("\nPermutation Importance:")
for feat, imp in sorted(importance.items(), key=lambda x: -abs(x[1])):
    print(f"  {feat}: {imp:.4f}")

# SHAP for single instance
instance = X[0]
shap_vals = explainer.shap_values_approx(X, instance, n_samples=50)
print("\nSHAP Values for first instance:")
for feat, val in sorted(shap_vals.items(), key=lambda x: -abs(x[1])):
    print(f"  {feat}: {val:.4f}")

## Part 5: Model Cards and Documentation

### Model Card Template (Google's Standard)

In [None]:
@dataclass
class ModelCard:
    """
    Model Card: Standardized documentation for ML models.
    
    Based on: Google's Model Cards for Model Reporting
    Required for: EU AI Act compliance, responsible AI practices
    """
    
    # Model details
    model_name: str
    model_version: str
    model_type: str  # e.g., "binary classification"
    model_description: str
    model_owner: str
    
    # Intended use
    primary_intended_uses: List[str]
    primary_intended_users: List[str]
    out_of_scope_uses: List[str]
    
    # Training data
    training_data_description: str
    training_data_size: int
    training_data_demographics: Dict[str, Any]
    
    # Evaluation data
    evaluation_data_description: str
    evaluation_data_size: int
    
    # Performance metrics
    overall_metrics: Dict[str, float]
    disaggregated_metrics: Dict[str, Dict[str, float]]  # By subgroup
    
    # Fairness analysis
    fairness_metrics: Dict[str, float]
    fairness_considerations: List[str]
    
    # Limitations and ethical considerations
    limitations: List[str]
    ethical_considerations: List[str]
    
    # Recommendations
    recommendations: List[str]
    
    def to_markdown(self) -> str:
        """Generate markdown documentation."""
        md = f"""# Model Card: {self.model_name}

## Model Details
- **Version**: {self.model_version}
- **Type**: {self.model_type}
- **Owner**: {self.model_owner}
- **Description**: {self.model_description}

## Intended Use
### Primary Uses
{"".join(f"- {use}" + chr(10) for use in self.primary_intended_uses)}

### Out of Scope Uses
{"".join(f"- {use}" + chr(10) for use in self.out_of_scope_uses)}

## Training Data
- **Description**: {self.training_data_description}
- **Size**: {self.training_data_size:,} samples

## Performance
### Overall Metrics
| Metric | Value |
|--------|-------|
{"".join(f"| {k} | {v:.4f} |" + chr(10) for k, v in self.overall_metrics.items())}

## Fairness Analysis
| Metric | Value |
|--------|-------|
{"".join(f"| {k} | {v:.4f} |" + chr(10) for k, v in self.fairness_metrics.items())}

### Considerations
{"".join(f"- {c}" + chr(10) for c in self.fairness_considerations)}

## Limitations
{"".join(f"- {l}" + chr(10) for l in self.limitations)}

## Ethical Considerations
{"".join(f"- {e}" + chr(10) for e in self.ethical_considerations)}

## Recommendations
{"".join(f"- {r}" + chr(10) for r in self.recommendations)}
"""
        return md

# Example model card
print("\n=== Model Card Example ===")

model_card = ModelCard(
    model_name="Loan Approval Classifier",
    model_version="2.1.0",
    model_type="Binary Classification",
    model_description="Predicts loan approval likelihood based on applicant features",
    model_owner="ML Risk Team",
    
    primary_intended_uses=[
        "Pre-screening loan applications",
        "Assisting human reviewers in decision-making"
    ],
    primary_intended_users=["Loan officers", "Risk analysts"],
    out_of_scope_uses=[
        "Fully automated loan decisions without human review",
        "Credit scoring for non-loan products"
    ],
    
    training_data_description="Historical loan applications from 2020-2023",
    training_data_size=500000,
    training_data_demographics={'gender': {'male': 0.55, 'female': 0.45}},
    
    evaluation_data_description="Held-out 2023 Q4 applications",
    evaluation_data_size=50000,
    
    overall_metrics={
        'accuracy': 0.85,
        'precision': 0.82,
        'recall': 0.78,
        'auc_roc': 0.89
    },
    disaggregated_metrics={
        'male': {'accuracy': 0.87, 'precision': 0.85},
        'female': {'accuracy': 0.82, 'precision': 0.78}
    },
    
    fairness_metrics={
        'demographic_parity_diff': 0.08,
        'disparate_impact_ratio': 0.91
    },
    fairness_considerations=[
        "Model shows slight bias toward male applicants",
        "Mitigation: Applied threshold optimization per group"
    ],
    
    limitations=[
        "Performance degrades for applicants under 25",
        "Limited data for self-employed applicants"
    ],
    ethical_considerations=[
        "Model should not be used as sole decision-maker",
        "Regular fairness audits required"
    ],
    
    recommendations=[
        "Combine with human review for edge cases",
        "Monitor for drift monthly",
        "Retrain quarterly with fresh data"
    ]
)

print(model_card.to_markdown()[:1500] + "...")

## Key Takeaways

### Fairness Checklist:
1. **Identify protected attributes**: gender, race, age, disability
2. **Choose appropriate metrics**: Demographic parity vs equalized odds
3. **Measure baseline bias**: Before any mitigation
4. **Apply mitigation**: Pre/in/post-processing techniques
5. **Document everything**: Model cards, audit trails
6. **Monitor in production**: Continuous fairness monitoring

### Explainability Checklist:
1. **Global explanations**: Feature importance, partial dependence
2. **Local explanations**: SHAP, LIME for individual predictions
3. **Counterfactuals**: "What would change the prediction?"
4. **Human-readable**: Explanations non-technical users understand

## FAANG Interview Questions

**Q1: How do you detect bias in an ML model?**
- Compute fairness metrics across protected groups
- Check demographic parity, equalized odds, calibration
- Use disparate impact ratio (80% rule)
- Analyze disaggregated performance by subgroup

**Q2: Explain the trade-off between different fairness metrics.**
- Demographic parity: Equal positive rates (may sacrifice accuracy)
- Equalized odds: Equal TPR/FPR (preserves prediction quality)
- Calibration: Same meaning across groups (for probabilistic models)
- Impossibility theorem: Can't satisfy all metrics simultaneously

**Q3: How do you make model predictions explainable?**
- SHAP: Shapley values for feature attribution
- LIME: Local linear approximation
- Attention weights: For transformer models
- Decision rules: For tree-based models

**Q4: Design a fairness monitoring system.**
- Track fairness metrics over time
- Alert on threshold violations
- Slice analysis by protected attributes
- A/B test fairness interventions
- Regular audits and documentation