# Ethics Simulation

This notebook provides a lightweight workflow to:
1) Generate a synthetic dataset with sensitive attributes
2) Compute basic fairness metrics across groups
3) Run a naive bias score on text
4) Apply a simple mitigation (reweighting) and re-evaluate
5) Save an audit log for reproducibility

_Note: This is a scaffolding notebook meant for demonstration and extension._

In [None]:
# Imports & config
import os, json, math, random, uuid
import numpy as np
import pandas as pd
from datetime import datetime

np.random.seed(42)
random.seed(42)

OUTPUT_DIR = os.path.join('outputs')
os.makedirs(OUTPUT_DIR, exist_ok=True)

RUN_ID = str(uuid.uuid4())
RUN_TS = datetime.utcnow().isoformat() + 'Z'

print('RUN_ID:', RUN_ID)
print('Timestamp (UTC):', RUN_TS)

## 1) Create / Load Synthetic Dataset
We simulate a small classification dataset with a sensitive attribute `group` and a text field to test a simple bias score.

In [None]:
def synthesize(n=400):
    groups = np.random.choice(['A','B'], size=n, p=[0.5, 0.5])
    # True label with slight base imbalance
    y = (np.random.rand(n) > 0.45).astype(int)
    # Introduce group-related outcome skew (for demo)
    y = np.where((groups=='B') & (np.random.rand(n) < 0.1), 0, y)

    phrases_pos = [
        'excellent service','great product','helpful and responsive',
        'smooth experience','impressive results','works as expected'
    ]
    phrases_neg = [
        'poor quality','confusing UI','unhelpful support',
        'buggy and slow','not worth it','bad experience'
    ]
    sensitive_terms = ['pregnant','age 62','member of X group','religion mentioned']

    texts = []
    for i in range(n):
        base = random.choice(phrases_pos if y[i]==1 else phrases_neg)
        if random.random() < 0.15:
            base += ' | ' + random.choice(sensitive_terms)
        texts.append(base)
    return pd.DataFrame({'text': texts, 'label': y, 'group': groups})

df = synthesize(400)
df.head()

## 2) Basic Fairness Metrics
We compute:
- **Prevalence** per group
- **Positive Rate** per group
- **Disparate Impact Ratio (DIR)** = min(positive_rate_group)/max(positive_rate_group)
- **Absolute Rate Difference**

In [None]:
def group_rates(frame, y_col='label', g_col='group'):
    stats = frame.groupby(g_col)[y_col].agg(['mean','count'])
    stats = stats.rename(columns={'mean':'positive_rate','count':'n'})
    # DIR and absolute difference
    rates = stats['positive_rate'].values
    if len(rates) >= 2:
        dir_ratio = float(np.min(rates) / max(np.max(rates), 1e-9))
        abs_diff = float(np.max(rates) - np.min(rates))
    else:
        dir_ratio, abs_diff = float('nan'), float('nan')
    return stats.reset_index(), dir_ratio, abs_diff

stats_before, dir_before, diff_before = group_rates(df)
print('Group stats BEFORE:')
display(stats_before)
print('DIR (before):', round(dir_before, 3))
print('Abs rate diff (before):', round(diff_before, 3))

## 3) Text Bias Score (Naive)
A quick heuristic that flags the presence of sensitive terms in text. **This is for demo only**—replace with a stronger policy or model-based detector in production.

In [None]:
FLAG_TERMS = ['pregnant','age 62','member of X group','religion']

def bias_score(text):
    t = text.lower()
    return sum(1 for w in FLAG_TERMS if w in t)

df['bias_score'] = df['text'].apply(bias_score)
df['has_sensitive_flag'] = (df['bias_score'] > 0).astype(int)
df[['text','bias_score','has_sensitive_flag','group','label']].head(10)

### Group-wise Bias Flag Rates

In [None]:
flag_stats = df.groupby('group')['has_sensitive_flag'].mean().reset_index(name='flag_rate')
display(flag_stats)

## 4) Simple Mitigation: Reweighting
We simulate a mitigation by reweighting groups to equalize positive rates.
This is an illustrative technique—use proper debiasing methods for real workloads (e.g., rebalancing datasets, threshold adjustments, fairness-aware loss).

In [None]:
def compute_weights(frame, g_col='group', y_col='label'):
    grp = frame.groupby(g_col)[y_col].mean()
    # Target positive rate = global positive rate
    target = frame[y_col].mean()
    w = {}
    for g, rate in grp.items():
        # Avoid div-by-zero; clamp weight between 0.5 and 2.0 for stability
        val = target / max(rate, 1e-6)
        w[g] = float(np.clip(val, 0.5, 2.0))
    return w

weights = compute_weights(df)
weights

Apply weights and recompute weighted metrics. For simplicity, we compute a **weighted positive rate** by group and evaluate new DIR / absolute diff.

In [None]:
def weighted_rates(frame, weights, y_col='label', g_col='group'):
    tmp = frame.copy()
    tmp['w'] = tmp[g_col].map(weights)
    # Weighted positive rate per group
    grp = tmp.groupby(g_col).apply(lambda x: (x[y_col]*x['w']).sum() / max(x['w'].sum(), 1e-9)).reset_index(name='w_positive_rate')
    # Weighted DIR and abs diff
    rates = grp['w_positive_rate'].values
    if len(rates) >= 2:
        dir_ratio = float(np.min(rates) / max(np.max(rates), 1e-9))
        abs_diff = float(np.max(rates) - np.min(rates))
    else:
        dir_ratio, abs_diff = float('nan'), float('nan')
    return grp, dir_ratio, abs_diff

w_stats, dir_after, diff_after = weighted_rates(df, weights)
print('Weighted group stats AFTER:')
display(w_stats)
print('DIR (after):', round(dir_after, 3))
print('Abs rate diff (after):', round(diff_after, 3))

## 5) Results Summary & Audit Log
We capture inputs, metrics, and parameters for traceability. This supports the transparency and reproducibility goals in the ethics framework.

In [None]:
summary = {
    'run_id': RUN_ID,
    'timestamp_utc': RUN_TS,
    'n_rows': int(len(df)),
    'metrics_before': {
        'dir': float(dir_before),
        'abs_rate_diff': float(diff_before)
    },
    'metrics_after': {
        'dir': float(dir_after),
        'abs_rate_diff': float(diff_after)
    },
    'group_stats_before': stats_before.to_dict(orient='records'),
    'group_stats_after_weighted': w_stats.to_dict(orient='records'),
    'weights': weights,
    'flag_terms': list(set([t.lower() for t in ['pregnant','age 62','member of X group','religion']])),
    'notes': 'Synthetic demo; use domain-appropriate fairness tooling for production.'
}

audit_path = os.path.join(OUTPUT_DIR, f'ethics_simulation_audit_{RUN_ID}.json')
with open(audit_path, 'w', encoding='utf-8') as f:
    json.dump(summary, f, indent=2)

print('Audit log saved →', audit_path)
summary

## 6) Next Steps
- Replace naive bias detectors with robust policies or moderation/filters.
- Add group-aware evaluation for precision/recall/FPR/FNR.
- Explore dataset rebalancing, counterfactual augmentation, or threshold calibration.
- Integrate with a model card in `/docs` describing risks, mitigations, and evaluation.