# Quick Start: DeepBridge for Fairness Detection

This notebook demonstrates how to use the **DeepBridge** library for automated fairness detection.

## What is DeepBridge?

DeepBridge is a library for automated fairness analysis in machine learning datasets. It provides:
- **Auto-detection** of sensitive attributes (race, gender, age, etc.)
- **Fairness metrics** computation (demographic parity, equalized odds, etc.)
- **Bias detection** with configurable thresholds
- **EEOC/ECOA compliance** checking

## Installation

```bash
# Install DeepBridge from local source
pip install -e /home/guhaase/projetos/DeepBridge/deepbridge
```

**Estimated time**: 5-10 minutes

## 1. Import Libraries

In [None]:
# Import DeepBridge
from deepbridge import DBDataset

# Other libraries
import pandas as pd
import numpy as np
from pathlib import Path

print("✓ Libraries imported successfully")

## 2. Create Sample Data

Let's create a simple dataset with intentional bias to demonstrate DeepBridge's capabilities.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Create sample data (500 samples)
n_samples = 500

data = {
    'age': np.random.randint(18, 70, n_samples),
    'income': np.random.randint(20000, 150000, n_samples),
    'education_years': np.random.randint(8, 20, n_samples),
    'gender': np.random.choice(['Male', 'Female'], n_samples),
    'race': np.random.choice(['White', 'Black', 'Asian', 'Hispanic'], n_samples),
}

# Create target with intentional bias:
# - Higher approval for males (70% vs 50%)
# - Higher approval for White race (65% vs 50%)
approved = []
for i in range(n_samples):
    base_prob = 0.5
    
    # Gender bias
    if data['gender'][i] == 'Male':
        base_prob += 0.2
    
    # Race bias
    if data['race'][i] == 'White':
        base_prob += 0.15
    
    # Random approval based on biased probability
    approved.append(1 if np.random.random() < base_prob else 0)

data['approved'] = approved

# Create DataFrame
df = pd.DataFrame(data)

print(f"✓ Created dataset with {len(df)} samples")
print(f"\nDataset shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst 5 rows:")
df.head()

## 3. Create DBDataset

The `DBDataset` class is the main entry point for DeepBridge. It automatically detects sensitive attributes.

In [None]:
# Create DBDataset (auto-detects sensitive attributes)
dataset = DBDataset(
    data=df,
    target_column='approved'
)

print("✓ DBDataset created successfully")
print(f"\nDetected sensitive attributes: {dataset.detected_sensitive_attributes}")
print(f"Target column: {dataset.target_column}")

## 4. Analyze Fairness

Run fairness analysis to detect bias in the dataset.

In [None]:
# Run fairness analysis
results = dataset.analyze_fairness()

print("✓ Fairness analysis complete")
print("\n" + "="*60)
print("FAIRNESS ANALYSIS RESULTS")
print("="*60)
print(results)

## 5. Examine Results by Attribute

Let's look at fairness metrics for each sensitive attribute.

In [None]:
# Check if results is a dictionary or has specific structure
if hasattr(results, 'to_dict'):
    results_dict = results.to_dict()
elif isinstance(results, dict):
    results_dict = results
else:
    results_dict = {'summary': str(results)}

# Display results
for key, value in results_dict.items():
    print(f"\n{key.upper()}:")
    print("-" * 40)
    if isinstance(value, (dict, pd.DataFrame)):
        print(value)
    else:
        print(value)

## 6. Check Specific Attributes

Examine bias for specific attributes like gender and race.

In [None]:
# Calculate approval rates by gender
print("APPROVAL RATES BY GENDER:")
print("="*40)
gender_stats = df.groupby('gender')['approved'].agg(['mean', 'count'])
gender_stats.columns = ['Approval Rate', 'Count']
print(gender_stats)
print(f"\nDifference: {gender_stats['Approval Rate'].max() - gender_stats['Approval Rate'].min():.3f}")

print("\n" + "="*40)

# Calculate approval rates by race
print("\nAPPROVAL RATES BY RACE:")
print("="*40)
race_stats = df.groupby('race')['approved'].agg(['mean', 'count'])
race_stats.columns = ['Approval Rate', 'Count']
print(race_stats)
print(f"\nDifference: {race_stats['Approval Rate'].max() - race_stats['Approval Rate'].min():.3f}")

## 7. Interpretation

### What do these results mean?

**Demographic Parity**: 
- Measures if different groups receive positive outcomes at similar rates
- Threshold: difference > 0.1 indicates potential bias

**Equalized Odds**:
- Measures if true positive rates and false positive rates are similar across groups
- Important for fair decision-making

**Disparate Impact**:
- Ratio of approval rates between groups
- EEOC 80% rule: ratio should be > 0.8

### Expected Results:

Since we intentionally created bias:
- **Gender**: Males should have ~70% approval vs Females ~50% (difference ~0.20)
- **Race**: White should have ~65% approval vs others ~50% (difference ~0.15)

Both violations should be **DETECTED** by DeepBridge! ⚠️

## 8. Load Real Dataset (Optional)

Try DeepBridge with a real-world dataset.

In [None]:
# Path to case study datasets
data_dir = Path("../../data/case_studies")

# Check available case studies
if data_dir.exists():
    case_studies = [d.name for d in data_dir.iterdir() if d.is_dir()]
    print(f"Available case studies: {case_studies}")
    
    # Example: Load Adult Income dataset if available
    adult_path = data_dir / "adult" / "adult.csv"
    if adult_path.exists():
        print(f"\n✓ Loading: {adult_path}")
        adult_df = pd.read_csv(adult_path)
        print(f"Shape: {adult_df.shape}")
        print(f"Columns: {list(adult_df.columns)}")
        
        # Create DBDataset (assuming 'income' is the target)
        if 'income' in adult_df.columns:
            adult_dataset = DBDataset(data=adult_df, target_column='income')
            print(f"\nDetected attributes: {adult_dataset.detected_sensitive_attributes}")
            
            # Analyze fairness
            adult_results = adult_dataset.analyze_fairness()
            print("\nFairness Analysis:")
            print(adult_results)
    else:
        print(f"⚠️  Adult dataset not found at {adult_path}")
else:
    print(f"⚠️  Case studies directory not found: {data_dir}")
    print("To use real datasets, ensure data/case_studies/ is populated")

## 9. Summary

### What we learned:

1. **Import DeepBridge**: `from deepbridge import DBDataset`
2. **Create dataset**: `DBDataset(data=df, target_column='target')`
3. **Auto-detection**: DeepBridge automatically identifies sensitive attributes
4. **Analyze fairness**: `dataset.analyze_fairness()`
5. **Interpret results**: Check demographic parity, equalized odds, disparate impact

### Next Steps:

- **Notebook 02**: Explore case studies (COMPAS, Adult Income, German Credit, Bank Marketing)
- **Run experiments**: Execute `experiments/scripts/exp*.py` to reproduce paper results
- **Read documentation**: See `docs/quickstart.md` and `docs/installation.md`

### Resources:

- **DeepBridge source**: `/home/guhaase/projetos/DeepBridge/deepbridge`
- **Documentation**: `docs/`
- **Case studies**: `data/case_studies/`
- **Experiments**: `experiments/scripts/`

---

**End of Quick Start**

Questions or issues? Check the [troubleshooting guide](../../docs/troubleshooting.md) or open an issue on GitHub.