# üõ°Ô∏è Resilience Testing - Classification

<div style="background-color: #e3f2fd; padding: 15px; border-radius: 5px; border-left: 5px solid #2196F3;">
<b>üìì Information</b><br>
<b>Level:</b> Intermediate/Advanced<br>
<b>Time:</b> 20 minutes<br>
<b>Dataset:</b> Breast Cancer (sklearn)<br>
<b>Prerequisite:</b> Basic understanding of ML models
</div>

## üéØ Objectives
- ‚úÖ Resilience testing for **classification** problems
- ‚úÖ Generate **interactive HTML report**
- ‚úÖ Export results to **JSON format**
- ‚úÖ Analyze **data drift and concept drift**
- ‚úÖ Evaluate **model stability** under distribution shifts

## üìö Why Resilience Testing?

### Critical Contexts

#### üè• Medicine - Disease Diagnosis
- **Problem**: Model trained on Hospital A data, deployed in Hospital B
- **Challenge**: Different patient demographics, equipment, protocols
- **Impact**: Model performance may degrade silently
- **Solution**: Resilience testing detects distribution shifts early

#### üí≥ Finance - Credit Scoring
- **Problem**: Model trained on pre-pandemic data
- **Challenge**: Economic conditions changed dramatically
- **Impact**: Credit risk patterns shifted significantly
- **Solution**: Monitor resilience to detect when retraining is needed

#### üîí Security - Fraud Detection
- **Problem**: Fraudsters constantly evolve their tactics
- **Challenge**: New fraud patterns emerge continuously
- **Impact**: Model becomes less effective over time
- **Solution**: Resilience metrics trigger retraining workflows

### Types of Drift

1. **Data Drift** (Covariate Shift)
   - Feature distributions change
   - Example: Age distribution shifts from 20-40 to 40-60

2. **Concept Drift**
   - Relationship between features and target changes
   - Example: What constitutes "fraud" evolves over time

3. **Performance Degradation**
   - Model accuracy decreases on new data
   - May be caused by data or concept drift

## 1Ô∏è‚É£ Setup - Binary Classification Problem

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report
from deepbridge import DBDataset, Experiment
import os

# Configure visualizations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load Breast Cancer dataset
cancer = load_breast_cancer()
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target  # 0 = malignant, 1 = benign

print(f"üìä Dataset: {df.shape}")
print(f"üè• Target: Cancer diagnosis (0=malignant, 1=benign)")
print(f"\nüìà Class distribution:")
print(df['target'].value_counts())
print(f"\nüìä Class balance:")
print(df['target'].value_counts(normalize=True))

2025-11-12 22:40:24,481 - deepbridge.reports - DEBUG - Using refactored FairnessDataTransformer
2025-11-12 22:40:24,488 - deepbridge.reports - INFO - Successfully imported radar chart fix
2025-11-12 22:40:24,489 - deepbridge.reports - INFO - Successfully patched EnhancedUncertaintyCharts.generate_model_metrics_comparison
2025-11-12 22:40:24,489 - deepbridge.reports - INFO - Successfully applied enhanced_charts patch
2025-11-12 22:40:24,491 - deepbridge.reports - INFO - Successfully loaded UncertaintyChartGenerator
2025-11-12 22:40:24,493 - deepbridge.reports - INFO - Successfully imported and initialized SeabornChartGenerator
2025-11-12 22:40:24,494 - deepbridge.reports - INFO - SeabornChartGenerator has_visualization_libs: True
2025-11-12 22:40:24,495 - deepbridge.reports - INFO - Available chart methods: ['bar_chart', 'boxplot_chart', 'coverage_analysis_chart', 'detailed_boxplot_chart', 'distribution_grid_chart', 'feature_comparison_chart', 'feature_importance_chart', 'feature_psi_ch

## 2Ô∏è‚É£ Train Classification Model

In [2]:
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Train RandomForest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_proba[:, 1])

print(f"‚úÖ Model trained!")
print(f"üìä Accuracy: {accuracy*100:.2f}%")
print(f"üìä ROC-AUC: {auc:.3f}")
print(f"\nüìã Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Malignant', 'Benign']))

‚úÖ Model trained!
üìä Accuracy: 95.61%
üìä ROC-AUC: 0.994

üìã Classification Report:
              precision    recall  f1-score   support

   Malignant       0.95      0.93      0.94        42
      Benign       0.96      0.97      0.97        72

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114



## 3Ô∏è‚É£ Create Experiment

In [3]:
dataset = DBDataset(
    data=df,
    target_column='target',
    model=model,
    test_size=0.2,
    random_state=42,
    dataset_name='Breast Cancer Classification'
)

exp = Experiment(
    dataset=dataset,
    experiment_type='binary_classification',
    tests=['resilience'],  # Specify resilience test
    random_state=42
)

print("‚úÖ Experiment created!")

‚úÖ Initial model evaluation complete: RandomForestClassifier
‚úÖ Experiment created!


## 4Ô∏è‚É£ Run Resilience Test

<div style="background-color: #fff3e0; padding: 15px; border-radius: 5px; border-left: 5px solid #ff9800;">
<b>‚ÑπÔ∏è Configuration:</b> Using 'full' config for comprehensive resilience analysis with multiple drift detection methods.
</div>

In [4]:
print("üß™ Running resilience test...\n")

# Use run_tests() to store results internally for save_html() and save_json()
experiment_result = exp.run_tests(config_name='full')

print("\n‚úÖ Resilience test completed!")
print(f"\nüìä Result type: {type(experiment_result)}")

# Access the resilience result
resilience_result = experiment_result.get_result('resilience')
print(f"üìä Resilience result type: {type(resilience_result)}")

üß™ Running resilience test...

‚úÖ Resilience Tests Finished!
üéâ Test completed successfully: resilience

‚úÖ Resilience test completed!

üìä Result type: <class 'deepbridge.core.experiment.results.ExperimentResult'>
üìä Resilience result type: <class 'dict'>


## 5Ô∏è‚É£ Generate Interactive HTML Report

### Export comprehensive interactive report with:
- üìä **Performance Analysis**: Model stability metrics
- üìà **Drift Detection**: Data and concept drift analysis
- üéØ **Feature Impact**: Features most affected by distribution shifts
- üìâ **Resilience Scores**: Overall model robustness metrics

In [5]:
# Create output directory
output_dir = './outputs/resilience_classification'
os.makedirs(output_dir, exist_ok=True)

# Generate interactive HTML report
html_output_path = os.path.join(output_dir, 'resilience_classification_interactive.html')

print("üìù Generating interactive HTML report...\n")

report_path = exp.save_html(
    test_type='resilience',
    file_path=html_output_path,
    model_name='RandomForest Classifier',
    report_type='interactive'
)

print(f"\n‚úÖ Interactive report generated!")
print(f"üìÇ Location: {report_path}")
print(f"\nüí° Open the HTML file in your browser to explore:")
print(f"   - Performance Overview")
print(f"   - Drift Detection Analysis")
print(f"   - Feature Distribution Shifts")
print(f"   - Interactive Plotly charts")

üìù Generating interactive HTML report...

2025-11-12 22:40:26,973 - deepbridge.reports - INFO - Generating SIMPLE resilience report to: /home/guhaase/projetos/DeepBridge/examples/notebooks/03_validation_tests/outputs/resilience_classification/resilience_classification_interactive.html
2025-11-12 22:40:26,974 - deepbridge.reports - INFO - Report type: interactive
2025-11-12 22:40:26,975 - deepbridge.reports - INFO - TRANSFORMING RESILIENCE DATA FOR REPORT (SIMPLE)
2025-11-12 22:40:26,976 - deepbridge.reports - INFO - Top-level keys in results: ['primary_model', 'alternative_models', 'initial_results', 'initial_model_evaluation', 'config', 'experiment_type', 'model_type']
2025-11-12 22:40:26,977 - deepbridge.reports - INFO - Using flat format: results['primary_model']
2025-11-12 22:40:26,978 - deepbridge.reports - INFO - primary_model keys: ['distribution_shift', 'worst_sample', 'worst_cluster', 'outer_sample', 'hard_sample', 'resilience_score', 'test_scores', 'alphas', 'distance_metri

## 6Ô∏è‚É£ Export Results to JSON

### JSON export includes:
- üîç **Experiment Info**: Configuration and metadata
- üìä **Test Results**: Complete resilience analysis data
- üéØ **Model Evaluation**: Model metrics and performance
- üìà **Drift Metrics**: Distribution shift measurements
- üåü **Feature Analysis**: Per-feature drift statistics

In [6]:
# Export to JSON (COMPACT MODE for AI validation)
json_output_path = os.path.join(output_dir, 'resilience_classification_results.json')
json_compact_path = os.path.join(output_dir, 'resilience_classification_results_COMPACT.json')

print("üìù Exporting results to JSON...\n")

# Full JSON (with all data)
json_path_full = exp._experiment_result.save_json(
    test_type='resilience',
    file_path=json_output_path,
    include_summary=True,
    compact=False  # Full data
)

# Compact JSON (optimized for AI validation)
json_path_compact = exp._experiment_result.save_json(
    test_type='resilience',
    file_path=json_compact_path,
    include_summary=True,
    compact=True  # Remove large arrays, keep only essentials
)

# Compare file sizes
import os
size_full = os.path.getsize(json_path_full) / 1024  # KB
size_compact = os.path.getsize(json_path_compact) / 1024  # KB
reduction = ((size_full - size_compact) / size_full) * 100

print(f"\n‚úÖ JSON files exported successfully!")
print(f"\nüìä FILE SIZES:")
print(f"   Full JSON:    {size_full:>8.2f} KB")
print(f"   Compact JSON: {size_compact:>8.2f} KB")
print(f"   Reduction:    {reduction:>8.1f}%")

print(f"\nüìÇ LOCATIONS:")
print(f"   Full:    {json_path_full}")
print(f"   Compact: {json_path_compact}")

print(f"\nüìã COMPACT JSON STRUCTURE (optimized for AI):")
print(f"   ‚îî‚îÄ experiment_info/")
print(f"      ‚îú‚îÄ test_type, experiment_type, generation_time, config")
print(f"   ‚îî‚îÄ test_results/")
print(f"      ‚îî‚îÄ primary_model/")
print(f"         ‚îú‚îÄ metrics (all model metrics)")
print(f"         ‚îú‚îÄ resilience_score")
print(f"         ‚îú‚îÄ drift_detected")
print(f"         ‚îî‚îÄ drift_analysis/")
print(f"            ‚îú‚îÄ data_drift_score")
print(f"            ‚îú‚îÄ concept_drift_score")
print(f"            ‚îî‚îÄ per_feature_drift (top affected features)")
print(f"   ‚îî‚îÄ initial_model_evaluation/ (compact)")
print(f"   ‚îî‚îÄ summary/  (AI-friendly analysis)")
print(f"      ‚îú‚îÄ key_findings")
print(f"      ‚îú‚îÄ drift_summary")
print(f"      ‚îú‚îÄ performance_stability")
print(f"      ‚îî‚îÄ recommendations")

print(f"\nüí° USE CASES:")
print(f"   ‚Ä¢ Full JSON: Deep dive analysis, debugging, research")
print(f"   ‚Ä¢ Compact JSON: AI validation, automated testing, CI/CD pipelines")

üìù Exporting results to JSON...


‚úÖ JSON files exported successfully!

üìä FILE SIZES:
   Full JSON:      308.84 KB
   Compact JSON:   304.81 KB
   Reduction:         1.3%

üìÇ LOCATIONS:
   Full:    /home/guhaase/projetos/DeepBridge/examples/notebooks/03_validation_tests/outputs/resilience_classification/resilience_classification_results.json
   Compact: /home/guhaase/projetos/DeepBridge/examples/notebooks/03_validation_tests/outputs/resilience_classification/resilience_classification_results_COMPACT.json

üìã COMPACT JSON STRUCTURE (optimized for AI):
   ‚îî‚îÄ experiment_info/
      ‚îú‚îÄ test_type, experiment_type, generation_time, config
   ‚îî‚îÄ test_results/
      ‚îî‚îÄ primary_model/
         ‚îú‚îÄ metrics (all model metrics)
         ‚îú‚îÄ resilience_score
         ‚îú‚îÄ drift_detected
         ‚îî‚îÄ drift_analysis/
            ‚îú‚îÄ data_drift_score
            ‚îú‚îÄ concept_drift_score
            ‚îî‚îÄ per_feature_drift (top affected features)
   ‚îî‚îÄ ini

## 7Ô∏è‚É£ Load and Analyze JSON Results

Demonstrate how to load and use the exported JSON

In [7]:
import json

print("="*80)
print("üìä ANALYZING COMPACT JSON (optimized for AI validation)")
print("="*80)

# Load COMPACT JSON results
with open(json_compact_path, 'r', encoding='utf-8') as f:
    results_json = json.load(f)

# Experiment Info
exp_info = results_json['experiment_info']
print(f"\nüî¨ EXPERIMENT INFO:")
print(f"   Test Type: {exp_info['test_type']}")
print(f"   Experiment Type: {exp_info['experiment_type']}")
print(f"   Generation Time: {exp_info['generation_time']}")

# Summary (AI-friendly)
if 'summary' in results_json:
    summary = results_json['summary']
    
    print(f"\nüéØ KEY FINDINGS:")
    for finding in summary.get('key_findings', []):
        print(f"   ‚Ä¢ {finding}")
    
    print(f"\nüìà DRIFT SUMMARY:")
    drift = summary.get('drift_summary', {})
    for key, value in drift.items():
        print(f"   {key}: {value}")
    
    print(f"\nüîç PERFORMANCE STABILITY:")
    perf = summary.get('performance_stability', {})
    for key, value in perf.items():
        print(f"   {key}: {value}")
    
    print(f"\nüí° RECOMMENDATIONS:")
    for rec in summary.get('recommendations', []):
        print(f"   ‚Ä¢ {rec}")

# Test Results (compact)
test_results = results_json['test_results']
primary = test_results.get('primary_model', {})

print(f"\nüî¨ TEST RESULTS (Compact):")
print(f"   Resilience Score: {primary.get('resilience_score', 'N/A')}")
print(f"   Drift Detected: {primary.get('drift_detected', 'N/A')}")

# Drift Analysis
if 'drift_analysis' in primary:
    drift_analysis = primary['drift_analysis']
    print(f"\nüìä DRIFT ANALYSIS:")
    print(f"   Data Drift Score: {drift_analysis.get('data_drift_score', 'N/A')}")
    print(f"   Concept Drift Score: {drift_analysis.get('concept_drift_score', 'N/A')}")
    
    if 'per_feature_drift' in drift_analysis:
        print(f"\nüåü TOP FEATURES WITH HIGHEST DRIFT:")
        print(f"   {'Feature':<30} {'Drift Score'}")
        print(f"   {'-'*50}")
        for feat, score in list(drift_analysis['per_feature_drift'].items())[:10]:
            print(f"   {feat:<30} {score:.4f}")

print(f"\n{'='*80}")
print(f"üíæ COMPACT JSON IS OPTIMIZED FOR:")
print(f"{'='*80}")
print(f"‚úÖ AI/LLM validation (smaller token count)")
print(f"‚úÖ Automated testing pipelines")
print(f"‚úÖ CI/CD integration")
print(f"‚úÖ Quick drift detection")
print(f"‚úÖ Summary-based decision making")

üìä ANALYZING COMPACT JSON (optimized for AI validation)

üî¨ EXPERIMENT INFO:
   Test Type: resilience
   Experiment Type: binary_classification
   Generation Time: 2025-11-12 22:40:26

üî¨ TEST RESULTS (Compact):
   Resilience Score: 1.0
   Drift Detected: N/A

üíæ COMPACT JSON IS OPTIMIZED FOR:
‚úÖ AI/LLM validation (smaller token count)
‚úÖ Automated testing pipelines
‚úÖ CI/CD integration
‚úÖ Quick drift detection
‚úÖ Summary-based decision making


## 8Ô∏è‚É£ Resilience Analysis Summary

Extract key insights from resilience testing

In [8]:
print("\nüìä RESILIENCE TESTING SUMMARY\n" + "="*70)

# Access resilience metrics
if 'resilience_score' in primary:
    resilience_score = primary['resilience_score']
    print(f"\nüìà OVERALL RESILIENCE:")
    print(f"   Score: {resilience_score:.3f}")
    
    # Quality assessment
    print(f"\nüéØ QUALITY ASSESSMENT:")
    
    if resilience_score >= 0.90:
        print(f"   ‚úÖ Resilience: EXCELLENT (score ‚â• 0.90)")
        print(f"      Model is highly stable under distribution shifts")
    elif resilience_score >= 0.75:
        print(f"   üü° Resilience: GOOD (score ‚â• 0.75)")
        print(f"      Model shows acceptable stability")
    elif resilience_score >= 0.60:
        print(f"   ‚ö†Ô∏è  Resilience: MODERATE (score ‚â• 0.60)")
        print(f"      Monitor model performance closely")
    else:
        print(f"   üî¥ Resilience: LOW (score < 0.60)")
        print(f"      Consider retraining or model improvement")

# Drift detection
if 'drift_detected' in primary:
    drift_detected = primary['drift_detected']
    print(f"\nüîç DRIFT DETECTION:")
    
    if drift_detected:
        print(f"   ‚ö†Ô∏è  Drift DETECTED")
        print(f"      Data distribution has shifted significantly")
        print(f"      Action: Review drift analysis and consider retraining")
    else:
        print(f"   ‚úÖ No significant drift detected")
        print(f"      Model remains valid for current data distribution")

# Drift analysis details
if 'drift_analysis' in primary:
    drift_analysis = primary['drift_analysis']
    
    print(f"\nüìä DRIFT ANALYSIS:")
    
    data_drift = drift_analysis.get('data_drift_score', 0)
    concept_drift = drift_analysis.get('concept_drift_score', 0)
    
    print(f"   Data Drift Score: {data_drift:.3f}")
    if data_drift > 0.3:
        print(f"      ‚ö†Ô∏è  High data drift - feature distributions changed")
    elif data_drift > 0.1:
        print(f"      üü° Moderate data drift detected")
    else:
        print(f"      ‚úÖ Low data drift")
    
    print(f"\n   Concept Drift Score: {concept_drift:.3f}")
    if concept_drift > 0.3:
        print(f"      ‚ö†Ô∏è  High concept drift - target relationships changed")
    elif concept_drift > 0.1:
        print(f"      üü° Moderate concept drift detected")
    else:
        print(f"      ‚úÖ Low concept drift")

# Recommendations
print(f"\nüí° RECOMMENDATIONS:")

if primary.get('drift_detected', False):
    print(f"   ‚Ä¢ URGENT: Investigate drift sources")
    print(f"   ‚Ä¢ Consider retraining with recent data")
    print(f"   ‚Ä¢ Implement drift monitoring in production")

if primary.get('resilience_score', 1.0) < 0.75:
    print(f"   ‚Ä¢ Model shows sensitivity to distribution shifts")
    print(f"   ‚Ä¢ Collect more diverse training data")
    print(f"   ‚Ä¢ Use ensemble methods to improve stability")
    print(f"   ‚Ä¢ Apply data augmentation techniques")

if primary.get('resilience_score', 1.0) >= 0.90 and not primary.get('drift_detected', False):
    print(f"   ‚úÖ Model is production-ready")
    print(f"   ‚Ä¢ Establish baseline metrics")
    print(f"   ‚Ä¢ Set up continuous monitoring")
    print(f"   ‚Ä¢ Define retraining triggers")


üìä RESILIENCE TESTING SUMMARY

üìà OVERALL RESILIENCE:
   Score: 1.000

üéØ QUALITY ASSESSMENT:
   ‚úÖ Resilience: EXCELLENT (score ‚â• 0.90)
      Model is highly stable under distribution shifts

üí° RECOMMENDATIONS:
   ‚úÖ Model is production-ready
   ‚Ä¢ Establish baseline metrics
   ‚Ä¢ Set up continuous monitoring
   ‚Ä¢ Define retraining triggers


## 9Ô∏è‚É£ Practical Decision Examples

How to use resilience metrics in real-world scenarios

In [9]:
print("\nüíº PRACTICAL USE CASES\n" + "="*70)

# Scenario 1: Medical Deployment
print(f"\nüè• MEDICAL DEPLOYMENT - Hospital Transfer\n" + "-"*70)
resilience_score = 0.82
drift_detected = False
data_drift = 0.15

print(f"   Scenario: Deploying model from Hospital A to Hospital B")
print(f"   Resilience Score: {resilience_score:.2f}")
print(f"   Drift Detected: {drift_detected}")
print(f"   Data Drift: {data_drift:.2f}")

print(f"\n   üìã DECISION PROTOCOL:")
if resilience_score >= 0.80 and not drift_detected:
    print(f"   ‚úÖ APPROVED for deployment")
    print(f"   Reason: High resilience, no significant drift")
    print(f"   Action: Deploy with standard monitoring")
    print(f"   Schedule: Weekly performance reviews")
elif resilience_score >= 0.70:
    print(f"   üü° CONDITIONAL approval")
    print(f"   Reason: Acceptable resilience but requires caution")
    print(f"   Action: Deploy with enhanced monitoring")
    print(f"   Requirement: Daily performance checks for first month")
else:
    print(f"   ‚ö†Ô∏è  REJECTED - Retrain required")
    print(f"   Reason: Insufficient resilience for critical application")
    print(f"   Action: Collect data from Hospital B and retrain")

# Scenario 2: Credit Scoring - Economic Change
print(f"\n\nüí∞ CREDIT SCORING - Economic Shift\n" + "-"*70)
resilience_score = 0.65
drift_detected = True
concept_drift = 0.35

print(f"   Scenario: Model trained pre-pandemic, evaluated post-pandemic")
print(f"   Resilience Score: {resilience_score:.2f}")
print(f"   Drift Detected: {drift_detected}")
print(f"   Concept Drift: {concept_drift:.2f}")

print(f"\n   üìã DECISION PROTOCOL:")
if drift_detected and concept_drift > 0.3:
    print(f"   üî¥ IMMEDIATE ACTION REQUIRED")
    print(f"   Reason: High concept drift - target relationships changed")
    print(f"   Action: STOP using model for automated decisions")
    print(f"   Next Steps:")
    print(f"      1. Manual review for all applications")
    print(f"      2. Collect recent labeled data")
    print(f"      3. Retrain model with updated data")
    print(f"      4. Re-run resilience testing")

# Scenario 3: Fraud Detection - Evolving Tactics
print(f"\n\nüîí FRAUD DETECTION - Evolving Threats\n" + "-"*70)
resilience_score = 0.88
drift_detected = True
data_drift = 0.22
concept_drift = 0.12

print(f"   Scenario: Quarterly resilience check on fraud detection model")
print(f"   Resilience Score: {resilience_score:.2f}")
print(f"   Drift Detected: {drift_detected}")
print(f"   Data Drift: {data_drift:.2f}")
print(f"   Concept Drift: {concept_drift:.2f}")

print(f"\n   üìã DECISION PROTOCOL:")
if resilience_score >= 0.85 and concept_drift < 0.2:
    print(f"   üü° CONTINUE with ENHANCED MONITORING")
    print(f"   Reason: High resilience but moderate data drift")
    print(f"   Action: Keep model active with adjustments")
    print(f"   Adjustments:")
    print(f"      ‚Ä¢ Increase manual review threshold from 70% to 60%")
    print(f"      ‚Ä¢ Schedule retraining in 1 month")
    print(f"      ‚Ä¢ Analyze drifted features for new fraud patterns")
    print(f"      ‚Ä¢ Implement feature importance monitoring")


üíº PRACTICAL USE CASES

üè• MEDICAL DEPLOYMENT - Hospital Transfer
----------------------------------------------------------------------
   Scenario: Deploying model from Hospital A to Hospital B
   Resilience Score: 0.82
   Drift Detected: False
   Data Drift: 0.15

   üìã DECISION PROTOCOL:
   ‚úÖ APPROVED for deployment
   Reason: High resilience, no significant drift
   Action: Deploy with standard monitoring
   Schedule: Weekly performance reviews


üí∞ CREDIT SCORING - Economic Shift
----------------------------------------------------------------------
   Scenario: Model trained pre-pandemic, evaluated post-pandemic
   Resilience Score: 0.65
   Drift Detected: True
   Concept Drift: 0.35

   üìã DECISION PROTOCOL:
   üî¥ IMMEDIATE ACTION REQUIRED
   Reason: High concept drift - target relationships changed
   Action: STOP using model for automated decisions
   Next Steps:
      1. Manual review for all applications
      2. Collect recent labeled data
      3. Retrain m

## üîü Visualize Model Stability

Visualize how model performance varies across different data subsets

In [10]:
# Simulate performance on different data distributions
n_simulations = 20
performance_scores = []
drift_levels = []

np.random.seed(42)

for i in range(n_simulations):
    # Simulate varying drift levels
    drift_level = np.random.uniform(0, 0.5)
    drift_levels.append(drift_level)
    
    # Performance degrades with drift
    base_performance = 0.95
    performance = base_performance - (drift_level * 0.4) + np.random.normal(0, 0.02)
    performance_scores.append(max(0.5, min(1.0, performance)))

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Performance vs Drift
scatter = ax1.scatter(drift_levels, performance_scores, 
                     c=performance_scores, s=100, 
                     cmap='RdYlGn', alpha=0.6, edgecolors='black')
ax1.plot(drift_levels, performance_scores, 'b--', alpha=0.3, label='Trend')
ax1.axhline(y=0.90, color='green', linestyle='--', linewidth=2, label='Good Performance')
ax1.axhline(y=0.75, color='orange', linestyle='--', linewidth=2, label='Acceptable')
ax1.axhline(y=0.60, color='red', linestyle='--', linewidth=2, label='Poor')
ax1.set_xlabel('Drift Level', fontsize=12, fontweight='bold')
ax1.set_ylabel('Model Performance', fontsize=12, fontweight='bold')
ax1.set_title('Performance Degradation vs Data Drift', fontsize=14, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0.5, 1.05)
plt.colorbar(scatter, ax=ax1, label='Performance')

# Plot 2: Performance Distribution
ax2.hist(performance_scores, bins=15, color='skyblue', alpha=0.7, edgecolor='black')
ax2.axvline(x=np.mean(performance_scores), color='red', linestyle='--', 
           linewidth=2, label=f'Mean: {np.mean(performance_scores):.3f}')
ax2.axvline(x=np.median(performance_scores), color='green', linestyle='--', 
           linewidth=2, label=f'Median: {np.median(performance_scores):.3f}')
ax2.set_xlabel('Performance Score', fontsize=12, fontweight='bold')
ax2.set_ylabel('Frequency', fontsize=12, fontweight='bold')
ax2.set_title('Distribution of Performance Scores', fontsize=14, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nüìä Stability Statistics:")
print(f"   Mean Performance: {np.mean(performance_scores):.3f}")
print(f"   Std Performance: {np.std(performance_scores):.3f}")
print(f"   Min Performance: {np.min(performance_scores):.3f}")
print(f"   Max Performance: {np.max(performance_scores):.3f}")
print(f"   Performance Range: {np.max(performance_scores) - np.min(performance_scores):.3f}")

stability_score = 1 - np.std(performance_scores)
print(f"\n   Stability Score: {stability_score:.3f}")
if stability_score >= 0.95:
    print(f"   ‚úÖ Highly stable model")
elif stability_score >= 0.90:
    print(f"   üü° Moderately stable model")
else:
    print(f"   ‚ö†Ô∏è  Model shows significant variability")


üìä Stability Statistics:
   Mean Performance: 0.877
   Std Performance: 0.048
   Min Performance: 0.793
   Max Performance: 0.949
   Performance Range: 0.157

   Stability Score: 0.952
   ‚úÖ Highly stable model


  plt.show()


## 1Ô∏è‚É£1Ô∏è‚É£ Generate Static Report (Optional)

Generate a static HTML report with matplotlib charts (for PDF export)

In [11]:
# Generate static HTML report
static_html_path = os.path.join(output_dir, 'resilience_classification_static.html')

print("üìù Generating static HTML report...\n")

static_report_path = exp.save_html(
    test_type='resilience',
    file_path=static_html_path,
    model_name='RandomForest Classifier',
    report_type='static'  # Uses matplotlib instead of Plotly
)

print(f"\n‚úÖ Static report generated!")
print(f"üìÇ Location: {static_report_path}")
print(f"\nüí° Static reports can be easily printed or converted to PDF")

üìù Generating static HTML report...

2025-11-12 22:40:27,340 - deepbridge.reports - INFO - Using static renderer for resilience report
2025-11-12 22:40:27,341 - deepbridge.reports - INFO - Generating static resilience report to: /home/guhaase/projetos/DeepBridge/examples/notebooks/03_validation_tests/outputs/resilience_classification/resilience_classification_static.html
2025-11-12 22:40:27,343 - deepbridge.reports - INFO - Found template at: /home/guhaase/projetos/DeepBridge/deepbridge/templates/report_types/resilience/static/index.html
2025-11-12 22:40:27,343 - deepbridge.reports - INFO - Found resilience template: /home/guhaase/projetos/DeepBridge/deepbridge/templates/report_types/resilience/static/index.html
2025-11-12 22:40:27,344 - deepbridge.reports - INFO - Using static template: /home/guhaase/projetos/DeepBridge/deepbridge/templates/report_types/resilience/static/index.html
2025-11-12 22:40:27,346 - deepbridge.reports - INFO - CSS compiled successfully using CSSManager for s

## üéâ Summary - Files Generated

### üìÇ Output Directory Structure:
```
outputs/resilience_classification/
‚îú‚îÄ‚îÄ resilience_classification_interactive.html  # Interactive report with Plotly
‚îú‚îÄ‚îÄ resilience_classification_static.html       # Static report with Matplotlib
‚îú‚îÄ‚îÄ resilience_classification_results.json      # Complete results in JSON
‚îî‚îÄ‚îÄ resilience_classification_results_COMPACT.json  # Compact JSON for AI
```

### ‚úÖ What You Learned:

1. **Resilience Testing**
   - Detect data drift (covariate shift)
   - Detect concept drift (relationship changes)
   - Assess model stability under distribution shifts
   - Calculate resilience scores

2. **Report Generation**
   - Interactive HTML reports with Plotly
   - Static HTML reports with Matplotlib
   - Complete control over report type

3. **JSON Export**
   - Full experiment metadata
   - Drift detection metrics
   - Per-feature drift analysis
   - Resilience scores and recommendations
   - Easy integration with monitoring systems

4. **Practical Applications**
   - Hospital deployment decision protocols
   - Credit scoring with economic shifts
   - Fraud detection with evolving threats
   - Production monitoring strategies

### üí° Best Practices:

- ‚úÖ Run resilience tests before production deployment
- ‚úÖ Establish baseline metrics in development
- ‚úÖ Set up continuous drift monitoring
- ‚úÖ Define clear retraining triggers
- ‚úÖ Monitor both data drift and concept drift
- ‚úÖ Export JSON for automated monitoring pipelines

### üöÄ Next Steps:

- üìò `02_complete_robustness.ipynb` - Model robustness testing
- üìò `03_uncertainty_classification.ipynb` - Uncertainty quantification
- üìò `../04_fairness/` - Fairness and bias analysis

<div style="background-color: #e8f5e9; padding: 15px; border-radius: 5px; border-left: 5px solid #4caf50;">
<b>üéØ Key Takeaway:</b> Resilience testing is essential for maintaining model reliability in production. Regular monitoring and retraining based on drift metrics prevent silent model degradation and ensure continued accuracy.
</div>