# Comprehensive Error Analysis - Heart Disease Risk Prediction

## Overview
This notebook performs a comprehensive error analysis of our baseline models following ML/AI best practices:

1. **Error Distribution Analysis** - Understanding error patterns across different segments
2. **Bias-Variance Decomposition** - Analyzing model complexity and generalization
3. **Feature-Based Error Analysis** - Identifying features that lead to misclassifications  
4. **Learning Curve Analysis** - Assessing training efficiency and data requirements
5. **Cross-Validation Stability** - Evaluating model consistency across folds
6. **Decision Boundary Analysis** - Understanding model behavior in feature space
7. **Outlier Detection** - Identifying problematic samples
8. **Error Correlation Analysis** - Finding patterns in model disagreements
9. **Clinical Relevance Assessment** - Healthcare-specific error implications

## Goals
- Identify systematic patterns in model errors
- Understand which patient profiles are difficult to classify
- Assess model reliability for clinical deployment
- Guide feature engineering and model improvement strategies

In [6]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import warnings
warnings.filterwarnings('ignore')

# Sklearn imports
from sklearn.model_selection import cross_val_score, learning_curve, validation_curve
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score, 
                           roc_auc_score, confusion_matrix, classification_report,
                           precision_recall_curve, roc_curve)
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
import xgboost as xgb

# Deep learning imports
import torch
import torch.nn as nn
import torch.nn.functional as F

# Statistical analysis
from scipy import stats
from scipy.stats import chi2_contingency

# Plotting configuration
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

print("✅ All libraries imported successfully")
print(f"PyTorch version: {torch.__version__}")
print(f"Device available: {torch.backends.mps.is_available() and torch.backends.mps.is_built()}")
print(f"XGBoost version: {xgb.__version__}")

✅ All libraries imported successfully
PyTorch version: 2.9.1
Device available: True
XGBoost version: 3.1.2
