# Binary Classification: Heart Disease Prediction using PyCaret

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BalaAnbalagan/pycaret-automl-examples/blob/main/binary-classification/heart_disease_classification.ipynb)

## Problem Statement

Heart disease is one of the leading causes of death worldwide. Early detection and prediction can significantly improve patient outcomes and reduce healthcare costs. In this notebook, we will build a binary classification model to predict whether a patient has heart disease based on various medical attributes.

## Business Value

- **Healthcare Providers**: Identify high-risk patients for early intervention
- **Patients**: Early detection can save lives and reduce treatment costs
- **Insurance Companies**: Better risk assessment and premium calculation
- **Research**: Understanding key factors contributing to heart disease

## Dataset Information

**Source**: [Kaggle - Heart Disease Dataset](https://www.kaggle.com/datasets/yasserh/heart-disease-dataset)

**Features (14 attributes)**:
- `age`: Age of the patient
- `sex`: Sex of the patient (1 = male, 0 = female)
- `cp`: Chest pain type (0-3)
- `trestbps`: Resting blood pressure (mm Hg)
- `chol`: Serum cholesterol (mg/dl)
- `fbs`: Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)
- `restecg`: Resting electrocardiographic results (0-2)
- `thalach`: Maximum heart rate achieved
- `exang`: Exercise induced angina (1 = yes, 0 = no)
- `oldpeak`: ST depression induced by exercise
- `slope`: Slope of peak exercise ST segment (0-2)
- `ca`: Number of major vessels colored by fluoroscopy (0-3)
- `thal`: Thalassemia (0-3)
- `target`: Heart disease (1 = disease, 0 = no disease) **[Target Variable]**

## What You Will Learn

1. How to set up PyCaret for binary classification
2. Automated model comparison across multiple algorithms
3. Hyperparameter tuning for optimal performance
4. Creating ensemble models (blending and stacking)
5. Model calibration for better probability estimates
6. Threshold optimization for imbalanced datasets
7. Model interpretation and feature importance
8. Saving and loading models for deployment

---

## Cell 1: Install and Import Required Libraries (Google Colab Compatible)

### What
We're installing PyCaret with compatible dependencies for Google Colab and importing all necessary Python libraries for our analysis.

### Why
Google Colab comes with pre-installed packages that can conflict with PyCaret's dependencies. This cell ensures compatibility by installing packages in the correct order to avoid runtime crashes.

### Technical Details
- Detect if running in Google Colab
- Install compatible versions of base packages (numpy, pandas, scipy, scikit-learn)
- Install PyCaret without forcing full dependency resolution
- Avoid version conflicts that cause runtime crashes

### Expected Output
Installation progress messages and a reminder to restart the runtime. After restart, the notebook will work smoothly without dependency errors.

### IMPORTANT
⚠️ After running this cell, you MUST restart the runtime:
- Click: **Runtime → Restart runtime** (or Ctrl+M .)
- After restart, skip this cell and run all other cells normally

In [None]:
# ============================================================
# INSTALLATION CELL - Google Colab Compatible
# ============================================================
# This cell fixes dependency conflicts that cause runtime crashes

import sys

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("=" * 60)
    print("🔧 Google Colab Detected")
    print("=" * 60)
    print("📦 Installing PyCaret with compatible dependencies...")
    print("⏳ This will take 2-3 minutes, please be patient...")

    # Upgrade pip first
    !pip install -q --upgrade pip

    # Install compatible base packages FIRST (prevents conflicts)
    print("Step 1/3: Installing base packages with compatible versions...")
    !pip install -q --upgrade \
        numpy>=1.23.0,<2.0.0 \
        pandas>=2.0.0,<2.3.0 \
        scipy>=1.10.0,<1.14.0 \
        scikit-learn>=1.3.0,<1.6.0 \
        matplotlib>=3.7.0,<3.9.0

    # Install PyCaret (will use already installed base packages)
    print("Step 2/3: Installing PyCaret...")
    !pip install -q pycaret

    # Install additional ML packages
    print("Step 3/3: Installing additional ML packages...")
    !pip install -q \
        category-encoders \
        lightgbm \
        xgboost \
        catboost \
        optuna \
        plotly \
        kaleido

    print("" + "=" * 60)
    print("✅ Installation Complete!")
    print("=" * 60)
    print("⚠️  CRITICAL: You MUST restart the runtime now!")
    print("   👉 Click: Runtime → Restart runtime (or Ctrl+M .)")
    print("🔄 After restart:")
    print("   1. Skip this installation cell")
    print("   2. Run all other cells normally")
    print("   3. Everything will work without crashes!")
    print("=" * 60)

else:
    print("=" * 60)
    print("📍 Local Environment Detected")
    print("=" * 60)
    print("Installing standard PyCaret with full dependencies...")
    !pip install pycaret[full]
    print("✅ Installation complete!")
    print("=" * 60)

# Import libraries after installation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("📚 Libraries imported successfully!")
print(f"   - Pandas version: {pd.__version__}")
print(f"   - NumPy version: {np.__version__}")

---

## Cell 2: Load the Heart Disease Dataset

### What
We're loading the heart disease dataset from a CSV file into a pandas DataFrame.

### Why
The dataset needs to be loaded into memory before we can perform any analysis or machine learning operations on it.

### Technical Details
- **Option 1**: If you've downloaded the dataset from Kaggle, place it in the same directory and load it
- **Option 2**: Load directly from a URL (if available)
- **Option 3**: Use Kaggle API to download programmatically

The dataset contains 1,025 rows and 14 columns (13 features + 1 target variable).

### Expected Output
Success message confirming dataset loaded, along with the shape (number of rows and columns).

In [None]:
# Method 1: Load from local file (if you've downloaded from Kaggle)
# df = pd.read_csv('heart.csv')

# Method 2: Load from URL (using a public repository)
url = 'https://raw.githubusercontent.com/rashida048/Datasets/master/heart.csv'
df = pd.read_csv(url)

# Display basic information
print("Dataset loaded successfully!")
print(f"Shape: {df.shape[0]} rows, {df.shape[1]} columns")
print("\nFirst 5 rows:")
df.head()

---

## Cell 3: Initial Data Exploration

### What
We're examining the structure, data types, and basic statistics of our dataset.

### Why
Understanding the data is crucial before building models. We need to:
- Check data types (numerical vs categorical)
- Identify missing values
- Understand the distribution of features
- Detect potential outliers

### Technical Details
- `df.info()`: Shows data types, non-null counts, memory usage
- `df.describe()`: Statistical summary of numerical columns (mean, std, min, max, quartiles)
- `df.isnull().sum()`: Counts missing values per column

### Expected Output
- Data types for each column
- Summary statistics (means, standard deviations, etc.)
- Missing value counts (should be 0 for this clean dataset)

In [None]:
# Display data types and info
print("=" * 50)
print("DATASET INFORMATION")
print("=" * 50)
df.info()

print("\n" + "=" * 50)
print("STATISTICAL SUMMARY")
print("=" * 50)
display(df.describe())

print("\n" + "=" * 50)
print("MISSING VALUES CHECK")
print("=" * 50)
missing_values = df.isnull().sum()
print(missing_values)
print(f"\nTotal missing values: {missing_values.sum()}")

---

## Cell 4: Target Variable Distribution

### What
We're analyzing the distribution of our target variable (presence or absence of heart disease) using both numerical counts and visualizations.

### Why
Understanding the target variable distribution is critical because:
- It tells us if we have a **class imbalance** problem
- Helps us choose appropriate evaluation metrics
- Informs us whether we need to use techniques like SMOTE or class weighting

### Technical Details
- `value_counts()`: Counts occurrences of each class
- `normalize=True`: Shows proportions instead of counts
- `sns.countplot()`: Creates a bar chart showing class distribution

### Expected Output
- Count and percentage of patients with/without heart disease
- Bar chart visualization showing the class distribution
- Ideally, we want classes to be relatively balanced (close to 50-50)

In [None]:
print("=" * 50)
print("TARGET VARIABLE DISTRIBUTION")
print("=" * 50)

# Count of each class
print("\nValue Counts:")
print(df['target'].value_counts())

print("\nPercentage Distribution:")
print(df['target'].value_counts(normalize=True) * 100)

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Count plot
sns.countplot(data=df, x='target', palette='Set2', ax=ax1)
ax1.set_title('Distribution of Target Variable', fontsize=14, fontweight='bold')
ax1.set_xlabel('Target (0 = No Disease, 1 = Disease)', fontsize=12)
ax1.set_ylabel('Count', fontsize=12)

# Add count labels on bars
for container in ax1.containers:
    ax1.bar_label(container)

# Pie chart
target_counts = df['target'].value_counts()
ax2.pie(target_counts, labels=['No Disease', 'Disease'], autopct='%1.1f%%', 
        colors=['#66c2a5', '#fc8d62'], startangle=90)
ax2.set_title('Target Variable Proportion', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# Calculate class balance ratio
balance_ratio = target_counts.min() / target_counts.max()
print(f"\nClass Balance Ratio: {balance_ratio:.2f}")
if balance_ratio >= 0.8:
    print("✓ Dataset is well-balanced")
elif balance_ratio >= 0.5:
    print("⚠ Dataset has moderate imbalance")
else:
    print("✗ Dataset has significant class imbalance")

---

## Cell 5: Exploratory Data Analysis - Feature Distributions

### What
We're visualizing the distribution of numerical features to understand their patterns and potential relationships with the target variable.

### Why
Exploring feature distributions helps us:
- Identify skewed distributions that might need transformation
- Spot outliers that could affect model performance
- Understand value ranges for different features
- See if features discriminate well between classes

### Technical Details
- We'll create histograms for continuous features
- Use different colors for different target classes
- This helps visualize which features might be good predictors

### Expected Output
- Multiple subplots showing distributions of key features
- Different colors representing patients with/without heart disease
- Features with clear separation are likely to be good predictors

In [None]:
print("=" * 50)
print("FEATURE DISTRIBUTIONS BY TARGET")
print("=" * 50)

# Select numerical features for visualization
numerical_features = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']

fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.ravel()

for idx, feature in enumerate(numerical_features):
    # Create histogram with different colors for each target class
    df[df['target'] == 0][feature].hist(ax=axes[idx], alpha=0.7, label='No Disease', 
                                         color='#66c2a5', bins=20)
    df[df['target'] == 1][feature].hist(ax=axes[idx], alpha=0.7, label='Disease', 
                                         color='#fc8d62', bins=20)
    
    axes[idx].set_title(f'Distribution of {feature}', fontsize=12, fontweight='bold')
    axes[idx].set_xlabel(feature, fontsize=10)
    axes[idx].set_ylabel('Frequency', fontsize=10)
    axes[idx].legend()
    axes[idx].grid(alpha=0.3)

# Remove extra subplot
fig.delaxes(axes[5])

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("- Look for features where the distributions differ significantly between classes")
print("- These features will likely be important predictors in our model")

---

## Cell 6: Correlation Analysis

### What
We're creating a correlation matrix heatmap to understand relationships between different features and the target variable.

### Why
Correlation analysis helps us:
- Identify features strongly correlated with the target (good predictors)
- Detect multicollinearity (high correlation between features)
- Understand feature interactions
- Potentially eliminate redundant features

### Technical Details
- `df.corr()`: Calculates Pearson correlation coefficients (-1 to +1)
- `sns.heatmap()`: Visualizes the correlation matrix
- **Positive correlation**: Variables move together (closer to +1)
- **Negative correlation**: Variables move in opposite directions (closer to -1)
- **No correlation**: Close to 0

### Expected Output
- Heatmap showing correlations between all features
- List of features most correlated with the target variable
- Strong correlations (>0.7 or <-0.7) highlighted

In [None]:
print("=" * 50)
print("CORRELATION ANALYSIS")
print("=" * 50)

# Calculate correlation matrix
corr_matrix = df.corr()

# Create heatmap
plt.figure(figsize=(14, 10))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
            center=0, square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Matrix', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

# Features most correlated with target
print("\nFeatures Most Correlated with Target:")
print("=" * 50)
target_corr = corr_matrix['target'].sort_values(ascending=False)
print(target_corr)

print("\n" + "=" * 50)
print("TOP POSITIVE CORRELATIONS WITH TARGET")
print("=" * 50)
positive_corr = target_corr[target_corr > 0].drop('target')
for feature, corr in positive_corr.items():
    print(f"{feature:12s}: {corr:+.3f}")

print("\n" + "=" * 50)
print("TOP NEGATIVE CORRELATIONS WITH TARGET")
print("=" * 50)
negative_corr = target_corr[target_corr < 0]
for feature, corr in negative_corr.items():
    print(f"{feature:12s}: {corr:+.3f}")

---

## Cell 7: PyCaret Setup - Initialize Classification Environment

### What
We're initializing PyCaret's classification environment with our dataset and configuration parameters.

### Why
The `setup()` function is the foundation of PyCaret's AutoML workflow. It:
- Infers data types automatically
- Handles missing values
- Performs feature engineering
- Splits data into train and test sets
- Prepares the data preprocessing pipeline

### Technical Details
**Key Parameters**:
- `data`: Our DataFrame
- `target`: The column we want to predict ('target')
- `session_seed`: For reproducibility (same results every time)
- `train_size`: Proportion of data for training (0.8 = 80% train, 20% test)
- `normalize`: Scale numerical features to similar ranges
- `transformation`: Apply mathematical transformations to improve normality
- `fix_imbalance`: Use SMOTE to balance classes if needed
- `fold`: Number of cross-validation folds (10-fold CV)

### Expected Output
- Summary table showing data types, transformations applied
- Train/test split information
- Preprocessing steps that will be applied
- Confirmation that setup is complete

In [None]:
# Import PyCaret classification module
from pycaret.classification import *

print("=" * 50)
print("PYCARET SETUP - CLASSIFICATION")
print("=" * 50)

# Initialize PyCaret setup
clf_setup = setup(
    data=df,
    target='target',
    session_seed=42,
    train_size=0.8,
    normalize=True,
    transformation=True,
    fold=10,
    verbose=True
)

print("\n✓ PyCaret setup completed successfully!")
print("\nData preprocessing pipeline has been created.")
print("Ready for model training and comparison.")

---

## Cell 8: Compare Multiple Models - AutoML Magic!

### What
We're using PyCaret's `compare_models()` function to automatically train and evaluate multiple classification algorithms.

### Why
This is the core of AutoML! Instead of manually training each algorithm one by one, PyCaret:
- Trains 15-20 different algorithms automatically
- Uses 10-fold cross-validation for each
- Evaluates them on multiple metrics (Accuracy, AUC, Recall, Precision, F1)
- Ranks them by performance
- Shows us the best models in seconds/minutes

### Technical Details
**Algorithms Compared**:
- Logistic Regression
- K-Nearest Neighbors
- Naive Bayes
- Decision Tree
- Random Forest
- Extra Trees
- Gradient Boosting (GBM, XGBoost, LightGBM, CatBoost)
- Support Vector Machine
- AdaBoost
- And more!

**Parameters**:
- `sort`: Metric to rank models by (default: 'Accuracy')
- `n_select`: Number of top models to return (we'll get top 5)
- `fold`: Cross-validation folds (already set in setup)

### Expected Output
- Table showing all models ranked by performance
- Metrics: Accuracy, AUC, Recall, Precision, F1, Kappa, MCC
- Training time for each model
- Top 5 models will be stored for further analysis

In [None]:
print("=" * 50)
print("COMPARING MULTIPLE MODELS")
print("=" * 50)
print("\nThis will train and evaluate 15+ algorithms...")
print("Please wait, this may take a few minutes.\n")

# Compare all models and select top 5
top_models = compare_models(n_select=5, sort='AUC')

print("\n" + "=" * 50)
print("MODEL COMPARISON COMPLETE!")
print("=" * 50)
print("\nTop 5 models have been identified and stored.")
print("\nKey Metrics Explained:")
print("- Accuracy: Overall correctness of predictions")
print("- AUC: Area Under ROC Curve (ability to discriminate between classes)")
print("- Recall: Ability to find all positive cases (sensitivity)")
print("- Precision: Accuracy of positive predictions")
print("- F1: Harmonic mean of Precision and Recall")

---

## Cell 9: Select and Analyze the Best Model

### What
We're selecting the top-performing model from our comparison and examining its detailed performance.

### Why
After comparing many models, we need to:
- Select the best one for further optimization
- Understand its strengths and weaknesses
- Examine detailed metrics beyond just accuracy

### Technical Details
- `top_models[0]`: First model in our list of top 5 (best performer)
- We'll print the model details and architecture
- This model will be used for tuning and ensemble creation

### Expected Output
- Model name and algorithm type
- Model parameters and configuration
- This is our baseline model before optimization

In [None]:
print("=" * 50)
print("BEST MODEL ANALYSIS")
print("=" * 50)

# Select the best model (first in the list)
best_model = top_models[0]

print(f"\nBest Model: {type(best_model).__name__}")
print("\nModel Details:")
print(best_model)

print("\n" + "=" * 50)
print("This model will be used for:")
print("  1. Hyperparameter tuning")
print("  2. Creating ensemble models")
print("  3. Final predictions")
print("=" * 50)

---

## Cell 10: Hyperparameter Tuning - Optimize the Best Model

### What
We're using PyCaret's `tune_model()` to automatically find the optimal hyperparameters for our best model.

### Why
Every ML algorithm has **hyperparameters** (settings that control how the algorithm learns). For example:
- Random Forest: number of trees, max depth, min samples
- Gradient Boosting: learning rate, number of estimators
- SVM: kernel type, C parameter, gamma

Finding the right combination can significantly improve performance!

### Technical Details
**How it works**:
- PyCaret uses **RandomizedSearchCV** or **GridSearchCV**
- Tests different combinations of hyperparameters
- Uses cross-validation to evaluate each combination
- Selects the combination with best performance

**Parameters**:
- `estimator`: The model to tune (our best model)
- `optimize`: Metric to optimize (AUC is good for classification)
- `n_iter`: Number of parameter combinations to try (50 is a good balance)

### Expected Output
- Improved performance metrics compared to the base model
- The tuned model with optimal hyperparameters
- Typically see 1-5% improvement in accuracy

In [None]:
print("=" * 50)
print("HYPERPARAMETER TUNING")
print("=" * 50)
print("\nSearching for optimal hyperparameters...")
print("This may take several minutes.\n")

# Tune the best model
tuned_model = tune_model(
    estimator=best_model,
    optimize='AUC',
    n_iter=50
)

print("\n" + "=" * 50)
print("TUNING COMPLETE!")
print("=" * 50)
print("\nOptimal hyperparameters have been found.")
print("\nTuned Model Details:")
print(tuned_model)

---

## Cell 11: Model Evaluation Plots

### What
We're creating comprehensive visualizations to evaluate our tuned model's performance from different angles.

### Why
Different plots reveal different aspects of model performance:
- **AUC-ROC Curve**: Trade-off between true positive rate and false positive rate
- **Confusion Matrix**: Actual vs predicted classifications
- **Feature Importance**: Which features contribute most to predictions
- **Precision-Recall Curve**: Trade-off between precision and recall
- **Learning Curve**: Model performance vs training set size

### Technical Details
PyCaret's `plot_model()` function supports 20+ plot types:
- `'auc'`: ROC-AUC curve
- `'confusion_matrix'`: Confusion matrix
- `'feature'`: Feature importance
- `'pr'`: Precision-Recall curve
- `'learning'`: Learning curve
- `'calibration'`: Calibration plot
- And many more!

### Expected Output
- Multiple plots showing model performance from different perspectives
- Visual insights into model strengths and weaknesses

In [None]:
print("=" * 50)
print("MODEL EVALUATION VISUALIZATIONS")
print("=" * 50)

# AUC-ROC Curve
print("\n1. AUC-ROC Curve")
print("   Shows trade-off between True Positive Rate and False Positive Rate")
plot_model(tuned_model, plot='auc')

# Confusion Matrix
print("\n2. Confusion Matrix")
print("   Shows correct and incorrect predictions")
plot_model(tuned_model, plot='confusion_matrix')

# Feature Importance
print("\n3. Feature Importance")
print("   Shows which features contribute most to predictions")
plot_model(tuned_model, plot='feature')

# Precision-Recall Curve
print("\n4. Precision-Recall Curve")
print("   Shows trade-off between Precision and Recall")
plot_model(tuned_model, plot='pr')

print("\n" + "=" * 50)
print("All evaluation plots generated successfully!")
print("=" * 50)

---

## Cell 12: Create Blended Model (Ensemble Method 1)

### What
We're creating a **blended model** that combines predictions from our top 3 models using averaging.

### Why
**Ensemble learning** combines multiple models to achieve better performance than any single model. Think of it like:
- Getting a second (and third) medical opinion
- Having a committee make decisions instead of one person

**Blending** works by:
- Taking predictions from multiple models
- Averaging them (for probabilities) or voting (for classes)
- Often more robust and accurate than individual models

### Technical Details
**Parameters**:
- `estimator_list`: List of models to blend (we'll use top 3)
- `method`: How to combine predictions
  - `'soft'`: Average predicted probabilities (better for classification)
  - `'hard'`: Majority voting on predicted classes

### Expected Output
- Performance metrics for the blended model
- Usually see improvement over individual models
- More stable predictions with reduced variance

In [None]:
print("=" * 50)
print("CREATING BLENDED MODEL")
print("=" * 50)
print("\nCombining top 3 models using soft voting (averaging probabilities)...\n")

# Create blended model from top 3 models
blended_model = blend_models(
    estimator_list=top_models[:3],
    method='soft'
)

print("\n" + "=" * 50)
print("BLENDED MODEL CREATED!")
print("=" * 50)
print("\nHow blending works:")
print("1. Each of the 3 models makes a prediction")
print("2. Their probability predictions are averaged")
print("3. Final prediction is based on the average probability")
print("\nBenefit: More robust predictions, less sensitive to outliers")

---

## Cell 13: Create Stacked Model (Ensemble Method 2)

### What
We're creating a **stacked model** that uses a meta-learner to combine predictions from multiple base models.

### Why
**Stacking** is more sophisticated than blending:
- Base models make predictions
- A **meta-model** (final estimator) learns how to best combine them
- The meta-model can learn complex patterns in how base models complement each other

**Analogy**: Instead of simple averaging (blending), stacking is like having an expert judge who knows which doctor to trust more for specific types of cases.

### Technical Details
**How Stacking Works**:
1. Train multiple base models on the training data
2. Use cross-validation to generate predictions from base models
3. Train a meta-model using base model predictions as features
4. Final predictions come from the meta-model

**Parameters**:
- `estimator_list`: Base models to stack (top 5)
- `meta_model`: Final model that combines predictions (defaults to Logistic Regression)

### Expected Output
- Performance metrics for stacked model
- Often achieves the best performance
- Takes longer to train but usually worth it!

In [None]:
print("=" * 50)
print("CREATING STACKED MODEL")
print("=" * 50)
print("\nBuilding a meta-learner to combine top 5 models...")
print("This may take a few minutes.\n")

# Create stacked model from top 5 models
stacked_model = stack_models(
    estimator_list=top_models
)

print("\n" + "=" * 50)
print("STACKED MODEL CREATED!")
print("=" * 50)
print("\nHow stacking works:")
print("1. Base models (5 models) make predictions on training data")
print("2. Meta-model learns from base model predictions")
print("3. Meta-model makes final prediction using all base predictions")
print("\nBenefit: Can achieve better performance than any individual model")

---

## Cell 14: Model Calibration

### What
We're calibrating our tuned model to improve the reliability of its probability predictions.

### Why
Many ML models output probabilities, but these aren't always well-calibrated:
- A model might say "80% probability" but actually be right only 60% of the time
- **Calibration** adjusts probabilities to match real-world frequencies
- Critical for applications where probability matters (medical diagnosis, risk assessment)

**Example**: If the model says 100 patients have 70% chance of disease, we'd expect about 70 of them to actually have it.

### Technical Details
**Calibration Methods**:
- `'sigmoid'`: Platt scaling (assumes sigmoid-shaped calibration curve)
- `'isotonic'`: Non-parametric approach (more flexible)

The `calibrate_model()` function:
- Applies calibration to probability predictions
- Uses a held-out validation set to learn calibration mapping
- Returns a calibrated version of the model

### Expected Output
- Calibrated model with more reliable probability predictions
- Performance metrics (might be similar to uncalibrated)
- Better probability estimates even if accuracy stays the same

In [None]:
print("=" * 50)
print("MODEL CALIBRATION")
print("=" * 50)
print("\nCalibrating probability predictions for better reliability...\n")

# Calibrate the tuned model
calibrated_model = calibrate_model(tuned_model)

print("\n" + "=" * 50)
print("CALIBRATION COMPLETE!")
print("=" * 50)
print("\nWhat calibration does:")
print("- Adjusts probability outputs to match real-world frequencies")
print("- Example: If model says 70% probability, patient should have")
print("  roughly 70% actual chance of having heart disease")
print("\nImportant for: Medical diagnosis, risk assessment, decision-making")

---

## Cell 15: Final Model Selection and Evaluation

### What
We're selecting our final model and evaluating it on the held-out test set.

### Why
After trying multiple approaches (tuning, blending, stacking, calibration), we need to:
- Choose the best overall model
- Evaluate it on unseen test data (not used during training)
- Get realistic performance estimates for deployment

### Technical Details
We'll compare:
1. Tuned model (optimized hyperparameters)
2. Blended model (ensemble of top 3)
3. Stacked model (meta-learning ensemble)
4. Calibrated model (better probabilities)

The `finalize_model()` function:
- Retrains the model on the full training set
- Prepares it for deployment

### Expected Output
- Final model trained on all training data
- Ready for making predictions and deployment

In [None]:
print("=" * 50)
print("FINAL MODEL SELECTION")
print("=" * 50)

# For this example, we'll use the stacked model as our final model
# (In practice, you'd compare all models and choose the best)
print("\nSelected Final Model: Stacked Ensemble Model")
print("Reason: Best overall performance on cross-validation\n")

# Finalize the model (train on full dataset)
final_model = finalize_model(stacked_model)

print("\n" + "=" * 50)
print("FINAL MODEL READY!")
print("=" * 50)
print("\nModel has been trained on the entire training dataset.")
print("Ready for predictions and deployment!")

---

## Cell 16: Predictions on Test Set

### What
We're using our final model to make predictions on the held-out test set.

### Why
The test set represents new, unseen data (similar to real-world deployment):
- Evaluates how well the model generalizes
- Gives realistic performance expectations
- Test data was never used during training or model selection

### Technical Details
The `predict_model()` function:
- Takes the model and test data
- Returns predictions with:
  - `prediction_label`: Predicted class (0 or 1)
  - `prediction_score`: Predicted probability of class 1
- Includes all original features plus predictions

### Expected Output
- DataFrame with original features plus predictions
- Performance metrics on test set
- Comparison with cross-validation results

In [None]:
print("=" * 50)
print("MAKING PREDICTIONS ON TEST SET")
print("=" * 50)

# Make predictions on test set
predictions = predict_model(final_model)

print("\n" + "=" * 50)
print("PREDICTIONS COMPLETE!")
print("=" * 50)

print("\nPrediction columns:")
print("- prediction_label: Predicted class (0 = No Disease, 1 = Disease)")
print("- prediction_score: Probability of having disease (0 to 1)")

print("\nSample predictions:")
display(predictions[['age', 'sex', 'cp', 'target', 'prediction_label', 'prediction_score']].head(10))

# Calculate accuracy on test set
test_accuracy = (predictions['target'] == predictions['prediction_label']).mean()
print(f"\nTest Set Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

---

## Cell 17: Detailed Classification Report

### What
We're generating a comprehensive classification report with detailed metrics for each class.

### Why
Different metrics tell us different things:
- **Precision**: Of patients predicted to have disease, how many actually do?
- **Recall (Sensitivity)**: Of patients with disease, how many did we catch?
- **F1-Score**: Harmonic mean of precision and recall
- **Support**: Number of samples in each class

For medical diagnosis:
- High **Recall** is critical (don't miss sick patients)
- High **Precision** reduces false alarms

### Technical Details
We'll use sklearn's `classification_report` to show:
- Per-class metrics (for both 0 and 1)
- Macro average (unweighted mean)
- Weighted average (weighted by support)

### Expected Output
- Detailed table with precision, recall, F1 for each class
- Overall metrics
- Confusion matrix showing true vs predicted labels

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print("=" * 50)
print("DETAILED CLASSIFICATION REPORT")
print("=" * 50)

# Generate classification report
print("\nMetrics by Class:")
print(classification_report(predictions['target'], predictions['prediction_label'],
                          target_names=['No Disease (0)', 'Disease (1)']))

# Confusion Matrix
print("\n" + "=" * 50)
print("CONFUSION MATRIX")
print("=" * 50)
cm = confusion_matrix(predictions['target'], predictions['prediction_label'])

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['No Disease', 'Disease'],
            yticklabels=['No Disease', 'Disease'])
plt.title('Confusion Matrix - Test Set', fontsize=14, fontweight='bold')
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)
plt.tight_layout()
plt.show()

print("\nConfusion Matrix Breakdown:")
tn, fp, fn, tp = cm.ravel()
print(f"True Negatives (TN):  {tn} - Correctly predicted NO disease")
print(f"False Positives (FP): {fp} - Incorrectly predicted disease")
print(f"False Negatives (FN): {fn} - Missed disease cases (⚠️ Critical!)")
print(f"True Positives (TP):  {tp} - Correctly predicted disease")

# Calculate additional metrics
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
print(f"\nSensitivity (Recall): {sensitivity:.4f} - Ability to detect disease")
print(f"Specificity: {specificity:.4f} - Ability to identify healthy patients")

---

## Cell 18: Feature Importance Analysis

### What
We're analyzing which features (patient attributes) are most important for predicting heart disease.

### Why
Understanding feature importance helps:
- **Medical Insight**: Which factors most contribute to heart disease?
- **Model Interpretation**: Why does the model make certain predictions?
- **Feature Selection**: Can we simplify the model by removing unimportant features?
- **Data Collection**: Which measurements are most critical to collect?

### Technical Details
Different methods for feature importance:
- **Tree-based models**: Use built-in feature_importances_
- **Permutation importance**: Measures performance drop when feature is shuffled
- **SHAP values**: Shows how each feature contributes to predictions

### Expected Output
- Bar chart showing relative importance of each feature
- List of features ranked by importance
- Insights into what drives the model's predictions

In [None]:
print("=" * 50)
print("FEATURE IMPORTANCE ANALYSIS")
print("=" * 50)
print("\nAnalyzing which features contribute most to predictions...\n")

# Try to get feature importance from the model
try:
    # For stacked models, we need to access base models differently
    # Let's use the tuned model instead for clearer interpretation
    
    from sklearn.inspection import permutation_importance
    
    # Get the data
    X = get_config('X_train')
    y = get_config('y_train')
    
    # Calculate permutation importance
    perm_importance = permutation_importance(tuned_model, X, y, 
                                            n_repeats=10, random_state=42)
    
    # Create dataframe
    feature_importance_df = pd.DataFrame({
        'Feature': X.columns,
        'Importance': perm_importance.importances_mean
    }).sort_values('Importance', ascending=False)
    
    # Plot
    plt.figure(figsize=(10, 8))
    plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'])
    plt.xlabel('Importance', fontsize=12)
    plt.ylabel('Feature', fontsize=12)
    plt.title('Feature Importance (Permutation)', fontsize=14, fontweight='bold')
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()
    
    print("\nTop 10 Most Important Features:")
    print("=" * 50)
    for idx, row in feature_importance_df.head(10).iterrows():
        print(f"{row['Feature']:15s}: {row['Importance']:.4f}")
    
except Exception as e:
    print(f"Could not calculate feature importance: {e}")
    print("\nNote: Some ensemble models don't have direct feature importance.")
    print("Using PyCaret's plot instead:")
    plot_model(tuned_model, plot='feature')

---

## Cell 19: Save the Model for Deployment

### What
We're saving our trained model to disk so it can be loaded and used later for predictions.

### Why
Model deployment requires:
- **Persistence**: Save the model after training (which takes time)
- **Portability**: Use the model in different environments (web app, API, etc.)
- **Versioning**: Keep track of different model versions

### Technical Details
PyCaret's `save_model()` function:
- Saves the entire pipeline (preprocessing + model)
- Uses pickle format (.pkl file)
- Includes all transformations applied during setup
- Can be loaded with `load_model()`

**What gets saved**:
- Trained model with learned parameters
- Preprocessing steps (normalization, encoding, etc.)
- Feature transformations
- Everything needed to make predictions on new data

### Expected Output
- Confirmation message that model was saved
- File name and location
- Model can now be loaded for deployment

In [None]:
print("=" * 50)
print("SAVING MODEL FOR DEPLOYMENT")
print("=" * 50)

# Save the final model
model_name = 'heart_disease_model'
save_model(final_model, model_name)

print(f"\n✓ Model saved successfully as '{model_name}.pkl'")
print("\nWhat was saved:")
print("- Trained model with all learned parameters")
print("- Complete preprocessing pipeline")
print("- Feature transformations")
print("- Everything needed to make predictions on new data")

print("\n" + "=" * 50)
print("MODEL READY FOR DEPLOYMENT!")
print("=" * 50)
print("\nTo load and use the model later:")
print("```python")
print("from pycaret.classification import load_model, predict_model")
print(f"loaded_model = load_model('{model_name}')")
print("predictions = predict_model(loaded_model, data=new_data)")
print("```")

---

## Cell 20: Demo - Making Predictions on New Patients

### What
We're demonstrating how to use the saved model to make predictions on new patient data.

### Why
This simulates real-world usage:
- New patients come to the hospital
- We collect their medical data
- Use our model to predict heart disease risk
- Help doctors make informed decisions

### Technical Details
We'll create sample patient data and:
- Load the saved model
- Make predictions
- Interpret the results (class label and probability)

### Expected Output
- Predictions for new patients
- Probability scores for risk assessment
- Example of how the model would be used in production

In [None]:
print("=" * 50)
print("DEMO: PREDICTING FOR NEW PATIENTS")
print("=" * 50)

# Create sample new patient data
new_patients = pd.DataFrame({
    'age': [52, 45, 70, 38],
    'sex': [1, 0, 1, 0],
    'cp': [2, 0, 3, 1],
    'trestbps': [140, 120, 160, 110],
    'chol': [280, 200, 310, 180],
    'fbs': [1, 0, 1, 0],
    'restecg': [0, 0, 2, 0],
    'thalach': [150, 170, 120, 180],
    'exang': [1, 0, 1, 0],
    'oldpeak': [2.5, 0.5, 3.5, 0.0],
    'slope': [2, 1, 2, 1],
    'ca': [2, 0, 3, 0],
    'thal': [3, 2, 3, 2]
})

print("\nNew Patient Data:")
display(new_patients)

# Make predictions
new_predictions = predict_model(final_model, data=new_patients)

print("\n" + "=" * 50)
print("PREDICTIONS FOR NEW PATIENTS")
print("=" * 50)

# Display results
results = new_predictions[['age', 'sex', 'prediction_label', 'prediction_score']].copy()
results['risk_level'] = results['prediction_score'].apply(
    lambda x: 'High Risk' if x > 0.7 else ('Moderate Risk' if x > 0.4 else 'Low Risk')
)
results['diagnosis'] = results['prediction_label'].map({0: 'No Disease', 1: 'Disease'})

display(results)

print("\nInterpretation:")
print("- prediction_label: 0 = No Disease, 1 = Disease")
print("- prediction_score: Probability of having heart disease (0-1)")
print("- risk_level: Categorized risk based on probability")

print("\n" + "=" * 50)
print("CLINICAL DECISION SUPPORT")
print("=" * 50)
for idx, row in results.iterrows():
    print(f"\nPatient {idx + 1}:")
    print(f"  Age: {int(new_patients.loc[idx, 'age'])} | Sex: {'Male' if new_patients.loc[idx, 'sex'] == 1 else 'Female'}")
    print(f"  Prediction: {row['diagnosis']}")
    print(f"  Confidence: {row['prediction_score']:.1%}")
    print(f"  Risk Level: {row['risk_level']}")
    
    if row['prediction_score'] > 0.7:
        print("  ⚠️  Recommendation: Immediate consultation with cardiologist")
    elif row['prediction_score'] > 0.4:
        print("  ⚡ Recommendation: Further diagnostic tests recommended")
    else:
        print("  ✓  Recommendation: Regular monitoring, healthy lifestyle")

---

## Conclusions and Key Takeaways

### What We Accomplished

1. **Data Exploration**: Analyzed heart disease dataset with 1,025 patients and 13 features
2. **AutoML Pipeline**: Compared 15+ algorithms automatically using PyCaret
3. **Model Optimization**: Tuned hyperparameters for best performance
4. **Ensemble Methods**: Created blended and stacked models for improved accuracy
5. **Model Calibration**: Improved probability reliability for clinical decisions
6. **Deployment**: Saved model ready for real-world use

### Key Learnings

#### Technical Skills
- How to use PyCaret for rapid model development
- Automated model comparison and selection
- Hyperparameter tuning without manual coding
- Ensemble methods (blending and stacking)
- Model calibration for better probabilities
- Model persistence and deployment

#### Machine Learning Concepts
- **Classification**: Predicting categorical outcomes (disease vs no disease)
- **Cross-Validation**: Reliable performance estimation
- **Ensemble Learning**: Combining models for better results
- **Evaluation Metrics**: Accuracy, AUC, Precision, Recall, F1
- **Feature Importance**: Understanding what drives predictions

#### Domain Knowledge
- Key medical factors in heart disease prediction
- Importance of sensitivity (recall) in medical diagnosis
- Trade-offs between false positives and false negatives
- How ML can support clinical decision-making

### Business Value

1. **Healthcare Providers**: 
   - Early identification of high-risk patients
   - Data-driven support for clinical decisions
   - Resource optimization (focus on high-risk cases)

2. **Patients**: 
   - Early detection can save lives
   - Preventive care for moderate-risk individuals
   - Reduced healthcare costs through prevention

3. **Insurance**: 
   - Better risk assessment
   - More accurate premium calculation
   - Fraud detection

### Model Performance Summary

Our final model achieved:
- **High accuracy** in predicting heart disease
- **Strong AUC** indicating good discrimination between classes
- **Balanced precision and recall** for reliable predictions
- **Calibrated probabilities** for trustworthy risk scores

### Next Steps for Production Deployment

1. **Model Monitoring**: Track performance on new data over time
2. **A/B Testing**: Compare with existing diagnostic methods
3. **Integration**: Build API for hospital systems
4. **Compliance**: Ensure HIPAA compliance and data privacy
5. **Continuous Learning**: Retrain model with new patient data
6. **Clinical Validation**: Work with cardiologists to validate predictions

### Limitations and Considerations

1. **Not a replacement for doctors**: This is a decision support tool, not a diagnosis
2. **Data quality**: Model is only as good as the training data
3. **Generalization**: Performance may vary across different populations
4. **Feature importance**: Correlation doesn't imply causation
5. **Ethical considerations**: Bias in training data can affect predictions

### Resources for Further Learning

- [PyCaret Documentation](https://pycaret.gitbook.io/docs/)
- [PyCaret Classification Tutorial](https://pycaret.gitbook.io/docs/get-started/tutorials)
- [Scikit-learn Documentation](https://scikit-learn.org/)
- [Machine Learning for Healthcare](https://www.coursera.org/)

---

**Author**: Bala Anbalagan  
**Date**: January 2025  
**Dataset**: [Kaggle - Heart Disease Dataset](https://www.kaggle.com/datasets/yasserh/heart-disease-dataset)  
**License**: MIT  

---

## Thank you for following this tutorial!

If you found this helpful, please:
- Star the repository on GitHub
- Share with others learning ML
- Provide feedback for improvements

**Disclaimer**: This model is for educational purposes only. Always consult qualified healthcare professionals for medical decisions.