# DeepBridge AutoDistiller Demo - Using Pre-calculated Probabilities

This notebook demonstrates how to use the DeepBridge `AutoDistiller` to compress a complex model into a simpler one, using pre-calculated probabilities from the teacher model.

## Overview

Knowledge distillation is a technique where a simpler model (student) learns to mimic the behavior of a more complex model (teacher). This can help create models that are:
- Smaller and faster
- More interpretable
- Easier to deploy

In this demo, we'll use pre-calculated probabilities from a neural network model (which we'll assume was already trained) and distill its knowledge into a simpler model.

## Setup

First, let's import the necessary libraries and set up our environment:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score

# Import DeepBridge components
from deepbridge.db_data import DBDataset
from deepbridge.auto_distiller import AutoDistiller
from deepbridge.distillation.classification.model_registry import ModelType

# Set random seed for reproducibility
np.random.seed(42)

## Generate Sample Data

In a real scenario, you would load your actual dataset. For this demo, we'll generate a synthetic classification dataset:

In [None]:
# Generate a binary classification dataset
X, y = make_classification(
    n_samples=1000, 
    n_features=20, 
    n_informative=10, 
    n_redundant=5, 
    random_state=42
)

# Convert to DataFrame for better handling
feature_names = [f'feature_{i}' for i in range(X.shape[1])]
data = pd.DataFrame(X, columns=feature_names)
data['target'] = y

# Display the first few rows
data.head()

## Simulate Teacher Model Probabilities

In a real-world scenario, you would have probabilities from a pre-trained complex model (like a neural network). For this demo, we'll simulate those probabilities by creating a complex probability distribution:

In [None]:
# Simulate teacher model probability outputs (typically from a neural network)
# For the positive class (y=1), generate values that correlate with the true labels but have some noise
# This simulates how a real model would produce probabilities

# Base probability highly correlated with true label
base_probs = y * 0.7 + (1 - y) * 0.3

# Add some Gaussian noise but keep values between 0 and 1
noise = np.random.normal(0, 0.1, size=len(y))
probabilities = np.clip(base_probs + noise, 0.01, 0.99)

# Create DataFrame with probabilities
prob_df = pd.DataFrame({
    'prob_class_0': 1 - probabilities,
    'prob_class_1': probabilities
})

# Display the first few rows of probabilities
print("Teacher model probability outputs:")
prob_df.head()

## Create DBDataset with Pre-calculated Probabilities

The `DBDataset` class is DeepBridge's way of organizing data, features, and model predictions. We'll create one using our data and the simulated teacher model probabilities:

In [None]:
# Create a train/test split
train_indices = np.random.choice(len(data), int(0.8 * len(data)), replace=False)
test_indices = np.array([i for i in range(len(data)) if i not in train_indices])

train_data = data.iloc[train_indices].reset_index(drop=True)
test_data = data.iloc[test_indices].reset_index(drop=True)

train_probs = prob_df.iloc[train_indices].reset_index(drop=True)
test_probs = prob_df.iloc[test_indices].reset_index(drop=True)

# Create DBDataset with probabilities
dataset = DBDataset(
    train_data=train_data,
    test_data=test_data,
    target_column='target',
    train_predictions=train_probs,
    test_predictions=test_probs,
    prob_cols=['prob_class_0', 'prob_class_1']
)

# Verify the dataset
print(dataset)

## Initialize and Run the AutoDistiller

Now we'll use the `AutoDistiller` to find the best simple model to replace our complex one:

In [None]:
# Initialize the AutoDistiller with our dataset
distiller = AutoDistiller(
    dataset=dataset,
    output_dir="distillation_results",
    test_size=0.2,  # For internal validation
    n_trials=10,    # Number of hyperparameter trials
    random_state=42,
    verbose=True
)

# Customize the configuration (optional)
# We'll test multiple model types, temperatures, and alpha values
distiller.customize_config(
    model_types=[
        ModelType.LOGISTIC_REGRESSION,
        ModelType.DECISION_TREE,
        ModelType.GBM
    ],
    temperatures=[0.5, 1.0, 2.0],
    alphas=[0.3, 0.7]
)

In [None]:
# Run the distillation process
# This will test all combinations of model types, temperatures, and alphas
results_df = distiller.run(use_probabilities=True, verbose_output=True)

# Display the results
results_df

## Analyze Results

Let's analyze the results to find the best model configuration:

In [None]:
# Find the best model based on accuracy
best_accuracy_config = distiller.find_best_model(metric='test_accuracy')
print("Best model configuration by accuracy:")
for key, value in best_accuracy_config.items():
    if key not in ['best_params']:
        print(f"  {key}: {value}")

In [None]:
# Find the best model based on KL divergence (which measures how well the student mimics the teacher)
best_kl_config = distiller.find_best_model(metric='test_kl_divergence', minimize=True)
print("Best model configuration by KL divergence (lower is better):")
for key, value in best_kl_config.items():
    if key not in ['best_params']:
        print(f"  {key}: {value}")

## Generate a Report

The AutoDistiller can automatically generate a comprehensive report:

In [None]:
# Generate a summary report
summary = distiller.generate_summary()
print(summary)

## Save the Best Distilled Model

Finally, let's save the best distilled model for future use:

In [None]:
# Save the best model based on KL divergence
model_path = distiller.save_best_model(
    metric='test_kl_divergence', 
    minimize=True,
    file_path='best_distilled_model.pkl'
)
print(f"Best model saved to: {model_path}")

## Conclusion

We've successfully distilled the knowledge from a complex model (represented by pre-calculated probabilities) into a simpler, more efficient model. The distilled model maintains accuracy while being much simpler than the original neural network.

Key benefits:
- Smaller model size
- Faster inference time
- Potentially more interpretable (depending on the model type)

This approach is particularly useful when:
- You have a large, complex model that needs to be deployed on resource-constrained environments
- You want to maintain accuracy while improving inference speed
- You need a more interpretable model for regulatory or explainability requirements