# DeepBridge AutoDistiller Demo - Using a Pre-trained Model

This notebook demonstrates how to use the DeepBridge `AutoDistiller` to compress a complex neural network model into a simpler one, starting with a pre-trained model file rather than pre-calculated probabilities.

## Overview

Knowledge distillation is a technique where a simpler model (student) learns to mimic the behavior of a more complex model (teacher). This allows us to create models that are:
- Smaller and faster
- Easier to deploy
- Often more interpretable

In this demo, we'll:
1. Create and train a neural network as our teacher model
2. Save the model to disk
3. Use AutoDistiller to create a simpler model that mimics the neural network

## Setup

First, let's import the necessary libraries:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import joblib
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, roc_auc_score

# Neural network libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Import DeepBridge components
from deepbridge.db_data import DBDataset
from deepbridge.auto_distiller import AutoDistiller
from deepbridge.distillation.classification.model_registry import ModelType

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

ModuleNotFoundError: No module named 'tensorflow'

## Generate Sample Data

Let's create a synthetic classification dataset:

In [None]:
# Generate a binary classification dataset
X, y = make_classification(
    n_samples=2000, 
    n_features=20, 
    n_informative=10, 
    n_redundant=5, 
    random_state=42
)

# Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features (important for neural networks)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to DataFrames
feature_names = [f'feature_{i}' for i in range(X.shape[1])]
train_df = pd.DataFrame(X_train_scaled, columns=feature_names)
train_df['target'] = y_train

test_df = pd.DataFrame(X_test_scaled, columns=feature_names)
test_df['target'] = y_test

# Display the first few rows
print("Training data shape:", train_df.shape)
train_df.head()

## Create and Train a Neural Network Teacher Model

Now we'll create a neural network as our teacher model:

In [None]:
# Create a neural network model
def create_neural_network():
    model = Sequential([
        Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
        Dropout(0.3),
        Dense(32, activation='relu'),
        Dropout(0.2),
        Dense(16, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    
    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Create and train the model
teacher_model = create_neural_network()

# Add early stopping
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True
)

# Train the model
history = teacher_model.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping],
    verbose=1
)

## Evaluate the Teacher Model Performance

In [None]:
# Plot training history
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Evaluate on test set
test_loss, test_acc = teacher_model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Accuracy: {test_acc:.4f}")

# Get predictions
y_pred_proba = teacher_model.predict(X_test_scaled)
y_pred = (y_pred_proba > 0.5).astype(int).flatten()

# Calculate AUC
auc = roc_auc_score(y_test, y_pred_proba)
print(f"AUC-ROC: {auc:.4f}")

## Save the Teacher Model

Now let's save our neural network model to disk. We'll save both a TensorFlow model (which will be loaded by TensorFlow) and a scikit-learn compatible wrapper (which will be used by DeepBridge):

In [None]:
# Create a directory for models if it doesn't exist
os.makedirs('models', exist_ok=True)

# Save the TensorFlow model
tf_model_path = 'models/teacher_nn_model'
teacher_model.save(tf_model_path)
print(f"TensorFlow model saved to {tf_model_path}")

# Create a scikit-learn compatible wrapper for the neural network
class NeuralNetworkWrapper:
    def __init__(self, model_path):
        self.model = load_model(model_path)
        self.classes_ = np.array([0, 1])
    
    def predict(self, X):
        return (self.predict_proba(X) > 0.5).astype(int)
    
    def predict_proba(self, X):
        # Get raw predictions
        y_pred = self.model.predict(X)
        # Convert to 2-column format for compatibility with sklearn
        return np.column_stack([1-y_pred, y_pred])

# Create and save the wrapper
nn_wrapper = NeuralNetworkWrapper(tf_model_path)
sklearn_model_path = 'models/teacher_sklearn_model.pkl'
joblib.dump(nn_wrapper, sklearn_model_path)
print(f"Scikit-learn compatible model saved to {sklearn_model_path}")

## Test the Saved Model

Let's verify that our saved model wrapper works correctly:

In [None]:
# Load the model wrapper
loaded_model = joblib.load(sklearn_model_path)

# Test predictions
test_probs = loaded_model.predict_proba(X_test_scaled)
test_preds = loaded_model.predict(X_test_scaled)

# Check shape and format
print(f"Probability predictions shape: {test_probs.shape}")
print(f"First 5 probability predictions:\n{test_probs[:5]}")
print(f"Binary predictions shape: {test_preds.shape}")
print(f"First 5 predictions: {test_preds[:5]}")

# Verify accuracy
accuracy = accuracy_score(y_test, test_preds)
print(f"Loaded model accuracy: {accuracy:.4f}")

# DEEPBRIDGE #

## Create DBDataset with Model Path

Now we'll create a DBDataset using our saved model path:

In [None]:
# Create DBDataset with model path
dataset = DBDataset(
    train_data=train_df,
    test_data=test_df,
    target_column='target',
    model_path=sklearn_model_path
)

# Verify the dataset
print(dataset)

## Run AutoDistiller with the Model

Now that we have our dataset with the model path configured, we can run the AutoDistiller to find the best student model:

In [None]:
# Initialize the AutoDistiller
distiller = AutoDistiller(
    dataset=dataset,
    output_dir="nn_distillation_results",
    test_size=0.2,  # For internal validation
    n_trials=10,    # Number of hyperparameter trials
    random_state=42,
    verbose=True
)

# Customize the configuration to test different models
distiller.customize_config(
    model_types=[
        ModelType.LOGISTIC_REGRESSION,
        ModelType.DECISION_TREE,
        ModelType.GBM,
        ModelType.XGB
    ],
    temperatures=[0.5, 1.0, 2.0],
    alphas=[0.3, 0.5, 0.7]
)

In [None]:
# Run the distillation process
# Use use_probabilities=False to indicate we're using the model to generate probabilities
results_df = distiller.run(use_probabilities=False)

# Display the results
results_df

## Analyze Results

Now let's analyze the results to find the best student model:

In [None]:
# Find the best model based on accuracy
best_accuracy_config = distiller.find_best_model(metric='test_accuracy')
print("Best model configuration by accuracy:")
for key, value in best_accuracy_config.items():
    if key not in ['best_params']:
        print(f"  {key}: {value}")

# Find the best model based on KL divergence
best_kl_config = distiller.find_best_model(metric='test_kl_divergence', minimize=True)
print("\nBest model configuration by KL divergence (lower is better):")
for key, value in best_kl_config.items():
    if key not in ['best_params']:
        print(f"  {key}: {value}")

# Compare the accuracy of the teacher model with the distilled model
print("\nModel Performance Comparison:")
print(f"  Teacher Neural Network Accuracy: {test_acc:.4f}")
print(f"  Best Distilled Model Accuracy: {best_accuracy_config.get('test_accuracy', 'N/A')}")
print(f"  Teacher Neural Network AUC: {auc:.4f}")
print(f"  Best Distilled Model AUC: {best_accuracy_config.get('test_auc_roc', 'N/A')}")

# Generate a summary report
print("\n----- Summary Report -----")
summary = distiller.generate_summary()
print(summary)

# Create visualizations
distiller.create_visualizations()

In [None]:
# Save the best distilled model
model_path = distiller.save_best_model(
    metric='test_kl_divergence', 
    minimize=True,
    file_path='models/best_distilled_model.pkl'
)
print(f"\nBest model saved to: {model_path}")

# Compare model sizes
import os

tf_model_size = sum(os.path.getsize(os.path.join(tf_model_path, f)) for f in os.listdir(tf_model_path) if os.path.isfile(os.path.join(tf_model_path, f)))
distilled_model_size = os.path.getsize(model_path)

print(f"\nModel Size Comparison:")
print(f"  Neural Network Teacher Model: {tf_model_size / (1024*1024):.2f} MB")
print(f"  Distilled Model: {distilled_model_size / (1024*1024):.2f} MB")
print(f"  Compression Ratio: {tf_model_size / distilled_model_size:.1f}x")

In [None]:
# Check inference speed
import time

# Measure neural network inference time
start_time = time.time()
for _ in range(100):
    teacher_model.predict(X_test_scaled)
nn_time = time.time() - start_time

# Load the distilled model
best_distilled_model = joblib.load(model_path)

# Measure distilled model inference time
start_time = time.time()
for _ in range(100):
    best_distilled_model.predict(X_test_scaled)
distilled_time = time.time() - start_time

print(f"\nInference Speed Comparison (100 predictions):")
print(f"  Neural Network Time: {nn_time:.4f} seconds")
print(f"  Distilled Model Time: {distilled_time:.4f} seconds")
print(f"  Speedup: {nn_time / distilled_time:.1f}x")

In [None]:
# Conclusion
print("\n----- Conclusion -----")
print(f"We've successfully distilled a complex neural network with {teacher_model.count_params():,} parameters")
print(f"into a simpler {best_kl_config['model_type']} model that offers:")
print(f"  - {tf_model_size / distilled_model_size:.1f}x smaller model size")
print(f"  - {nn_time / distilled_time:.1f}x faster inference")
print(f"  - Comparable accuracy ({test_acc:.4f} vs {best_accuracy_config.get('test_accuracy', 0):.4f})")
print("\nThis demonstrates how knowledge distillation can maintain performance while significantly")
print("reducing resource requirements, making complex models more deployable in production environments.")

In [None]:
# Additional analysis: Feature importance comparison
if hasattr(best_distilled_model, 'student_model') and hasattr(best_distilled_model.student_model, 'feature_importances_'):
    # Plot feature importance for the distilled model
    plt.figure(figsize=(12, 6))
    
    # Get feature importance
    importances = best_distilled_model.student_model.feature_importances_
    indices = np.argsort(importances)[::-1]
    
    plt.bar(range(X_test.shape[1]), importances[indices], align='center')
    plt.xticks(range(X_test.shape[1]), [feature_names[i] for i in indices], rotation=90)
    plt.title('Feature Importance in Distilled Model')
    plt.xlabel('Features')
    plt.ylabel('Importance')
    plt.tight_layout()
    plt.show()
    
    print("\nThe feature importance visualization shows which features the distilled model relies on most.")
    print("This can provide insights into model interpretability, which is an additional benefit of")
    print("using simpler models through knowledge distillation.")

# Save experiment artifacts
os.makedirs('experiment_results', exist_ok=True)
results_df.to_csv('experiment_results/distillation_results.csv', index=False)
print("\nExperiment results saved to: experiment_results/distillation_results.csv")

# Create predictions on test data using both models
teacher_test_preds = teacher_model.predict(X_test_scaled)
distilled_test_preds = best_distilled_model.predict(X_test_scaled)

# Check agreement between teacher and student models
agreement = np.mean(teacher_test_preds == distilled_test_preds)
print(f"\nAgreement between teacher and distilled model predictions: {agreement:.4f} ({agreement*100:.1f}%)")

print("\nFinal Remarks:")
print("1. The distilled model achieves comparable accuracy to the complex neural network")
print("2. The model size is significantly reduced, enabling deployment in constrained environments")
print("3. Inference speed is much faster, which is crucial for real-time applications")
print("4. The simpler model may offer better interpretability and explainability")
print("\nThese benefits make knowledge distillation a valuable technique for productionizing")
print("complex models while maintaining their performance characteristics.")