## 6. Using the Experiment Class

Now that we have our DBDataset set up, we can use the `Experiment` class to run knowledge distillation experiments. The `Experiment` class manages the distillation process, from model training to evaluation.

In [8]:
# Create an Experiment instance with our DBDataset
experiment = Experiment(
    dataset=db_dataset,
    experiment_type="binary_classification",
    test_size=0.2,
    random_state=42
)

# Run distillation with a surrogate model
experiment.fit(
    student_model_type=ModelType.LOGISTIC_REGRESSION,
    temperature=1.0,  # Temperature parameter for knowledge distillation
    alpha=0.5,        # Alpha parameter (weight between teacher and true labels)
    use_probabilities=True,  # Use pre-calculated probabilities
    distillation_method="surrogate"  # Use surrogate model distillation
)

# Get the test metrics
test_metrics = experiment.results['test']

# Display performance metrics
print("Surrogate Model Performance:")
print(f"- Test Accuracy: {test_metrics.get('accuracy', 'N/A'):.3f}")
print(f"- Test AUC-ROC: {test_metrics.get('auc_roc', 'N/A'):.3f}")
print(f"- KL Divergence: {test_metrics.get('kl_divergence', 'N/A'):.3f}")
print(f"- KS Statistic: {test_metrics.get('ks_statistic', 'N/A'):.3f} (p-value: {test_metrics.get('ks_pvalue', 'N/A'):.3f})")
print(f"- R² Score: {test_metrics.get('r2_score', 'N/A'):.3f}")

=== Evaluating distillation model on train dataset ===
Using pre-calculated probabilities
Student probabilities shape: (455, 2)
First 3 student probabilities: [[0.98 0.02]
 [0.91 0.09]
 [0.14 0.86]]
Teacher probabilities type: <class 'pandas.core.frame.DataFrame'>
Teacher probabilities shape: (455, 2)
First 3 teacher probabilities: [[0.99 0.01]
 [0.75 0.25]
 [0.1  0.9 ]]
Teacher prob first 5 values: [0.01 0.25 0.9  0.56 0.98]
KS Statistic calculation: 0.03076923076923077, p-value: 0.9704654051871452
R² calculation successful: 0.9774650669825795
Evaluation metrics: {'accuracy': 0.9692307692307692, 'precision': 0.9692982456140351, 'recall': 0.9541284403669725, 'f1_score': 0.9616122840690979, 'auc_roc': 0.9873020305869752, 'auc_pr': 0.9865493718091531, 'log_loss': 0.10256498766755073, 'kl_divergence': 0.019997600911793095, 'ks_statistic': 0.03076923076923077, 'ks_pvalue': 0.9704654051871452, 'r2_score': 0.9774650669825795, 'distillation_method': 'SurrogateModel'}
=== Evaluation complete =

## 7. Try Knowledge Distillation Method

Now let's try the more advanced `knowledge_distillation` method, which uses both teacher probabilities and true labels during training.

In [9]:
# Create a new experiment for knowledge distillation
experiment_kd = Experiment(
    dataset=db_dataset,
    experiment_type="binary_classification",
    test_size=0.2,
    random_state=42
)

# Run knowledge distillation
experiment_kd.fit(
    student_model_type=ModelType.LOGISTIC_REGRESSION,
    temperature=1.0,
    alpha=0.5,
    use_probabilities=True,
    distillation_method="knowledge_distillation"  # Use knowledge distillation method
)

# Get the test metrics
test_metrics_kd = experiment_kd.results['test']

# Display performance metrics
print("Knowledge Distillation Performance:")
print(f"- Test Accuracy: {test_metrics_kd.get('accuracy', 'N/A'):.3f}")
print(f"- Test AUC-ROC: {test_metrics_kd.get('auc_roc', 'N/A'):.3f}")
print(f"- KL Divergence: {test_metrics_kd.get('kl_divergence', 'N/A'):.3f}")
print(f"- KS Statistic: {test_metrics_kd.get('ks_statistic', 'N/A'):.3f} (p-value: {test_metrics_kd.get('ks_pvalue', 'N/A'):.3f})")
print(f"- R² Score: {test_metrics_kd.get('r2_score', 'N/A'):.3f}")


=== DEBUG: _get_teacher_soft_labels ===
X shape: (455, 5)
teacher_model: False
teacher_probabilities: True
Using pre-calculated probabilities
teacher_probabilities is DataFrame with shape (455, 2)
teacher_probabilities columns: ['prob_class_0', 'prob_class_1']
Found prob_class_0 and prob_class_1 columns
First 3 probabilities: [[0.99 0.01]
 [0.75 0.25]
 [0.1  0.9 ]]
Probabilities shape: (455, 2)
Applying temperature=1.0 scaling
Soft labels shape: (455, 2)
First 3 soft labels: [[0.99 0.01]
 [0.75 0.25]
 [0.1  0.9 ]]
=== END DEBUG ===

=== EVALUATING DISTILLATION MODEL ===
Student probabilities shape: (455, 2)
First 3 student probabilities: [[0.97806519 0.02193481]
 [0.69868593 0.30131407]
 [0.07183229 0.92816771]]
Teacher soft labels shape: (455, 2)
First 3 teacher soft labels: [[0.99 0.01]
 [0.75 0.25]
 [0.1  0.9 ]]
teacher_prob shape: (455,)
First 5 teacher_prob values: [0.01 0.25 0.9  0.56 0.98]
y_prob stats: min=0.021934813298069293, max=0.9999983614375593, mean=0.34131406923242724


## 8. Compare Student Models with the Teacher

Now let's compare the student models with the original teacher model.

In [10]:
# Compare with original teacher model
print("Original Teacher Model:")
print(f"- Test Accuracy: {test_accuracy:.4f}")
print()

print("Comparison with Distilled Models:")
print("1. Surrogate Model (LogisticRegression)")
print(f"   - Test Accuracy: {test_metrics['accuracy']:.4f} (Δ: {test_metrics['accuracy'] - test_accuracy:+.4f})")
print(f"   - Distribution Similarity (R²): {test_metrics['r2_score']:.4f}")
print()
print("2. Knowledge Distillation (LogisticRegression)")
print(f"   - Test Accuracy: {test_metrics_kd['accuracy']:.4f} (Δ: {test_metrics_kd['accuracy'] - test_accuracy:+.4f})")
print(f"   - Distribution Similarity (R²): {test_metrics_kd['r2_score']:.4f}")
print()
print("Both distillation methods achieved similar accuracy but Knowledge Distillation produced better distribution matching.")

Original Teacher Model:
- Test Accuracy: 0.9440

Comparison with Distilled Models:
1. Surrogate Model (LogisticRegression)
   - Test Accuracy: 0.9474 (Δ: +0.0034)
   - Distribution Similarity (R²): 0.9900

2. Knowledge Distillation (LogisticRegression)
   - Test Accuracy: 0.9474 (Δ: +0.0034)
   - Distribution Similarity (R²): 0.9995

Both distillation methods achieved similar accuracy but Knowledge Distillation produced better distribution matching.


## 9. Extracting Model Information

Let's extract some information from our distilled model, such as parameters and feature importance.

In [11]:
# Get student model from knowledge distillation experiment
kd_model = experiment_kd.distillation_model

# Print model information
print("Knowledge Distillation Model Information:")
print(f"- Model Type: {kd_model.model.__class__.__name__}")
print(f"- Distillation Method: {kd_model.__class__.__name__}")
print()

# Extract feature importance
if hasattr(kd_model.model, 'coef_'):
    # For linear models like LogisticRegression
    importance = abs(kd_model.model.coef_[0])
    feature_importance = dict(zip(db_dataset.features, importance))
    
    # Sort features by importance
    sorted_features = sorted(feature_importance.items(), key=lambda x: x[1], reverse=True)
    
    print("Feature Importance:")
    for feature, importance in sorted_features:
        print(f"- {feature}: {importance:.3f}")

Knowledge Distillation Model Information:
- Model Type: LogisticRegression
- Distillation Method: KnowledgeDistillation

Feature Importance:
- mean radius: 1.227
- mean area: 0.812
- mean perimeter: 0.623
- mean texture: 0.343
- mean smoothness: 0.151


## 10. Making Predictions with the Distilled Model

Now let's use our distilled model to make predictions on new data.

In [12]:
from sklearn.metrics import confusion_matrix, classification_report

# Get predictions from the knowledge distillation model
student_predictions = experiment_kd.get_student_predictions(dataset='test')

# Display some example predictions
print("Example predictions from knowledge distillation model:")
print(student_predictions.head())
print()

# Calculate confusion matrix
cm = confusion_matrix(student_predictions['y_true'], student_predictions['y_pred'])
print("Confusion Matrix:")
print(cm)
print()

# Generate classification report
cr = classification_report(student_predictions['y_true'], student_predictions['y_pred'])
print("Classification Report:")
print(cr)

Example predictions from knowledge distillation model:
   y_true  y_pred   prob_0   prob_1
0       0       0  0.96244  0.03756
1       0       0  0.75431  0.24569
2       0       0  0.93902  0.06098
3       1       1  0.05483  0.94517
4       0       0  0.98437  0.01563

Confusion Matrix:
[[67  3]
 [ 3 41]]

Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.96      0.96        70
           1       0.93      0.93      0.93        44

    accuracy                           0.95       114
   macro avg       0.94      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114


## 11. Creating a Custom Example

Let's create a custom example and make a prediction with our distilled model.

In [13]:
# Create a custom example
custom_example = pd.DataFrame({
    'mean radius': [18.5],
    'mean texture': [15.0],
    'mean perimeter': [120.0],
    'mean area': [900.0],
    'mean smoothness': [0.10]
})

print("Custom example features:")
print(custom_example)
print()

# Make prediction with the distilled model
# Get probability
custom_prob = kd_model.predict_proba(custom_example)
custom_class = kd_model.predict(custom_example)[0]

# Convert class index to label (assuming 0=benign, 1=malignant)
class_name = "Malignant" if custom_class == 1 else "Benign"

print(f"Prediction: {class_name} ({custom_class})")
print(f"Probability of malignancy: {custom_prob[0, 1]:.2f}")

Custom example features:
   mean radius  mean texture  mean perimeter  mean area  mean smoothness
0        18.50         15.00          120.00     900.00           0.100

Prediction: Malignant (1)
Probability of malignancy: 0.89


## 12. Conclusion

In this notebook, we demonstrated the use of DeepBridge's `DBDataset` and `Experiment` classes for model distillation. We:

1. Created a complex "teacher" model using Random Forest
2. Organized our data using the powerful `DBDataset` class
3. Used the `Experiment` class to run distillation experiments with two different methods
4. Compared the performance of our student models with the original teacher model
5. Analyzed feature importance and made predictions with our distilled model

Our results show that we successfully created simpler, more efficient models (using logistic regression) that maintained or even slightly improved the accuracy of the original complex model. Knowledge distillation produced better distribution matching than the surrogate approach.

Model distillation is particularly valuable when you need to deploy models in resource-constrained environments or when inference speed is critical. The DeepBridge library makes this process straightforward and provides tools for comprehensive analysis and evaluation.