# Healthcare ML: Breast Cancer Classification with XGBoost

## Overview

This notebook demonstrates an **end-to-end machine learning workflow** for breast cancer diagnosis using the Wisconsin Diagnostic dataset. You'll build a production-ready classification model and deploy it to Snowflake's Model Registry for scalable inference.

### Business Context
Early detection of breast cancer significantly improves patient outcomes. This model analyzes cell nucleus characteristics from fine needle aspirate (FNA) images to predict whether a tumor is **malignant** (cancerous) or **benign** (non-cancerous).

### Technical Stack
| Component | Technology | Purpose |
|-----------|------------|---------|
| Algorithm | XGBoost | Gradient boosting classifier |
| Data | sklearn.datasets | 569 samples, 30 features |
| Registry | Snowflake ML | Model versioning & deployment |
| Runtime | Container Runtime | Pre-installed packages |

### Learning Objectives
By the end of this notebook, you will:
1. Perform **exploratory data analysis** with statistical insights
2. Apply **feature scaling** and understand when it matters
3. Compare **multiple algorithms** (Logistic Regression, Random Forest, XGBoost)
4. Implement **cross-validation** for robust model evaluation
5. Analyze **feature importance** and model interpretability
\n> **Note**: This notebook runs entirely locally - no Snowflake setup required. See **Part 2** for model deployment to Snowflake.

## Step 1: Environment Setup

### Import Libraries

| Library | Purpose |
|---------|---------|
| `pandas`, `numpy` | Data manipulation and numerical operations |
| `matplotlib`, `seaborn` | Statistical visualizations |
| `sklearn` | ML utilities, metrics, and baseline models |
| `xgboost` | Gradient boosting implementation |
| `pickle` | Save model artifacts to /tmp for Part 2 |

> **Note**: This notebook runs pure Python - no Snowflake connection required!

In [None]:
from snowflake.snowpark.context import get_active_session
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    roc_curve, auc, precision_recall_curve, f1_score
)

plt.style.use('dark_background')
sns.set_palette(['#00D4AA', '#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7', '#DDA0DD', '#98D8C8'])
plt.rcParams['figure.dpi'] = 100
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = '#1a1a2e'
plt.rcParams['figure.facecolor'] = '#1a1a2e'
plt.rcParams['axes.edgecolor'] = '#4a4a6a'
plt.rcParams['axes.labelcolor'] = '#ffffff'
plt.rcParams['xtick.color'] = '#cccccc'
plt.rcParams['ytick.color'] = '#cccccc'
plt.rcParams['grid.color'] = '#3a3a5a'
plt.rcParams['grid.alpha'] = 0.3
plt.rcParams['text.color'] = '#ffffff'

session = get_active_session()
session.sql("""
    ALTER SESSION SET query_tag = '{"origin":"sf_sit-is","name":"healthcare_ml_classification","version":{"major":1,"minor":0},"attributes":{"is_quickstart":1,"source":"notebook"}}'
""").collect()
print(f"Connected to Snowflake: {session.get_current_account()}")

## Step 2: Data Loading and Exploration

### About the Dataset

The **Breast Cancer Wisconsin (Diagnostic)** dataset contains measurements from **Fine Needle Aspirate (FNA)** images of breast masses. Each sample has 30 features computed from digitized images of cell nuclei.

### Feature Engineering Background

Ten real-valued features are computed for each cell nucleus:

| Feature | Description | Clinical Relevance |
|---------|-------------|-------------------|
| Radius | Mean distance from center to perimeter | Larger cells may indicate abnormality |
| Texture | Standard deviation of gray-scale values | Irregular texture suggests malignancy |
| Perimeter | Cell boundary length | Related to cell size |
| Area | Cell size measurement | Malignant cells often larger |
| Smoothness | Local variation in radius lengths | Irregular shapes are concerning |
| Compactness | Perimeter² / Area - 1.0 | Shape regularity metric |
| Concavity | Severity of concave portions | Indentations in cell boundary |
| Concave Points | Number of concave portions | Count of boundary indentations |
| Symmetry | Symmetry measurement | Asymmetry indicates problems |
| Fractal Dimension | "Coastline approximation" - 1 | Boundary complexity |

For each feature, **three statistics** are computed: mean, standard error (SE), and "worst" (mean of 3 largest values), yielding **30 features** total.

### Target Variable
- **0 = Malignant** (cancerous) - 212 samples (37.3%)
- **1 = Benign** (non-cancerous) - 357 samples (62.7%)

In [None]:
cancer = load_breast_cancer()
feature_names = [name.replace(' ', '_').upper() for name in cancer.feature_names]
X = pd.DataFrame(cancer.data, columns=feature_names)
y = pd.Series(cancer.target, name="DIAGNOSIS")

print("=" * 60)
print("DATASET SUMMARY")
print("=" * 60)
print(f"Shape: {X.shape[0]} samples × {X.shape[1]} features")
print(f"\nTarget Classes: {list(cancer.target_names)}")
print(f"\nClass Distribution:")
print(f"  • Malignant (0): {(y == 0).sum():>4} samples ({(y == 0).mean()*100:.1f}%)")
print(f"  • Benign (1):    {(y == 1).sum():>4} samples ({(y == 1).mean()*100:.1f}%)")
print(f"\nClass Imbalance Ratio: {(y == 1).sum() / (y == 0).sum():.2f}:1 (Benign:Malignant)")

print("\n" + "=" * 60)
print("FEATURE STATISTICS (First 5 Features)")
print("=" * 60)
X.iloc[:, :5].describe().round(3)

### Class Distribution Analysis

Understanding class balance is critical for classification tasks:

- **Imbalanced data** can bias models toward the majority class
- **Metrics like accuracy** become misleading with imbalance
- **Stratified sampling** ensures both train/test sets maintain class proportions

Our dataset has a **1.68:1 ratio** (Benign:Malignant) - moderate imbalance that we'll handle with stratified cross-validation.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

colors = ['#ff6b6b', '#4ecdc4']
counts = [sum(y == 0), sum(y == 1)]
axes[0].bar(['Malignant', 'Benign'], counts, color=colors, edgecolor='black', linewidth=1.2)
axes[0].set_ylabel('Count', fontsize=12)
axes[0].set_title('Class Distribution', fontsize=14, fontweight='bold')
for i, v in enumerate(counts):
    axes[0].text(i, v + 5, str(v), ha='center', fontsize=12, fontweight='bold')

axes[1].pie(counts, labels=['Malignant', 'Benign'], colors=colors, autopct='%1.1f%%',
            explode=(0.05, 0), shadow=True, startangle=90,
            textprops={'fontsize': 12, 'fontweight': 'bold'})
axes[1].set_title('Class Proportion', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

### Feature Distributions by Diagnosis

Visualizing feature distributions reveals **class separability**:
- Features with **distinct distributions** between classes (e.g., mean radius) will be strong predictors
- **Overlapping distributions** indicate weaker individual predictive power
- Even weak features may contribute through feature interactions

In [None]:
key_features = ['MEAN_RADIUS', 'MEAN_TEXTURE', 'MEAN_PERIMETER', 'MEAN_AREA',
                'MEAN_SMOOTHNESS', 'MEAN_COMPACTNESS']

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
axes = axes.flatten()

for idx, feature in enumerate(key_features):
    for diagnosis, color, label in [(0, '#FF6B6B', 'Malignant'), (1, '#00D4AA', 'Benign')]:
        axes[idx].hist(X.loc[y == diagnosis, feature], bins=20, alpha=0.7,
                      color=color, label=label, edgecolor='#1a1a2e', linewidth=0.5)
    axes[idx].set_xlabel(feature.replace('MEAN_', '').title(), fontsize=11)
    axes[idx].set_ylabel('Frequency', fontsize=11)
    axes[idx].legend(facecolor='#1a1a2e', edgecolor='#4a4a6a')
    axes[idx].set_title(f'{feature.replace("_", " ").title()}', fontsize=12, fontweight='bold', color='#ffffff')

plt.suptitle('Feature Distributions by Diagnosis', fontsize=16, fontweight='bold', y=1.02, color='#ffffff')
plt.tight_layout()
plt.show()

### Feature Correlation Analysis

**Correlation Interpretation:**
| Value | Meaning |
|-------|---------|
| +0.7 to +1.0 | Strong positive correlation |
| +0.3 to +0.7 | Moderate positive correlation |
| -0.3 to +0.3 | Weak/no correlation |
| -0.7 to -0.3 | Moderate negative correlation |
| -1.0 to -0.7 | Strong negative correlation |

**Key Insight:** Radius, perimeter, and area are highly correlated (geometrically related). This **multicollinearity** doesn't harm tree-based models but affects interpretation of linear models.

In [None]:
mean_features = [col for col in X.columns if 'MEAN' in col]
corr_matrix = X[mean_features].corr()

plt.figure(figsize=(10, 8))
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
sns.heatmap(corr_matrix, mask=mask, annot=True, fmt='.2f', cmap='coolwarm',
            center=0, square=True, linewidths=0.5, linecolor='#1a1a2e',
            cbar_kws={'shrink': 0.8, 'label': 'Correlation'},
            annot_kws={'size': 9, 'color': '#ffffff'})
plt.title('Feature Correlation Matrix (Mean Features)', fontsize=14, fontweight='bold', pad=20, color='#ffffff')
plt.xticks(rotation=45, ha='right', color='#cccccc')
plt.yticks(color='#cccccc')
plt.tight_layout()
plt.show()

## Step 3: Data Preparation

### Train-Test Split Strategy

We'll use an **80-20 split** with **stratification** to maintain class proportions:

| Split | Purpose | Size |
|-------|---------|------|
| Training | Model learning | 80% (455 samples) |
| Test | Final evaluation | 20% (114 samples) |

### Feature Scaling

Many algorithms (especially Logistic Regression) perform better with **standardized features**:
- Transforms features to have **mean=0** and **std=1**
- Prevents features with larger scales from dominating
- XGBoost is scale-invariant, but we'll scale for fair model comparison

> **Important**: Fit the scaler on training data only, then transform both sets. This prevents **data leakage** from test set statistics.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

X_train_scaled = pd.DataFrame(X_train_scaled, columns=X.columns, index=X_train.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X.columns, index=X_test.index)

print("=" * 60)
print("DATA SPLIT SUMMARY")
print("=" * 60)
print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set:     {X_test.shape[0]} samples")
print(f"\nTraining class distribution:")
print(f"  • Malignant: {(y_train == 0).sum()} ({(y_train == 0).mean()*100:.1f}%)")
print(f"  • Benign:    {(y_train == 1).sum()} ({(y_train == 1).mean()*100:.1f}%)")
print(f"\nFeature Scaling Applied: StandardScaler (mean=0, std=1)")

## Step 4: Model Comparison with Cross-Validation

### Why Compare Multiple Models?

Different algorithms have different strengths:

| Algorithm | Strengths | Best For |
|-----------|-----------|----------|
| **Logistic Regression** | Fast, interpretable, probabilistic | Linearly separable data, baselines |
| **Random Forest** | Handles non-linearity, robust to outliers | Complex relationships, feature importance |
| **XGBoost** | State-of-the-art accuracy, handles imbalance | Competition-winning performance |

### Cross-Validation Strategy

Instead of a single train-test split, we use **5-fold Stratified Cross-Validation**:

1. Split training data into 5 equal folds
2. Train on 4 folds, validate on 1 fold
3. Repeat 5 times, each fold serving as validation once
4. Average the results for robust performance estimate

This reduces variance in our performance estimates and better utilizes our limited data.

In [None]:
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, max_depth=6, random_state=42),
    'XGBoost': XGBClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, 
                             random_state=42, use_label_encoder=False, eval_metric='logloss')
}

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

print("=" * 70)
print("CROSS-VALIDATION RESULTS (5-Fold Stratified)")
print("=" * 70)

cv_results = {}
for name, model in models.items():
    if name == 'Logistic Regression':
        scores = cross_val_score(model, X_train_scaled, y_train, cv=cv, scoring='accuracy')
    else:
        scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='accuracy')
    cv_results[name] = scores
    print(f"\n{name}:")
    print(f"  Accuracy: {scores.mean():.4f} (+/- {scores.std()*2:.4f})")
    print(f"  Per-fold: {[f'{s:.3f}' for s in scores]}")

fig, ax = plt.subplots(figsize=(10, 5))
positions = range(len(cv_results))
bp = ax.boxplot(cv_results.values(), positions=positions, patch_artist=True, widths=0.6)
colors = ['#45B7D1', '#00D4AA', '#FF6B6B']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.8)
for element in ['whiskers', 'caps', 'medians']:
    plt.setp(bp[element], color='#ffffff', linewidth=1.5)
plt.setp(bp['fliers'], markerfacecolor='#FFEAA7', markeredgecolor='#FFEAA7', markersize=6)
ax.set_xticklabels(cv_results.keys(), fontsize=11, color='#ffffff')
ax.set_ylabel('Accuracy', fontsize=12, color='#ffffff')
ax.set_title('Model Comparison: 5-Fold Cross-Validation', fontsize=14, fontweight='bold', color='#ffffff')
ax.set_ylim(0.9, 1.0)
ax.axhline(y=max([s.mean() for s in cv_results.values()]), color='#FFEAA7', linestyle='--', alpha=0.7, label='Best Mean')
ax.legend(facecolor='#1a1a2e', edgecolor='#4a4a6a', labelcolor='#ffffff')
plt.tight_layout()
plt.show()

print("\n" + "=" * 70)
print("SELECTING BEST MODEL: XGBoost")
print("=" * 70)
best_model = XGBClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, 
                           random_state=42, use_label_encoder=False, eval_metric='logloss')
best_model.fit(X_train, y_train)
print("XGBoost model trained on full training set.")

## Step 5: Model Evaluation on Test Set

### Evaluation Metrics for Binary Classification

| Metric | Formula | Clinical Interpretation |
|--------|---------|------------------------|
| **Accuracy** | (TP+TN) / Total | Overall correctness |
| **Precision** | TP / (TP+FP) | When we predict malignant, how often correct? |
| **Recall (Sensitivity)** | TP / (TP+FN) | Of actual cancers, how many detected? |
| **Specificity** | TN / (TN+FP) | Of actual benign, how many correctly identified? |
| **F1-Score** | 2 × (Precision × Recall) / (Precision + Recall) | Harmonic mean of precision/recall |
| **AUC-ROC** | Area under ROC curve | Model's discriminative ability |

### Clinical Priority

In cancer screening, **high recall (sensitivity) is critical**:
- **False Negative** (missed cancer): Patient doesn't get treatment → potentially fatal
- **False Positive** (false alarm): Additional testing → anxiety but manageable

> We optimize for **high recall** while maintaining acceptable precision.

In [None]:
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]

test_accuracy = accuracy_score(y_test, y_pred)
test_f1 = f1_score(y_test, y_pred)

print("=" * 70)
print("TEST SET EVALUATION RESULTS")
print("=" * 70)
print(f"\nOverall Metrics:")
print(f"  • Accuracy:  {test_accuracy:.4f} ({test_accuracy*100:.1f}%)")
print(f"  • F1-Score:  {test_f1:.4f}")

print(f"\nDetailed Classification Report:")
print("-" * 50)
print(classification_report(y_test, y_pred, target_names=['Malignant', 'Benign'], digits=4))

### Confusion Matrix

The confusion matrix reveals **where** the model makes errors:

|  | Predicted Malignant | Predicted Benign |
|--|---------------------|------------------|
| **Actual Malignant** | True Positive (TP) | False Negative (FN) ⚠️ |
| **Actual Benign** | False Positive (FP) | True Negative (TN) |

**Clinical Impact of Errors:**
- **FN (missed cancer)**: Most dangerous - patient goes untreated
- **FP (false alarm)**: Leads to additional testing, patient anxiety

In [None]:
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
ax = sns.heatmap(cm, annot=True, fmt='d', cmap='YlGnBu',
            xticklabels=['Malignant', 'Benign'],
            yticklabels=['Malignant', 'Benign'],
            annot_kws={'size': 28, 'fontweight': 'bold'},
            linewidths=3, linecolor='#1a1a2e',
            cbar_kws={'label': 'Count'})
for text in ax.texts:
    text.set_color('#000000')
plt.xlabel('Predicted', fontsize=14, fontweight='bold', color='#ffffff')
plt.ylabel('Actual', fontsize=14, fontweight='bold', color='#ffffff')
plt.title(f'Confusion Matrix\nAccuracy: {test_accuracy:.2%}', fontsize=16, fontweight='bold', color='#ffffff')
plt.xticks(color='#ffffff', fontsize=12)
plt.yticks(color='#ffffff', fontsize=12, rotation=0)
plt.tight_layout()
plt.show()

### ROC Curve Analysis

**ROC (Receiver Operating Characteristic)** plots True Positive Rate vs False Positive Rate at various classification thresholds.

| AUC Score | Interpretation |
|-----------|----------------|
| 0.90 - 1.00 | Excellent |
| 0.80 - 0.90 | Good |
| 0.70 - 0.80 | Fair |
| 0.50 - 0.70 | Poor |

**Why ROC over Accuracy?**
- Works well with imbalanced classes
- Threshold-independent evaluation
- Easy visual comparison of models

In [None]:
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='#00D4AA', lw=3, label=f'ROC Curve (AUC = {roc_auc:.3f})')
plt.fill_between(fpr, tpr, alpha=0.3, color='#00D4AA')
plt.plot([0, 1], [0, 1], '--', lw=2, color='#FF6B6B', label='Random Classifier', alpha=0.7)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=14, fontweight='bold', color='#ffffff')
plt.ylabel('True Positive Rate', fontsize=14, fontweight='bold', color='#ffffff')
plt.title('Receiver Operating Characteristic (ROC) Curve', fontsize=16, fontweight='bold', color='#ffffff')
plt.legend(loc='lower right', fontsize=12, facecolor='#1a1a2e', edgecolor='#4a4a6a', labelcolor='#ffffff')
plt.grid(True, alpha=0.2, color='#4a4a6a')
plt.tight_layout()
plt.show()

### Feature Importance Analysis

XGBoost provides **feature importance scores** based on how often each feature is used in tree splits and how much it improves the model.

**Interpreting Feature Importance:**
| Importance Type | Description |
|-----------------|-------------|
| Gain | Average improvement in accuracy when feature is used |
| Weight | Number of times feature appears in trees |
| Cover | Average number of samples affected |

**Clinical Insights:**
- "Worst" features (largest values) often dominate → extreme cell characteristics are highly predictive
- Perimeter and area features capture cell size → larger cells indicate malignancy
- Concavity features capture shape irregularity → irregular shapes suggest cancer

In [None]:
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': best_model.feature_importances_
}).sort_values('importance', ascending=True).tail(15)

plt.figure(figsize=(10, 8))
colors = plt.cm.viridis(np.linspace(0.3, 0.9, len(feature_importance)))
bars = plt.barh(feature_importance['feature'], feature_importance['importance'],
                color=colors, edgecolor='#1a1a2e', linewidth=0.5)
plt.xlabel('Importance Score (Gain)', fontsize=12, color='#ffffff')
plt.title('Top 15 Most Important Features for Cancer Prediction', fontsize=14, fontweight='bold', color='#ffffff')

for bar, val in zip(bars, feature_importance['importance']):
    plt.text(val + 0.002, bar.get_y() + bar.get_height()/2,
             f'{val:.3f}', va='center', fontsize=9, color='#ffffff')

plt.tight_layout()
plt.show()

top_3 = feature_importance.tail(3)['feature'].tolist()[::-1]
print(f"\nTop 3 Predictive Features: {', '.join(top_3)}")

### Precision-Recall Curve

For **imbalanced datasets** or when **false negatives are costly**, PR curves are often more informative than ROC:

| Metric | Question Answered |
|--------|------------------|
| Precision | Of predicted positives, how many are correct? |
| Recall | Of actual positives, how many were detected? |

**Clinical Context:** High recall is critical (catch all cancers), but precision matters too (avoid unnecessary biopsies).

In [None]:
precision, recall, _ = precision_recall_curve(y_test, y_pred_proba)
pr_auc = auc(recall, precision)

plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color='#DDA0DD', lw=3, label=f'PR Curve (AUC = {pr_auc:.3f})')
plt.fill_between(recall, precision, alpha=0.3, color='#DDA0DD')
plt.xlabel('Recall', fontsize=14, fontweight='bold', color='#ffffff')
plt.ylabel('Precision', fontsize=14, fontweight='bold', color='#ffffff')
plt.title('Precision-Recall Curve', fontsize=16, fontweight='bold', color='#ffffff')
plt.legend(loc='lower right', fontsize=12, facecolor='#1a1a2e', edgecolor='#4a4a6a', labelcolor='#ffffff')
plt.grid(True, alpha=0.2, color='#4a4a6a')
plt.tight_layout()
plt.show()

## Step 6: Deploy to Snowflake Model Registry

### What is Snowflake Model Registry?

The Model Registry is Snowflake's **MLOps solution** for managing the ML lifecycle:

| Capability | Description |
|------------|-------------|
| **Version Control** | Track model versions (V1, V2, etc.) with full lineage |
| **Metadata Storage** | Store metrics, parameters, and comments |
| **Access Control** | Leverage Snowflake RBAC for model governance |
| **Deployment** | Run inference via SQL or Python at scale |

### Logging Best Practices

1. **Include metrics** - Enables model comparison across versions
2. **Add comments** - Document model purpose and training details
3. **Sample input** - Helps registry infer schema for inference
4. **Task type** - Enables task-specific optimizations

## Save Artifacts for Part 2

We'll save all necessary variables to `/tmp` so Part 2 can load them for Snowflake deployment.

In [None]:
import pickle

artifacts = {
    'best_model': best_model,
    'X_train': X_train,
    'X_test': X_test,
    'y_train': y_train,
    'y_test': y_test,
    'test_accuracy': test_accuracy,
    'test_f1': test_f1,
    'roc_auc': roc_auc,
    'pr_auc': pr_auc,
    'cv_results': cv_results,
    'feature_names': X.columns.tolist()
}

with open('/tmp/breast_cancer_artifacts.pkl', 'wb') as f:
    pickle.dump(artifacts, f)

print("=" * 60)
print("✅ ARTIFACTS SAVED TO /tmp/breast_cancer_artifacts.pkl")
print("=" * 60)
print("\nSaved items:")
print(f"  • Trained XGBoost model")
print(f"  • Training data: {X_train.shape[0]} samples")
print(f"  • Test data: {X_test.shape[0]} samples")
print(f"  • Metrics: Accuracy={test_accuracy:.3f}, F1={test_f1:.3f}, ROC AUC={roc_auc:.3f}")
print(f"\n➡️  Continue to Part 2 to deploy to Snowflake Model Registry")