# üöÄ Hybrid Model: Swin Transformer + MobileViT + Gradient Boosting

## Architecture Overview
**Feature Extraction**: Swin Transformer (Base/Small) + MobileViT ‚Üí **Fusion** ‚Üí **Classifier**: XGBoost/LightGBM/CatBoost

### Key Features:
- ‚úÖ Dual transformer backbones (Swin + MobileViT)
- ‚úÖ Feature fusion layer
- ‚úÖ Multiple gradient boosting classifiers
- ‚úÖ Optuna optimization with TPE/BOHB samplers
- ‚úÖ GPU acceleration support

---

## üìã Step 1: Setup and Installation

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install optuna xgboost lightgbm catboost
!pip install scikit-learn matplotlib seaborn tqdm
!pip install Pillow numpy opencv-python

In [None]:
# Verify installations
import torch
import torchvision
import xgboost as xgb
import lightgbm as lgb
import catboost as cb
import optuna

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
print(f"XGBoost version: {xgb.__version__}")
print(f"LightGBM version: {lgb.__version__}")
print(f"CatBoost version: {cb.__version__}")
print(f"Optuna version: {optuna.__version__}")

## üì¶ Step 2: Upload Project Files

**Upload the following files to Colab:**
1. `model.py` - MobileViT architecture
2. `dataset.py` - Kvasir dataset loader
3. `hybrid_model.py` - Hybrid Swin + MobileViT model
4. `gradient_boosting_classifier.py` - GB classifier wrapper
5. `train_hybrid_pipeline.py` - Training pipeline
6. `mobilevit_kvasir_v2_best_optuna.pth` - Pre-trained MobileViT weights

**Option 1: Upload from local machine**

In [None]:
from google.colab import files

print("Upload the following files:")
print("1. model.py")
print("2. dataset.py")
print("3. hybrid_model.py")
print("4. gradient_boosting_classifier.py")
print("5. train_hybrid_pipeline.py")
print("6. mobilevit_kvasir_v2_best_optuna.pth")
print("\nClick 'Choose Files' below...")

uploaded = files.upload()

**Option 2: Upload from Google Drive**

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copy files from Drive (adjust path as needed)
# !cp /content/drive/MyDrive/your_folder/*.py .
# !cp /content/drive/MyDrive/your_folder/*.pth .

## üìä Step 3: Download Kvasir Dataset

In [None]:
# Download and extract Kvasir-V2 dataset
!wget https://datasets.simula.no/downloads/kvasir/kvasir-dataset-v2.zip
!unzip -q kvasir-dataset-v2.zip
!ls -la kvasir-dataset-v2/

## üîç Step 4: Verify Files

In [None]:
import os

required_files = [
    'model.py',
    'dataset.py',
    'hybrid_model.py',
    'gradient_boosting_classifier.py',
    'train_hybrid_pipeline.py',
    'mobilevit_kvasir_v2_best_optuna.pth'
]

print("Checking required files:")
for file in required_files:
    exists = os.path.exists(file)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"{status} {file}")

print("\nDataset check:")
dataset_exists = os.path.exists('kvasir-dataset-v2/kvasir-dataset-v2')
print(f"{'‚úÖ' if dataset_exists else '‚ùå'} Kvasir dataset")

## üß™ Step 5: Quick Test - Hybrid Model

In [None]:
# Test hybrid model initialization
from hybrid_model import SwinMobileViTHybrid
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Test with Swin-Small
print("\n=== Testing Swin-Small + MobileViT ===")
model = SwinMobileViTHybrid(
    num_classes=8,
    swin_variant='small',
    mobilevit_weights_path='mobilevit_kvasir_v2_best_optuna.pth',
    freeze_backbones=True
).to(device)

# Test forward pass
dummy_input = torch.randn(2, 3, 224, 224).to(device)
features = model.extract_features(dummy_input)

print(f"Input shape: {dummy_input.shape}")
print(f"Extracted features shape: {features.shape}")
print(f"Feature dimension: {model.get_feature_dim()}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Frozen parameters: {total_params - trainable_params:,}")

print("\n‚úÖ Hybrid model test passed!")

## üöÄ Step 6: Run Full Training Pipeline

### Configuration Options:
- **Swin Variant**: `'small'` (lighter, faster) or `'base'` (more powerful)
- **Optuna Sampler**: `'tpe'` (Tree-structured Parzen Estimator) or `'bohb'` (Bayesian Optimization HyperBand)
- **Optuna Trials**: Number of hyperparameter optimization trials (default: 50)

### Training Steps:
1. Load pre-trained Swin Transformer and MobileViT
2. Extract features from training/validation/test sets
3. Optimize XGBoost, LightGBM, and CatBoost with Optuna
4. Train best models and evaluate
5. Compare results and save best model

In [None]:
# Run the complete training pipeline
!python train_hybrid_pipeline.py

## üìä Step 7: View Results

In [None]:
# Load and display training summary
import json

with open('hybrid_results/training_summary.json', 'r') as f:
    summary = json.load(f)

print("="*70)
print("TRAINING SUMMARY")
print("="*70)

print("\nConfiguration:")
for key, value in summary['config'].items():
    if key not in ['classifier_types']:
        print(f"  {key}: {value}")

print("\nResults:")
for classifier, result in summary['results'].items():
    print(f"\n{classifier.upper()}:")
    print(f"  Validation Accuracy: {result['best_val_accuracy']:.4f}")
    print(f"  Test Accuracy: {result['metrics']['accuracy']:.4f}")
    print(f"  Test AUROC: {result['metrics']['auroc']:.4f}")
    print(f"  Test F1-Score: {result['metrics']['f1_score']:.4f}")

print(f"\n{'='*70}")
print(f"BEST CLASSIFIER: {summary['best_classifier'].upper()}")
print(f"{'='*70}")

## üìà Step 8: Visualize Results

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Extract metrics for visualization
classifiers = list(summary['results'].keys())
accuracies = [summary['results'][c]['metrics']['accuracy'] for c in classifiers]
aurocs = [summary['results'][c]['metrics']['auroc'] for c in classifiers]
f1_scores = [summary['results'][c]['metrics']['f1_score'] for c in classifiers]

# Create comparison plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Accuracy
axes[0].bar(classifiers, accuracies, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[0].set_title('Test Accuracy', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Accuracy')
axes[0].set_ylim([0, 1])
for i, v in enumerate(accuracies):
    axes[0].text(i, v + 0.02, f'{v:.4f}', ha='center', fontweight='bold')

# AUROC
axes[1].bar(classifiers, aurocs, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[1].set_title('Test AUROC', fontsize=14, fontweight='bold')
axes[1].set_ylabel('AUROC')
axes[1].set_ylim([0, 1])
for i, v in enumerate(aurocs):
    axes[1].text(i, v + 0.02, f'{v:.4f}', ha='center', fontweight='bold')

# F1-Score
axes[2].bar(classifiers, f1_scores, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[2].set_title('Test F1-Score', fontsize=14, fontweight='bold')
axes[2].set_ylabel('F1-Score')
axes[2].set_ylim([0, 1])
for i, v in enumerate(f1_scores):
    axes[2].text(i, v + 0.02, f'{v:.4f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.savefig('hybrid_results/comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Visualization saved to hybrid_results/comparison.png")

## üíæ Step 9: Download Results

In [None]:
# Create a zip file with all results
!zip -r hybrid_results.zip hybrid_results/

# Download the zip file
from google.colab import files
files.download('hybrid_results.zip')

print("‚úÖ Results downloaded!")

## üî¨ Step 10: Test Inference (Optional)

In [None]:
# Load best model and test on a sample image
import numpy as np
from PIL import Image
from torchvision import transforms
import json

# Load best classifier
with open('hybrid_results/training_summary.json', 'r') as f:
    summary = json.load(f)

best_classifier_type = summary['best_classifier']
print(f"Loading best classifier: {best_classifier_type.upper()}")

# Load hybrid model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
hybrid_model = SwinMobileViTHybrid(
    num_classes=8,
    swin_variant=summary['config']['swin_variant'],
    mobilevit_weights_path=summary['config']['mobilevit_weights'],
    freeze_backbones=True
).to(device)
hybrid_model.eval()

# Load gradient boosting classifier
from gradient_boosting_classifier import GradientBoostingClassifier
gb_classifier = GradientBoostingClassifier(classifier_type=best_classifier_type, num_classes=8)
gb_classifier.load(f'hybrid_results/{best_classifier_type}_model')

# Define class names
class_names = ['dyed-lifted-polyps', 'dyed-resection-margins', 'esophagitis', 
               'normal-cecum', 'normal-pylorus', 'normal-z-line', 
               'polyps', 'ulcerative-colitis']

# Inference function
def predict_image(image_path):
    # Load and preprocess image
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    image = Image.open(image_path).convert('RGB')
    image_tensor = transform(image).unsqueeze(0).to(device)
    
    # Extract features
    with torch.no_grad():
        features = hybrid_model.extract_features(image_tensor)
        features_np = features.cpu().numpy()
    
    # Predict
    prediction = gb_classifier.model.predict(features_np)[0]
    probabilities = gb_classifier.model.predict_proba(features_np)[0]
    
    # Display results
    plt.figure(figsize=(12, 4))
    
    # Show image
    plt.subplot(1, 2, 1)
    plt.imshow(image)
    plt.title(f'Predicted: {class_names[prediction]}', fontsize=12, fontweight='bold')
    plt.axis('off')
    
    # Show probabilities
    plt.subplot(1, 2, 2)
    plt.barh(class_names, probabilities)
    plt.xlabel('Probability')
    plt.title('Class Probabilities', fontsize=12, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
    return class_names[prediction], probabilities

# Test on a sample image from the dataset
import glob
sample_images = glob.glob('kvasir-dataset-v2/kvasir-dataset-v2/polyps/*.jpg')[:1]
if sample_images:
    print(f"Testing on: {sample_images[0]}")
    predicted_class, probs = predict_image(sample_images[0])
    print(f"\nPredicted class: {predicted_class}")
    print(f"Confidence: {probs.max():.4f}")

## üéØ Summary

### What We Built:
1. ‚úÖ **Hybrid Feature Extractor**: Swin Transformer + MobileViT
2. ‚úÖ **Feature Fusion**: Concatenation + Dense layers
3. ‚úÖ **Multiple Classifiers**: XGBoost, LightGBM, CatBoost
4. ‚úÖ **Hyperparameter Optimization**: Optuna with TPE/BOHB
5. ‚úÖ **Complete Pipeline**: Training, evaluation, and inference

### Expected Performance:
- **Accuracy**: 85-95% (depending on configuration)
- **AUROC**: 95-99%
- **Training Time**: ~2-4 hours on Colab GPU

### Next Steps:
- Fine-tune the fusion layer
- Try different Swin variants (Base vs Small)
- Experiment with different feature fusion strategies
- Deploy the best model for production use

---
**Created for Google Colab with GPU support** üöÄ