# 🚀 Hyperspectral Plastic Classification Pipeline
## Google Colab Pro+ Edition

This notebook runs the complete pipeline for hyperspectral plastic classification.

**Features:**
- 🎯 6 Model Architectures (CNN, ResNet, Deep, Inception, LSTM, Transformer)
- ⚡ GPU Acceleration (CUDA)
- 📊 11 Classes: Background + 10 plastic types
- 💾 Automatic Results Download

**Requirements:**
- Google Colab Pro+ (for best performance)
- Code from GitHub
- Data uploaded to Google Drive
- ~20-30 GB free space on Drive

## 📦 Step 1: Clone Code from GitHub

In [None]:
# Clone the repository from GitHub
!git clone https://github.com/PlugNawapong/plastic-type-classification.git
%cd plastic-type-classification

import os
print("✓ Code cloned from GitHub")
print("✓ Working directory:", os.getcwd())

## 📁 Step 2: Mount Google Drive & Setup Data

**Important:** Before running this cell, upload your data to Google Drive:
- `training_dataset/` (PNG files + header.json)
- `Ground_Truth/` (labels.png + labels.json)
- `Inference_dataset1/` (PNG files + header.json)

In [None]:
from google.colab import drive
import os
from pathlib import Path

# Mount Google Drive
drive.mount('/content/drive')

# Define data path (CHANGE THIS if your data is in a different location)
DATA_PATH = '/content/drive/MyDrive/hyperspectral_data'

print(f"Data path: {DATA_PATH}")
print(f"Data path exists: {Path(DATA_PATH).exists()}")

# Remove old links if they exist
!rm -f training_dataset Ground_Truth Inference_dataset1

# Create symbolic links to data folders
!ln -s "{DATA_PATH}/training_dataset" training_dataset
!ln -s "{DATA_PATH}/Ground_Truth" Ground_Truth
!ln -s "{DATA_PATH}/Inference_dataset1" Inference_dataset1

print("\n✓ Google Drive mounted")

# Verify links were created
if Path('training_dataset').exists():
    print("✓ training_dataset linked")
else:
    print("✗ training_dataset link failed - check DATA_PATH")
    
if Path('Ground_Truth').exists():
    print("✓ Ground_Truth linked")
else:
    print("✗ Ground_Truth link failed - check DATA_PATH")
    
if Path('Inference_dataset1').exists():
    print("✓ Inference_dataset1 linked")
else:
    print("✗ Inference_dataset1 link failed - check DATA_PATH")

## 🎮 Step 3: Check GPU Availability

In [None]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
    print("GPU Memory:", torch.cuda.get_device_properties(0).total_memory / 1e9, "GB")
    print("\n✅ GPU is ready!")
else:
    print("\n⚠️ No GPU detected!")
    print("Go to: Runtime > Change runtime type > Hardware accelerator > GPU")

## 📦 Step 4: Install Dependencies

In [None]:
# Install required packages
!pip install -q scipy tqdm matplotlib pillow
print("✓ Dependencies installed")

## 📂 Step 5: Verify Data Structure

In [None]:
from pathlib import Path
import re

# Check required folders and files
required = [
    'training_dataset/header.json',
    'Ground_Truth/labels.png',
    'Ground_Truth/labels.json',
    'Inference_dataset1/header.json',
    'run_pipeline_config.py'
]

print("Checking data structure...")
all_good = True
for item in required:
    exists = Path(item).exists()
    status = "✓" if exists else "✗"
    print(f"{status} {item}")
    if not exists:
        all_good = False

# Count band files with strict pattern matching
# Only count files matching ImagesStack###.png (3 digits)
pattern = re.compile(r'^ImagesStack\d{3}\.png$')

train_files = [f for f in Path('training_dataset').glob('*.png') if pattern.match(f.name)]
infer_files = [f for f in Path('Inference_dataset1').glob('*.png') if pattern.match(f.name)]

train_bands = len(train_files)
infer_bands = len(infer_files)

print(f"\n✓ Training bands: {train_bands}")
print(f"✓ Inference bands: {infer_bands}")

if all_good and train_bands > 0 and infer_bands > 0:
    print("\n✅ All data files present!")
    print("Ready to run the pipeline.")
else:
    print("\n❌ Some files are missing.")
    print("Please check your DATA_PATH in Step 2.")

## ⚙️ Step 6: Configure Pipeline Parameters

Edit these parameters to customize your training:

In [None]:
# ==================== CONFIGURATION ====================

# Mode
MODE = "full"  # Options: "full", "normalize", "train", "inference"

# Normalization (skip if already normalized)
SKIP_NORMALIZE = False  # Set to True if data already normalized
LOWER_PERCENTILE = 2
UPPER_PERCENTILE = 98

# Preprocessing
SPECTRAL_BINNING = 2      # 2, 5, 10, or None
SPATIAL_BINNING = None    # 2, 4, 8, or None
WAVELENGTH_RANGE = None   # e.g., (450, 700) or None
DENOISE = False
DENOISE_METHOD = "gaussian"  # "gaussian" or "median"
DENOISE_STRENGTH = 1.0

# Model
MODEL_TYPE = "resnet"  # Options: "cnn", "resnet", "deep", "inception", "lstm", "transformer"
DROPOUT = 0.3

# Training
EPOCHS = 50
LEARNING_RATE = 0.001
BATCH_SIZE = 512  # Colab can handle larger batches
VAL_RATIO = 0.2

# ======================================================

print("Configuration:")
print(f"  Mode: {MODE}")
print(f"  Model: {MODEL_TYPE}")
print(f"  Epochs: {EPOCHS}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Spectral Binning: {SPECTRAL_BINNING}")
print(f"  Spatial Binning: {SPATIAL_BINNING}")

## 🚀 Step 7: Run Pipeline

In [None]:
# Build command
cmd = f"python run_pipeline_config.py --mode {MODE}"

if SKIP_NORMALIZE:
    cmd += " --skip-normalize"

cmd += f" --lower-percentile {LOWER_PERCENTILE}"
cmd += f" --upper-percentile {UPPER_PERCENTILE}"

if SPECTRAL_BINNING:
    cmd += f" --spectral-binning {SPECTRAL_BINNING}"

if SPATIAL_BINNING:
    cmd += f" --spatial-binning {SPATIAL_BINNING}"

if WAVELENGTH_RANGE:
    cmd += f" --wavelength-range {WAVELENGTH_RANGE[0]} {WAVELENGTH_RANGE[1]}"

if DENOISE:
    cmd += f" --denoise --denoise-method {DENOISE_METHOD} --denoise-strength {DENOISE_STRENGTH}"

cmd += f" --model-type {MODEL_TYPE}"
cmd += f" --dropout {DROPOUT}"
cmd += f" --epochs {EPOCHS}"
cmd += f" --lr {LEARNING_RATE}"
cmd += f" --batch-size {BATCH_SIZE}"
cmd += f" --val-ratio {VAL_RATIO}"

print("Running command:")
print(cmd)
print("\n" + "="*60)

# Run pipeline
!{cmd}

## 📊 Step 8: View Training Results

In [None]:
import json
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path

# Load training history
history_path = Path('output/training/training_history.json')
if history_path.exists():
    with open(history_path, 'r') as f:
        history = json.load(f)
    
    # Plot training curves
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    epochs = range(1, len(history['train_losses']) + 1)
    
    # Loss
    ax1.plot(epochs, history['train_losses'], 'b-', label='Train Loss')
    ax1.plot(epochs, history['val_losses'], 'r-', label='Val Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()
    ax1.grid(True)
    
    # Accuracy
    ax2.plot(epochs, history['train_accs'], 'b-', label='Train Acc')
    ax2.plot(epochs, history['val_accs'], 'r-', label='Val Acc')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy (%)')
    ax2.set_title('Training and Validation Accuracy')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nBest Validation Accuracy: {history['best_val_acc']:.2f}% (Epoch {history['best_epoch']})")
else:
    print("Training history not found. Run training first.")

## 🔬 Step 9: View Inference Results

In [None]:
# Show inference predictions
pred_path = Path('output/inference/predictions.png')
if pred_path.exists():
    print("Inference Predictions:")
    print("="*60)
    
    img = Image.open(pred_path)
    plt.figure(figsize=(12, 8))
    plt.imshow(img)
    plt.axis('off')
    plt.title('Predicted Plastic Types (11 Classes)')
    plt.tight_layout()
    plt.show()
else:
    print("Predictions not found. Run inference first.")

## 📈 Step 10: View Inference Statistics

In [None]:
# Load inference statistics
stats_path = Path('output/inference/inference_statistics.json')
if stats_path.exists():
    with open(stats_path, 'r') as f:
        stats = json.load(f)
    
    print("Inference Statistics:")
    print("="*60)
    print(f"Total pixels: {stats['total_pixels']:,}\n")
    print("Class Distribution (11 classes total):")
    print("-"*60)
    print(f"{'Class':<15} {'Pixels':>12} {'Percentage':>12} {'Confidence':>12}")
    print("-"*60)
    
    for class_id, info in sorted(stats['class_distribution'].items()):
        print(f"{info['class_name']:<15} {info['pixel_count']:>12,} {info['percentage']:>11.2f}% {info['mean_confidence']:>11.3f}")
else:
    print("Statistics not found. Run inference first.")

## 💾 Step 11: Download Results (Optional)

In [None]:
# Create ZIP of results
!zip -r results.zip output/

# Download to local machine
from google.colab import files
files.download('results.zip')

print("✓ Results downloaded as results.zip")

## 🧹 Step 12: Cleanup (Optional)

Remove large temporary files to free up space:

## 🔬 Quick Experiments: Try Different Models

After normalizing once, quickly try different models:

In [None]:
# Remove normalized data (can be regenerated)
!rm -rf training_dataset_normalized/
!rm -rf Inference_dataset1_normalized/

print("✓ Normalized data removed (can be regenerated)")
print("  Model and results are preserved in output/")

In [None]:
# Try different models quickly (data already normalized)
models = ["cnn", "resnet", "deep", "inception", "lstm", "transformer"]

for model in models:
    print(f"\n{'='*60}")
    print(f"Training with {model.upper()} model")
    print(f"{'='*60}")
    
    cmd = f"python run_pipeline_config.py --mode train --skip-normalize "
    cmd += f"--model-type {model} --epochs 20 --batch-size 512"
    
    !{cmd}
    
    print(f"\n✓ {model.upper()} training complete\n")