# Hyperspectral Material Classification - 1D CNN Approach

This notebook trains and evaluates a 1D CNN model for hyperspectral material classification.

**Features:**
- 1D CNN architecture (processes spectral signatures pixel-by-pixel)
- Band-wise percentile normalization (2-98%)
- Gaussian denoising after normalization
- Spectral augmentation for better generalization
- Industry-standard post-processing

**Before running:**
1. Runtime → Change runtime type → GPU (A100/V100 for Pro+, T4 for free tier)
2. Upload your data to Google Drive in folder: `dl-plastics-data`
3. Run cells in order

**Note:** All models are saved to Google Drive automatically!

## 1. Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone repository
!git clone https://github.com/PlugNawapong/my-ml-project.git
%cd my-ml-project
!pwd

In [None]:
# Install dependencies
!pip install -q torch torchvision tqdm Pillow numpy matplotlib scikit-learn scipy scikit-image

## 2. Mount Google Drive and Load Data

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create output directories
!mkdir -p /content/drive/MyDrive/dl-plastics-models
!mkdir -p /content/drive/MyDrive/dl-plastics-predictions

print('✓ Google Drive mounted')
print('✓ Output directories created')

In [None]:
# Copy data from Google Drive to Colab workspace
import os

# Path to your data in Google Drive
drive_data_path = '/content/drive/MyDrive/dl-plastics-data'

# Verify data exists
if not os.path.exists(drive_data_path):
    print(f'⚠ ERROR: {drive_data_path} not found!')
    print('Please upload your data to Google Drive first.')
else:
    print(f'✓ Data folder found: {drive_data_path}')
    !ls -la {drive_data_path}

# Copy training data
if os.path.exists(f'{drive_data_path}/data'):
    !cp -r {drive_data_path}/data ./
    print('✓ Training data copied')
else:
    print('⚠ Training data not found')

# Copy inference datasets
if os.path.exists(f'{drive_data_path}/inference_data_set1'):
    !cp -r {drive_data_path}/inference_data_set1 ./
    print('✓ Inference dataset 1 copied')

if os.path.exists(f'{drive_data_path}/inference_data_set2'):
    !cp -r {drive_data_path}/inference_data_set2 ./
    print('✓ Inference dataset 2 copied')

print('\n=== Files in workspace ===')
!ls -la

## 3. Inspect Data (Optional)

In [None]:
# Inspect training data
!python inspect_data.py --data_dir data

# Display generated plots
from IPython.display import Image, display

print('\n=== Band Visualization ===')
display(Image('data_inspection_bands.png'))

print('\n=== Label Visualization ===')
display(Image('data_inspection_labels.png'))

print('\n=== Spectral Signatures ===')
display(Image('data_inspection_spectra_normalized.png'))

## 4. Train 1D CNN Model

**Training configuration:**
- Model: 1D CNN (spectral_cnn_1d)
- Normalization: Band-wise percentile (2-98%) + Gaussian denoising
- Augmentation: Medium spectral augmentation
- Epochs: 100
- Batch size: 4096
- Dropout: 0.6 (high regularization for generalization)

**Training time:** ~10-15 minutes on A100, ~30-40 minutes on T4

In [None]:
# Train 1D CNN model
!python train.py \
    --model spectral_cnn_1d \
    --epochs 100 \
    --batch_size 4096 \
    --max_samples_per_class 20000 \
    --dropout 0.6 \
    --lr 0.001 \
    --spectral_augment medium \
    --norm_method percentile \
    --output_dir /content/drive/MyDrive/dl-plastics-models \
    --num_workers 0

print('\n✓ Training complete! Model saved to Google Drive.')

## 5. Find Trained Model

In [None]:
# Find the latest trained model
import glob
import os
import re

# Search for 1D CNN models only
model_files = glob.glob('/content/drive/MyDrive/dl-plastics-models/spectral_cnn_1d_*/best_model.pth')

print(f'Found {len(model_files)} model(s):')
for m in model_files:
    print(f'  - {m}')

if model_files:
    # Sort by timestamp
    def extract_timestamp(path):
        match = re.search(r'(\d{8}_\d{6})', path)
        if match:
            return match.group(1)
        return '00000000_000000'

    latest_model = sorted(model_files, key=extract_timestamp)[-1]
    print(f'\n✓ Using latest model: {latest_model}')
    print(f'Timestamp: {extract_timestamp(latest_model)}')
    
    model_type = 'spectral_cnn_1d'
else:
    print('\n⚠ No trained model found!')
    print('Please run the training cell first.')

## 6. Run Inference on Dataset 1

In [None]:
# Run inference on dataset 1 (no post-processing needed - usually clean)
if model_files:
    print(f'Running inference on dataset 1...')
    !python inference.py \
        --checkpoint {latest_model} \
        --model {model_type} \
        --data_dir /content/my-ml-project/inference_data_set1 \
        --norm_method percentile \
        --output_dir /content/drive/MyDrive/dl-plastics-predictions

    print('\n✓ Inference complete for dataset 1!')
else:
    print('⚠ No model available for inference')

## 7. Run Inference on Dataset 2 (with Post-Processing)

**Post-processing configuration:**
- Remove small regions < 50 pixels (noise reduction)
- Median filter for isolated pixel noise
- Morphological operations per class
- Edge detection for material boundaries

In [None]:
# Run inference on dataset 2 with industry-standard post-processing
if model_files:
    print(f'Running inference on dataset 2...')
    !python inference.py \
        --checkpoint {latest_model} \
        --model {model_type} \
        --data_dir /content/my-ml-project/inference_data_set2 \
        --norm_method percentile \
        --post_process \
        --min_region_size 50 \
        --smooth_sigma 0.8 \
        --show_edges \
        --output_dir /content/drive/MyDrive/dl-plastics-predictions
    
    print('\n✓ Inference complete for dataset 2!')
    print('\nGenerated visualizations:')
    print('  - prediction_visualization.png (raw)')
    print('  - prediction_filtered_visualization.png (cleaned)')
    print('  - prediction_enhanced_labeled.png (with edges & legend)')
else:
    print('⚠ No model available for inference')

## 8. Visualize Results

In [None]:
# Display prediction visualizations
from IPython.display import Image, display
import json

# Dataset 1
print('\n' + '='*80)
print('INFERENCE DATA SET 1 RESULTS')
print('='*80)

pred_path_1 = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set1/prediction_visualization.png'
stats_path_1 = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set1/statistics.json'

if os.path.exists(pred_path_1):
    display(Image(pred_path_1))
    
    with open(stats_path_1, 'r') as f:
        stats1 = json.load(f)
    
    print(f"\nMean Confidence: {stats1['mean_confidence']:.4f}")
    print("\nClass Distribution:")
    for class_name, class_stats in stats1['class_distribution'].items():
        if class_stats['percentage'] > 0.01:
            print(f"  {class_name:<15}: {class_stats['percentage']:>6.2f}% (conf: {class_stats['mean_confidence']:.4f})")
else:
    print('⚠ Results not found. Run inference first.')

# Dataset 2
print('\n' + '='*80)
print('INFERENCE DATA SET 2 RESULTS')
print('='*80)

pred_path_2_raw = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set2/prediction_visualization.png'
pred_path_2_filtered = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set2/prediction_filtered_visualization.png'
pred_path_2_enhanced = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set2/prediction_enhanced_labeled.png'
stats_path_2 = '/content/drive/MyDrive/dl-plastics-predictions/inference_data_set2/statistics.json'

if os.path.exists(pred_path_2_raw):
    print('\nRaw Predictions:')
    display(Image(pred_path_2_raw))
    
    if os.path.exists(pred_path_2_filtered):
        print('\nPost-Processed (Cleaned):')
        display(Image(pred_path_2_filtered))
    
    if os.path.exists(pred_path_2_enhanced):
        print('\nEnhanced (with Edges & Legend):')
        display(Image(pred_path_2_enhanced))
    
    with open(stats_path_2, 'r') as f:
        stats2 = json.load(f)
    
    print(f"\nMean Confidence: {stats2['mean_confidence']:.4f}")
    print("\nClass Distribution:")
    for class_name, class_stats in stats2['class_distribution'].items():
        if class_stats['percentage'] > 0.01:
            print(f"  {class_name:<15}: {class_stats['percentage']:>6.2f}% (conf: {class_stats['mean_confidence']:.4f})")
else:
    print('⚠ Results not found. Run inference first.')

## 9. Download Results (Optional)

Results are already saved to Google Drive, but you can download them to your computer if needed.

In [None]:
# Copy results from Google Drive to local workspace
!cp -r /content/drive/MyDrive/dl-plastics-models ./models-backup
!cp -r /content/drive/MyDrive/dl-plastics-predictions ./predictions-backup

# Zip all results
!zip -r results.zip models-backup/ predictions-backup/

# Download to your computer
from google.colab import files
files.download('results.zip')

print('\n✓ Results downloaded!')
print('  - models-backup/ : Trained models and training history')
print('  - predictions-backup/ : Inference results and visualizations')

## 10. Summary

**Your models and results are saved in Google Drive:**
- Models: `/MyDrive/dl-plastics-models/`
- Predictions: `/MyDrive/dl-plastics-predictions/`

**Key Features Applied:**
1. ✅ Band-wise percentile normalization (2-98%)
2. ✅ Gaussian denoising (sigma=0.5) after normalization
3. ✅ Spectral augmentation (medium)
4. ✅ Industry-standard post-processing:
   - Connected component analysis
   - Median filtering
   - Morphological operations

**Next Steps:**
- Compare raw vs. post-processed results
- Adjust `min_region_size` if needed (smaller = keep more detail, larger = more aggressive noise removal)
- Train longer (increase `--epochs`) if results are not satisfactory

You can access results anytime from Google Drive!