# Wetland Mapping - GPU SVM Training on Google Colab

This notebook trains a **Support Vector Machine (SVM)** with an **RBF Kernel** using NVIDIA GPUs via the **RAPIDS cuML** library. 

**Why this notebook?**
- **Speed**: Trains on 1.5M samples in minutes (vs days on CPU).
- **Accuracy**: Uses the true non-linear RBF kernel (no approximations).

## 1. Environment Setup
We need to install RAPIDS cuML. This takes about 4-5 minutes.

In [None]:
# Install RAPIDS cuML (This script checks compatibility and installs the right version)
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py

## 2. Mount Google Drive
Make sure you have uploaded `wetland_dataset_1.5M_4Training.npz` to your Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Check where the file is (Update this path if you put it in a folder!)
# Example: '/content/drive/MyDrive/Wetlands/wetland_dataset_1.5M_4Training.npz'
import os
DATA_PATH = '/content/drive/MyDrive/wetland_dataset_1.5M_4Training.npz'

if not os.path.exists(DATA_PATH):
    print(f"WARNING: File not found at {DATA_PATH}. Please verify the path.")
else:
    print(f"File found at {DATA_PATH}")

## 3. Load Data & Preprocessing

In [None]:
import numpy as np
import cudf
from cuml.model_selection import train_test_split
from cuml.preprocessing import StandardScaler

print("Loading data...")
data = np.load(DATA_PATH)
X_np = data['X'] 
y_np = data['y']
# Note: We will use 'balanced' mode in SVM instead of passing weights manually to avoid key type issues
data.close()

print(f"Data Shape: {X_np.shape}")

# Split Data (using simple numpy split -> then move to GPU for safety)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_np, y_np, test_size=0.2, random_state=42)

# Standard Scaling (Critical for SVM)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# CRITICAL FIX for cuML: Ensure types are strictly float32 (for X) and int32 (for y)
# Also ensure y is flattened
X_train_scaled = X_train_scaled.astype('float32')
X_test_scaled = X_test_scaled.astype('float32')
y_train = y_train.astype('int32').ravel()
y_test = y_test.astype('int32').ravel()

print(f"Unique classes in y_train: {np.unique(y_train)}")
print("Data loaded, scaled, and cast to float32/int32.")

## 4. Train SVM (RBF Kernel) on GPU
This is where the magic happens. `cuml.svm.SVC` uses the GPU.

In [None]:
from cuml.svm import SVC
import time

print("Initializing GPU SVM...")
model = SVC(
    kernel='rbf',
    C=1.0,           # Regularization
    gamma='scale',   # Kernel coefficient
    class_weight='balanced', # Automatically balance weights based on y_train frequency
    cache_size=2000  # GPU Memory cache
)

print("Starting training...")
start = time.time()
model.fit(X_train_scaled, y_train) 
end = time.time()

print(f"Training completed in {end - start:.2f} seconds!")

## 5. Evaluate

In [None]:
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

print("Predicting...")
# Note: Predict returns a numpy array (or cudf series depending on input)
y_pred = model.predict(X_test_scaled)

# Calculate metrics (using CPU sklearn for reporting is easier as result size is small)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

## 6. Save Model
We save the trained model back to Google Drive so you can download it.

In [None]:
import joblib
from datetime import datetime

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
model_filename = f'/content/drive/MyDrive/svm_gpu_wetland_model_{timestamp}.pkl'

print(f"Saving model to {model_filename}...")
joblib.dump(model, model_filename)
print("Done!")