# Classification with Random Forest and SVM

This notebook demonstrates how to perform land cover classification using Random Forest (RF) and Support Vector Machine (SVM) classifiers with `scikit-learn` in Python. We use a multi-band raster and labeled training data (e.g., a shapefile) for supervised classification in a remote sensing context.

## Prerequisites
- Install required libraries: `rasterio`, `geopandas`, `scikit-learn`, `numpy`, `matplotlib` (listed in `requirements.txt`).
- A multi-band GeoTIFF file (e.g., `sample.tif`) and a shapefile with labeled data (e.g., `labels.shp`). Replace file paths with your own data.

## Learning Objectives
- Extract training data from a raster using labeled vector data.
- Train RF and SVM classifiers for land cover classification.
- Predict and visualize classification results.

In [None]:
# Import required libraries
import rasterio
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

## Step 1: Load Raster and Labeled Data

Load the multi-band raster and shapefile with labeled data (e.g., land cover classes).

In [None]:
# Define file paths
raster_path = 'sample.tif'
shapefile_path = 'labels.shp'

# Load the shapefile
gdf = gpd.read_file(shapefile_path)

# Load the raster
with rasterio.open(raster_path) as src:
    raster_data = src.read()  # Shape: (bands, height, width)
    raster_crs = src.crs
    raster_transform = src.transform

# Reproject shapefile to match raster CRS if needed
if gdf.crs != raster_crs:
    gdf = gdf.to_crs(raster_crs)

# Print basic information
print(f'Raster shape: {raster_data.shape}')
print(f'Number of classes: {gdf['class'].nunique()}')  # Assumes 'class' column in shapefile

## Step 2: Extract Training Data

Extract pixel values from the raster at the locations of labeled geometries.

In [None]:
from rasterio.features import geometry_mask

# Initialize lists for features and labels
X_train = []
y_train = []

# Extract pixel values for each geometry
for idx, row in gdf.iterrows():
    geom = row.geometry
    label = row['class']  # Assumes 'class' column
    mask = geometry_mask([geom], transform=raster_transform, out_shape=(raster_data.shape[1], raster_data.shape[2]), invert=True)
    for band in range(raster_data.shape[0]):
        band_data = raster_data[band][mask]
        if band == 0:
            features = band_data[:, np.newaxis]
        else:
            features = np.hstack((features, band_data[:, np.newaxis]))
    X_train.extend(features)
    y_train.extend([label] * len(band_data))

# Convert to arrays
X_train = np.array(X_train)
y_train = np.array(y_train)

# Print training data info
print(f'Training features shape: {X_train.shape}')
print(f'Training labels shape: {y_train.shape}')

## Step 3: Train Random Forest Classifier

Train a Random Forest classifier and predict across the entire raster.

In [None]:
# Initialize and train Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Prepare raster data for prediction
height, width = raster_data.shape[1], raster_data.shape[2]
X_raster = raster_data.transpose(1, 2, 0).reshape(-1, raster_data.shape[0])

# Predict
rf_predictions = rf.predict(X_raster)
rf_predictions = rf_predictions.reshape(height, width)

# Visualize RF predictions
plt.figure(figsize=(8, 8))
plt.imshow(rf_predictions, cmap='tab10')
plt.colorbar(label='Class')
plt.title('Random Forest Classification')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

## Step 4: Train SVM Classifier

Train a Support Vector Machine classifier and predict across the raster.

In [None]:
# Initialize and train SVM
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train, y_train)

# Predict
svm_predictions = svm.predict(X_raster)
svm_predictions = svm_predictions.reshape(height, width)

# Visualize SVM predictions
plt.figure(figsize=(8, 8))
plt.imshow(svm_predictions, cmap='tab10')
plt.colorbar(label='Class')
plt.title('SVM Classification')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

## Step 5: Save Classification Results

Save the RF and SVM classification rasters to GeoTIFF files.

In [None]:
# Update profile for single-band output
output_profile = rasterio.open(raster_path).profile.copy()
output_profile.update(count=1, dtype=rasterio.uint8)

# Save RF predictions
with rasterio.open('rf_classification.tif', 'w', **output_profile) as dst:
    dst.write(rf_predictions.astype(rasterio.uint8), 1)

# Save SVM predictions
with rasterio.open('svm_classification.tif', 'w', **output_profile) as dst:
    dst.write(svm_predictions.astype(rasterio.uint8), 1)

print('RF classification saved to: rf_classification.tif')
print('SVM classification saved to: svm_classification.tif')

## Next Steps

- Replace `sample.tif` and `labels.shp` with your own raster and labeled shapefile.
- Adjust the 'class' column name to match your shapefile.
- Tune classifier parameters (e.g., `n_estimators` for RF, `kernel` for SVM).
- Proceed to the next notebook (`13_kmeans_clustering.ipynb`) for unsupervised clustering.

## Notes
- Ensure the shapefile has a 'class' column with numeric or categorical labels.
- Handle large datasets by sampling training data or using windowed reading.
- See `docs/installation.md` for troubleshooting library installation.