<a href="https://colab.research.google.com/github/ck1972/University-GeoAI/blob/main/Lab_8_Scaling_Geospatial_ML_Workflows_Spatial_Transferability_Gweru_Github.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 8. Scaling Geospatial Machine Learning — Exploring the Spatial Transferability of Random Forest Models for Land Cover Classification**


## Imports and Setup
### Install libraries
First, install any additional libraries that are not installed by default (e.g., rasterio, earthpy).

In [None]:
# Install rasterio and earthpy libraries
!pip install rasterio
!pip install earthpy

### Import libraries
Import the necessary libraries (pandas, numpy, scikit-learn, rasterio, etc.).

In [None]:
# Import libraries
import earthpy.plot as ep
import numpy as np
import matplotlib.pyplot as plt
import rasterio
import joblib
from matplotlib.colors import from_levels_and_colors
from google.colab import drive

### Mount Google Drive
Next, mount your Google Drive. You will be prompted to authorize access to your Google Drive. Once mounted, you can read/write files in /content/drive/MyDrive.

In [None]:
# Mount Google Drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Define file paths and metadata
Define the the paths to access your own directory structure in Google Drive. In this tutorial, we use:
-A CSV training dataset (Bul_TrainingData_2024.csv) containing pixel values and their corresponding classes.
- A multiband Sentinel-2 image (Bul_S2_2024.tif).
- PALSAR ScanSAR polarization

In [None]:
# Define file paths
s2_path = '/content/drive/MyDrive/Gweru_Dataset_2024/Gw_S2_2024.tif'
palsar_path = '/content/drive/MyDrive/Gweru_Dataset_2024/Gw_Palsar_HV_2024.tif'
model_path = '/content/drive/MyDrive/Bulawayo_Dataset_2024/best_rf_model.pkl'
output_path = '/content/drive/MyDrive/Gweru_Dataset_2024/Gw_LandCover_RF_2024.tif'

# Define metadata
Bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12', 'HV']
Classes = [0, 1, 2, 3, 4, 5]
Names = ["Bare area", "Built-up", "Cropland", "Grassland", "Woodland", "Water"]
Palette = ['#D3D3D3', '#FF0000', '#FFD700', '#ADFF2F', '#006400', '#0000FF']

## Load the trained random forest model
Next, we will load a previously trained random forest model that was saved in Lab 5b.

In [None]:
# Load the trained model
model_package = joblib.load(model_path)
rf = model_package['model']
features = model_package['features']

## Load and display Sentinel-2 and PALSAR HV images for Gweru
We will use rasterio to open Sentinel-2 imagery and ALOS PALSAR ScanSAR HV polarization for the new study area.

In [None]:
# Load Sentinel-2 bands (assumes 9 bands in the correct order: B2 to B12)
with rasterio.open(s2_path) as s2_src:
    s2_bands = s2_src.read(list(range(1, 10)))
    profile = s2_src.profile  # Save for writing output

# Load PALSAR HV band
with rasterio.open(palsar_path) as palsar_src:
    palsar_hv = palsar_src.read(1)

# Display Sentinel-2 RGB and PALSAR HV
# Select bands for RGB
red = s2_bands[8, :, :]   # B12 (SWIR2)
green = s2_bands[6, :, :] # B8 (NIR)
blue = s2_bands[2, :, :]  # B4 (Red)

# Normalize for display
rgb = np.stack([red, green, blue], axis=-1)
rgb_min, rgb_max = 0, 0.4
rgb_display = np.clip((rgb - rgb_min) / (rgb_max - rgb_min), 0, 1)

# Normalize PALSAR HV
hv_min, hv_max = 0, 1
hv_display = np.clip((palsar_hv - hv_min) / (hv_max - hv_min), 0, 1)

### Display images
Display the Sentinel-2 composite and the PALSAR ScanSAR HV polorization images.

In [None]:
# Plot
fig, axs = plt.subplots(1, 2, figsize=(14, 7))
axs[0].imshow(rgb_display)
axs[0].set_title('Sentinel-2 RGB (B11, B8, B4)')
axs[0].axis('off')

axs[1].imshow(hv_display, cmap='gray')
axs[1].set_title('PALSAR HV Backscatter')
axs[1].axis('off')

plt.tight_layout()
plt.show()

## Apply the model to classify land cover in the study area
Next, we will prepare the input features and apply the trained random forest model to predict land cover classes for a new study area (Gweru).


In [None]:
# Stack and Predict
# Stack S2 and PALSAR HV
bands_data = np.concatenate([s2_bands, palsar_hv[np.newaxis, :, :]], axis=0)  # (10, H, W)
bands_data = np.transpose(bands_data, (1, 2, 0))  # (H, W, 10)
flat_pixels = bands_data.reshape(-1, bands_data.shape[-1])  # (N_pixels, 10)

# Predict
predictions = rf.predict(flat_pixels)
predicted_image = predictions.reshape(bands_data.shape[:2])  # (H, W)



## Display and export the predicted land cover map
Finally, we display the land cover map and then export it to Google Drive as GeoTIFF using rasterio.

In [None]:
# Display Predicted Map
levels = Classes + [max(Classes) + 1]
cmap, norm = from_levels_and_colors(levels, Palette)

plt.figure(figsize=(10, 8))
im = plt.imshow(predicted_image, cmap=cmap, norm=norm)
plt.title("Predicted Land Cover Map - Gweru")
cbar = plt.colorbar(im, shrink=0.7)
tick_positions = [i + 0.5 for i in Classes]
cbar.set_ticks(tick_positions)
cbar.set_ticklabels(Names)
plt.axis('off')
plt.show()

# Save to GeoTIFF
profile.update(dtype=rasterio.uint8, count=1)
with rasterio.open(output_path, 'w', **profile) as dst:
    dst.write(predicted_image.astype(rasterio.uint8), 1)

print("Land cover map saved to:", output_path)