<a href="https://colab.research.google.com/github/ck1972/Hands-On-GeoAI1/blob/main/GeoAI_Lab_2b_Preparing_Data_for_Geospatial_Machine_Learning_S2_Palsar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Preparing Training Data for Land Cover Mapping: Optical and SAR data**
## Introduction
The aim of this tutorial is to prepare high-quality training data for land cover mapping by integrating Sentinel-2 optical imagery with ALOS-PALSAR radar data. Combining these two complementary datasets enhances the ability to distinguish between different land cover types, especially in areas where where radar backscatter provides valuable structural information.

### Check tutorial for preparing training data (polygons)
- Watch Youtube video tutorial: https://www.youtube.com/watch?v=k--M1a-V_x4


## Initialize and authenticate Earth Engine
To get started with Google Earth Engine (GEE), you need to initialize and authenticate the Earth Engine API. Follow these steps.


First, import the Earth Engine API by importing the ee module into your Python environment. This module allows you to interact with the Earth Engine platform.


In [1]:
# Import the API
import ee

# Import the geemap library
import geemap

Next, initialize the Earth Engine API. You must initialize the API to use Earth Engine functionalities. This involves authenticating your session and initializing the library. When you run the ee.Initialize() command for the first time, you might be prompted to authenticate your session. This will open a web browser window where you need to log in with your Google account and grant Earth Engine access.

In [2]:
# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize(project='ee-kamusoko-test') # Change to your EE project

## Import study area boundary
First, import the study area boundary.

In [None]:
# Load the boundary
boundary = ee.FeatureCollection('users/kamas72_ML_Zim_Cities/Bulawayo_Crop_Boundary')

## Import training data
Next, we will import land cover training data (polygons), which was created in QGIS.

In [None]:
# Load training datasets
training_data = ee.FeatureCollection('projects/ee-kamusoko-test/assets/Bul_TrainingData_2025a')

# Get the histogram of classes (key = class value, value = count)
histogram = training_data.aggregate_histogram('Cl_Id').getInfo()

# Define a label map for clarity
label_map = {
    '0': "Bare areas",
    '1': "Built-up",
    '2': "Cropland",
    '3': "Grass / open areas",
    '4': "Woodlands",
    '5': "Water"
}

print("Number of training polygons per land cover class (Cl_Id):")
for cl_id in sorted(histogram.keys(), key=int):
    label = label_map.get(cl_id, f"Class {cl_id}")
    print(f"{label} (Cl_Id={cl_id}): {histogram[cl_id]}")

## Create Sentinel-2 composite
The sentinel-2 mission offers a wide-swath, high-resolution, multispectral imaging capability with a global 5-day revisit frequency. The Sentinel-2 Multispectral Instrument (MSI) has 13 spectral bands, providing a comprehensive view of the Earth's surface. These bands are distributed as four at 10 meters, six at 20 meters, and three at 60 meters spatial resolution. For more detailed information about the Sentinel-2 mission, please visit https://sentinel.esa.int/web/sentinel/missions/sentinel-2.


In [None]:
# Sentinel-2 SR data (Harmonized)
s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')

# Cloud masking function using SCL band
def mask_s2clouds(image):
    scl = image.select('SCL')
    mask = scl.neq(8).And(scl.neq(9)).And(scl.neq(10)).And(scl.neq(11))
    return image.updateMask(mask).divide(10000)

# Filter and preprocess Sentinel-2 data
S2 = (s2.filterBounds(boundary)
      .filterDate('2025-03-01', '2025-06-30')
      .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))
      .map(mask_s2clouds)
      .select(['B2','B3','B4','B5','B6','B7','B8','B11','B12']))

# Bands to include in the classification
bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12']

# Create a median composite
composite = S2.median().clip(boundary)

## Create ALOS PALSAR-2 ScanSAR composite
ALOS PALSAR-2 (Advanced Land Observing Satellite – Phased Array type L-band Synthetic Aperture Radar) is a Japanese L-band radar sensor (Japan Aerospace Exploration Agency). The ScanSAR mode allows wide-area coverage (up to 350 km swath) by scanning multiple sub-swaths, making it useful for regional-scale monitoring. It operates in L-band, which penetrates vegetation and soil, making it ideal for forest mapping, flood detection, and land deformation studies. The 25 m PALSAR-2 ScanSAR (Google Earth Engine catalog) is normalized backscatter data of PALSAR-2 broad area observation mode with observation width of 350 km. The SAR imagery was ortho-rectificatied and slope corrected using the ALOS World 3D - 30 m (AW3D30) Digital Surface Model. Polarization data are stored as 16-bit digital numbers (DN).

We will normalize the PALSAR backscatter values (HH and HV) to the range [0, 1] in Earth Engine, you can use .unitScale(min, max). For ALOS PALSAR backscatter in DN (digital number) format, typical values range from 0 to 8000.

In [None]:
# Load ALOS PALSAR-2 ScanSAR image collection and filter by boundary and date
collection = (
    ee.ImageCollection('JAXA/ALOS/PALSAR-2/Level2_2/ScanSAR')
    .filterBounds(boundary)
    .filterDate('2025-03-01', '2025-06-30')
)

# Compute median composites for HH and HV
Palsar_median_HH = collection.select('HH').median().clip(boundary).unitScale(0, 8000)
Palsar_median_HV = collection.select('HV').median().clip(boundary).unitScale(0, 8000)

In [7]:
# Load Sentinel-1 Image Collection and filter by bounds, metadata, and date (Not available)
collection_S1 = (
    ee.ImageCollection('COPERNICUS/S1_GRD')
    .filterBounds(boundary)
    .filterMetadata('instrumentMode', 'equals', 'IW')
    .filterDate('2025-03-01', '2025-06-30')
)

# Filter for dual polarization (VV and VH)
collection_S1_VV_VH = (
    collection_S1
    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
)

# Filter for ascending orbit
collection_S1_ASC = collection_S1_VV_VH.filter(ee.Filter.eq('orbitProperties_pass', 'ASCENDING'))

# Compute median composites for VV and VH
S1_median_ASC_VV = collection_S1_ASC.select('VV').median().clip(boundary)
S1_median_ASC_VH = collection_S1_ASC.select('VH').median().clip(boundary)

## Display training samples and satellite imagery
Next, display the land cover training samples on the satellite imagery (Sentinel-2 and ALOS PALSAR).

In [None]:
# Initialize the map
map = geemap.Map()
map.centerObject(training_data, 12)

# Add Sentinel-2 composite
map.addLayer(composite, {'bands': ['B11', 'B8', 'B3'], 'min': 0, 'max': 0.3}, 'Sentinel-2 Composite')

# Add PALSAR ScanScar layers to the map
map.addLayer(Palsar_median_HH, {'min': 0, 'max': 1}, 'PALSAR Median HH')
map.addLayer(Palsar_median_HV, {'min': 0, 'max': 1}, 'PALSAR Median HV')

# Add VV and VH layers to map
#map.addLayer(S1_median_ASC_VV, {'min': -15, 'max': 5}, 'S-1 Median Asc VV')
#map.addLayer(S1_median_ASC_VH, {'min': -25, 'max': 5}, 'S-1 Median Asc VH')

# Add training data as a layer
map.addLayer(training_data, {'color': 'red'}, 'Training Data')

# Display the map with layer control
map.addLayerControl()
map

## Prepare training data
In this step, we prepare the dataset for training and testing machine learning models by processing satellite imagery and training labels. We start by selecting Sentinel-2 bands (B2 to B12) and clipping the composite image to the specified boundary region, defining the input features. Next, we rasterize the vector training data using the Cl_Id property to create a raster layer representing class labels and add it as a new band (class) to the input features. To create a representative dataset, we use stratified sampling to extract reflectance values and class labels, ensuring proportional representation across classes.

In [None]:
# Combine Sentinel-2 composite with PALSAR HV band
combined = composite.addBands(Palsar_median_HV.rename('HV'))

# Use ee.List for band selection
bands = ee.List(['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12', 'HV'])
input_features = combined.clip(boundary)
print('input features: ', input_features.getInfo())

# Rasterise training data
training_rasterized = training_data.reduceToImage(
    properties=['Cl_Id'],
    reducer=ee.Reducer.first()
).toInt().remap([0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]) # Bare areas, Built-up, Cropland, Grass/ open areas, Woodlands, Water

# Add a class band to features
input_features = input_features.addBands(training_rasterized.toInt().rename('class'))

# Sample the reflectance, elevation, and slope values for each training point
training_dataset = input_features.stratifiedSample(
    numPoints=10000,
    classBand="class",
    region=boundary,
    scale=20
)

## Export the training points
We export the 'training_data' feature collection, Sentinel-2 composite and PALSAR ScanSAR images to your Google Drive. After configuring the export, the task is started with task.start().

In [None]:
# Export training samples as CSV
task_table = ee.batch.Export.table.toDrive(
    collection=training_dataset,
    description='Bul_TA_S2_Pal_2025',
    folder='Bulawayo_Dataset_2025',
    fileFormat='CSV'
)

# Start the export task
task_table.start()

# Export the composite with indices
task_composite = ee.batch.Export.image.toDrive(
    image=composite.float(),
    description='Bul_S2_2025',
    folder='Bulawayo_Dataset_2025',
    scale=10,
    region=boundary.geometry(),
    maxPixels=1e13
)
task_composite.start()

# Export the composite with indices
task_composite = ee.batch.Export.image.toDrive(
    image=Palsar_median_HV.float(),
    description='Bul_Palsar_HV_2025',
    folder='Bulawayo_Dataset_2025',
    scale=10,
    region=boundary.geometry(),
    maxPixels=1e13
)
task_composite.start()