<a href="https://colab.research.google.com/github/ck1972/Python-Geospatial_Model1/blob/main/Lab_4a_Preparing_Data_for_Geospatial_Machine_Learning_S2_cleaned.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Preparing Training Data for Land Cover Mapping**
## Introduction
Preparing training data for supervised land cover classification using satellite imagery such as Landsat and Sentinel-2 is crucial for accurate mapping, yet it presents several challenges.

### Challenges in preparing training samples for supervised land cover
- High-quality and consistent training data
- Class imbalance
- Mixed pixels
- Temporal variations
- Spectral confusion
- Co-registration and geometric accuracy
- Labor-intensive manual collection
- Limited transferability

### Check tutorial for preparing training data (polygons)
- Watch Youtube video tutorial: https://www.youtube.com/watch?v=k--M1a-V_x4


## Initialize and authenticate Earth Engine
To get started with Google Earth Engine (GEE), you need to initialize and authenticate the Earth Engine API. Follow these steps.


First, import the Earth Engine API by importing the ee module into your Python environment. This module allows you to interact with the Earth Engine platform.


In [None]:
# Import the API
import ee

# Import the geemap library
import geemap

Next, initialize the Earth Engine API. You must initialize the API to use Earth Engine functionalities. This involves authenticating your session and initializing the library. When you run the ee.Initialize() command for the first time, you might be prompted to authenticate your session. This will open a web browser window where you need to log in with your Google account and grant Earth Engine access.

In [None]:
# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize(project='ee-kamusoko-test') # Change to your EE project

## Import study area boundary
First, import the study area boundary.

In [None]:
# Load the boundary
boundary = ee.FeatureCollection('users/kamas72_ML_Zim_Cities/Bulawayo_Crop_Boundary')

## Import training data
Next, we will import land cover training data (polygons), which was created in QGIS.

In [None]:
# Load training datasets
training_data = ee.FeatureCollection('users/kamas72_ML_Zim_Cities/Updated_TA_2020_Bul_May_21_GEE')

# Get the histogram of classes (key = class value, value = count)
histogram = training_data.aggregate_histogram('Cl_Id').getInfo()

# Define a label map for clarity
label_map = {
    '0': "Bare areas",
    '1': "Built-up",
    '2': "Cropland",
    '3': "Grass / open areas",
    '4': "Woodlands",
    '5': "Water"
}

print("Number of training polygons per land cover class (Cl_Id):")
for cl_id in sorted(histogram.keys(), key=int):
    label = label_map.get(cl_id, f"Class {cl_id}")
    print(f"{label} (Cl_Id={cl_id}): {histogram[cl_id]}")

Number of training polygons per land cover class (Cl_Id):
Bare areas (Cl_Id=0): 154
Built-up (Cl_Id=1): 806
Cropland (Cl_Id=2): 169
Grass / open areas (Cl_Id=3): 495
Woodlands (Cl_Id=4): 335
Water (Cl_Id=5): 19


## Create Sentinel-2 composite
The sentinel-2 mission offers a wide-swath, high-resolution, multispectral imaging capability with a global 5-day revisit frequency. The Sentinel-2 Multispectral Instrument (MSI) has 13 spectral bands, providing a comprehensive view of the Earth's surface. These bands are distributed as four at 10 meters, six at 20 meters, and three at 60 meters spatial resolution. For more detailed information about the Sentinel-2 mission, please visit https://sentinel.esa.int/web/sentinel/missions/sentinel-2.


In [None]:
# Sentinel-2 SR data (Harmonized)
s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')

# Cloud masking function using SCL band
def mask_s2clouds(image):
    scl = image.select('SCL')
    mask = scl.neq(8).And(scl.neq(9)).And(scl.neq(10)).And(scl.neq(11))
    return image.updateMask(mask).divide(10000)

# Filter and preprocess Sentinel-2 data
S2 = (s2.filterBounds(boundary)
      .filterDate('2024-03-01', '2024-06-30')
      .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))
      .map(mask_s2clouds)
      .select(['B2','B3','B4','B5','B6','B7','B8','B11','B12']))

# Bands to include in the classification
bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12']

# Create a median composite
composite = S2.median().clip(boundary)

## Display training samples and Sentine-2
Next, display the land cover training samples on Sentinel-2 imagery

In [None]:
# Initialize the map
map = geemap.Map()
map.centerObject(training_data, 12)

# Add Sentinel-2 composite
map.addLayer(composite, {'bands': ['B11', 'B8', 'B3'], 'min': 0, 'max': 0.3}, 'Sentinel-2 Composite')

# Add training data as a layer
map.addLayer(training_data, {'color': 'red'}, 'Training Data')

# Display the map with layer control
map.addLayerControl()
map

Map(center=[-20.071642895480387, 28.547525199943355], controls=(WidgetControl(options=['position', 'transparen…

## Prepare training data
In this step, we prepare the dataset for training and testing machine learning models by processing satellite imagery and training labels. We start by selecting Sentinel-2 bands (B2 to B12) and clipping the composite image to the specified boundary region, defining the input features. Next, we rasterize the vector training data using the Cl_Id property to create a raster layer representing class labels and add it as a new band (class) to the input features. To create a representative dataset, we use stratified sampling to extract reflectance values and class labels, ensuring proportional representation across classes.

In [None]:
# Use ee.List for band selection
bands = ee.List(['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12'])
input_features = composite.clip(boundary)
print('input features: ', input_features.getInfo())

# Rasterise training data
training_rasterized = training_data.reduceToImage(
    properties=['Cl_Id'],
    reducer=ee.Reducer.first()
).toInt().remap([0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]) # Bare areas, Built-up, Cropland, Grass/ open areas, Woodlands, Water

# Add a class band to features
input_features = input_features.addBands(training_rasterized.toInt().rename('class'))

# Sample the reflectance, elevation, and slope values for each training point
training_dataset = input_features.stratifiedSample(
    numPoints=10000,
    classBand="class",
    region=boundary,
    scale=20
)

input features:  {'type': 'Image', 'bands': [{'id': 'B2', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B3', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B4', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B5', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B6', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B7', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': 0, 'max': 6.553500175476074}, 'crs': 'EPSG:4326', 'cr

## Export the training samples and Sentinel-2 imagery
We export the 'training_data' feature collection, Sentinel-2 composite and PALSAR ScanSAR images to your Google Drive. After configuring the export, the task is started with task.start().

In [None]:
# Export training samples as CSV
task_table = ee.batch.Export.table.toDrive(
    collection=training_dataset,
    description='Bul_TrainingData_2024',
    folder='Bulawayo_Dataset_2024',
    fileFormat='CSV'
)

# Start the export task
task_table.start()

# Export the composite with indices
task_composite = ee.batch.Export.image.toDrive(
    image=composite.float(),
    description='Bul_S2_2024',
    folder='Bulawayo_Dataset_2024',
    scale=10,
    region=boundary.geometry(),
    maxPixels=1e13
)
task_composite.start()