# Classification pipeline using GEE classifiers

Full glacier-snow-cover-mapping classification pipeline for Sentinel-2 TOA, Sentinel-2 SR, and Landsat 8/9 images. 

__Requirements:__
1. Google Earth Engine (GEE) account: used to query imagery and the DEM (if no DEM is provided). Sign up for a free account [here](https://earthengine.google.com/new_signup/). 

2. Google Drive folder: Create a folder where output snow cover statistics will be saved. Enter the name of this folder as the `out_folder` variable below. If you don't create the folder ahead of time, duplicates of the same folder will be created for each output file! 

## Define image search settings and paths

In [1]:
import os
import ee
import geemap
import sys
import numpy as np

# -----Define Google Drive folder for outputs
# Note: Make sure this folder already exists and is the only folder in your "My Drive" with that name. 
out_folder = 'glacier_snow_cover_exports'

# -----Import pipeline utilities
# Assumes pipeline_utils.py is in the same folder as this notebook
script_path = os.getcwd()
sys.path.append(script_path)
import pipeline_utils as utils

# -----Define image search settings
# Date and month ranges (inclusive)
date_start = '2013-06-01'
date_end = '2025-06-25'
month_start = 6
month_end = 10
# Minimum fill portion of the AOI (0–100), used to remove images after mosaicking by day. 
min_aoi_coverage = 70
# Whether to mask clouds using the respective cloud mask via the geedim package
mask_clouds = True

  import pkg_resources


## Authenticate and/or Initialize Google Earth Engine (GEE)

Replace the project ID with your GEE project. Default = `ee-{GEE-username}`

In [2]:
# project_id = "snow-cover-mapping-463217"
project_id = "ee-raineyaberle"

try:
    ee.Initialize(project=project_id)
except:
    ee.Authenticate()
    ee.Initialize(project=project_id)

## Select the Area of Interest (AOI) from the GLIMS dataset

This cell will plot the GLIMS dataset on a map. To find a glacier, click on the wrench in the upper right toolbox of the map, and use the "Inspector" to click on a polygon and view the its properties. Right click on the "glac_id" property to highlight and then copy. Replace the `glac_id` variable below with your selected site. 

In [3]:
# Create a Map
Map = geemap.Map()

# Load the GLIMS dataset, add to the map
glims = ee.FeatureCollection('GLIMS/20230607')
Map.addLayer(glims, {'color': 'blue', 'opacity':  0.5}, 'GLIMS/20230607')

# Select your study site from the GLIMS dataset
# glac_id = 'G211100E60420N'
# aoi = glims.filter(ee.Filter.eq('glac_id', glac_id))

# Test with the largest AK glacier (Malaspina)
aoi_ak = (glims.filter(ee.Filter.eq('gtng_o1reg', 1))) # filter to AK
aoi_ak_largest = aoi_ak.filter(ee.Filter.eq('area', aoi_ak.aggregate_max('area'))) # Get the largest AK glacier
aoi = aoi_ak_largest
glac_id = aoi.aggregate_array('glac_id').getInfo()[0] # use GLIMS ID for output file names
print("Glacier ID used for output files:", glac_id)
                                 
# Merge all geometries to use as the AOI 
aoi = aoi.union().geometry()

# Add AOI to the map
Map.addLayer(aoi, {'color': 'orange', 'opacity': 0.8}, 'AOI')
Map.centerObject(aoi)

# Display the map
Map

Glacier ID used for output files: G219787E60289N


Map(center=[0, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchDataGUI(childr…

## Load the Digital Elevation Model (DEM)

Default: use the ArcticDEM Mosaic where there is > 90 % coverage. Otherwise, use the NASADEM. For sites that use the ArcticDEM Mosaic, elevations are reprojected to the EGM96 geoid to match the vertical datum of NASADEM. 

In [4]:
# Query GEE for DEM
dem = utils.query_gee_for_dem(aoi)

# Add DEM to map
# grab min and max elevations for color limits
minMax = dem.reduceRegion(reducer=ee.Reducer.minMax(),
                          geometry=aoi, 
                          scale=30,
                          maxPixels=1e9,
                          bestEffort=True)
elev_min = minMax.get('elevation_min')
elev_max = minMax.get('elevation_max')
print(f'Elevation range = {int(elev_min.getInfo())} to {int(elev_max.getInfo())} m')
# colors based on the "terrain" palette from matplotlib
palette = ['#333399', '#0d7fe5', '#00be90','#55dd77','#c6f48e','#e3db8a','#aa926b','#8e6e67','#c6b6b3','#ffffff']
Map.addLayer(dem, {'palette': palette, 'min': elev_min, 'max': elev_max}, 'DEM')


Querying GEE for DEM
ArcticDEM coverage = 100 %
Using ArcticDEM Mosaic
Elevation range = -1 to 5952 m


## Run the classification pipeline for each image dataset

### Sentinel-2 Top of Atmosphere (TOA): 2016 onwards

Note: While Sentinel-2 launched in 2015, Sentinel-2 Surface Reflectance (below) is only available on GEE starting in ~2019. Therefore, Sentinel-2 TOA can be used to increase temporal coverage from 2015–2019 in particular. 

In [14]:
dataset = "Sentinel-2_TOA"

# Calculate the required image spatial resolution (scale) to stay within the GEE user memory limit.
scale_required = utils.determine_required_image_scale(aoi, dataset)

# Split the date range into separate annual date ranges to increase compute times.
date_ranges_list = utils.split_date_range_by_year(dataset, date_start, date_end, month_start, month_end)

# Run the workflow for each date range separately. 
for date_range in date_ranges_list:
    print('\n', date_range)

    # Query GEE for imagery
    image_collection = utils.query_gee_for_imagery(dataset, aoi, scale_required, date_range[0], date_range[1], month_start, month_end, 
                                                   min_aoi_coverage, mask_clouds)

    # Classify image collection
    classified_collection = utils.classify_image_collection(image_collection, dataset)

    # Calculate snow cover statistics, export to Google Drive
    task = utils.calculate_snow_cover_statistics(classified_collection, dem, aoi, scale=scale_required, out_folder=out_folder,
                                                 file_name_prefix=f"{glac_id}_{dataset}_snow_cover_stats_{date_range[0]}_{date_range[1]}")


Estimated image size over AOI for Sentinel-2_TOA: 795.32 MB
Image size exceeds GEE's 10 MB limit. Using scale = 90 m

 ('2016-06-01', '2016-10-31')
Querying GEE for Sentinel-2_TOA image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Sentinel-2_TOA_snow_cover_stats_2016-06-01_2016-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2017-06-01', '2017-10-31')
Querying GEE for Sentinel-2_TOA image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Sentinel-2_TOA_snow_cover_stats_2017-06-01_2017-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2018-06-01', '2018-10-31')
Querying GEE for Sentinel-2_TOA image collection
Class

### Sentinel-2 Surface Reflectance (SR): 2019 onwards

In [24]:
dataset = "Sentinel-2_SR"

# Calculate the required image spatial resolution (scale) to stay within the GEE user memory limit.
scale_required = utils.determine_required_image_scale(aoi, dataset)

# Split the date range into separate annual date ranges to increase compute times.
date_ranges_list = utils.split_date_range_by_year(dataset, date_start, date_end, month_start, month_end)

# Run the workflow for each date range separately. 
for date_range in date_ranges_list:
    print('\n', date_range)

    # Query GEE for imagery
    image_collection = utils.query_gee_for_imagery(dataset, aoi, scale_required, date_range[0], date_range[1], month_start, month_end, 
                                                   min_aoi_coverage, mask_clouds)

    # Classify image collection
    classified_collection = utils.classify_image_collection(image_collection, dataset)

    # Calculate snow cover statistics, export to Google Drive
    task = utils.calculate_snow_cover_statistics(classified_collection, dem, aoi, scale=scale_required, out_folder=out_folder,
                                                 file_name_prefix=f"{glac_id}_{dataset}_snow_cover_stats_{date_range[0]}_{date_range[1]}")


Estimated image size over AOI for Sentinel-2_SR: 795.32 MB
Image size exceeds GEE's 10 MB limit. Using scale = 90 m

 ('2019-06-01', '2019-10-31')
Querying GEE for Sentinel-2_SR image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Sentinel-2_SR_snow_cover_stats_2019-06-01_2019-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2020-06-01', '2020-10-31')
Querying GEE for Sentinel-2_SR image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Sentinel-2_SR_snow_cover_stats_2020-06-01_2020-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2021-06-01', '2021-10-31')
Querying GEE for Sentinel-2_SR image collection
Classifying

### Landsat 8/9 SR: 2013 onwards

In [25]:
dataset = "Landsat"

# Calculate the required image spatial resolution (scale) to stay within the GEE user memory limit.
scale_required = utils.determine_required_image_scale(aoi, dataset)

# Split the date range into separate annual date ranges to increase compute times.
date_ranges_list = utils.split_date_range_by_year(dataset, date_start, date_end, month_start, month_end)

# Run the workflow for each date range separately. 
for date_range in date_ranges_list:
    print('\n', date_range)

    # Query GEE for imagery
    image_collection = utils.query_gee_for_imagery(dataset, aoi, scale_required, date_range[0], date_range[1], month_start, month_end, 
                                                   min_aoi_coverage, mask_clouds)

    # Classify image collection
    classified_collection = utils.classify_image_collection(image_collection, dataset)

    # Calculate snow cover statistics, export to Google Drive
    task = utils.calculate_snow_cover_statistics(classified_collection, dem, aoi, scale=scale_required, out_folder=out_folder,
                                                 file_name_prefix=f"{glac_id}_{dataset}_snow_cover_stats_{date_range[0]}_{date_range[1]}")


Estimated image size over AOI for Landsat: 51.55 MB
Image size exceeds GEE's 10 MB limit. Using scale = 70 m

 ('2013-06-01', '2013-10-31')
Querying GEE for Landsat image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Landsat_snow_cover_stats_2013-06-01_2013-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2014-06-01', '2014-10-31')
Querying GEE for Landsat image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Landsat_snow_cover_stats_2014-06-01_2014-10-31
To monitor tasks, go to your GEE Task Manager: https://code.earthengine.google.com/tasks

 ('2015-06-01', '2015-10-31')
Querying GEE for Landsat image collection
Classifying image collection
Calculating snow co

## Test for checking statuses of all current tasks/operations

RA 2025-06-25:

GEE automatically has starts some jobs and leaves some pending... Maybe it manages a bunch of jobs submitted around the same time on its own? Will test again later by running several sites at the same time.

If error occurs with too many ~simultaneous job submissions, could check how many current "RUNNING" and  "PENDING" tasks every minute or two, set a limit for number of tasks at a given time, and only start next one when current tasks are below that limit.

Otherwise, maybe the number of simulataneous job submissions might not be an issue. 🤞

In [28]:
# Get statuses of all current tasks/operations
task_statuses = [x['metadata']['state'] for x in ee.data.listOperations()]
print('Current statuses for all tasks:')
print(task_statuses)

# Count number of "RUNNING" tasks
running_count=0
for status in task_statuses:
    if status=="RUNNING":
        running_count += 1
print('Number of current RUNNING tasks: ', running_count)

# Count number of "PENDING" tasks
pending_count=0
for status in task_statuses:
    if status=="PENDING":
        pending_count += 1
print('Number of current PENDING tasks: ', pending_count)

Current statuses for all tasks:
['PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'PENDING', 'RUNNING', 'RUNNING', 'RUNNING', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED', 'FAILED']
Number of current RUNNING tasks:  3
Number of current PENDING tasks:  27
