# Classification pipeline for a single glacier

Full glacier-snow-cover-mapping classification pipeline for Sentinel-2 TOA, Sentinel-2 SR, and Landsat 8/9 images. 

## Requirements:
1. Google Earth Engine (GEE) account: used to query imagery and the DEM (if no DEM is provided). Sign up for a free account [here](https://earthengine.google.com/new_signup/). 

2. Google Drive folder: Create a folder where output snow cover statistics will be saved. Enter the name of this folder as the `out_folder` variable below. If you don't create the folder ahead of time, duplicates of the same folder will be created for each output file!

## Notes on GEE job submissions

GEE enforces [user quotas](https://developers.google.com/earth-engine/guides/usage) on memory usage (default = 10 MB per query) and the number of concurrent requests (default = 40). The workflow below mitigates the risk of exceeding these limits using the following strategies:

1. __Limit single query size:__ The spatial resolution of imagery is adjusted to ensure individual images remain under 10 MB in memory. This typically affects only the largest glaciers. For example, the largest glacier in Alaska by area (Malaspina Glacier, ~3,500 km$^2$) requires resampling images to a coarser resolution to meet the memory constraint:
    - Sentinel-2 imagery (12 bands used) is resampled to 280 m resolution.
    - Landsat 8/9 (7 bands used) is resampled to 220 m resolution.

2. __Split tasks into smaller date ranges:__ While GEE handles queued tasks efficiently, it's our job to manage the size of the individual jobs (i.e., image collections). To reduce task size, the date range is divided before running the workflow and submitting export tasks:
    - For glaciers $<$ 1,000 km$^2$, the date range is split into annual intervals as needed.
    - For glaciers $\geq$ 1,000 km$^2$, the date range is split into monthly intervals, since the resulting images are much larger.


## Define image search settings and paths

In [1]:
import os
import ee
import geemap
import sys
import numpy as np

# -----Define Google Drive folder for outputs
# Note: Make sure this folder already exists and is the only folder in your "My Drive" with that name. 
out_folder = 'glacier_snow_cover_exports'

# -----Import pipeline utilities
# Assumes pipeline_utils.py is in the same folder as this notebook
script_path = os.getcwd()
sys.path.append(script_path)
import pipeline_utils as utils

# -----Define image search settings
# Date and month ranges (inclusive)
date_start = '2013-06-01'
date_end = '2025-06-25'
month_start = 6
month_end = 10
# Minimum fill portion of the AOI (0–100), used to remove images after mosaicking by day. 
min_aoi_coverage = 70
# Whether to mask clouds using the respective cloud mask via the geedim package
mask_clouds = True

  import pkg_resources


## Authenticate and/or Initialize Google Earth Engine (GEE)

Replace the project ID with your GEE project. Default = `ee-{GEE-username}`

In [2]:
# project_id = "snow-cover-mapping-463217"
project_id = "ee-raineyaberle"

try:
    ee.Initialize(project=project_id)
except:
    ee.Authenticate()
    ee.Initialize(project=project_id)

## Select the Area of Interest (AOI) from the GLIMS dataset

This cell will plot the GLIMS dataset on a map. To find a glacier, click on the wrench in the upper right toolbox of the map, and use the "Inspector" to click on a polygon and view the its properties. Right click on the "glac_id" property to highlight and then copy. Replace the `glac_id` variable below with your selected site. 

In [3]:
# Load the GLIMS dataset, add to the map
glims = ee.FeatureCollection('GLIMS/20230607')

# Select your study site from the GLIMS dataset
# glac_id = 'G211100E60420N'
# aoi = glims.filter(ee.Filter.eq('glac_id', glac_id))

# Test with the largest AK glacier (Malaspina)
glims_ak = (glims.filter(ee.Filter.eq('gtng_o1reg', 1))) # filter to AK
glims_ak_largest = glims_ak.filter(ee.Filter.eq('area', glims_ak.aggregate_max('area'))) # Get the largest AK glacier
aoi = glims_ak_largest
glac_id = aoi.aggregate_array('glac_id').getInfo()[0] # use GLIMS ID for output file names
print("Glacier ID used for output file names:", glac_id)
                                 
# Merge all geometries to use as the AOI 
aoi = aoi.union().geometry()

# Create a Map
# Map = geemap.Map()
# Map.addLayer(glims, {'color': 'blue', 'opacity':  0.5}, 'GLIMS/20230607')
# Map.addLayer(aoi, {'color': 'orange', 'opacity': 0.8}, 'AOI')
# Map.centerObject(aoi)
# Map

Glacier ID used for output file names: G219787E60289N


## Load the Digital Elevation Model (DEM)

Default: use the ArcticDEM Mosaic where there is > 90 % coverage. Otherwise, use the NASADEM. For sites that use the ArcticDEM Mosaic, elevations are reprojected to the EGM96 geoid to match the vertical datum of NASADEM. 

In [4]:
# Query GEE for DEM
dem = utils.query_gee_for_dem(aoi)

# # Add DEM to map
# # grab min and max elevations for color limits
# minMax = dem.reduceRegion(reducer=ee.Reducer.minMax(),
#                           geometry=aoi, 
#                           scale=30,
#                           maxPixels=1e9,
#                           bestEffort=True)
# elev_min = minMax.get('elevation_min')
# elev_max = minMax.get('elevation_max')
# print(f'Elevation range = {int(elev_min.getInfo())} to {int(elev_max.getInfo())} m')
# # colors based on the "terrain" palette from matplotlib
# palette = ['#333399', '#0d7fe5', '#00be90','#55dd77','#c6f48e','#e3db8a','#aa926b','#8e6e67','#c6b6b3','#ffffff']
# Map.addLayer(dem, {'palette': palette, 'min': elev_min, 'max': elev_max}, 'DEM')


Querying GEE for DEM
ArcticDEM coverage = 100 %
Using ArcticDEM Mosaic


## Run the classification pipeline for each dataset

In [8]:
### RA 2025-06-26: TESTING THE IMAGE SCALES FOR MALASPINA (function not currently calculating correctly)
## Scales tested for a 1-month range:
# - 1 km2: FAILED
# - 5 km2: FAILED
# - 10 km2: FAILED
# - 100 km2: FAILED

# Try a one week range
# - 100 km2: FAILED
# - 1000 km2: SUCCESS (Woof)
dataset = "Sentinel-2_TOA"
date_start = "2022-09-03"
date_end = "2022-09-10"

# Calculate the required image spatial resolution (scale) to stay within the GEE user memory limit.
# scale_required = utils.determine_required_image_scale(aoi, dataset)
scale_required = 1000e3

image_collection = utils.query_gee_for_imagery(dataset, aoi, scale_required, date_start, date_end, month_start, month_end, 
                                                min_aoi_coverage, mask_clouds)

classified_collection = utils.classify_image_collection(image_collection, dataset)

task = utils.calculate_snow_cover_statistics(classified_collection, dem, aoi, scale=scale_required, out_folder=out_folder,
                                             file_name_prefix=f"{glac_id}_{dataset}_snow_cover_stats_{date_start}_{date_end}_{int(scale_required/3)}km2")


Querying GEE for Sentinel-2_TOA image collection
Classifying image collection
Calculating snow cover statistics
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file name: G219787E60289N_Sentinel-2_TOA_snow_cover_stats_2022-09-03_2022-09-10_333333km2
To monitor tasks, see your Google Cloud Console or GEE Task Manager: https://code.earthengine.google.com/tasks


### Sentinel-2 Top of Atmosphere (TOA): 2016 onwards

In [None]:
dataset = "Sentinel-2_TOA"
utils.run_classification_pipeline(aoi, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, glac_id)

### Sentinel-2 Surface Reflectance (SR): 2019 onwards

In [None]:
dataset = "Sentinel-2_SR"
utils.run_classification_pipeline(aoi, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, glac_id)

### Landsat 8/9 SR: 2013 onwards

In [None]:
dataset = "Landsat"
utils.run_classification_pipeline(aoi, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, glac_id)

## Test for checking statuses of all current tasks/operations

RA 2025-06-25:

GEE automatically starts some jobs and leaves some pending, so I THINK simulataneous job submissions are managed automatically. 🤞

In [None]:
# Get statuses of all current tasks/operations
task_statuses = [x['metadata']['state'] for x in ee.data.listOperations()]
print('Current statuses for all tasks:')
print(task_statuses, '\n')

statuses = ['RUNNING', 'PENDING', 'SUCCEEDED', 'FAILED']

for status in statuses:
    count = 0
    for task in task_statuses:
        if task == status:
            count += 1
    print(f"Number of {status} tasks = {count}")
