# GlaSEE pipeline

Full classification pipeline for Sentinel-2 TOA, Sentinel-2 SR, and Landsat 8/9 images. 

## Requirements:
1. Google Earth Engine (GEE) account: used to query imagery and the DEM (if no DEM is provided). Sign up for a free account [here](https://earthengine.google.com/new_signup/). 

2. Google Drive folder: Create a folder where output snow cover statistics will be saved. Enter the name of this folder as the `out_folder` variable below. If you don't create the folder ahead of time, duplicates of the same folder will be created for each output file!

## Define image search settings and paths

In [1]:
import os
import ee
import geemap
import sys

# -----Define Google Drive folder for exports
# NOTE: Make sure this folder already exists and is the only folder with that name in "My Drive". 
out_folder = 'glacier_snow_cover_exports'

# -----Import pipeline utilities
# Assumes pipeline_utils.py is in the same folder as this notebook
script_path = os.getcwd()
sys.path.append(script_path)
import glasee_pipeline_utils as utils

# -----Define image search settings
# Date and month ranges (inclusive)
date_start = '2014-04-01' 
date_end = '2025-10-31' 
month_start = 4 # April = 4
month_end = 10 # Oct = 10 (inclusive... if you want only one month, set the same as date_start)
# Minimum fill portion percentage of the AOI (0–100), used to remove images after mosaicking by day
min_aoi_coverage = 70
# Whether to mask clouds using the respective cloud mask via the geedim package
mask_clouds = True

  import pkg_resources


## Authenticate and/or Initialize Google Earth Engine (GEE)

Replace the project ID with your GEE project. Default = `ee-[GEE-username]`

In [2]:
project_id = "ee-ellynenderlin"

try:
    ee.Initialize(project=project_id)
except:
    ee.Authenticate()
    ee.Initialize(project=project_id)

## Run the pipeline for a single glacier

### Select the Area of Interest (AOI) from the Randolph Glacier Inventory (RGI) dataset

This cell will plot the RGI dataset on a map. To find a glacier, click on the wrench in the upper right toolbox of the map, and use the "Inspector" to click on a polygon and view the its properties. Right click on the "rgi_id" property to highlight and then copy. Replace the `rgi_id` variable below with your selected site. 

In [None]:
# Load the RGI v7 dataset
rgi = ee.FeatureCollection("projects/ee-raineyaberle/assets/glacier-snow-cover-mapping/RGI2000-v7-G")

# Select a glacier by the RGI v7 ID 
rgi_id = 'RGI2000-v7.0-G-01-21534' #RGI2000-v7.0-G-01-11350 (Wolverine), RGI2000-v7.0-G-01-05299 (Gulkana), RGI2000-v7.0-G-01-05740 (Kennicott), 17516 (Casement)
                                 
# Grab the geometry
aoi = rgi.filter(ee.Filter.eq('rgi_id', rgi_id))
aoi = aoi.geometry()
aoi_area = aoi.area().getInfo() # save area [m^2] for splitting date ranges later
print(f"Glacier area = {int(aoi_area/1e6)} km2")

# Create a Map
Map = geemap.Map()
Map.addLayer(rgi, {'color': 'blue', 'opacity':  0.5}, 'RGI v7')
Map.addLayer(aoi, {'color': 'orange', 'opacity': 0.8}, 'AOI')
Map.centerObject(aoi,zoom=9)
Map

### Load the Digital Elevation Model (DEM)

Default: use the ArcticDEM Mosaic where there is > 90 % coverage. Otherwise, use the NASADEM. For sites that use the ArcticDEM Mosaic, elevations are reprojected to the EGM96 geoid to match the vertical datum of NASADEM. 

In [None]:
# Query GEE for DEM
dem = utils.query_gee_for_dem(aoi)

# Add DEM to map
# grab min and max elevations for color limits
minMax = dem.reduceRegion(reducer=ee.Reducer.minMax(),
                          geometry=aoi, 
                          scale=30,
                          maxPixels=1e9,
                          bestEffort=True)
elev_min = minMax.get('elevation_min')
elev_max = minMax.get('elevation_max')
print(f'Elevation range = {int(elev_min.getInfo())} to {int(elev_max.getInfo())} m')
# colors based on the "terrain" palette from matplotlib
palette = ['#333399', '#0d7fe5', '#00be90','#55dd77','#c6f48e','#e3db8a','#aa926b','#8e6e67','#c6b6b3','#ffffff']
Map.addLayer(dem, {'palette': palette, 'min': elev_min, 'max': elev_max}, 'DEM')

Option A) Run the model for all platforms:

In [None]:
for dataset in ['Sentinel-2_TOA','Sentinel-2_SR','Landsat']: #'Sentinel-2_TOA','Sentinel-2_SR','Landsat'
    utils.run_classification_pipeline(aoi, aoi_area, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, rgi_id, scale=None, verbose=False)

Option B) Run the model for a single platform over the specified date range:

In [7]:
# New date range (useful if your kernel abruptly shut down and you need to pick up where it stopped)
date_start = '2022-06-01' #2014-04-01
date_end = '2025-10-31' #2025-10-31

Run Sentinel-2 Top of Atmosphere (TOA): available 2016 onwards

In [None]:
dataset = "Sentinel-2_TOA"
utils.run_classification_pipeline(aoi, aoi_area, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, rgi_id, scale=None, verbose=False)

Run Sentinel-2 Surface Reflectance (SR): available 2019 onwards

In [None]:
dataset = "Sentinel-2_SR"
utils.run_classification_pipeline(aoi, aoi_area, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, rgi_id, scale=None, verbose=False)

Run Landsat 8/9 SR: available 2013 onwards

In [None]:
dataset = "Landsat"
utils.run_classification_pipeline(aoi, aoi_area, dem, dataset, date_start, date_end, month_start, month_end, 
                                  min_aoi_coverage, mask_clouds, out_folder, rgi_id, scale=None, verbose=False)

## Run the pipeline for multiple glaciers

First, create a list of glacier IDs for analysis. 

Below is an example selection of glaciers, where the full RGI v. 7 collection is filtered by RGI O1 region ("o1region") and area ("area_km2"). For the full list of properties available for filtering, see the [RGI v. 7 documentation](https://www.glims.org/rgi_user_guide/products/glacier_product.html#full-list-of-attributes) or run the following command: 

`rgi.first().propertyNames().getInfo()`

In [3]:
# OPTION 1) Load the RGI v. 7.0 dataset and filter by size
rgi = ee.FeatureCollection("projects/ee-raineyaberle/assets/glacier-snow-cover-mapping/RGI2000-v7-G")

# Apply filters
rgi_filt = (rgi
            .filter(ee.Filter.eq('o1region', '01')) # Alaska
            .filter(ee.Filter.gte('area_km2', 200)) # area > 10 km2
            .filter(ee.Filter.lte('area_km2', 1000)) # area < 200 km2
            )
# # rgi_filt = (rgi
# #             .filter(ee.Filter.eq('o2region', '01-02')) # Ex. 01-05 = Wrangell-St. Elias
# #             .filter(ee.Filter.gte('area_km2', 10)) # area > 10 km2
# #             .filter(ee.Filter.lte('area_km2', 200)) # area < 200 km2
# #             )

# Get the list of RGI IDs
id_list = rgi_filt.aggregate_array('rgi_id')
id_list = id_list.getInfo()
print('Number of glaciers selected:', len(id_list))
# print(id_list)

Number of glaciers selected: 62


In [None]:
# OPTION 2) List-out RGI IDs of interest
rgi = ee.FeatureCollection("projects/ee-raineyaberle/assets/glacier-snow-cover-mapping/RGI2000-v7-G")

id_list = ['RGI2000-v7.0-G-01-17977','RGI2000-v7.0-G-01-17992','RGI2000-v7.0-G-01-18097','RGI2000-v7.0-G-01-18306',
           'RGI2000-v7.0-G-01-18458','RGI2000-v7.0-G-01-21229','RGI2000-v7.0-G-01-21280','RGI2000-v7.0-G-01-23000','RGI2000-v7.0-G-01-23183',
           'RGI2000-v7.0-G-01-23206','RGI2000-v7.0-G-01-24876','RGI2000-v7.0-G-01-26047','RGI2000-v7.0-G-01-26477','RGI2000-v7.0-G-01-26478',
           'RGI2000-v7.0-G-01-26746','RGI2000-v7.0-G-01-26781','RGI2000-v7.0-G-01-26785','RGI2000-v7.0-G-01-26798','RGI2000-v7.0-G-01-26800',
           'RGI2000-v7.0-G-01-26806','RGI2000-v7.0-G-01-26821','RGI2000-v7.0-G-01-26886','RGI2000-v7.0-G-01-26951','RGI2000-v7.0-G-01-26963',
           'RGI2000-v7.0-G-01-26976','RGI2000-v7.0-G-01-26998','RGI2000-v7.0-G-01-27010','RGI2000-v7.0-G-01-27011','RGI2000-v7.0-G-01-27071',
           'RGI2000-v7.0-G-01-27133','RGI2000-v7.0-G-01-27162','RGI2000-v7.0-G-01-27166','RGI2000-v7.0-G-01-27174','RGI2000-v7.0-G-01-27247',
           'RGI2000-v7.0-G-01-27252','RGI2000-v7.0-G-01-27315','RGI2000-v7.0-G-01-27329','RGI2000-v7.0-G-01-27335','RGI2000-v7.0-G-01-27355',
           'RGI2000-v7.0-G-01-27372','RGI2000-v7.0-G-01-27374','RGI2000-v7.0-G-02-10373','RGI2000-v7.0-G-02-10554','RGI2000-v7.0-G-02-10587',
           'RGI2000-v7.0-G-02-10599','RGI2000-v7.0-G-02-10724','RGI2000-v7.0-G-02-10765','RGI2000-v7.0-G-02-10781','RGI2000-v7.0-G-02-10785',
           'RGI2000-v7.0-G-02-10852','RGI2000-v7.0-G-02-10862','RGI2000-v7.0-G-02-10975','RGI2000-v7.0-G-02-10999','RGI2000-v7.0-G-02-11122',
           'RGI2000-v7.0-G-02-11493','RGI2000-v7.0-G-02-11907','RGI2000-v7.0-G-02-11915','RGI2000-v7.0-G-02-12653','RGI2000-v7.0-G-02-12921',
           'RGI2000-v7.0-G-02-12977','RGI2000-v7.0-G-02-12982','RGI2000-v7.0-G-02-13026','RGI2000-v7.0-G-02-13197','RGI2000-v7.0-G-02-13215',
           'RGI2000-v7.0-G-02-13428','RGI2000-v7.0-G-02-13445','RGI2000-v7.0-G-02-13518','RGI2000-v7.0-G-02-13542','RGI2000-v7.0-G-02-13562',
           'RGI2000-v7.0-G-02-15375']
print('Number of glaciers selected:', len(id_list))
print(id_list)

In [4]:
# Iterate over RGI IDs
for i in range(0,len(id_list)): # using "[0:1]" to only run the first site for testing 
    rgi_id = id_list[i]
    print("\nGlacier #", i, '\n---------')
    print("Glacier ID used for output file names:", rgi_id)

    # grab glacier area of interest
    aoi = rgi.filter(ee.Filter.eq('rgi_id', rgi_id))
    aoi = aoi.geometry()
    aoi_area = aoi.area().getInfo() # save area [m^2] for splitting date ranges later
    print(f"Glacier area = {int(aoi_area/1e6)} km2")

    # Query GEE for DEM
    dem = utils.query_gee_for_dem(aoi) 

    # Run pipeline for each dataset
    for dataset in ['Sentinel-2_TOA','Sentinel-2_SR','Landsat']: #
        print(print(f'\nRunning {dataset}...'))
        utils.run_classification_pipeline(aoi, aoi_area, dem, dataset, date_start, date_end, month_start, month_end, 
                                          min_aoi_coverage, mask_clouds, out_folder, rgi_id, scale=None, verbose=False)

    


Glacier # 1 
---------
Glacier ID used for output file names: RGI2000-v7.0-G-01-25753
Glacier area = 445 km2
Querying GEE for DEM
ArcticDEM coverage = 100 %
Using ArcticDEM Mosaic

Running Sentinel-2_TOA...
None
AOI >= 150 km2 — splitting date range by day.
Number of date ranges = 2139
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file naming convention: RGI2000-v7.0-G-01-25753_Sentinel-2_TOA_snow_cover_stats_DATE-START_DATE-END.csv
To monitor export tasks, see your Google Cloud Console or GEE Task Manager: https://code.earthengine.google.com/tasks
Iterating over date ranges...


  0%|                                                               | 0/2139 [00:00<?, ?it/s]

checking queue length


 23%|████████████▎                                        | 499/2139 [03:14<08:16,  3.30it/s]

checking queue length


 47%|████████████████████████▋                            | 998/2139 [06:16<05:46,  3.30it/s]

checking queue length


 70%|████████████████████████████████████▍               | 1497/2139 [09:19<03:22,  3.17it/s]

checking queue length


 93%|████████████████████████████████████████████████▌   | 1996/2139 [12:25<00:41,  3.41it/s]

checking queue length


100%|████████████████████████████████████████████████████| 2139/2139 [13:38<00:00,  2.61it/s]



Running Sentinel-2_SR...
None
AOI >= 150 km2 — splitting date range by day.
Number of date ranges = 1497
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file naming convention: RGI2000-v7.0-G-01-25753_Sentinel-2_SR_snow_cover_stats_DATE-START_DATE-END.csv
To monitor export tasks, see your Google Cloud Console or GEE Task Manager: https://code.earthengine.google.com/tasks
Iterating over date ranges...


  0%|                                                               | 0/1497 [00:00<?, ?it/s]

checking queue length


 33%|█████████████████▋                                   | 499/1497 [03:08<06:02,  2.75it/s]

checking queue length


 67%|███████████████████████████████████▎                 | 998/1497 [23:01<02:25,  3.43it/s]

checking queue length


100%|██████████████████████████████████████████████████| 1497/1497 [1:04:57<00:00,  2.60s/it]



Running Landsat...
None
AOI >= 150 km2 — splitting date range by day.
Number of date ranges = 2567
Exporting snow cover statistics to glacier_snow_cover_exports Google Drive folder with file naming convention: RGI2000-v7.0-G-01-25753_Landsat_snow_cover_stats_DATE-START_DATE-END.csv
To monitor export tasks, see your Google Cloud Console or GEE Task Manager: https://code.earthengine.google.com/tasks
Iterating over date ranges...


  0%|                                                               | 0/2567 [00:00<?, ?it/s]

checking queue length


 19%|█████████▉                                         | 499/2567 [1:09:46<10:24,  3.31it/s]

checking queue length


 39%|███████████████████▊                               | 998/2567 [2:14:20<06:43,  3.89it/s]

checking queue length


 58%|█████████████████████████████▏                    | 1497/2567 [3:07:54<05:24,  3.29it/s]

checking queue length


 78%|██████████████████████████████████████▉           | 1996/2567 [3:50:27<03:09,  3.01it/s]

checking queue length


 97%|████████████████████████████████████████████████▌ | 2495/2567 [4:32:49<00:21,  3.28it/s]

checking queue length


100%|██████████████████████████████████████████████████| 2567/2567 [5:01:29<00:00,  7.05s/it]



Glacier # 2 
---------
Glacier ID used for output file names: RGI2000-v7.0-G-01-20851
Glacier area = 521 km2
Querying GEE for DEM


KeyboardInterrupt: 

In [None]:
# Optional: Check job queue statuses
tasks = ee.batch.Task.list()
running_count = len([x for x in tasks if x.state=='RUNNING'])
ready_count = len([x for x in tasks if x.state=='READY'])
completed_count = len([x for x in tasks if x.state=='COMPLETED'])
failed_count = len([x for x in tasks if x.state=='FAILED'])

print('Total number of tasks in queue =', len(tasks))
print('RUNNING tasks =', running_count)
print('READY tasks =', ready_count)
print('COMPLETED tasks =', completed_count)
print('FAILED tasks =', failed_count)