# Download and process GLIMS glaciers

GLIMS is a multi-temporal worldwide glacier dataset. Here, we download the most recent GLIMS file from NASA Earth Data and filter it to the USA for use in Hydrofabric.

GLIMS glaciers are stored on NASA's earth data server. You will need an earthdata login to download. This notebook will programmatically download the most recent GLIMS file.

https://daacdata.apps.nsidc.org/pub/DATASETS/nsidc0272_GLIMS_v1/ 




In [None]:
import os
import re
import shlex
import subprocess
from datetime import datetime
from pathlib import Path
from zipfile import ZipFile

import boto3
import geopandas as gpd
from dateutil import parser
from dotenv import load_dotenv

In [None]:
load_dotenv(dotenv_path=Path("..") / ".env")

In [None]:
data_path = Path("../data/glims")
usa_file = data_path / "tl_2024_us_state.gpkg"  # You can download this from s3://edfs-data/boundaries
out_parquet_file = data_path / "glims_us_20250624.parquet"
out_usa_buffer = data_path / "temp_usa_buffer.gpkg"

## Download

Follow instructions to store your NASA earthdata login and ID in `~/.netc`

`echo 'machine urs.earthdata.nasa.gov login <login> password <password>' >> ~/.netrc`

https://nsidc.org/data/user-resources/help-center/programmatic-access-guide-data-daacdataapps

Sample wget:
```
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies --no-check-certificate --auth-no-challenge=on -r --reject "index.html*" -np -e robots=off https://daacdata.apps.nsidc.org/pub/DATASETS/nsidc0272_GLIMS_v1/NSIDC-0272_glims_db_north_20230607_v01.0.zip
```

In [None]:
# Use spider to get a list of available files on the server
# if you're having trouble, try downloading an arbirtary file with the sample wget before to load cookies
!wget --spider -r --no-parent --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies --no-check-certificate --auth-no-challenge=on https://daacdata.apps.nsidc.org/pub/DATASETS/nsidc0272_GLIMS_v1/ 2>&1 | grep -o 'https://[^ ]*' > output.txt

In [None]:
# open list of files retrieved
with open("output.txt") as f:
    file_list = f.readlines()

# get the most recent date
date_list = []
dt = datetime(1900, 1, 1)  # starting date - old

for _i, f in enumerate(file_list):
    # find date
    dt_txt = re.findall(r"\d{4}\d{2}\d{2}", f)
    if dt_txt:
        this_dt = parser.parse(dt_txt[0])
        date_list.append([this_dt, f])
        # if date is greater, save it as greatest
        if this_dt > dt:
            dt = this_dt

# get files that have the latest date where date_list is [[date, file], [date, file]]
# we want the north .zip (no .md5)
to_download = []
for i, f in enumerate(date_list):
    if dt == f[0]:
        if ("north" in f[1]) and (".md5" not in f[1]):
            to_download.append(date_list[i][1])

final_download = to_download[0].strip("\n")  # newline was incorporated in txt
print(final_download)

In [None]:
# run the download
subprocess.run(
    shlex.split(
        f"wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies --no-check-certificate --auth-no-challenge=on -r --reject 'index.html*' -np -e robots=off -nd {final_download}"
    )
)

In [None]:
# unzip the download - it downloads to the current directory
with ZipFile(final_download.split("/")[-1]) as z:
    z.extractall(data_path)

In [None]:
# the directory has a random number of the download in it, so search for it, (ex. glims_download_35992)
dirs = []
for f in data_path.iterdir():
    dirs.append(f)
glims_poly = dirs[0] / "glims_polygons.shp"

## Process GLIMS

GLIMS polygons have broken geometries that need to be repaired. 

To lower file size and speed up later operations with hydrofabric, we'll extract out a buffered USA.

To buffer the USA, download the tiger line file from test account s3: 

`s3://edfs-data/boundaries/tl_2024_us_state.gpkg`

Store it in the working data folder for local use

In [None]:
gdf = gpd.read_file(glims_poly)

In [None]:
# repair geometry
gdf["geometry"] = gdf["geometry"].make_valid()

# extract existing glaciers
gdf_exists = gdf.loc[gdf["glac_stat"] == "exists", :].copy()
gdf_exists.to_file(data_path / "temp_glims_exists_geom_repair.gpkg")

In [None]:
# read US, exclude unneeded states, dissolve to one polygon, buffer to include canada watersheds
gdf_usa = gpd.read_file(usa_file)
gdf_usa = gdf_usa.loc[~gdf_usa["STUSPS"].isin(["MP", "AS", "GU"]), :]
gdf_usa_dissolve = gdf_usa.dissolve()
gdf_usa_dissolve = gdf_usa_dissolve.to_crs(5070)
gdf_usa_dissolve = gdf_usa_dissolve.buffer(200000)
gdf_usa_dissolve = gpd.GeoDataFrame(geometry=gdf_usa_dissolve, data={"temp": ["buffer"]})
gdf_usa_dissolve.to_file(out_usa_buffer)

In [None]:
# intersect glaciers with USA polygon
# can take a few min
gdf_usa_dissolve = gdf_usa_dissolve.to_crs(gdf_exists.crs)
gdf_int = gdf_exists.overlay(gdf_usa_dissolve, how="intersection")
gdf_int.to_parquet(out_parquet_file)

### Upload to s3

In [None]:
# upload to parquet and zip file to s3 for storage
s3_client = boto3.client(
    "s3",
    aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
    aws_session_token=os.environ["AWS_SESSION_TOKEN"],
)
s3_client.upload_file(out_parquet_file, "edfs-data", "glaciers/glims_20250624.parquet")

zip_file = final_download.split("/")[-1]
s3_client.upload_file(zip_file, "edfs-data", f"glaciers/{zip_file}")