![image](https://raw.githubusercontent.com/GSFC-618/618-tutorials/refs/heads/main/images/618_tutorials_banner_logos.png "618 banner")

# TO DO
- [ ] desription of the data (resolutions, sampling etc)
- [ ] Download GLiht HSI and lidar
- [ ] visualize both dataset
- [ ] extract footprint info
- [ ] calculate metrics (scene/band stats, NDVI)
- [ ] analysis
    - [ ] descriptions (what can be calculated, dimension reduction, classification, veg health/surface indices,
    - [ ] visualize pixel-wise relationships (e.g., canopy height ~ NDVI)
    - [ ] 
- [ ] rm data


Optionals:
- [ ] add demo POIs (25m circles)
    - [ ] extract metrics and relate to
- [ ] add Landsat or Sentinel data to relate to medium-res EO data


# WIP - GLiHT sandbox 
---
**Summary**
This tutorial will guide you through using lidar data from the GLiHT (Goddard LIdar, Hyperspectral, and Thermal) airborne imaging dataset including common geospatial considerations, how to co-locate the dataset with Earth observation imagery, and how to analyze the datasets.


**Learning Objectives**
1. access and download GLiHT lidar and hyperspectral data,
2. understand the dataset features and structure,
3. explore the datasets through analyses.

**Requirements**
- TBD

#### Contact Info**

**Author:** Colin Quinn (Github: [CQuinn8](https://github.com/CQuinn8))  
**Last Update:** 2025-11-10 (Created: 2025-11-10)  
**Website:** <https://github.com/GSFC-618/618-tutorials>


## Environment Setup 
**Environment:** conda env create -f global-airborne-core-20251107.yaml

In [1]:
# general env libs
import os
import sys
from io import BytesIO
import tarfile
import gzip
import shutil



# libs that aid downloading https data
import requests
from pathlib import Path
from tqdm import tqdm

# geospatial libraries
import rasterio
import rasterio.plot

In [2]:
# Provide any custom user arguments such as whether to save outputs (csvs, plots, processed datasets)
save_outputs=True
output_directory="~"

## Authentication 
None needed. As of writing this notebook, datasets are downloaded from the open-access [GLiHT archive](https://glihtdata.gsfc.nasa.gov).

## Import data


### Derived lidar products
Canopy height model (CHM)
Lidar-derived maximum canopy height (m AGL) and canopy rugosity (i.e., standard deviation of heights within an area equivalent to a 1/24 ac USFS-FIA subplot). Available as Google Earth overlay (KML) and raster data product (GeoTIFF) at a nominal 1 m spatial resolution.
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/Rocky_Point_Jun2016_CHM.tif.gz

Digital Terrain Model (DTM)
Lidar-derived bare earth elevation (m, EGM96 geoid), aspect and slope. Available as Google Earth overlay (KML) and raster data product (GeoTIFF) at a nominal 1 m spatial resolution.
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/Rocky_Point_Jun2016_DTM.tif.gz

[Metrics](https://glihtdata.gsfc.nasa.gov/metrics_readme.pdf)
Common lidar height and density metrics and return statistics (e.g., mean pulse density, returns per pulse). Available as raster data product (GeoTIFF) at a nominal 13 m spatial resolution (area equivalent to a 1/24 ac USFS-FIA subplot). A detailed list of available metrics and related publications is available in the Metrics Readme.
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/metrics/ 

### Hyperspectral imagery (HSI; mosaicked reflectances)
VNIR Image spectrometer data (420 to 920 nm, 4.5 nm sampling interval) and data products are available as orthorectified raster files (ENVI) at a nominal 1 m spatial resolution. At-sensor reflectance data is computed as the ratio between observed upwelling radiance and downwelling hemispheric irradiance, and corrected for differences in cross-track illumination and BRDF using an empirically derived multiplier. At a nominal flying height of 335 m AGL, the at-sensor reflectance is a close approximation of surface reflectance. Available for individual swaths, and mosaicked for mapped areas using observations closest to nadir.
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/hyperspec/Rocky_Point_Jun2016_mosaicked_refl_VIs.tar.gz

### metadata
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/metrics/ https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/metadata/Rocky_Point_Jun2016_metadata.pdf



### las files (not used here)
Individual lidar return data, including 3D coordinates; classified ground returns ("Classification" field); AGL heights ("Point Source ID Text" field, using z scale factor and offsets); and lidar apparent reflectance ("Intensity" field; see accompanying metadata for 2 byte dB range). Available in ASPRS LAS 1.1 format.
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/las/Rocky_Point_Jun2016_c0r0.las.gz 
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/las/Rocky_Point_Jun2016_c1r0.las.gz 
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/las/Rocky_Point_Jun2016_c0r1.las.gz 
- https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/las/Rocky_Point_Jun2016_c1r1.las.gz



In [79]:
# Create a directory to store GLiHT data
download_dir = Path("./data/")
download_dir.mkdir(exist_ok = True, parents=True)

In [89]:
# urls for the CHM, DTM, HSI, and metadata pdf
urls = [
    'https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/Rocky_Point_Jun2016_CHM.tif.gz',
    'https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/lidar/geotiff/Rocky_Point_Jun2016_DTM.tif.gz',
    'https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/hyperspec/Rocky_Point_Jun2016_mosaicked_refl_VIs.tar.gz',
    'https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Rocky_Point_Jun2016/metadata/Rocky_Point_Jun2016_metadata.pdf',
    'https://glihtdata.gsfc.nasa.gov/files/G-LiHT/Brookhaven_Jun2016/hyperspec/Brookhaven_Jun2016_mosaicked_refl_VIs.tar.gz'
]

In [None]:
for url in urls:
    out_file = Path(download_dir / Path(url).name)
    if out_file.exists():
        print(f"File already downloaded/present: {out_file}")
        print("---"*20)
        continue
    else:        
        try:
            with requests.get(url, stream = True, timeout=60) as r:
                r.raise_for_status()
                total = int(r.headers.get('content-length', 0))
                with open(out_file, "wb") as input_file, tqdm(
                    total=total, 
                    unit='B', 
                    unit_divisor=1024,
                    unit_scale=True, 
                    desc=out_file.name
                ) as bar:
                    for chunk in r.iter_content(chunk_size=10*1024*1024): # working on EFS, reduce number of i/o chunks with 10MB chunks
                        input_file.write(chunk)
                        bar.update(len(chunk))
                    print("Download complete")
        except requests.exceptions.RequestException as e:
                    print(f"FAILED download: {url} : {e}")
        print("---"*20)

File already downloaded/present: data/Rocky_Point_Jun2016_CHM.tif.gz
------------------------------------------------------------
File already downloaded/present: data/Rocky_Point_Jun2016_DTM.tif.gz
------------------------------------------------------------
File already downloaded/present: data/Rocky_Point_Jun2016_mosaicked_refl_VIs.tar.gz
------------------------------------------------------------
File already downloaded/present: data/Rocky_Point_Jun2016_metadata.pdf
------------------------------------------------------------


Brookhaven_Jun2016_mosaicked_refl_VIs.tar.gz:  45%|████▍     | 1.58G/3.53G [04:13<08:35, 4.06MB/s]

In [88]:
gliht_archive_files = Path(download_dir).glob("*.gz")
                           
for archive_f in gliht_archive_files:
    print(archive_f)
    if tarfile.is_tarfile(archive_f):
        with tarfile.open(archive_f, 'r:gz') as tar:
            tar.extractall(path=download_dir)
    elif archive_f.suffix == ".gz":
        output_file = archive_f.with_suffix("")  
        with gzip.open(gz_file, 'rb') as f_in:
            with open(output_file, 'wb') as f_out:
                shutil.copyfileobj(f_in, f_out)


data/Rocky_Point_Jun2016_mosaicked_refl_VIs.tar.gz


  tar.extractall(path=download_dir)


data/Rocky_Point_Jun2016_DTM.tif.gz
data/Rocky_Point_Jun2016_CHM.tif.gz


In [None]:
gliht_files = list(data_dir.glob("*.tif$")) 

chm_file = [f for f in gliht_files if "chm" in f.name.lower()]
dtm_file = [f for f in gliht_files if "dtm" in f.name.lower()]
hsi_file  = [f for f in gliht_files if "vis" in f.name.lower()] 

In [None]:
chm = rasterio.open(chm_file)
print(chm.shape)

In [None]:
chm.meta

In [None]:
rasterio.plot.show(chm)

In [None]:
# Load data
def load_nc_data(source:aaastr):
    """Load dataset from netcdf file."""
    nc = xr.open_dataset(source)
    return nc

# Example usage
data = load_nc_data("data/input.nc")

# Visualize data structure
data.info()

## Analyses
Organize your notebook using headers to delineate meaningful sections.

In [None]:
# extending the data to insights on the environment, modeling, etc

## Save or Export results

In [None]:
# save any outputs you may generate, close all datasets if not already