<img width=50 src="https://carbonplan-assets.s3.amazonaws.com/monogram/dark-small.png" style="margin-left:0px;margin-top:20px"/>

# Sample notebook to explore lidar dataset, lidar processing code, and lidar derived biomass

Authors: Cindy Chiao, Oriana Chegwidden and Joe Hamman

To run this notebook locally and produce the loveliest-possible figures, you'll want to have the
CarbonPlan styles installed.


In [None]:
import fsspec 
import xarray as xr
import pandas as pd
from carbonplan_trace.v0.data import cat, cat_file

import carbonplan_trace.v1.utils as utils
from carbonplan_trace.v1.glas_preprocess import preprocess
from carbonplan_trace.v1.glas_height_metrics import plot_shot
from carbonplan_trace.v1.glas_allometric_eq import apply_allometric_equation

from carbonplan_styles.colors import colors
from carbonplan_styles.mpl import set_theme
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

theme = "dark"
set_theme(style=f"carbonplan_{theme}")
c = colors(theme)

# Datasets available and associated code 

There are three ICESat GLAS derived LiDAR datasets available on our server at varying stages of processsing. The raw data was available for download from the National Snow and Ice Data Center (NSIDC), including the Level-1A altimetry data (GLAH01) version 33 and the Level-2 L2 Global Land Surface Altimetry Data (GLAH14) version 34 in HDF5 format.

## 1. Extracted LiDAR data

After download, relevant data in HDF5 format were extracted to a zarr format using [extract_GLAH01_data](https://github.com/carbonplan/trace/blob/3797376ef85bdc492b40811d71d5e9ec7ed75fbc/carbonplan_trace/v1/glas_extract.py#L42) and [extract_GLAH14_data](https://github.com/carbonplan/trace/blob/3797376ef85bdc492b40811d71d5e9ec7ed75fbc/carbonplan_trace/v1/glas_extract.py#L121) functions. This extracted data is avaiable to the public. 

In [None]:
# note, this step reads all LiDAR available and takes a few mins 
data01 = utils.open_glah01_data()
data14 = utils.open_glah14_data()

In [None]:
# subset data to a bounding box of interest 
min_lat = 40 
max_lat = 42
min_lon = -124
max_lon = -122

sub14 = utils.subset_data_for_bounding_box(data14, min_lat, max_lat, min_lon, max_lon)
sub01 = data01.where(data01.record_index.isin(sub14.record_index), drop=True)
combined = sub14.merge(sub01, join="inner")

The LiDAR data is uniquely indexed by `record_index` and `shot_number`. The available variables extracted are shown below. 

In [None]:
combined

## 2. Pre-processed LiDAR data 

After extraction, the LiDAR data is then preprocessed to calculate several derived variables, generate smoothed waveforms, and filtered out the records that did not fit our QA criteria. Preprocess can be done by calling the [preprocess](https://github.com/carbonplan/trace/blob/3797376ef85bdc492b40811d71d5e9ec7ed75fbc/carbonplan_trace/v1/glas_preprocess.py#L230) function on the combined data. The preprocessed data is also available to the public. 

In [None]:
# doing preprocess on the combined data, this takes a few mins to run 

# preprocessed = preprocess(combined, min_lat, max_lat, min_lon, max_lon)
# preprocessed

In [None]:
# reading preprocessed data from s3 

lat_tag = '50N'
lon_tag = '130W'

mapper = fsspec.get_mapper(f"s3://carbonplan-climatetrace/v1/preprocessed_lidar/{lat_tag}_{lon_tag}.zarr")
preprocessed = (
    xr.open_zarr(mapper)
    .stack(unique_index=("record_index", "shot_number"))
    .dropna(dim="unique_index", subset=["lat"])
)
# filtering of null values stored as the maximum number for the datatype
preprocessed = preprocessed.where(
    (preprocessed.rec_wf < 1e35).all(dim="rec_bin"), drop=True
)
preprocessed

In [None]:
# plotting an example lidar waveform 
plot_shot(preprocessed.isel(unique_index=0))

## 3. LiDAR derived biomass 

Once the LiDAR data is preprocessed, we can apply the appropriate allometric equation to the LiDAR data to obtain estimated biomass values. This can be accomplished by calling the [apply_allometric_equation](https://github.com/carbonplan/trace/blob/3797376ef85bdc492b40811d71d5e9ec7ed75fbc/carbonplan_trace/v1/glas_allometric_eq.py#L944) function, which includes a few steps: 
1. we first add ancilliary data, which includes the ecoregion, land cover types, tree cover %, and whether the area is burned, 
2. then, we assign an appropriate allometric equation to each LiDAR record based on the ancilliary data above, 
3. based on the assignment, we calculate the LiDAR derived height metric used in the allometric equation needed, 
4. calculate biomass value by applying the allometric equation,
5. finally, biomass value is post processed where biomass for certain records is set to 0 or filtered out based on time of the year. 

The derived biomass values, ancilliary data, and a few height metrics are also available to the public. 

In [None]:
# applying allometric equations on preprocessed lidar data, this also takes a few mins to run 

sub = preprocessed.isel(unique_index=slice(0, 100))

with_biomass = apply_allometric_equation(
    sub, min_lat, max_lat, min_lon, max_lon
)

In [None]:
mapper = fsspec.get_mapper(f"s3://carbonplan-climatetrace/v1/biomass/{lat_tag}_{lon_tag}.zarr")
biomass = (
    xr.open_zarr(mapper)
    .stack(unique_index=("record_index", "shot_number"))
    .dropna(dim="unique_index", subset=["lat"])
)

In [None]:
# visualize biomass data 
plt.figure(figsize=(10, 6))
s = biomass.where(biomass.biomass > 0.0, drop=True)
p = s.plot.scatter(
    x="lon", y="lat", hue="biomass", 
    robust=True,
)