# GEDI Data Access 

Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (Development Seed), Henry Rodman (Development Seed), Zac Deziel (Development Seed)

Date: April 15, 2025

Description: In this notebook, users will learn how to search for GEDI data using `maap-py`, download it, and then open it using `rhdf5`.

## Run This Notebook

To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an "R/Python" workspace.

## Additional Resources
- [rhdf5](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)
  - The `rhdf5` package page, with installation instructions, documentation, and more.
 
- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) 
  - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.

- [GEDI02_A v2 Dataset Landing Page](https://lpdaac.usgs.gov/products/gedi02_av002/)
  - Learn more about NASA's GEDI L2A dataset, which is accessed in this notebook.


## Install and Load Required Libraries
Let’s install and load the packages necessary for this tutorial.

In [64]:
library("rhdf5") # to read HDF5 files 
library("reticulate") # to use maap-py python

Let's also invoke the `MAAP` constructor. This will allow us to use the python-based `maap-py` library from R, which will be used to get credentials and conduct a NASA CMR search.

In [65]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()

## Collection and Granule Search

Using `maap-py`, we can conduct a collection and granule search for data within NASA's CMR. For this example, we'll use data available within the GEDI L2A collection. For more information on CMR searching in R, see ["Searching for Data in NASA's CMR in R"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). 

In [66]:
# search for a GEDI collection
gedi_collections <- maap$searchCollection(
    short_name = "GEDI_L4A_AGB_Density_V2_1_2056",
    version = "2.1",
    cmr_host = "cmr.earthdata.nasa.gov",
    cloud_hosted = "true"
)

# get collection ID for granule search
collection_concept_id <- gedi_collections[[1]][["concept-id"]]
cat("Collection Concept ID:", collection_concept_id, "\n")

# search for the first granules
gedi_granules <- maap$searchGranule(
    collection_concept_id = collection_concept_id,
    limit = as.integer(10),
    cmr_host = "cmr.earthdata.nasa.gov"
)

granule_names <- sapply(gedi_granules, function(names) names[["Granule"]][["GranuleUR"]])
cat("Granules:\n")
print(granule_names)

Collection Concept ID: C2237824918-ORNL_CLOUD 
Granules:
 [1] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"
 [2] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_02_T02638_02_002_02_V002.h5"
 [3] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5"
 [4] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019107224731_O01958_04_T02638_02_002_02_V002.h5"
 [5] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_01_T03909_02_002_02_V002.h5"
 [6] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_02_T03909_02_002_02_V002.h5"
 [7] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_03_T03909_02_002_02_V002.h5"
 [8] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108002012_O01959_04_T03909_02_002_02_V002.h5"
 [9] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_01_T03910_02_002_02_V002.h5"
[10] "GEDI_L4A_AGB_Density_V2_1.GEDI04_A_2019108015253_O01960_02_T03910_02_002_02_V002.h5"


Let's get the S3 URL from the first granule from our granule search.

In [67]:
s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(s3_link)

[1] "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4A_AGB_Density_V2_1/data/GEDI04_A_2019107224731_O01958_01_T02638_02_002_02_V002.h5"


## Get Credentials

Since we will be downloading the GEDI data, we will need temporary credentials for NASA ORNL DAAC.

In [68]:
credentials <- maap$aws$earthdata_s3_credentials(
    "https://data.ornldaac.earthdata.nasa.gov/s3credentials"
)

s3 <- paws::s3(
    credentials = list(
        creds = list(
          access_key_id = credentials["accessKeyId"],
          secret_access_key = credentials["secretAccessKey"],
          session_token = credentials["sessionToken"]
          )),
        region = "us-west-2")

## Download File

Before downloading, lets do some prepping. First we'll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.

In [69]:
# create directory
download_dir = file.path(getwd(), "data")
dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)

In [70]:
# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name

# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
download_file <- file.path(download_dir, filename)

# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item

Now we can download our file.

In [71]:
s3$download_file(Bucket = bucket, Key = key, Filename = download_file)

## Access Data

Now that we have our downloaded data, we can use `rhdf5` to open our file for exploration.

In [84]:
gedi_data <- h5ls(download_file)
head(gedi_data)

Unnamed: 0_level_0,group,name,otype,dclass,dim
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
0,/,ANCILLARY,H5I_GROUP,,
1,/ANCILLARY,model_data,H5I_DATASET,COMPOUND,35.0
2,/ANCILLARY,pft_lut,H5I_DATASET,COMPOUND,7.0
3,/ANCILLARY,region_lut,H5I_DATASET,COMPOUND,7.0
4,/,BEAM0000,H5I_GROUP,,
5,/BEAM0000,agbd,H5I_DATASET,FLOAT,48675.0


We can extract the different beams associated with GEDI L2A.

In [85]:
beams <- paste0("/", gedi_data[grep("^BEAM", gedi_data$name),]$name)
beams

Now that we have a list of beams, we can see what data is held within each beam. Let's create a dataframe with all variables associated with `/BEAM0001` and their dimensions (how many rows of data are available within each variable).

In [86]:
beam_variables <- gedi_data[gedi_data$group == beams[2],]

cat("Available variables for /BEAM0001 and their dimensions:\n")
print(beam_variables[, c("name", "dim")])

Available variables for /BEAM0001 and their dimensions:
                    name       dim
192                 agbd     47789
193        agbd_pi_lower     47789
194        agbd_pi_upper     47789
195      agbd_prediction          
309              agbd_se     47789
310               agbd_t     47789
311            agbd_t_se     47789
312   algorithm_run_flag     47789
313                 beam     47789
314              channel     47789
315         degrade_flag     47789
316           delta_time     47789
317      elev_lowestmode     47789
318          geolocation          
349      l2_quality_flag     47789
350      l4_quality_flag     47789
351      land_cover_data          
363       lat_lowestmode     47789
364       lon_lowestmode     47789
365          master_frac     47789
366           master_int     47789
367      predict_stratum     47789
368 predictor_limit_flag     47789
369  response_limit_flag     47789
370   selected_algorithm     47789
371        selected_mode     47789

Let's read some of the data associated with specific variables, and load them into a dataframe.

In [88]:
# set variables
lats <- h5read(download_file, "/BEAM0001/lat_lowestmode")
lons <- h5read(download_file, "/BEAM0001/lon_lowestmode")
elev <- h5read(download_file, "/BEAM0001/elev_lowestmode")
shot_num <- h5read(download_file, "/BEAM0001/shot_number", bit64conversion='bit64')
agbd <- h5read(download_file, "/BEAM0001/agbd")

# create dataframe
gedi_df <- data.frame(latitude = lats, longitude = lons, elevation = elev, shot_number = shot_num, agbd = agbd)
head(gedi_df[!(gedi_df$agbd %in% "-9999"),]) # drop missing values, load first few rows

Unnamed: 0_level_0,latitude,longitude,elevation,shot_number,agbd
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<int64>,<dbl>
36569,-4.637412,103.8779,3288.7,19580100100036569,398.62744
36580,-4.6328,103.8812,3391.723,19580100100036580,565.04077
36581,-4.632382,103.8815,3412.304,19580100100036581,378.42584
36585,-4.630685,103.8827,3344.158,19580100100036585,265.46426
36586,-4.630273,103.883,3393.292,19580100100036586,323.67648
36588,-4.62943,103.8836,3388.073,19580100100036588,36.59831
