# GEDI Data Access 

Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (DevSeed), Henry Rodman (DevSeed), Zac Deziel (DevSeed)

Date: April 4, 2025

Description: In this notebook, users will learn how to search for GEDI data using `maap-py`, download it, and then open it using `rhdf5`.

## Run This Notebook

To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an "R/Python" workspace.

## Additional Resources
- [rhdf5](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)
  - The `rhdf5` package page, with installation instructions, documentation, and more.
 
- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) 
  - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.

- [GEDI02_A v2 Dataset Landing Page](https://lpdaac.usgs.gov/products/gedi02_av002/)
  - Learn more about NASA's GEDI L2A dataset, which is accessed in this notebook.


## Install and Load Required Libraries
Let’s install and load the packages necessary for this tutorial.

In [16]:
library("rhdf5") # to read HDF5 files 
library("reticulate") # to use maap-py python

Let's also invoke the `MAAP` constructor. This will allow us to use the python-based `maap-py` library from R, which will be used to get credentials and conduct a NASA CMR search.

In [17]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()

## Collection and Granule Search

Using `maap-py`, we can conduct a collection and granule search for data within NASA's CMR. For this example, we'll use data available within the GEDI L2A collection. For more information on CMR searching in R, see ["Searching for Data in NASA's CMR in R"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). 

In [18]:
# search for a GEDI collection
gedi_collections <- maap$searchCollection(
    short_name = "GEDI02_A",
    version = "002",
    cmr_host = "cmr.earthdata.nasa.gov",
    cloud_hosted = "true"
)

# get collection ID for granule search
collection_concept_id <- gedi_collections[[1]][["concept-id"]]
cat("Collection Concept ID:", collection_concept_id, "\n")

# search for the first granules
gedi_granules <- maap$searchGranule(
    collection_concept_id = collection_concept_id,
    limit = as.integer(10),
    cmr_host = "cmr.earthdata.nasa.gov"
)

granule_names <- sapply(gedi_granules, function(names) names[["Granule"]][["GranuleUR"]])
cat("Granules:\n")
print(granule_names)

Collection Concept ID: C2142771958-LPCLOUD 
Granules:
 [1] "GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002"
 [2] "GEDI02_A_2019108002012_O01959_03_T03909_02_003_01_V002"
 [3] "GEDI02_A_2019108002012_O01959_04_T03909_02_003_01_V002"
 [4] "GEDI02_A_2019108002012_O01959_02_T03909_02_003_01_V002"
 [5] "GEDI02_A_2019108015253_O01960_01_T03910_02_003_01_V002"
 [6] "GEDI02_A_2019108015253_O01960_03_T03910_02_003_01_V002"
 [7] "GEDI02_A_2019108015253_O01960_02_T03910_02_003_01_V002"
 [8] "GEDI02_A_2019108015253_O01960_04_T03910_02_003_01_V002"
 [9] "GEDI02_A_2019108032535_O01961_01_T03911_02_003_01_V002"
[10] "GEDI02_A_2019108032535_O01961_03_T03911_02_003_01_V002"


Let's get the S3 URL from the first granule from our granule search.

In [19]:
s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(s3_link)

[1] "s3://lp-prod-protected/GEDI02_A.002/GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002/GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002.h5"


## Get Credentials

Since we will be downloading the GEDI data, we will need temporary credentials for NASA LPDAAC.

In [20]:
credentials <- maap$aws$earthdata_s3_credentials(
    "https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials"
)

s3 <- paws::s3(
    credentials = list(
        creds = list(
          access_key_id = credentials["accessKeyId"],
          secret_access_key = credentials["secretAccessKey"],
          session_token = credentials["sessionToken"]
          )),
        region = "us-west-2")

## Download File

Before downloading, lets do some prepping. First we'll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.

In [21]:
# create directory
download_dir = file.path(getwd(), "data")
dir.create(download_dir, showWarnings = FALSE, recursive = TRUE)

In [22]:
# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name

# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
download_file <- file.path(download_dir, filename)

# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item

Now we can download our file.

In [23]:
s3$download_file(Bucket = bucket, Key = key, Filename = download_file)

## Access Data

Now that we have our downloaded data, we can use `rhdf5` to open our file for exploration.

In [24]:
gedi_data <- h5ls(download_file)
gedi_data

Unnamed: 0_level_0,group,name,otype,dclass,dim
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
0,/,BEAM0000,H5I_GROUP,,
1,/BEAM0000,ancillary,H5I_GROUP,,
2,/BEAM0000/ancillary,l2a_alg_count,H5I_DATASET,INTEGER,1
3,/BEAM0000,beam,H5I_DATASET,INTEGER,7279
4,/BEAM0000,channel,H5I_DATASET,INTEGER,7279
5,/BEAM0000,degrade_flag,H5I_DATASET,INTEGER,7279
6,/BEAM0000,delta_time,H5I_DATASET,FLOAT,7279
7,/BEAM0000,digital_elevation_model,H5I_DATASET,FLOAT,7279
8,/BEAM0000,digital_elevation_model_srtm,H5I_DATASET,FLOAT,7279
9,/BEAM0000,elev_highestreturn,H5I_DATASET,FLOAT,7279


We can extract the different beams associated with GEDI L2A.

In [25]:
beams <- paste0("/", gedi_data[grep("^BEAM", gedi_data$name),]$name)
beams

Now that we have a list of beams, we can see what data is held within each beam. Let's create a dataframe with all variables associated with `/BEAM0001` and their dimensions (how many rows of data are available within each variable).

In [26]:
beam_variables <- gedi_data[gedi_data$group == beams[2],]

cat("Available variables for /BEAM0001 and their dimensions:\n")
print(beam_variables[, c("name", "dim")])

Available variables for /BEAM0001 and their dimensions:
                             name        dim
561                     ancillary           
563                          beam       6383
564                       channel       6383
565                  degrade_flag       6383
566                    delta_time       6383
567       digital_elevation_model       6383
568  digital_elevation_model_srtm       6383
569            elev_highestreturn       6383
570               elev_lowestmode       6383
571           elevation_bias_flag       6383
572          elevation_bin0_error       6383
573                  energy_total       6383
574                   geolocation           
681               land_cover_data           
696             lat_highestreturn       6383
697                lat_lowestmode       6383
698           latitude_bin0_error       6383
699             lon_highestreturn       6383
700                lon_lowestmode       6383
701          longitude_bin0_error       6383

Let's read some of the data associated with specific variables, and load them into a dataframe.

In [27]:
# read first 20 values for latitudes and longitudes
lats <- h5read(download_file, paste(beams[2], "lat_lowestmode", sep = "/"))[1:20]
lons <- h5read(download_file, paste(beams[2], "lon_lowestmode", sep = "/"))[1:20]

# create dataframe
lats_lons <- data.frame(lats = lats, lons = lons)
lats_lons

lats,lons
<dbl>,<dbl>
-2.58617,81.74209
-2.585748,81.74239
-2.585327,81.74269
-2.584905,81.74298
-2.584483,81.74328
-2.584061,81.74358
-2.583639,81.74388
-2.583217,81.74417
-2.582795,81.74447
-2.582373,81.74477
