# GEDI Data Access 

Authors: Harshini Girish (UAH), Sheyenne Kirkland (UAH), Alex Mandel (DevSeed), Henry Rodman (DevSeed), Zac Deziel (DevSeed)

Date: March 27, 2025

Description: In this notebook, users will learn how to search for GEDI data using `maap-py`, download it, and then open it using `rhdf5`.

## Run This Notebook

To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: it is highly recommended to run a tutorial within MAAP's ADE, which already includes packages specific to MAAP, such as maap-py. Running the tutorial outside of the MAAP ADE may lead to errors. Users should work within an "R/Python" workspace.

## Additional Resources
- [rhdf5](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)
  - The `rhdf5` package page, with installation instructions, documentation, and more.
 
- [NASA's Operational CMR (MAAP Docs)](https://docs.maap-project.org/en/latest/technical_tutorials/search/catalog.html#nasa-s-operational-cmr) 
  - A section in the MAAP Docs offering an overview of resources to search and access NASA's CMR.

- [GEDI02_A v2 Dataset Landing Page](https://lpdaac.usgs.gov/products/gedi02_av002/)
  - Learn more about NASA's GEDI L2A dataset, which is accessed in this notebook.


## Install and Load Required Libraries
Let’s install and load the packages necessary for this tutorial.

In [1]:
library("rhdf5") # to read HDF5 files 
library("reticulate") # to use maap-py python

Let's also invoke the `MAAP` constructor. This will allow us to use the python-based `maap-py` library from R, which will be used to get credentials and conduct a NASA CMR search.

In [2]:
maap_py <- import("maap.maap")
maap <- maap_py$MAAP()

## Get Credentials

Since we will be downloading the GEDI data, we will need temporary credentials for NASA LPDAAC.

In [3]:
credentials <- maap$aws$earthdata_s3_credentials(
    "https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials"
)

s3 <- paws::s3(
    credentials = list(
        creds = list(
          access_key_id = credentials["accessKeyId"],
          secret_access_key = credentials["secretAccessKey"],
          session_token = credentials["sessionToken"]
          )),
        region = "us-west-2")

## Collection and Granule Search

Using `maap-py`, we can conduct a collection and granule search for data within NASA's CMR. For this example, we'll use data available within the GEDI L2A collection. For more information on CMR searching in R, see ["Searching for Data in NASA's CMR in R"](https://docs.maap-project.org/en/develop/technical_tutorials/working_with_r/cmr_search_in_r.html). 

In [4]:
# search for a GEDI collection
gedi_collections <- maap$searchCollection(
    short_name = "GEDI02_A",
    version = "002",
    cmr_host = "cmr.earthdata.nasa.gov",
    cloud_hosted = "true"
)

# get collection ID for granule search
collection_id <- gedi_collections[[1]]["concept-id"]

# search for the first granules
gedi_granules <- maap$searchGranule(
    concept_id = collection_id,
    limit = as.integer(10),
    cmr_host = "cmr.earthdata.nasa.gov"
)

granule_names <- sapply(gedi_granules, function(names) names["Granule"]["GranuleUR"])
cat("Granules:\n")
print(granule_names)

Granules:
 [1] "GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002"
 [2] "GEDI02_A_2019108002012_O01959_03_T03909_02_003_01_V002"
 [3] "GEDI02_A_2019108002012_O01959_04_T03909_02_003_01_V002"
 [4] "GEDI02_A_2019108002012_O01959_02_T03909_02_003_01_V002"
 [5] "GEDI02_A_2019108015253_O01960_01_T03910_02_003_01_V002"
 [6] "GEDI02_A_2019108015253_O01960_03_T03910_02_003_01_V002"
 [7] "GEDI02_A_2019108015253_O01960_02_T03910_02_003_01_V002"
 [8] "GEDI02_A_2019108015253_O01960_04_T03910_02_003_01_V002"
 [9] "GEDI02_A_2019108032535_O01961_01_T03911_02_003_01_V002"
[10] "GEDI02_A_2019108032535_O01961_03_T03911_02_003_01_V002"


Let's get the S3 URL from the first granule from our granule search.

In [5]:
s3_link <- gedi_granules[[1]]["Granule"]["OnlineAccessURLs"][[1]][1]["URL"]
print(s3_link)

[1] "s3://lp-prod-protected/GEDI02_A.002/GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002/GEDI02_A_2019108002012_O01959_01_T03909_02_003_01_V002.h5"


## Download File

Before downloading, lets do some prepping. First we'll create a directory to download our file to. Then from our S3 link, we can get the bucket, key, and a filename.

In [6]:
# create directory
dir_name = "./data"
if(!dir.exists(dir_name)){dir.create(dir_name)}

In [7]:
# get bucket from file path
s3_parts <- strsplit(sub("s3://","", s3_link), "/", fixed = TRUE)[[1]] # drop the s3 prefix
bucket <- s3_parts[1] # grab the 1st item which is the bucket name
bucket

# create file name for download
filename <- tail(s3_parts, n=1) # grab the last part of the path
file <- paste("./data/", filename)


# get key from file path
key <- paste(tail(s3_parts, n=-1), collapse='/') # grab everything in the path, except the 1st item
key

Now we can download our file.

In [8]:
gedi_file <- s3$download_file(Bucket = bucket, Key = key, Filename = file)

## Access Data

Now that we have our downloaded data, we can use `rhdf5` to open our file for exploration.

In [9]:
gedi_data <- h5ls(file)
gedi_data

Unnamed: 0_level_0,group,name,otype,dclass,dim
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
0,/,BEAM0000,H5I_GROUP,,
1,/BEAM0000,ancillary,H5I_GROUP,,
2,/BEAM0000/ancillary,l2a_alg_count,H5I_DATASET,INTEGER,1
3,/BEAM0000,beam,H5I_DATASET,INTEGER,7279
4,/BEAM0000,channel,H5I_DATASET,INTEGER,7279
5,/BEAM0000,degrade_flag,H5I_DATASET,INTEGER,7279
6,/BEAM0000,delta_time,H5I_DATASET,FLOAT,7279
7,/BEAM0000,digital_elevation_model,H5I_DATASET,FLOAT,7279
8,/BEAM0000,digital_elevation_model_srtm,H5I_DATASET,FLOAT,7279
9,/BEAM0000,elev_highestreturn,H5I_DATASET,FLOAT,7279


We can extract the different beams associated with GEDI L2A.

In [10]:
for (beam in gedi_data$name) {
  if (grepl("BEAM", beam)) {
    print(beam)}
}

[1] "BEAM0000"
[1] "BEAM0001"
[1] "BEAM0010"
[1] "BEAM0011"
[1] "BEAM0101"
[1] "BEAM0110"
[1] "BEAM1000"
[1] "BEAM1011"


Now that we have a list of beams, we can see what data is held within each beam.

In [11]:
# print first 10 rows of dataframe
gedi_data[gedi_data$group == "/BEAM0000",][1:10,]

Unnamed: 0_level_0,group,name,otype,dclass,dim
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
1,/BEAM0000,ancillary,H5I_GROUP,,
3,/BEAM0000,beam,H5I_DATASET,INTEGER,7279.0
4,/BEAM0000,channel,H5I_DATASET,INTEGER,7279.0
5,/BEAM0000,degrade_flag,H5I_DATASET,INTEGER,7279.0
6,/BEAM0000,delta_time,H5I_DATASET,FLOAT,7279.0
7,/BEAM0000,digital_elevation_model,H5I_DATASET,FLOAT,7279.0
8,/BEAM0000,digital_elevation_model_srtm,H5I_DATASET,FLOAT,7279.0
9,/BEAM0000,elev_highestreturn,H5I_DATASET,FLOAT,7279.0
10,/BEAM0000,elev_lowestmode,H5I_DATASET,FLOAT,7279.0
11,/BEAM0000,elevation_bias_flag,H5I_DATASET,INTEGER,7279.0


Let's read the data associated with specific object from our dataframe above.

In [12]:
# read first 20 values
h5read(file, "/BEAM0000/lat_lowestmode")[1:20]