In [1]:
import datetime as dt 
import pandas as pd
import geopandas as gpd
import earthaccess
from shapely.geometry import MultiPolygon, Polygon, box
from shapely.ops import orient

  from .autonotebook import tqdm as notebook_tqdm


## 1. Searching with a bounding box

The dataset Digital Object Identifier or DOI for the GEDI 4A dataset is needed for searching the files (or granules). For this tutorial, let's use a bounding box of Brazil, which extends north to south from 5.24448639569 N to -33.7683777809 S latitude and east to west from 34.7299934555 E to 73.9872354804 W longitude. We will search and download all the files for July, 2020.

In [2]:
# USA bounding box
bound = (-125.0,  24.5, -66.5,  49.5)  

# time bound
start_date = dt.datetime(2021, 4, 1) # specify your own start date
end_date = dt.datetime(2021, 4, 30)  # specify your end start date

The bounding box and time-bound can be used to search for GEDI L4A files using the `earthaccess` module. We will use [pandas dataframe](https://pandas.pydata.org/) to store the download URLs of each file and their bounding geometries.

In [3]:
granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    bounding_box = bound,
    temporal=(start_date, end_date), # time bound
    doi='10.3334/ORNLDAAC/2056' # GEDI L4A DOI 
)
print(f"Total granules found: {len(granules)}")

Total granules found: 211


Let’s print the details of the first of the files.

In [4]:
granules[0]

In [11]:
# import os
# import requests

# out_dir = "GEDI_data"
# os.makedirs(out_dir, exist_ok=True)

# for g in granules:
#     url  = g["download_url"]
#     name = os.path.basename(url)
#     dest = os.path.join(out_dir, name)

#     print(f"Downloading {name}…")
#     r = requests.get(url, stream=True)
#     r.raise_for_status()
#     with open(dest, "wb") as f:
#         for chunk in r.iter_content(1024*1024):
#             f.write(chunk)



As we see above, the granules are hosted in the NASA Earthdata Cloud.

The `granules` object contains metadata about the granules, including the bounding geometry, publication dates, data providers, etc. Now, let’s convert the above granule metadata from json-formatted to `geopandas` dataframe. Converting to `geopandas` dataframe will let us generate plots of the granule geometry.

In [5]:
def convert_umm_geometry(gpoly):
    """converts UMM geometry to multipolygons"""
    multipolygons = []
    for gl in gpoly:
        ltln = gl["Boundary"]["Points"]
        points = [(p["Longitude"], p["Latitude"]) for p in ltln]
        multipolygons.append(Polygon(points))
    return MultiPolygon(multipolygons)

def convert_list_gdf(datag):
    """converts List[] to geopandas dataframe"""
    # create pandas dataframe from json
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])
    # keep only last string of the column names
    df.columns=df.columns.str.split('.').str[-1]
    # convert polygons to multipolygonal geometry
    df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)
    # return geopandas dataframe
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")

# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]

Now, we have stored the granule URLs and their bounding geometries into the geopandas dataframe `gdf`. The first few rows of the table look like the following.

In [6]:
gdf.head()

Unnamed: 0,GranuleUR,size,geometry
0,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20210902318...,151.789185,"MULTIPOLYGON (((-148.42497 36.22493, -144.8829..."
1,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20210902318...,171.734807,"MULTIPOLYGON (((-96.69016 51.75853, -90.71111 ..."
2,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20210910051...,146.99061,"MULTIPOLYGON (((-120.3181 51.75826, -114.33578..."
3,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20210910224...,368.690987,"MULTIPOLYGON (((-143.85097 51.75735, -137.8719..."
4,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20210910357...,289.095427,"MULTIPOLYGON (((-152.10918 50.69925, -146.5450..."


We can now plot the bounding geometries of the granules (shown with green lines in the figure below). The bounding box (of Brazil) is plotted in red color.

In [7]:
# plotting granule geometry
m = gdf.explore(color='green',  fillcolor='green')

# plotting bounding box of USA
b = list(bound)
gdf_bound = gpd.GeoDataFrame(index=[0], crs='epsg:4326', 
                             geometry=[box(b[0], b[1], b[2], b[3])])
gdf_bound.explore(m = m, color='red', fill=False)

## 2. Searching for a polygonal area of interest

If an area of interest is already defined as a polygon, the polygon file in `geojson`, `shapefile` or `kml` formats can be used to find overlapping GEDI L4A files.

For this tutorial, we will use the boundary of a northern state of Brazil, Amapá, to search for the overlapping GEDI files. The boundary polygon is stored in a geojson file called `amapa.json` (shown in red polygon in the figure below).

In [8]:
poly = gpd.read_file("Field_Boundary.geojson").geometry
poly.explore(color='red',  fill=False)
# poly.explore(color='green',  fill=False)

In this example, we will use `earthaccess` python module to search for all the GEDI L4A overlapping the above polygon.

In [9]:
#my code
import geopandas as gpd

# 1) read
gdf = gpd.read_file("california.geojson")   # or path to your file

# some files contain multiple features; combine them into one geometry
geom = gdf.unary_union

# 2) if MultiPolygon, choose largest polygon
from shapely.geometry import MultiPolygon, Polygon

if isinstance(geom, MultiPolygon):
    mainland = max(geom.geoms, key=lambda p: p.area)   # in degrees (see note)
elif isinstance(geom, Polygon):
    mainland = geom
else:
    raise ValueError("Unexpected geometry type: " + str(type(geom)))

# 3) ensure output is in WGS84 lon/lat (EPSG:4326)
# if your GeoJSON had a different CRS, reproject earlier: gdf = gdf.to_crs("EPSG:4326")

# 4) (optional) save mainland to new geojson
g_out = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[mainland])
g_out.to_file("california_mainland.geojson", driver="GeoJSON")

  geom = gdf.unary_union


In [20]:
#My code 
coords = list(poly.exterior.coords)
print(coords[0], coords[-1])

AttributeError: 'MultiPolygon' object has no attribute 'exterior'

In [19]:
#My code
import geopandas as gpd
from shapely.geometry import Polygon, MultiPolygon

g = gpd.read_file("california.geojson")   # or whatever your file is
poly = g.geometry.unary_union            # merge parts (handles MultiPolygon)

# ensure it's the largest polygon if unary_union yields several parts:
if isinstance(poly, MultiPolygon):
    poly = max(poly, key=lambda p: p.area)

# ensure valid geometry (fix self-intersections)
if not poly.is_valid:
    poly = poly.buffer(0)

# simplify (optional)
poly = poly.simplify(0.01)   # adjust tolerance if this collapses edges

# get exterior coords and ensure ring is closed
coords = list(poly.exterior.coords)     # list of (lon, lat) pairs
if coords[0] != coords[-1]:
    coords.append(coords[0])

# earthaccess expects list of (lon, lat) tuples
polygon = [(float(x), float(y)) for (x,y) in coords]

# now call search_data
granules = earthaccess.search_data(
    count=-1,
    doi="10.3334/ORNLDAAC/2056",
    polygon=polygon,
    temporal=(start_date, end_date),
)

  poly = g.geometry.unary_union            # merge parts (handles MultiPolygon)


TypeError: 'MultiPolygon' object is not iterable

In [10]:
# bounding lon, lat as a list of tuples
poly = poly.apply(orient, args=(1,))
# simplifying the polygon to bypass the coordinates 
# limit of the CMR with a tolerance of .01 degrees
xy = poly.simplify(0.01).get_coordinates()

granules = earthaccess.search_data(
    count=-1, # needed to retrieve all granules
    doi="10.3334/ORNLDAAC/2056", # GEDI L4A DOI 
    polygon=list(zip(xy.x, xy.y))
)
print(f"Total granules found: {len(granules)}")

Total granules found: 29


Now, let’s convert the above granule metadata from json-formatted to geopandas dataframe. Converting to geopandas dataframe will let us generate plots of the granule geometry.

In [11]:
# only keep three columns
gdf = convert_list_gdf(granules)[['GranuleUR', 'size', 'geometry']]

We have stored the granule bounding geometries into the geopandas dataframe `gdf`. The first few rows of the `gdf` dataframe look like the following.

In [12]:
gdf.head()

Unnamed: 0,GranuleUR,size,geometry
0,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191072247...,289.630746,"MULTIPOLYGON (((-151.16322 50.32592, -145.6598..."
1,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20191671651...,283.851782,"MULTIPOLYGON (((-159.82286 0.09906, -157.57457..."
2,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20192560545...,297.366525,"MULTIPOLYGON (((-159.97528 0.17433, -157.74376..."
3,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20192862347...,307.557347,"MULTIPOLYGON (((-168.82906 51.77548, -162.8391..."
4,GEDI_L4A_AGB_Density_V2_1.GEDI04_A_20201691534...,75.301803,"MULTIPOLYGON (((-139.75212 25.53906, -136.9344..."


We can now plot the bounding geometries of the granules (shown with green lines in the figure below) using geopandas. The Amapá state is plotted in red color.

In [13]:
# plotting granule geometry
m = gdf.explore(color='green',  fillcolor='green')
poly.explore(m=m, color='red',  fill=False)

## 3. Downloading the files

We recommend using `earthaccess` to download GEDI data granules from the NASA Earthdata. You will first need to authenticate your [Earthdata Login (EDL)](https://urs.earthdata.nasa.gov/) information using the `earthaccess` python library as follows:

In [14]:
import earthaccess

# This will pop up a prompt in your terminal / notebook for your Earthdata username & password,
# then save them into ~/.netrc so future calls can go non-interactive.
auth = earthaccess.login(
    strategy="interactive",
    persist=True   # <-- write your creds to ~/.netrc
)

if not auth.authenticated:
    raise RuntimeError("Login failed; check your Earthdata credentials")


In [15]:
# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc") 
if not auth.authenticated:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

The following will download the first two files. If you want to download all the granules (846 total), please uncomment the third line below.

In [16]:
# downloaded_files = earthaccess.download(granules[:2], local_path="test_download")
# downloaded_files = earthaccess.download(granules[:20], local_path="california")
# download all files 
downloaded_files = earthaccess.download(granules, local_path="Field_Boundary")

QUEUEING TASKS | : 100%|███████████████████████████████| 29/29 [00:00<00:00, 4418.75it/s]
PROCESSING TASKS | : 100%|███████████████████████████████| 29/29 [13:23<00:00, 27.72s/it]
COLLECTING RESULTS | : 100%|█████████████████████████████████████| 29/29 [00:00<?, ?it/s]


In [40]:
downloaded_files

['california\\GEDI04_A_2019107224731_O01958_03_T02638_02_002_02_V002.h5',
 'california\\GEDI04_A_2019108154705_O01969_02_T03766_02_002_02_V002.h5',
 'california\\GEDI04_A_2019110154024_O02000_02_T04761_02_002_02_V002.h5',
 'california\\GEDI04_A_2019110215109_O02004_03_T03189_02_002_02_V002.h5',
 'california\\GEDI04_A_2019111145043_O02015_02_T00201_02_002_02_V002.h5',
 'california\\GEDI04_A_2019111210128_O02019_03_T04474_02_002_02_V002.h5',
 'california\\GEDI04_A_2019112140102_O02030_02_T01486_02_002_02_V002.h5',
 'california\\GEDI04_A_2019113131121_O02045_02_T01195_02_002_02_V002.h5',
 'california\\GEDI04_A_2019114135421_O02061_02_T00905_02_002_02_V002.h5',
 'california\\GEDI04_A_2019114200506_O02065_03_T05178_02_002_02_V002.h5',
 'california\\GEDI04_A_2019115130439_O02076_02_T02190_02_002_02_V002.h5',
 'california\\GEDI04_A_2019116121457_O02091_02_T00476_02_002_02_V002.h5',
 'california\\GEDI04_A_2019117125756_O02107_02_T03032_02_002_02_V002.h5',
 'california\\GEDI04_A_2019117190841_O

## References
```{bibliography}
:style: plain
:filter: docname in docnames
```