## Front page blurb

Measuring the impact of deforestation is one of the many overarching goals of the [Global Ecosystem Dynamics Investigation](https://gedi.umd.edu/mission/mission-overview/) (GEDI) project. With support of the GEDI's [LIDAR system](https://gedi.umd.edu/mission/technology/) and hitching a ride on the International Space Station, high-quality laser ranging observations collect a 3D structure of the Earths forests for which can be used to estimate the height of forests, the density of vegetation, and providing detail on the Earths carbon cycle. In this blog post, we will  explore the GEDI dataset by leveraging IPFS with the ipfs-stac python library.


# Introduction

A common thread with geospatial data is its visual nature. But visualization isn’t just limited to imagery—any dataset with a geospatial component can be transformed into a visual format, making complex information easier to grasp. One such format that we'll be exploring today is the [Hierarchical Data Format](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) (HDF5), a scalable and flexible format to store a variety of data types and structures, including multidimensional arrays, tables and metadata.

To provide a sense of how much data GEDI collected, each laser on the lidar system fired off 242 times each second, bouncing off the surface with a 25-meter diameter footprint. Around 16 billion laser pulses per year were continuously collected and an estimated 10 billion cloud-free observations were produced during the 24 month mission. The derived data products in total are ~300TB in size.

```
"Existing pan-tropical biomass maps use laser data acquired nearly 15 years ago and were based on less than 5 million laser observations in total. GEDI collects 6 million laser observations every day. So over the tropics, we've already collected about two orders of magnitude more data than what was ‘state-of-the-art' before."

```

 -- Ralph Dubayah, [GEDI Principal Investigator](https://www.nasa.gov/centers-and-facilities/goddard/nasa-forest-structure-mission-releases-first-data/) and professor of geographical sciences at the UMD.


# Retrieving GEDI data from IPFS

We'll be exploring the [GEDI L4A](https://cmr.earthdata.nasa.gov/search/concepts/C2237824918-ORNL_CLOUD.html) collection, a data product that's been processed and converted to footprint estimates of above ground biomass density. Let's dive in and see how we can retrieve this data from IPFS using python!

### Preqrequisites

1. Install the [IPFS desktop app](https://docs.ipfs.tech/install/ipfs-desktop/#install-instructions) or [Kubo CLI client](https://docs.ipfs.tech/install/command-line/) as this will will allow you to start up a IPFS local node on your machine.


2. Python that's version `3.10.x` or higher as to install our dependencies such as [ipfs-stac](https://pypi.org/project/ipfs-stac/) and [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook).

To get started, save the [libraries](ADD LINK) we'll be using as a text file named `requirements.txt`. I also recommend creating a virtual environment by running the following commands in the terminal.

``` bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

Next, start up Jupyter Notebook session by with the following command.

```bash
jupyter notebook
```

and create a new notebook by clicking on the `New` button and selecting `Python 3`.

### Importing the libraries


In [None]:
import h5py
import geopandas as gpd
from shapely.geometry import MultiPolygon, Polygon
from ipfs_stac import client
import json
import folium


### Creating our client object

The `ipfs-stac` library is how we'll be interacting the [Easier STAC API](https://stac.easierdata.info) and communicating with the IPFS network via the [Kubo RPC API](https://docs.ipfs.tech/reference/kubo/rpc/). Below are the properties that we can pass in along with the default assigned values to establish this connection.

A feature that's been added to the `ipfs-stac` library is the ability to start the daemon if it's not already running. We perform this check as the `client` object is initialized.

Let's see what collections are available to us by via our client object.


In [None]:
easier_client = client.Web3(
    local_gateway="127.0.0.1",
    gateway_port="8081",
    api_port="5001",
    stac_endpoint="https://stac.easierdata.info",
)

easier_client.collections


Just like the [Landsat 9 imagery](https://easierdata.org/notebooks/ndvi_stac_ipfs#How-did-we-set-up-the-STAC-API?), we've also prepared a sample set of GEDI data, commonly referred to as **"granules"**.

Let's see how many granules are in the collection id  `GEDI_L4A_AGB_Density_V2_1_2056.v2.1`.


In [None]:
collection_id = "GEDI_L4A_AGB_Density_V2_1_2056.v2.1"
items = easier_client.searchSTAC(collections=[collection_id])
print(f"The {collection_id} collection has {len(items)} items")


### Displaying the results

Let's grab a geojson from IPFS and display it in a map

`harvard.json
 bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4
 `

In [None]:
geojson_cid = "bafkreib5tmwa7qb2qnm2zqgklsnesjlt4w7uxwbqvbqz7se54t7kxceuu4"
geojson_result = easier_client.getFromCID(geojson_cid)

# Convert to json object as to be able to read it with geopandas
geojson = json.loads(geojson_result)
geojson_layer = gpd.GeoDataFrame.from_features(geojson["features"], crs="epsg:4326")

# Create the map centered on the bounds of the first GeoJSON layer
m = folium.Map(scrollWheelZoom=False)
bounds = geojson_layer.total_bounds
m.fit_bounds([[bounds[1], bounds[0]], [bounds[3], bounds[2]]])

# Add the first layer manually using GeoJson
harvard_forests = folium.GeoJson(
    data=geojson,
    name="Geojson from IPFS",
    style_function=lambda feature: {
        "fillColor": "blue",
        "color": "blue",
        "weight": 2,
        "fillOpacity": 0.5,
    },
)

harvard_forests.add_to(m)

m


next, we'll query the granules that intersect the geojson that was retrieved from IPFS


In [None]:
# Query content from STAC server and process it
granules = easier_client.searchSTAC(
    intersects=geojson["features"][0]["geometry"], collections=[collection_id]
)

print(f"Found {len(granules)} granules that intersect the geojson")
granules_dict = [granule.to_dict() for granule in granules]
granules_geojson = {"type": "FeatureCollection", "features": granules_dict}

# Add as a layer to the map
granules_layer = folium.GeoJson(
    data=granules_geojson,
    name="Granules",
    style_function=lambda feature: {
        "fillColor": "red",  # Orange fill color
        "color": "black",  # Orange border color
        "weight": 5,  # Border width
        "opacity": 0.1,  # Border opacity
        "fillOpacity": 0.01,  # Fill opacity
    },
)

granules_layer.add_to(m)

folium.LayerControl(collapsed=False).add_to(m)

m
