## Run this notebook

You can launch this notebook in the US GHG Center JupyterHub by clicking the link below.

[Launch in the US GHG Center JupyterHub (requires access)](https://hub.ghg.center/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FUS-GHG-Center%2Fghgc-docs&urlpath=lab%2Ftree%2Fghgc-docs%2Fuser_data_notebooks%2Fcasagfed-carbonflux-monthgrid-v3_User_Notebook.ipynb&branch=main)
   

## Approach

1. Identify available dates and temporal frequency of observations for a given collection using the GHGC API `/stac` endpoint. The collection processed in this notebook is the Land-Atmosphere Carbon Flux data product.
2. Pass the STAC item into the raster API `/collections/{collection_id}/items/{item_id}/tilejson.json` endpoint.
3. Using `folium.plugins.DualMap`, visualize two tiles (side-by-side), allowing time point comparison.
4. After the visualization, perform zonal statistics for a given polygon.
   

## About the Data
#### CASA-GFED3

This dataset presents a variety of carbon flux parameters derived from the Carnegie-Ames-Stanford-Approach – Global Fire Emissions Database version 3 (CASA-GFED3) model. The model’s input data includes air temperature, precipitation, incident solar radiation, a soil classification map, and a number of satellite derived products. All model calculations are driven by analyzed meteorological data from NASA’s Modern-Era Retrospective analysis for Research and Application, Version 2 (MERRA-2). The resulting product provides monthly, global data at 0.5 degree resolution from January 2003 through December 2017. It includes the following carbon flux variables expressed in units of kilograms of carbon per square meter per month (kg Carbon m²/mon) from the following sources: net primary production (NPP), net ecosystem exchange (NEE), heterotrophic respiration (Rh), wildfire emissions (FIRE), and fuel wood burning emissions (FUEL). This product and earlier versions of MERRA-driven CASA-GFED carbon fluxes have been used in a number of atmospheric CO₂ transport studies, and through the support of NASA’s Carbon Monitoring System (CMS), it helps characterize, quantify, understand and predict the evolution of global carbon sources and sinks.

## Terminology
Navigating data via the GHGC API, you will encounter terminology that is different from browsing in a typical filesystem. We'll define some terms here which are used throughout this notebook.
- `catalog`:    All datasets available at the `/stac` endpoint
- `collection`: A specific dataset, e.g. CarbonTracker-CH₄ Isotopic Methane Inverse Fluxes
- `item`:       One granule in the dataset, e.g. one monthly file of methane inverse fluxes
- `asset`:      A variable available within the granule, e.g. microbial, fossil, or pyrogenic methane fluxes
- `STAC API`:   **Sp**atio**T**emporal **A**sset **C**atalogs - Endpoint for fetching metadata about available datasets
- `Raster API`: Endpoint for fetching data itself, for imagery and statistics

# Installing the Required Libraries
Required libraries are pre-installed on the GHG Center Hub. If you need to run this notebook elsewhere, please install them with this line in a code cell:

%pip install requests, folium, rasterstats, pystac_client, pandas, matplotlib

In [1]:
# Import the following libraries
# For fetching from the Raster API
import requests
# For making maps
import folium
import folium.plugins
from folium import Map, TileLayer
# For talking to the STAC API
from pystac_client import Client
# For working with data
import pandas as pd
# For making time series
import matplotlib.pyplot as plt
# For formatting date/time data
import datetime
# Custom functions for working with GHGC data via the API
import ghgc_utils

# Query the STAC API
Now, you must fetch the dataset from the [**STAC API**](https://earth.gov/ghgcenter/api/stac/) by defining its associated STAC API collection ID as a variable. 
The collection ID, also known as the **collection name**, for the CASA-GFED Land-Atmosphere Carbon Flux dataset is [**casagfed-carbonflux-monthgrid-v3**](https://earth.gov/ghgcenter/api/stac/collections/casagfed-carbonflux-monthgrid-v3).*

**You can find the collection name of any dataset on the GHGC data portal by navigating to the dataset landing page within the data catalog. The collection name is the last portion of the dataset landing page's URL, and is also listed in the pop-up box after clicking "ACCESS DATA."*

In [2]:
# Provide STAC and RASTER API endpoints
STAC_API_URL = "https://earth.gov/ghgcenter/api/stac"
RASTER_API_URL = "https://earth.gov/ghgcenter/api/raster"

# Name of the collection for CASA GFED Land-Atmosphere Carbon Flux monthly emissions. 
collection_name = "casagfed-carbonflux-monthgrid-v3"

In [3]:
catalog = Client.open(STAC_API_URL)
collection = catalog.get_collection(collection_name)
collection

In [4]:
items = list(collection.get_items())  # Convert the iterator to a list
print(f"Found {len(items)} items")

Found 180 items


In [5]:
search = catalog.search(
    collections=collection_name,
    datetime=['2010-01-01T00:00:00Z','2010-12-31T00:00:00Z']
)
# Take a look at the items we found
for item in search.item_collection():
    print(item)
   

<Item id=casagfed-carbonflux-monthgrid-v3-201012>
<Item id=casagfed-carbonflux-monthgrid-v3-201011>
<Item id=casagfed-carbonflux-monthgrid-v3-201010>
<Item id=casagfed-carbonflux-monthgrid-v3-201009>
<Item id=casagfed-carbonflux-monthgrid-v3-201008>
<Item id=casagfed-carbonflux-monthgrid-v3-201007>
<Item id=casagfed-carbonflux-monthgrid-v3-201006>
<Item id=casagfed-carbonflux-monthgrid-v3-201005>
<Item id=casagfed-carbonflux-monthgrid-v3-201004>
<Item id=casagfed-carbonflux-monthgrid-v3-201003>
<Item id=casagfed-carbonflux-monthgrid-v3-201002>
<Item id=casagfed-carbonflux-monthgrid-v3-201001>


Examining the contents of our `collection` under the `temporal` variable, we see that the data is available from January 2003 to December 2017. By looking at the `dashboard:time density`, we observe that the periodic frequency of these observations is monthly.

In [6]:
# Examine the first item in the collection
items[0]

In [7]:
# Restructure the items into a dictionary where keys are derived from the datetime items; we can then query more easily by date/time, e.g. "2003-07"
items_dict = {item.properties["start_datetime"][:7]: item for item in collection.get_items()}

In [8]:
#  Before we go further, let's pick which asset to focus on for the remainder of the notebook.
# 'npp' = net primary production
asset_name = "npp"

# Creating Maps using Folium
We will explore differences in the land atmosphere Carbon flux Net Primary Productivity between two different dates/times. We'll then visualize the outputs on a map using `folium`. 

## Fetch Imagery Using the Raster API
Here we get information from the Raster API which we will add to our map in the next section.

In [9]:
# Specify which two date/times you would like to visualize, using the format of items_dict.keys()
dates = ["2010-01","2003-01"]

Below, we use some statistics of the raster data to set upper and lower limits for our color bar. These are saved as the `rescale_values`, and will be passed to the Raster API in the following step(s).

In [10]:
# Extract collection name and item ID for the first date
first_date = items_dict[dates[0]]
collection_id = first_date.collection_id
item_id = first_date.id
# Select relevant asset (NPP)
object = first_date.assets[asset_name]
raster_bands = object.extra_fields.get("raster:bands", [{}])
# Print the raster bands' information
raster_bands 

[{'scale': 1.0,
  'offset': 0.0,
  'sampling': 'area',
  'data_type': 'float32',
  'histogram': {'max': 0.23026999831199646,
   'min': 0.0,
   'count': 11.0,
   'buckets': [244259.0,
    3221.0,
    2076.0,
    2056.0,
    3230.0,
    3103.0,
    996.0,
    195.0,
    53.0,
    11.0]},
  'statistics': {'mean': 0.00534185953438282,
   'stddev': 0.022541318088769913,
   'maximum': 0.23026999831199646,
   'minimum': 0.0,
   'valid_percent': 0.0003858024691358025}}]

In [11]:
# Use mean, scaled stddev, and minimum to generate an appropriate color bar range.
rescale_values = {
    #"max": raster_bands[0]['statistics']['mean'] + 4*raster_bands[0]['statistics']['stddev'],
    "max": raster_bands[0]['statistics']['maximum'],
    "min": raster_bands[0]['statistics']['minimum'],
}

print(rescale_values)

{'max': 0.23026999831199646, 'min': 0.0}


Now, you will pass the `item id`, `collection name`, `asset name`, and the `rescale values` to the Raster API endpoint, along with a colormap. This step is done twice, one for each date/time you will visualize, and tells the Raster API which collection, item, and asset you want to view, specifying the colormap and colorbar ranges to use for visualization. The API returns a JSON with information about the requested image. Each image will be referred to as a tile.

In [12]:
# Choose a color for displaying the data
# For more information on Colormaps in Matplotlib, please visit https://matplotlib.org/stable/users/explain/colors/colormaps.html
color_map = "viridis"

In [13]:
# Make a GET request to retrieve information for your first date/time
date_1_tile = requests.get(
    f"{RASTER_API_URL}/collections/{collection_id}/items/{item_id}/tilejson.json?"
    f"&assets={asset_name}"
    f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
    f"&rescale={rescale_values['min']},{rescale_values['max']}"
).json()

# Print the properties of the retrieved granule to the console
date_1_tile

{'tilejson': '2.2.0',
 'version': '1.0.0',
 'scheme': 'xyz',
 'tiles': ['https://earth.gov/ghgcenter/api/raster/collections/casagfed-carbonflux-monthgrid-v3/items/casagfed-carbonflux-monthgrid-v3-201001/tiles/WebMercatorQuad/{z}/{x}/{y}@1x?assets=npp&color_formula=gamma+r+1.05&colormap_name=viridis&rescale=0.0%2C0.23026999831199646'],
 'minzoom': 0,
 'maxzoom': 24,
 'bounds': [-180.0, -90.0, 180.0, 90.0],
 'center': [0.0, 0.0, 0]}

In [14]:
# Repeat the above for your second date/time
# Note that we do not calculate new rescale_values for this tile, because we dates tiles 1 and 2 to have the same colorbar range for best visual comparison.
second_date = items_dict[dates[1]]
# Extract collection name and item ID
collection_id = second_date.collection_id
item_id = second_date.id


date_2_tile = requests.get(
    f"{RASTER_API_URL}/collections/{collection_id}/items/{item_id}/tilejson.json?"
    f"&assets={asset_name}"
    f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
    f"&rescale={rescale_values['min']},{rescale_values['max']}"
).json()

date_2_tile

{'tilejson': '2.2.0',
 'version': '1.0.0',
 'scheme': 'xyz',
 'tiles': ['https://earth.gov/ghgcenter/api/raster/collections/casagfed-carbonflux-monthgrid-v3/items/casagfed-carbonflux-monthgrid-v3-200301/tiles/WebMercatorQuad/{z}/{x}/{y}@1x?assets=npp&color_formula=gamma+r+1.05&colormap_name=viridis&rescale=0.0%2C0.23026999831199646'],
 'minzoom': 0,
 'maxzoom': 24,
 'bounds': [-180.0, -90.0, 180.0, 90.0],
 'center': [0.0, 0.0, 0]}

## Generate Map
First, we'll define the Area of Interest (AOI) as a GEOJSON. This will be visualized as a filled polygon on the map.

In [15]:
# The AOI is currently set to the Amazon rainforest in South America
aoi = {
    "type": "Feature",
    "properties": {},
    "geometry": {
        "coordinates": [
            [
                # [longitude, latitude]
                [-74.0, -3.0],   # Southwest Bounding Coordinate
                [-74.0, 5.0],    # Southeast Bounding Coordinate
                [-60.0, 5.0],    # Northeast Bounding Coordinate
                [-60.0, -3.0],   # Northwest Bounding Coordinate
                [-74.0, -3.0]    # Closing the polygon at the Southwest Bounding Coordinate
            ]
        ],
        "type": "Polygon",
    },
}


In [16]:
# Initialize the map, specifying the center of the map and the starting zoom level.
# 'folium.plugins' allows mapping side-by-side via 'DualMap'
# Map is centered on the position specified by "location=(lat,lon)"
map_ = folium.plugins.DualMap(location=(0, -66), zoom_start=5)

# Define the first map layer using the tile fetched for the first date
# The TileLayer library helps in manipulating and displaying raster layers on a map
map_layer_1 = TileLayer(
    tiles=date_1_tile["tiles"][0],
    attr="GHG",
    opacity=0.8,
    name=f"{dates[0]} NPP",
    overlay= True,
    legendEnabled = True
)
# Add the first layer to the Dual Map
# This will appear on the left side, specified by 'm1'
map_layer_1.add_to(map_.m1)

# Define the second map layer using the tile fetched for the second date
map_layer_2 = TileLayer(
    tiles=date_2_tile["tiles"][0],
    attr="GHG",
    opacity=0.8,
    name=f"{dates[1]} NPP",
    overlay= True,
    legendEnabled = True
)
# Add the second layer to the Dual Map
# This will appear on the right side, specified by 'm2'
map_layer_2.add_to(map_.m2)


# Add data markers to both sides of map
folium.Marker((0, -66), tooltip="Amazon Rainforest").add_to(map_)
# Add AOI to both sides of map
folium.GeoJson(aoi, name="Amazon Rainforest, South America",
        style_function=lambda feature: {
        "fillColor": "none",
    }).add_to(map_)
# Add controls to turn different elements on/off, for both sides of map
folium.LayerControl(collapsed=False).add_to(map_)


# Add a colorbar
# For this, use one of our custom 'ghgc_utils' functions to create an HTML colorbar representation.
legend_html = ghgc_utils.generate_html_colorbar(color_map, rescale_values, label='NPP (kg Carbon/m2/month)')
# Now add colorbar to the map
map_.get_root().html.add_child(folium.Element(legend_html))

# Visualizing the map
map_

# Calculating Zonal Statistics

To perform zonal statistics, first we need to create a polygon. In this case we are focusing on an Area of Interest (AOI) in the Amazon Rainforest, South America.

In [17]:
# Give the AOI a name to be used in your time series plot later on.
aoi_name = 'Amazon Rainforest'
# The AOI is defined as a GEOJSON
aoi = {
    "type": "Feature",
    "properties": {},
    "geometry": {
        "coordinates": [
            [
                # [longitude, latitude]
                [-74.0, -3.0],   # Southwest Bounding Coordinate
                [-74.0, 5.0],    # Southeast Bounding Coordinate
                [-60.0, 5.0],    # Northeast Bounding Coordinate
                [-60.0, -3.0],   # Northwest Bounding Coordinate
                [-74.0, -3.0]    # Closing the polygon at the Southwest Bounding Coordinate
            ]
        ],
        "type": "Polygon",
    },
}


In [None]:
# Use one of the custom 'ghgc_utils' functions to generate statistics over your AOI using the Raster API
# This step may take a minute.
df = ghgc_utils.generate_stats(items,aoi,url=RASTER_API_URL,asset=asset_name)
# Print the first 5 lines of our statistics
df.head(5)

Generating stats...


## Visualizing the Data as a Time Series
We can now explore the NPP time series for the Amazon Rainforest, South America area. We can plot the data set using the code below:

In [None]:
# Plot data
fig = plt.figure(figsize=(10,5))  # Set the size of the figure

# Change "which_stat" if you would rather look at something like mean, median, or standard deviation.
which_stat = 'max'

plt.plot(
    df["date"][:],
    df[which_stat],
    color="purple",
    linestyle="-",
    linewidth=1.5,
)

# Add x labels at desired positions (for example, every 6 months)
plt.xticks(
    df["date"][::6],  
    rotation=45,  # Rotate labels to avoid overlap
    ha="right"
)

# Labels and title
plt.xlabel("Month")
plt.ylabel("kg Carbon/m2/month")
plt.title(f"{which_stat.capitalize()} Monthly {asset_name.upper()} for {aoi_name} (2003-2017)")

# Add data citation
plt.text(
    df["date"][:].min(),                         # X-coordinate of the text (first datetime value)
    df[which_stat].min(),                  # Y-coordinate of the text (minimum CO2 value)

    # Text to be displayed
    f"Source: {collection.title}",                   
    fontsize=9,                             # Font size
    horizontalalignment="left",              # Horizontal alignment
    verticalalignment="bottom",              # Vertical alignment
    color="blue",                            # Text color
)

plt.show()

In [None]:
# Fetch the third granule in the collection and set the color scheme and rescale values. 
n = 2
october_tile = requests.get(
    f"{RASTER_API_URL}/collections/{collection_name}/items/{items[n].id}/tilejson.json?"
    f"&assets={asset_name}"
    f"&color_formula=gamma+r+1.05&colormap_name={color_map}"
    f"&rescale={rescale_values['min']},{rescale_values['max']}",
).json()
october_tile

In [None]:
# Map the NPP level for the Congo area for the chosen tile
aoi_map_bbox = Map(
    tiles="OpenStreetMap",
    location=[
        1, # latitude
        17, # longitude
    ],
    zoom_start=5,
)

map_layer = TileLayer(
    tiles=october_tile["tiles"][0],
    attr="GHG", opacity = 0.7, name=f"{items[n].properties['start_datetime'][0:7]} {asset_name}", overlay= True, legendEnabled = True
)

map_layer.add_to(aoi_map_bbox)

# Display data marker (title) on the map
folium.Marker((1,17), tooltip="Dallas").add_to(aoi_map_bbox)
folium.LayerControl(collapsed=False).add_to(aoi_map_bbox)

# Add a colorbar
# Add a colorbar
# For this, use one of the custom 'ghgc_utils' functions to create an HTML colorbar representation.
legend_html = ghgc_utils.generate_html_colorbar(color_map, rescale_values, label='NPP (kg Carbon/m2/month)')

# Add colorbar to the map
aoi_map_bbox.get_root().html.add_child(folium.Element(legend_html))

aoi_map_bbox

## Summary

In this notebook we have successfully completed the following steps for the STAC collection for CASA GFED Land-Atmosphere Carbon Flux data:
1.	Install and import the necessary libraries 
2.	Fetch the collection from STAC collections using the appropriate endpoints
3.	Count the number of existing granules within the collection
4.	Map and compare the Net Primary Production (NPP) levels over the Amazon Rainforest, South America area for two distinctive years
5.	Create a table that displays the minimum, maximum, and sum of the NPP values for a specified region
6.	Generate a time-series graph of the NPP values for a specified region

If you have any questions regarding this user notebook, please contact us using the [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSeVWCrnca08Gt_qoWYjTo6gnj1BEGL4NCUC9VEiQnXA02gzVQ/viewform). 