<img src="https://raw.githubusercontent.com/Harmonize-Brazil/code-gallery/main/img/INPE_logo.png" align="left" style="height: 105px" height="105"/>
<!-- https://www.gov.br/mcti/pt-br/composicao/rede-mcti/instituto-nacional-de-pesquisas-espaciais -->
<img src="https://earth.bsc.es/harmonize/lib/exe/fetch.php?h=250&crop=0&tok=cfb750&media=wiki:logo.png" align="right" style="height: 90px" height="90"/>

<h1 style="color:#336699; text-align: center">Health and Climate data</h1>
<h3 style="color:#336699; text-align: center">Access using the BDC-STAC service</h3>
<hr style="border:2px solid #0077b9;">
<!-- <hr style="border:2px solid #274ad4;"> -->
<br/>

<div style="text-align: center;font-size: 90%;">
    <!-- <a href="https://colab.research.google.com/github/Harmonize-Brazil/code-gallery/blob/main/jupyter/events/2025-Infodengue-Harmonize_INPE/" target = "_blank"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab"> </a> -->
    <a href="https://nbviewer.jupyter.org/github/Harmonize-Brazil/code-gallery/blob/main/jupyter/events/2025-Infodengue-Harmonize_INPE/health_climate_using_STAC.ipynb"><img src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" ></a> <!--align="center"-->
    <br/><br/>
    <!-- Yuri -->
    Yuri Domaradzki Moreira Nunes <sup><a href="https://orcid.org/0009-0007-2829-4345" target="_blank" rel="noopener noreferrer"><img src="https://orcid.filecamp.com/static/thumbs/folders/qLJ1tuei4m6ugC3g.png" width="16" alt="ORCID iD" style="vertical-align: text-bottom;"/></a></sup>,
    <!-- Adeline  -->
    Adeline Marinho Maciel <sup><a href="https://orcid.org/0000-0002-1467-6488" target="_blank" rel="noopener noreferrer"><img src="https://orcid.filecamp.com/static/thumbs/folders/qLJ1tuei4m6ugC3g.png" width="16" alt="ORCID iD" style="vertical-align: text-bottom;"/></a></sup>
    <br/><br/>
    Earth Observation and Geoinformatics Division, National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact: <a href="mailto:yuridomaradzki@gmail.com">yuridomaradzki@gmail.com</a>;
    <a href="mailto:adelinemaciel22@gmail.com">adelinemaciel22@gmail.com</a>
    <br/><br/>
    Last Update: June 9, 2025
</div>    
<br/><br/>
<div style="width: 60%; margin: auto">
    <div style="text-align: center; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 10px;">
        <b>Abstract.</b> This Jupyter Notebook gives an overview of how to use the BDC-STAC service implementation to discover and access the health and climate data products from the Earth Observation Data Cubes tuned for Health Response Systems (EODCtHRS), a <a href="https://harmonize-tools.org" target="_blank">HARMONIZE project</a> component. This notebook was adapted from <a href="https://github.com/brazil-data-cube/code-gallery/blob/master/jupyter/Python/stac/stac-introduction.ipynb" target="_blank">Introduction to the SpatioTemporal Asset Catalog (STAC)</a> available at Github code gallery of the <a href="https://data.inpe.br/bdc/web" target="_blank">Brazil Data Cube (BDC)</a> project.
    </div>
</div>
<br><br>

# Setting up a virtual environment for Jupyter Notebook
<hr style="border:1px solid #0077b9;">

For running the examples in a non-colab environment using `pip`, use the following commands:

In [None]:

# # 1- Create a virtual environment named 'venv'
!python3 -m venv venv

# # 2- Activate the virtual environment
# # For Linux/macOS:
!source venv/bin/activate

# # For Windows (PowerShell):
# #!.\venv\Scripts\Activate.ps1

# # 3- Upgrade pip
!pip3 install --upgrade pip

# # 4- Install required libraries
!pip3 install ipykernel jupyter

# # 5- Add the virtual environment to Jupyter kernels
!python -m ipykernel install --user --name=venv --display-name "Python (venv)"


# STAC Client API
<hr style="border:1px solid #0077b9;">

For running the examples in this Jupyter Notebook you will need to install the [pystac-client](https://pystac-client.readthedocs.io/en/latest/). To install it from PyPI using `pip`, use the following commands:

In [None]:
# We already have these libraries installed using the Geospatial kernel from BDC-Lab!

# Check and install scikit-learn and pystac-client if necessary
try:
    import sklearn
except ImportError:
    !pip3 install scikit-learn

try:
    import pystac_client
except ImportError:
    !pip3 install pystac-client

# Import the modules to use their functions or check versions
import sklearn
import pystac_client

# Display versions of the imported libraries
print('scikit-learn:',sklearn.__version__)
print('pystac_client:',pystac_client.__version__)

Installing additional libraries for processing and visualization:

In [None]:
# We already have these libraries installed using the Geospatial kernel from BDC-Lab!
import importlib

for module in ['geopandas','shapely','matplotlib','tqdm','folium']:
    try:
        importlib.import_module(module)
    except ImportError:
        !pip3 install {module}

# Import the modules to use their functions or check versions
import geopandas
import shapely
import matplotlib
import tqdm
import folium

# Display versions of the imported libraries
print('geopandas:',geopandas.__version__)
print('shapely:',shapely.__version__)
print('matplotlib:',matplotlib.__version__)
print('tqdm:',tqdm.__version__)
print('folium:',folium.__version__)

In order to access the funcionalities of the client API, you should import the `stac` package, as follows:

In [None]:
import pystac_client

Then, create a `STAC` object attached to the HARMONIZE instance of BDC-STAC service:

In [None]:
service = pystac_client.Client.open('https://brazildatacube.dpi.inpe.br/harmonize/dev/stac/v1/')
print(service)

# Listing Health and Climate Data Available
<hr style="border:1px solid #0077b9;">

Using the STAC object, it is possible to list all health and climate data collections available from the service. However, here we have used a set of keywords (`dengue`, `zika`, `chagas` and `chikungunya`) to retrieve only collections related to health data, and keyword (`temp`, `prec`, `humidity` and `) to retrieve only climate data:

In [None]:
# Get all collections
collections = service.get_collections()

# Sort collections by their ID (alphabetically)
collections_sorted = sorted(collections, key=lambda c: c.id)

# Filter relevant collections
health_collections = [
    collection for collection in collections_sorted
    #print(collection)
    #keyword for collections from health data
    if any(keyword in collection.id for keyword in ["dengue", "zika", "chagas", "chikungunya"])
            #print(collection)
]

# Print each filtered collection
for collection in health_collections:
    print(collection)

# Print the total number
print(f"\nTotal health collections: {len(health_collections)}")

In [None]:
# Filter climate collections
climate_collections = [
    collection for collection in collections_sorted
    #keyword for collections from climate data
    if any(keyword in collection.id for keyword in ["temp", "prec", "humidity"])
        #print(collection)
]

# Print each filtered collection
for collection in climate_collections:
    print(collection)

# Print the total number
print(f"\nTotal climate collections: {len(climate_collections)}")

Description of a collection

In [None]:
# Get health data collection:
health_collection_1 = service.get_collection('dengue_alert_level_northeast_mun_week_infodengue-1')
health_collection_1.description

In [None]:
# Get climate data collection:
climate_collection_1 = service.get_collection('humidity_percent_ne_mun_epiweek-1')
climate_collection_1.description

# Retrieving the Metadata of a Collection
<hr style="border:1px solid #0077b9;">

The `collection` object associated as a result of the  `get_collections` method has information about a given health or climate data collection identified by its name. In this example, we are retrieving information about the `Dengue alert level northeast mun week infodengue` data collection, `dengue_alert_level_northeast_mun_week_infodengue-1`:

In [None]:
health_collection = service.get_collection('dengue_alert_level_northeast_mun_week_infodengue-1')
health_collection

The `collection` object associated as a result of the  `get_collections` method has information about a given health or climate data collection identified by its name. In this example, we are retrieving information about the `Relative humidity percent northeast mun epiweek` data collection, `humidity_percent_ne_mun_epiweek-1`:

In [None]:
climate_collection = service.get_collection('humidity_percent_ne_mun_epiweek-1')
climate_collection

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-item.png?raw=true" align="right" width="300"/>

# Retrieving Items
<hr style="border:1px solid #0077b9;">

Use the `Client.search(**kwargs)` method to retrieve the items of a given collection(s):

In [None]:
items_search_health = service.search(collections=
                                     ['dengue_alert_level_northeast_mun_week_infodengue-1'])

The method `.search(**kwargs)` returns a `ItemSearch` representation which has handy methods to identify the matched results. For example, to check the number of items matched, use `.matched()`:

In [None]:
items_search_health.matched()

In [None]:
items_search_climate = service.search(collections=
                                      ['humidity_percent_ne_mun_epiweek-1'])

In [None]:
items_search_climate.matched()

Two collections

In [None]:
# all collections
items_search_all = service.search(collections=['dengue_alert_level_northeast_mun_week_infodengue-1','humidity_percent_ne_mun_epiweek-1'])

items_search_all.matched()

To iterate over the matched result, use `.items()` to traverse the list of items:

In [None]:
for item in items_search_health.items():
    print(item)
    break #remove break to view all items

In [None]:
for item in items_search_climate.items():
    print(item)
    break #remove break to view all items

The  `search(**kwargs)` method can be used to support filtering rules through the specification of a rectangle (bbox) or a date and time (datetime) criterias. Other options are available, such as using spatial intersects of a GeoJSON Geometry. Please see the documentation available at https://api.stacspec.org/v1.0.0/item-search.

In [None]:
item_search_dengue = service.search(#bbox=bbox,
                             datetime='2019-01-01/2023-12-31',
                             collections=['dengue_alert_level_northeast_mun_week_infodengue-1'])
item_search_dengue.matched()

In [None]:
item_search_humidity = service.search(#bbox=bbox,
                             datetime='2019-01-01/2023-12-31',
                             collections=['humidity_percent_ne_mun_epiweek-1'])
item_search_humidity.matched()

<img src="https://raw.githubusercontent.com/brazil-data-cube/code-gallery/master/img/stac/stac-asset.png?raw=true" align="right" width="300"/>

# Assets
<hr style="border:1px solid #0077b9;">

The assets with the links to the images, thumbnails or specific metadata files, can be accessed through the property `assets` (from a given item):

From each item (`item_search_dengue` and `item_search_humidity`), we can access all its assets, which are links to the associated data files.

In [None]:
items_dengue = item_search_dengue.item_collection()
items_dengue

In [None]:
items_humidity = item_search_humidity.item_collection()
items_humidity

First item as reference

In [None]:
item_dengue = items_dengue[0]
item_dengue

In [None]:
item_humidity = items_humidity[0]
item_humidity

# Using Folium to Visualize a Health and Climate Data
<hr style="border:1px solid #0077b9;">

For each feature, retrieve geojson as link

In [None]:
import geopandas as gpd

def extract_geojson_links(item_collection, asset_key='geojson'):
    """
    Extracts and prints GeoJSON links from a STAC ItemCollection.

    Parameters:
        item_collection: pystac.ItemCollection
            The collection of STAC items.
        asset_key: str
            The key of the asset to extract (default is 'geojson').

    Returns:
        list: A list of GeoJSON URLs.
    """
    geojson_links = []

    for item in item_collection.items:
        if asset_key in item.assets:
            link = item.assets[asset_key].href
            geojson_links.append(link)
            #print(link)
        else:
            print(f"Asset key '{asset_key}' not found in item {item.id}")

    return geojson_links

In [None]:
geojson_links_dengue = extract_geojson_links(items_dengue)
geojson_links_humidity = extract_geojson_links(items_humidity)

print(f"Dengue First Link: {geojson_links_dengue[0]}\nHumidity First Link: {geojson_links_humidity[0]}")

Open first geojson as dataframe.

In [None]:
# Open as dataframe
gdf_dengue = gpd.read_file(geojson_links_dengue[0])
gdf_dengue

In [None]:
gdf_humidity = gpd.read_file(geojson_links_humidity[0])
gdf_humidity

### Plot health data

In [None]:
dst_crs = "EPSG:4326"

gdf_dengue = gdf_dengue.to_crs(dst_crs)

In [None]:
# Copy with some columns
gdf_dengue_simple = gdf_dengue[['municipio_geocodigo', 'value', 'geometry']].copy()

gdf_dengue_simple['value'] = gdf_dengue_simple['value'].astype(int)

gdf_dengue_simple

In [None]:
import folium

# Create a map
m = folium.Map(location=[gdf_dengue_simple.geometry.centroid.y.mean(),
                         gdf_dengue_simple.geometry.centroid.x.mean()], zoom_start=8)

# Add GeoJSON
folium.GeoJson(gdf_dengue_simple).add_to(m)
m

In [None]:
import folium
from branca.element import Template, MacroElement

# Define fixed colors for dengue transmission levels
color_dict = {
    1: 'green',    # Low transmission
    2: 'yellow',   # Attention
    3: 'orange',   # Sustained transmission
    4: 'red'       # High transmission
}

# Define style function
def style_function(feature):
    value = feature['properties']['value']
    return {
        'fillColor': color_dict.get(value, 'gray'),
        'color': 'black',
        'weight': 1,
        'fillOpacity': 0.7
    }

# Initialize map centered on centroid of GeoDataFrame
m = folium.Map(
    location=[gdf_dengue_simple.geometry.centroid.y.mean(),
              gdf_dengue_simple.geometry.centroid.x.mean()],
    zoom_start=8
)

# Add GeoJSON with style
folium.GeoJson(
    gdf_dengue_simple,
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(fields=['municipio_geocodigo', 'value'])
).add_to(m)

# Define HTML legend template
legend_html = """
{% macro html(this, kwargs) %}
<div style="
    position: fixed;
    bottom: 50px; left: 50px; width: 180px; height: 130px;
    background-color: white;
    z-index:9999;
    font-size:14px;
    border:2px solid grey;
    padding: 10px;
    ">
    <b>Dengue Alert Levels</b><br>
    <i style="background:green; width:10px; height:10px; float:left; margin-right:5px;"></i>Low transmission<br>
    <i style="background:yellow; width:10px; height:10px; float:left; margin-right:5px;"></i>Attention<br>
    <i style="background:orange; width:10px; height:10px; float:left; margin-right:5px;"></i>Sustained transmission<br>
    <i style="background:red; width:10px; height:10px; float:left; margin-right:5px;"></i>High transmission
</div>
{% endmacro %}
"""

# Create MacroElement and add to map
legend = MacroElement()
legend._template = Template(legend_html)
m.get_root().add_child(legend)

# Display map
m


### Plot climate data

In [None]:
gdf_humidity = gdf_humidity.to_crs(dst_crs)

gdf_humidity['value'] = gdf_humidity['value'].astype(float)

In [None]:
import folium
import branca.colormap as cm

# Define colors according to the humidity percentage legend you provided
colors = [
    '#eeeeee',  # ~0-10% - light gray
    '#fcae91',  # ~10-20% - light pink
    '#fb6a4a',  # ~20-30% - orange
    '#fcbf49',  # ~30-40% - yellow
    '#fddc6c',  # ~40-50% - light yellow
    '#c7e9b4',  # ~50-60% - very light green
    '#7fcdbb',  # ~60-70% - aqua green
    '#41b6c4',  # ~70-80% - light blue
    '#2c7fb8',  # ~80-90% - medium blue
    '#253494'   # ~90-100% - dark blue
]

# Create the LinearColormap with the defined colors
colormap = cm.LinearColormap(
    colors=colors,
    vmin=0, vmax=100  # Min and max values for humidity percentage
)

# Define style function for folium GeoJson
def style_function(feature):
    value = feature['properties']['value']  # Get the humidity value
    return {
        'fillColor': colormap(value),  # Use colormap to get color based on value
        'color': 'black',              # Outline color
        'weight': 1,                   # Outline weight
        'fillOpacity': 0.7             # Transparency
    }

# Example: create folium map centered at a given location
m = folium.Map([gdf_humidity.geometry.centroid.y.mean(),
            gdf_humidity.geometry.centroid.x.mean()],
              zoom_start=8)

# Example: adding GeoJson (replace 'gdf_humidity' with your GeoDataFrame)
folium.GeoJson(
    gdf_humidity,
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(fields=['cod_mun', 'value'])
).add_to(m)

# Add the colormap legend to the map
colormap.caption = 'Humidity (%)'
colormap.add_to(m)

# Display map
m


# Download Health and Climate Data Files
<hr style="border:1px solid #0077b9;">

The file related to an asset can be retrieved through the `download` method. The cell code below shows how to download the image file associated to the asset into a folder named `img`:

In [None]:
# Function adapted from script drone_using_stac.ipynb
import os
from urllib.parse import urlparse

import requests
from pystac import Asset
from tqdm import tqdm

def download(asset_or_url, directory: str = None, chunk_size: int = 1024 * 16, overwrite=False, **request_options) -> str:
    """Smart download from STAC Asset or URL with progress bar."""
    if directory is None:
        directory = ''

    # Detect if input is Asset or URL
    if isinstance(asset_or_url, Asset):
        url = asset_or_url.href
    elif isinstance(asset_or_url, str):
        url = asset_or_url
    else:
        raise TypeError("Input must be a pystac.Asset or str (URL).")

    # Extract filename
    filename = os.path.basename(urlparse(url).path)
    output_file = os.path.join(directory, filename)

    # Avoid re-downloading
    if os.path.exists(output_file) and not overwrite:
        print(f"File already exists: {output_file}")
        return output_file

    # Ensure directory exists
    os.makedirs(directory, exist_ok=True)

    # Start download
    response = requests.get(url, stream=True, **request_options)
    response.raise_for_status()

    total_bytes = int(response.headers.get('content-length', 0))

    with open(output_file, 'wb') as fout:
        with tqdm(total=total_bytes, unit='B', unit_scale=True, desc=filename) as pbar:
            for chunk in response.iter_content(chunk_size=chunk_size):
                if chunk:
                    fout.write(chunk)
                    pbar.update(len(chunk))

    return output_file

In order to download all files related to an item, iterate over assets and download each one as following:

In [None]:
asset_key='geojson'
num_assets=0

for item in items_dengue.items:
    if asset_key in item.assets and num_assets < 10:
        link = item.assets[asset_key].href
        download(link, 'test_data')
        num_assets +=1

# References
<hr style="border:1px solid #0077b9;">

- [Spatio Temporal Asset Catalog Specification](https://stacspec.org/)


- [Python Client Library for STAC Service](https://pystac-client.readthedocs.io/en/latest/)

# See also the following Jupyter Notebooks
<hr style="border:1px solid #0077b9;">

* [Introduction to Earth Observation Data Cubes tuned for Health Response (EDPU)
STAC functions in Python](https://github.com/Harmonize-Brazil/code-gallery/blob/main/jupyter/Python/edpu/publish_collection.ipynb)
* [Earth Observation Data Cubes tuned for Health Response Health Indicator PRocessing (EHIPR) user manual](https://github.com/Harmonize-Brazil/code-gallery/blob/main/jupyter/Python/ehipr/spatializing_lis_indicator.ipynb)