<img src="https://raw.githubusercontent.com/Harmonize-Brazil/code-gallery/main/img/INPE_logo.png" align="left" style="height: 105px" height="105"/>
<!-- https://www.gov.br/mcti/pt-br/composicao/rede-mcti/instituto-nacional-de-pesquisas-espaciais -->
<img src="https://earth.bsc.es/harmonize/lib/exe/fetch.php?h=250&crop=0&tok=cfb750&media=wiki:logo.png" align="right" style="height: 90px" height="90"/>

<h1 style="color:#336699; text-align: center">Module ehipr (health data)</h1>
<h3 style="color:#336699; text-align: center"><b>E</b>ODCtHRS <b>H</b>ealth <b>I</b>ndicator <b>PR</b>ocessing Package</h3>
<hr style="border:2px solid #0077b9;">

<div style="text-align: center; font-size: 90%;">
    <!-- <a href="https://colab.research.google.com/github/Harmonize-Brazil/code-gallery/blob/main/jupyter/events/2025-Infodengue-Harmonize_INPE/" target = "_blank"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab"> </a> -->
    <a href="https://nbviewer.jupyter.org/github/Harmonize-Brazil/code-gallery/blob/main/jupyter/events/2025-Previous-Harmonize-Training/health_dengue_confirmed_cases_indicator.ipynb"><img src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" ></a> <!--align="center"-->
    </br></br>
    Yuri Domaradzki Moreira Nunes <sup><a href="https://orcid.org/0009-0007-2829-4345" target="_blank" rel="noopener noreferrer"><img src="https://orcid.filecamp.com/static/thumbs/folders/qLJ1tuei4m6ugC3g.png" width="16" alt="ORCID iD" style="vertical-align: text-bottom;"/></a></sup>
    <br/><br/>
    Earth Observation and Geoinformatics Division, National Institute for Space Research (INPE)
    <br/>
    Avenida dos Astronautas, 1758, Jardim da Granja, São José dos Campos, SP 12227-010, Brazil
    <br/><br/>
    Contact:
    <a href="mailto:yuri.nunes@inpe.br">yuridomaradzki@gmail.com</a>
    <br/><br/>
    Last Update: November 6, 2025
    <br/><br/>
    <div style="width: 60%; margin: auto">
        <div style="text-align: center; border-style: solid; border-color: #0077b9; border-width: 1px; padding: 10px;">
            This Jupyter Notebook uses the ehipr package (cube4health module) and provides an overview of how to use it to generate health indicators from a data source. It demonstrates how to process dengue confirmed cases from Health Information Laboratory (LIS/ICICT).
        </div>
    </div>
</div>
<br/><br/>



### <span style="color:#336699" id="install"> 1. Introduction </span>
 </span>
<hr style="border:1px solid #0077b9;">

<p style='text-align: justify; font-size: 16px;'>
This Jupyter Notebook uses the ehipr package (from the cube4health module) and provides an overview of how to use it to generate health indicators from a data source. It demonstrates how to process dengue confirmed cases from the Health Information Laboratory (LIS/ICICT) and how these health data can be integrated into the Earth Observation Data Cubes tuned for Health Response (EODCtHRS) platform to support environmental and epidemiological analyses in Brazil.

The <a href="https://github.com/Harmonize-Brazil/ehipr" target="_blank"><b>E</b>ODCtHRS <b>H</b>ealth <b>I</b>ndicator <b>PR</b>ocessing (ehipr)</a> is composed of a set of functions for reading files in CSV and Parquet formats, enabling spatial aggregations (e.g., grouping by municipalities, health region and state) and temporal aggregations (e.g., epidemiological weeks, months and year).

As output, ehipr generates files in vector formats, including GeoJSON and Shapefile. The input data include the dengue mortality rate (per 100,000 inhabitants), incidence rates of classic dengue, zika, chikungunya, and Chagas disease from LIS, as well as notified dengue cases, estimated incidence rates for dengue, chikungunya, and zika, current outbreak risk levels for these diseases in a given region, and laboratory-confirmed cases of dengue, chikungunya, and zika.

<a href="#ehipr_workflow">Figure 1</a> presents the data processing workflow.

<figure style="align: center; font-size: 12px;">
  <img id="ehipr_workflow" src="https://raw.githubusercontent.com/Harmonize-Brazil/code-gallery/main/img/Health/ehipr_workflow.jpg" />
  <figcaption style='text-align: center;'><b>Figure 1</b> - Protocol health data in ehipr package.</figcaption>
</figure>

</p>




<p style='text-align: justify; font-size: 16px;'>
The third collection of health data is based on a dataset provided by the <a href="https://www.icict.fiocruz.br/laboratorio/laboratorio-de-informacao-em-saude-lis" target="_blank">Health Information Laboratory (LIS/ICICT)</a>, , which includes dengue mortality rate (per 100,000 inhabitants) and incidence rates of classic dengue, zika, chikungunya, and Chagas disease for the period of January 2000 to December 2023. 

In addition to these data, the collection also incorporates indicators such as notified dengue cases, estimated incidence rates for dengue, chikungunya, and zika, current outbreak risk levels for these diseases in specific regions, and laboratory-confirmed cases of dengue, chikungunya, and zika for the period January 2010 to August 2025, produced by <a href="https://mosqlimate.org/pt/" target="_blank">Mosqlimate/Infodengue</a>.

The list was defined by the team of Fiocruz and Mosqlimate/Infodengue experts involved in the project and serves as a reference for the set of indicators detailed below in <a href="#health_indicators_table">Table 1</a>.
</p>

<table id="health_indicators_table" align="center">
    <caption style="text-align"><b>Table 1</b> - List of health indicators </caption>
    <tr style="background-color: #4e4d4dff; border-radius: 1em/5em;  font-size: 16px;">
        <th>Indicator description (unit)</th>
        <th>Aggregation Temporal / Spatial</th>
        <th>Period / Frequency</th>
        <th>Data source</th>
    </tr>
    </thead>
    <tbody>
        <tr>
            <td <p style="text-align: center; font-size: 16px;">
                <b>Mortality rate of dengue</b>, calculated as the number of reported cases per population, multiplied by 100.00.
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                Northeast and North hotspots <br> Aggregated by municipality
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                January/2000 - August/2023 <br> by months
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                <a href="https://www.icict.fiocruz.br/laboratorio/laboratorio-de-informacao-em-saude-lis" target="_blank">Health Information Laboratory (LIS/ICICT)</a>
            </td>
        </tr>
        <tr style="background-color: #D8D8D8; color: #000;">
            <td <p style="text-align: center; font-size: 16px;">
                <b>Incidence rate</b> of dengue, zika, chikungunya, and Chagas disease recorded each epidemiological week.
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                Northeast and North hotspots <br> Aggregated by municipality
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                January/2000 - August/2023 <br> by months
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                <a href="https://www.icict.fiocruz.br/laboratorio/laboratorio-de-informacao-em-saude-lis" target="_blank">Health Information Laboratory (LIS/ICICT)</a>
            </td>
        </tr>
        <tr>
            <td <p style="text-align: center; font-size: 16px;">
                <b>Notified cases</b> of dengue, zika, chikungunya, and Chagas disease recorded each epidemiological week.
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                Northeast and North hotspots <br> Aggregated by municipality
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                January/2010 - August/2025 <br> by epidemiological week
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                <a href="https://mosqlimate.org/pt/" target="_blank">Mosqlimate/Infodengue</a>
            </td>
        </tr>
        <tr style="background-color: #D8D8D8; color: #000;">
            <td <p style="text-align: center; font-size: 16px;">
                <b>Incidence rate</b> of dengue, zika and chikungunya recorded each epidemiological week.
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                Northeast and North hotspots <br> Aggregated by municipality
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                January/2010 - August/2025 <br> by epidemiological week
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                <a href="https://mosqlimate.org/pt/" target="_blank">Mosqlimate/Infodengue</a>
            </td>
        </tr>
        <tr>
            <td <p style="text-align: center; font-size: 16px;">
                <b>Current outbreak risk level</b> of dengue, zika, and chikungunya, a classification that indicates the current risk of outbreaks of these diseases in a given region.
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                Northeast and North hotspots <br> Aggregated by municipality
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                January/2010 - August/2025 <br> by epidemiological week
            </td>
            <td <p style="text-align: center; font-size: 16px;">
                <a href="https://mosqlimate.org/pt/" target="_blank">Mosqlimate/Infodengue</a>
            </td>
        </tr>
    </tbody>
</table>


## <span style="color:#336699" id="install"> Example of using ehipr to generate health indicator </span>
<hr style="border:1px solid #0077b9;">

<p style='text-align: justify; font-size: 16px;'>
    Below is a guide on how to use the package:
</p>

<ol>
    <li> <a href="#install" style="font-size: 16px;">Installation cube4health Python Package</a>
    <li> <a href="#download" style="font-size: 16px;">Usage</a>
    <ol style='list-style-type: none;'>
        <li> <a href="#images" style="font-size: 16px;"><span style="color:#fff">2.1.</span> Spatializing the indicator</a>
        <!-- <li> <a href="#roi" style="font-size: 16px;"><span style="color:#fff">2.2.</span> Create a region of interesting (roi)</a> -->
    </ol>
    <!-- <li> <a href="#execute" style="font-size: 16px;">Generate temperature indicator by month</a> -->
    <li> <a href="#results" style="font-size: 16px;">Plot results</a>
    <!-- <li> <a href="#functions" style="font-size: 16px;">Others functions in eclimpr module</a> -->
    <li> <a href="#references" style="font-size: 16px;">Bibliographical references</a>
</ol>

### <span style="color:#336699" id="load">1. Load cube4health Python Package </span>
 </span>
<hr style="border:1px solid #0077b9;">

To run the examples in this Jupyter Notebook, follow the installation instructions provided in the <a href="https://github.com/Harmonize-Brazil/code-gallery/blob/main/jupyter/events/2025-Harmonize-Annual-Meeting/cube4health_introduction.ipynb" target="_blank">cube4health_introduction.ipynb</a> notebook. The cube4health package includes the ehipr module. Use the following command to check your installation:

In [None]:
# Check and install ehipr if necessary
try:
    import cube4health
except ImportError:
    print("Must be installed!")

print("cube4health:", cube4health.__version__)

cube4health: 0.2.1.dev0+g80389d824.d20251105


In [None]:
%cd ~

In [None]:
%cd /home/jovyan/code-gallery/jupyter/events/2025-Harmonize-Annual-Meeting/

### <span style="color:#336699" id="execute"> 2. Generate health indicator </span>
 </span>
<hr style="border:1px solid #0077b9;">


<p text-align='justify;' style='font-size: 16px;'>
To use the ehipr package, the user must provide a set of parameters to inform the package the characteristics of the input data. The first is the file format allowed as input data. The package can read <b>parquet</b> and <b>csv</b> files.
 
The input data must contain a set of columns associated with: spatial index, date, indicator name, spatial and temporal aggregation and indicator value. An example of the table's structure is shown in Table 2.
</p>

<table id="spatialize_data_parameters" align="center" width="100%">
    <caption style="text-align"><b>Table 2</b> - Structure of the health data table (csv or parquet format).</caption>
    <tr style="background-color: #848484; border-radius: 1em/5em;  font-size: 16px;">
        <th style="text-align: center;" width="15%"> spatial index (geocode)
        <th style="text-align: center;" width="20%"> date
        <th style="text-align: center;" width="15%"> indicator name
        <th style="text-align: center;" width="15%"> spatial aggregation
        <th style="text-align: center;" width="15%"> temporal aggregation
        <th style="text-align: center;" width="15%"> value
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> 150010 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> year </p>
        <td> <p style="text-align: center; font-size: 16px;"> 90.67 </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> 150080 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> year </p>
        <td> <p style="text-align: center; font-size: 16px;"> 44.9 </p>
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> 150010 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> month </p>
        <td> <p style="text-align: center; font-size: 16px;"> 20.00 </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> 150080 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> month </p>
        <td> <p style="text-align: center; font-size: 16px;"> 4.00 </p>
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> 150010 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> week </p>
        <td> <p style="text-align: center; font-size: 16px;"> 5.00 </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> 150080 </p>
        <td> <p style="text-align: center; font-size: 16px;"> 2010-01-01 00:00:00 </p>
        <td> <p style="text-align: center; font-size: 16px;"> indi_0019 </p>
        <td> <p style="text-align: center; font-size: 16px;"> municipality </p>
        <td> <p style="text-align: center; font-size: 16px;"> week </p>
        <td> <p style="text-align: center; font-size: 16px;"> 1.00 </p>
    </tr>
</table>

<p style="font-size: 16px;">
With the input data following the pattern mentioned above, the user must provide to the package a shapefile with the grid of geometries he wants to add to the data. This file must have three columns: spatial index, territory name and uf, which mean, respectively, the spatial index (geocode) present in health data; the name of the territory associated with that spatial index; and the state abbreviation. An example of the table's structure is shown in Table 3.
</p>

<table id="spatialize_data_parameters" align="center" width="100%">
    <caption style="text-align"><b>Table 3</b> - Structure of the shapefile with the grid of geometries.</caption>
    <tr style="background-color: #848484; border-radius: 1em/5em;  font-size: 16px;">
        <th style="text-align: center;" width="15%"> spatial index
        <th style="text-align: center;" width="70%"> territory name
        <th style="text-align: center;" width="15%"> uf
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> 1500107 </p>
        <td> <p style="text-align: center; font-size: 16px;"> Abaetetuba </p>
        <td> <p style="text-align: center; font-size: 16px;"> PA </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> 1500800 </p>
        <td> <p style="text-align: center; font-size: 16px;"> Ananindeua </p>
        <td> <p style="text-align: center; font-size: 16px;"> PA </p>
    </tr>
</table>

#### <span style="color:#336699"> Import Packages </span>
<hr style="border:1px solid #0077b9;">
<p style=" font-size: 16px;">
Let's load the lis and ehipr modules of ehipr package:
</p>

In [None]:
from cube4health.ehipr import ehipr, lis
from cube4health.ehipr.ehipr import get_indicators_id

<p style="text-align: justify; font-size: 16px;">
In this example, we will use the Confirmed Cases data from LIS. We will perform spatial aggregation by municipality and temporal aggregation by month for the regions of interest (ROI) of the HARMONIZE Project in Brazil — Northeast region. To do this, it is necessary to specify each parameter and execute the function `spatialize_data`.

Before running the `spatialize_data` function, it is necessary to define the parameters that control the data processing workflow. These parameters determine the input and output directories, the indicator to be processed, the region of interest shapefile, and the temporal configuration used for aggregation. <a href="#parameters_indicator">Table 4</a> summarizes all the parameters required to execute the function properly, including their descriptions, accepted values, and whether they are required.

<table id="spatialize_data_parameters" align="center" width="100%">
    <caption style="text-align"><b>Table 4</b> - Parameter for <b>spatialize_data</b> function of ehipr module.</caption>
    <tr style="background-color: #848484; border-radius: 1em/5em;  font-size: 16px;">
        <th style="text-align: center;" width="15%"> Parameter
        <th style="text-align: center;" width="75%"> Description
        <th style="text-align: center;" width="10%"> Required
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> indicators </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a list of strings containing the identifiers of the indicators. These identifies are represented in the <b>Id</b> field in <a href="#health_indicators_table">Table 1</a>. </p>
        <td> <p style="text-align: center; font-size: 16px;"> Yes </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> input_path </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a string that indicates where the data is located. If the data is not available on your computer, this parameter indicates the path where the downloaded data will be stored. </p>
        <td> <p style="text-align: center; font-size: 16px;"> Yes </p>
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> data_columns </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a string that indicates where the data is located. If the data is not available on your computer, this parameter indicates the path where the downloaded data will be stored. </p>
        <td> <p style="text-align: center; font-size: 16px;"> Yes </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> provider </p>
        <td> <p style="text-align: center; font-size: 16px;">  This is a string that indicates who is the data provider (ex: lis). </p>
        <td> <p style="text-align: center; font-size: 16px;"> Yes </p>
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> grid </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a tuple with a string that indicates where the grid is located and a dictionary with the column names. If not informed, it will use the 2022 standard grid for Brazil.</p>
        <td> <p style="text-align: center; font-size: 16px;"> No </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> file_crops_geom </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a list of tuples with a string indicating where the crop grid is located and a dictionary with the column names. If not informed, no crop is applied to the original grid.</p>
        <td> <p style="text-align: center; font-size: 16px;"> No </p>
    </tr>
    <tr>
        <td> <p style="text-align: center; font-size: 16px;"> spatial_agg </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a list of strings containing the spatial aggregations you want to apply to the data. By default, its value is the spatial aggregation of the data.</p>
        <td> <p style="text-align: center; font-size: 16px;"> No </p>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000;">
        <td> <p style="text-align: center; font-size: 16px;"> temp_agg </p>
        <td> <p style="text-align: center; font-size: 16px;"> This is a list of strings containing the temporal aggregations you want to apply to the data. By default, its value is the temporal aggregation of the data.</p>
        <td> <p style="text-align: center; font-size: 16px;"> No </p>
    </tr>
</table>

<p style="text-align:justify; font-size: 16px;">
First, we are going to define the parameters related to the health data, which are: 
<ul>
<li> indicators;
<li> provider;
<li> spatial_agg;
<li> temporal_agg;
<li> data_columns; and
<li> github_settings.
</p>

In [2]:
indicators = ['indi_0015']
spatial_agg = ['mun_res']
temporal_agg = ['month']
provider = 'lis'

The <b>data_columns</b> parameter is used to map the columns of the tabular data. The script recognizes and requires the following values as dictionary keys in order to read the data:

<p style="text-align:justify; font-size: 16px;">
<ul style="text-align:justify; font-size: 16px;">

<li> <b>cod</b>: it's the geocode field;
<li> <b>date</b>: it's the date field;
<li> <b>name</b>: it's the indicator name field;
<li> <b>spt_agg</b>: it's the spatial aggregation field;
<li> <b>temp_agg</b>: it's the temporal aggregation field;
<li> <b>value</b>: it's the indicator value field;

</ul>

In [3]:
data_columns = {
    'cod': 'cod', 
    'date': 'date', 
    'name': 'name',
    'spt_agg': 'agg',
    'temp_agg': 'agg_time', 
    'value': 'value'
}

<p style="text-align:justify; font-size: 16px;">
The <code>input_path</code> parameter specifies the folder where your data must already be available in either Parquet or CSV format. For most data sources, the script will read the files directly from this location.
</p>

<p style="text-align:justify; font-size: 16px;">
For Infodengue indicators, this requirement does not apply: the package includes a built-in function that directly queries the Infodengue database, so no local files are needed.
</p>

In [4]:
input_path = '/home/yuri/Docker-Compose/geoserver/data/health_indicators'

<p style="text-align:justify; font-size: 16px;">
After defined the data parameters, we are going to define the geometry parameters. As we want the data to be clipped to the area of interest, we have to point this out to the script. To do this, we need to define a variable that receives a list with tuples indicating where the cut shapefile is stored and a dictionary pointing to the name of the columns in this file. This dictionary must contain three keys:
</p>
<ul style="text-align:justify; font-size: 16px;">
<li> <b>name</b>: it's the name of the municipalities.
<li> <b>cod_mun</b>: is a 7-digit code that represents Brazilian municipalities.
<li> <b>uf</b>: the abbreviation that identifies the Brazilian state (federative unit) associated with the health data.
</ul>

In [5]:
shp_info = {
    'name': 'NM_MUN',
    'cod': 'CD_MUN',
    'uf': 'SIGLA'
}

<p style="text-align:justify; font-size: 16px;">
With the dictionary mapping the names of the municipal grid columns, we can define the the tuple that points to the information about the municipal grid to be used to add the spatial components to the health data.
</p>

In [6]:
shp_grid = (
    'cube4health/src/cube4health/ehipr/shp_malhas/default_grid/municipality/BR_municipality_2022.shp',
    shp_info
)

<p style="text-align:justify; font-size: 16px;">
In addition, we can also define the dictionary mapping the names of the municipal grid columns, we can define the list of tuples used to cut the grid and obtain data only for the regions of interest.
</p>

In [7]:
crop_geoms = [
    ('cube4health/src/cube4health/ehipr/shp_malhas/northeast/northeast.shp', shp_info),
    ('cube4health/src/cube4health/ehipr/shp_malhas/north/north.shp', shp_info)
]

<p style="text-align:justify; font-size: 16px;">
Finally, we can run the <b>spatilize_data</b> function to add the geometry to the data.
</p>

In [8]:
layers = ehipr.spatialize_data(indicators=indicators, 
                               input_path=input_path,
                               spatial_agg=spatial_agg, 
                               temp_agg=temporal_agg, 
                               file_crops_geom=crop_geoms,
                               data_columns=data_columns,
                               provider=provider)


SEPARATING indi_0015 BY SPATIAL AND TEMPORAL AGGREGATIONS...
 - Combination: mun_res - month
... Done


ORGANIZING THE DATASETS...: 100%|██████████| 1/1 [00:07<00:00,  7.22s/it]



SPATIALIZING DATA...
Cropping data by shapefile northeast...


Processing dataframes...:   0%|          | 0/1 [00:00<?, ?it/s]#=#=#                                                                          

/home/yuri/treinamento_2025/cube4health/src/cube4health/ehipr/shp_malhas/default_grid/municipality

SHAPEFILE OF THE MUNICIPALITY

PROGRESS BAR OF /home/yuri/treinamento_2025/cube4health/src/cube4health/ehipr/shp_malhas/default_grid/municipality/temp/BR_Municipios_2022.zip


######################################################################## 100.0%                                                           14.6%



/home/yuri/treinamento_2025/cube4health/src/cube4health/ehipr/shp_malhas/default_grid/municipality/BR_municipality_2022.shp



[A

[A[A


[A[A[A

[A[A
[A


[A[A[A

[A[A
[A






[A[A[A[A[A[A[A



[A[A[A[A




[A[A[A[A[A


[A[A[A





[A[A[A[A[A[A







[A[A[A[A[A[A[A[A








[A[A[A[A[A[A[A[A[A









[A[A[A[A[A[A[A[A[A[A






[A[A[A[A[A[A[A

[A[A
[A




[A[A[A[A[A





[A[A[A[A[A[A







[A[A[A[A[A[A[A[A








[A[A[A[A[A[A[A[A[A




[A[A[A[A[A

[A[A
[A



[A[A[A[A


[A[A[A
[A




[A[A[A[A[A

[A[A


[A[A[A



[A[A[A[A



[A[A[A[A


100%|██████████| 20/20 [00:00<00:00, 199.46it/s]
Creating the items files for each date: 100%|██████████| 288/288 [00:51<00:00,  5.64it/s]
Processing dataframes...: 100%|██████████| 1/1 [01:06<00:00, 66.81s/it]


Cropping data by shapefile north...


Processing dataframes...:   0%|          | 0/1 [00:00<?, ?it/s]
[A
[A


[A[A[A

[A[A
[A



[A[A[A[A




[A[A[A[A[A





[A[A[A[A[A[A






[A[A[A[A[A[A[A









[A[A[A[A[A[A[A[A[A[A







[A[A[A[A[A[A[A[A








[A[A[A[A[A[A[A[A[A










[A[A[A[A[A[A[A[A[A[A[A



[A[A[A[A


[A[A[A

[A[A


[A[A[A







[A[A[A[A[A[A[A[A











[A[A[A[A[A[A[A[A[A[A[A[A
[A




[A[A[A[A[A





[A[A[A[A[A[A






[A[A[A[A[A[A[A









[A[A[A[A[A[A[A[A[A[A








[A[A[A[A[A[A[A[A[A


[A[A[A











[A[A[A[A[A[A[A[A[A[A[A[A
[A

[A[A


[A[A[A



[A[A[A[A

[A[A


[A[A[A
[A



100%|██████████| 21/21 [00:00<00:00, 222.15it/s]
Creating the items files for each date: 100%|██████████| 288/288 [00:36<00:00,  7.90it/s]
Processing dataframes...: 100%|██████████| 1/1 [00:41<00:00, 41.91s/it]


Among the improvements introduced in this version, the standardization of attribute names for geometric features in Shapefile and GeoJSON files (for both health and climate data) stands out. This initiative was motivated by the feedback provided during the training held in June, which highlighted the lack of uniformity in attribute naming across datasets.

The new naming convention adopts clear and intuitive terms in English, facilitating data integration and analysis across different systems and research teams. Table 4 summarizes the standardized attribute names now applied to all vector data layers related to health and climate indicators.

<table id="vector_attributes_standard" align="center" width="100%">
    <caption style="text-align:center"><b>Table 5</b> - Standardized attribute names for vector data (Shapefile and GeoJSON)</caption>
    <tr style="background-color: #4e4d4dff; border-radius: 1em/5em;  font-size: 16px; color: white;">
        <th style="text-align: center;" width="20%">Attribute</th>
        <th style="text-align: center;" width="80%">Description</th>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">cod_mun</td>
        <td style="text-align: center; font-size: 16px">Municipality code according to IBGE (7-digit code, where the first two digits represent the state code).</td>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000">
        <td style="text-align: center; font-size: 16px">name_mun</td>
        <td style="text-align: center; font-size: 16px">Full name of the municipality.</td>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">uf_mun</td>
        <td style="text-align: center; font-size: 16px">Abbreviation of the federal unit (state) to which the municipality belongs.</td>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000">
        <td style="text-align: center; font-size: 16px">data_source</td>
        <td style="text-align: center; font-size: 16px">Name of the data provider or source (e.g., LIS, Infodengue).</td>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">name_indicator</td>
        <td style="text-align: center; font-size: 16px">Name of the indicator represented in the dataset (e.g., dengue, zika, chickungunya and Chagas disease indicators).</td>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000">
        <td style="text-align: center; font-size: 16px">epiweek_number / month_number</td>
        <td style="text-align: center; font-size: 16px">Epidemiological week number (for weekly aggregation) or month number (for monthly aggregation).</td>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">epiweek_start_date / month_start_date</td>
        <td style="text-align: center; font-size: 16px">Start date of the epidemiological week or first day of the month corresponding to the data record.</td>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000">
        <td style="text-align: center; font-size: 16px">time_agg</td>
        <td style="text-align: center; font-size: 16px">Temporal aggregation type (e.g., epidemiological week or month).</td>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">spatial_agg</td>
        <td style="text-align: center; font-size: 16px">Spatial aggregation unit (e.g., municipality).</td>
    </tr>
    <tr style="background-color: #D8D8D8; color: #000">
        <td style="text-align: center; font-size: 16px">value</td>
        <td style="text-align: center; font-size: 16px">Indicator value.</td>
    </tr>
    <tr>
        <td style="text-align: center; font-size: 16px">geom</td>
        <td style="text-align: center; font-size: 16px">Geometry column used in spatial databases (e.g., PostGIS) or geospatial libraries (e.g., GeoPandas).</td>
    </tr>
</table>

For month and epidemiological week indicators.

Note: In the Shapefile (.shp) format, the attribute names are limited to a maximum of 10 characters due to the dBase (.dbf) file structure. For this reason, some field names are shortened to ensure compatibility (for example, data_source becomes datasource, and name_indicator becomes indicator). This convention guarantees that the files can be correctly read by GIS software such as QGIS, GeoServer, and PostGIS.

In [4]:
import geopandas as gpd

# Load shapefile
gdf = gpd.read_file("/home/yuri/Docker-Compose/geoserver/data/health_indicators/lis/dengue_incidence_rate/month/municipality/items/2023-12-01/north/shapefile/dengue_incidence_NO_mun_month_20231201_20231231.shp")

# Print first 4 rows
gdf.head(4)

Unnamed: 0,code_mun,name_mun,uf_mun,data_sourc,name_indic,month_numb,month_star,time_agg,spatial_ag,value,geometry
0,150380,Jacundá,PA,lis,dengue_incidence_rate,12,2023-12-01 00:00:00,month,mun_res,2.53,"POLYGON ((-49.01793 -4.38605, -49.01366 -4.394..."
1,150178,Breu Branco,PA,lis,dengue_incidence_rate,12,2023-12-01 00:00:00,month,mun_res,0.0,"POLYGON ((-49.23882 -4.07631, -49.23941 -4.077..."
2,150210,Cametá,PA,lis,dengue_incidence_rate,12,2023-12-01 00:00:00,month,mun_res,9.81,"POLYGON ((-49.26145 -2.02594, -49.26058 -2.026..."
3,150330,Igarapé-Miri,PA,lis,dengue_incidence_rate,12,2023-12-01 00:00:00,month,mun_res,1.46,"POLYGON ((-48.91996 -1.94696, -48.91969 -1.947..."


### <span id="results" style="color:#336699">3. Plot results</span>
<hr style="border:1px solid #0077b9;">

It is possible to plot the data of the dataset with the plot method. To geojson files:

In [None]:
import folium
import glob
import os
import numpy as np
from branca.colormap import LinearColormap, StepColormap
import json

# Caminho raiz com subpastas contendo os GeoJSONs
path_geojson_root = "/home/yuri/Docker-Compose/geoserver/data/health_indicators/lis/dengue_incidence_rate/month/municipality/items/2023-12-01/north/"

# Pega o primeiro GeoJSON encontrado
files = sorted(glob.glob(os.path.join(path_geojson_root, "**", "*.geojson"), recursive=True))
if not files:
    raise FileNotFoundError(f"No GeoJSON files under: {path_geojson_root}")

geojson = files[0]

# Inicializa o mapa
m = folium.Map(location=[-2.242, -49.497], zoom_start=7)

# Lê o GeoJSON
with open(geojson, "r") as f:
    data = json.load(f)

# Extrai os valores da propriedade "value"
values = [float(feat["properties"].get("value")) for feat in data["features"] if feat["properties"].get("value") is not None]
vmin, vmax = np.nanmin(values), np.nanmax(values)

intervals = np.linspace(vmin, vmax, 6)
colors = ["#d9d9d9", "#f1b066", "#e56d46", "#c5444d", "#922042"]

cmap = StepColormap(
    colors=colors,
    index=intervals.tolist(),
    vmin=vmin,
    vmax=vmax,
    caption="Confirmed dengue cases"
)

def style_fn(feature):
    val = feature["properties"].get("value")
    if val is None:
        color = "#cccccc"
    else:
        val = float(val)
        if val < intervals[1]:
            color = colors[0]
        elif val < intervals[2]:
            color = colors[1]
        elif val < intervals[3]:
            color = colors[2]
        elif val < intervals[4]:
            color = colors[3]
        else:
            color = colors[4]

    return {
        "fillColor": color,
        "color": color,
        "weight": 0.5,
        "fillOpacity": 0.7,
    }

folium.GeoJson(
    data,
    style_function=style_fn,
    tooltip=folium.GeoJsonTooltip(
        fields=["name_mun", "value"],
        aliases=["Municipality:", "Value:"],
        localize=True
    )
).add_to(m)

cmap.add_to(m)
m


### <span id="references" style="color:#336699">4. Bibliographical references</span>
<hr style="border:1px solid #0077b9;">


<p id="ref_saldanha_2023" style='text-align: justify;'>Saldanha R, Xavier D, Pascoal V, Barros H, Gracie R, Magalhães M, Barcellos C (2023). bilis: An R package to calculate health indicators. <a href="https://rfsaldanha.github.io/bilis/">https://rfsaldanha.github.io/bilis/</a>, <a href="https://github.com/rfsaldanha/bilis/">https://github.com/rfsaldanha/bilis/</a>.</p>