---
title: Geospatial Data Workflows
---

Or Geospatial Data for Climate and Health.....

Table of contents:

- [Geospatial Data Types](#geospatial-data-types)
- [Combining and Layering GIS Data](#combining-and-layering-gis-data)
- [Coordinates and Projections](#coordinates-and-projections)
- [Ways of Working with GIS Data](#ways-of-working-with-gis-data)
- [List of Resources](#list-of-resources)

------------------
## Why Geospatial Data Matters for Climate and Health

Climate, weather, and environmental data are inherently geospatial. They describe how conditions vary across space and time, rather than being tied to individual people or records.

In health information systems like DHIS2, data is usually organized by administrative units and time periods. Climate data, on the other hand, is often provided as gridded datasets covering large geographic areas at regular spatial and temporal resolutions.

Bringing these two worlds together requires more than simple data import. It involves combining geospatial datasets, aligning them to common spatial units, and transforming them into formats that can be analyzed and visualized within DHIS2.

This page will take you through the theory around a geospatial data workflow - covering important concepts such as data types, spatial resolution, and aggregation - to make it easier to work with climate and environmental data in DHIS2, and to follow the guides and workflows provided by DHIS2 Climate Tools.

------------------

## Geospatial Data Types

Most geographic data can be represented in two main formats:

1. Vector data represents discrete features using points, lines, or polygons. Examples include administrative boundaries, roads, or health facility locations.
2. Raster data represents continuous data on a grid of cells or pixels. Examples include rainfall, temperature, or elevation data. Each pixel has a value that represents the measurement at that location.

### 1. Vector data

When working with data in DHIS2, we're generally used to working with simple tabular data. 

However, DHIS2 also contains some geospatial data in the form of **vector data**. 

Vector data are geospatial objects with clearly defined geometry boundaries, either as points, lines, or polygons.

In the context of DHIS2, examples of vector data include:

- the polygons of DHIS2 organisation units
- the point locations of DHIS2 health facilities

![DHIS2 Polygons](./images/dhis2-polygons.png)

Organisation units are just like any other table in DHIS2, but in addition to storing attributes like `name` or `id`, they also contain an added column containing the `geometry` information:

![Table data with geometry column](./images/table-with-geoms.png)

#### File formats

Vector data typically comes in two forms:

- **Shapefiles**: widely used for many years, and is actually a collection of files (`.shp`, `.dbf`, `.prj`, and more), often contained in a zipfile.
- **GeoJSON files**: typically indicated by the `.geojson` extension, which although not very efficient are very popular and easy to work with, especially over the internet. 

There are also many other and more modern file formats, and although these are generally smaller and more efficient, you are unlikely to come across many of these in the real world. 

### 2. Raster

If vector data represents distinct geometry objects, **raster data** represents gradually continuguous data on a regular grid of pixels. This is what we talk about as gridded climate data showing e.g. how precipitation or temperature is distributed over a surface. 

In the image below, you see a screenshot from the DHIS2 Maps App showing a precipitation data layer:

![Screenshot of Precipitation Layer in DHIS2 Maps App](images/dhis2-raster.png)

Raster data is really just a grid or matrix of data values, similar to how an image contains pixel values. The main difference is that raster data also has additional **georeferencing information** that defines the coordinates of the grid -- this is what makes it geospatial. Among other things, this defines the coordinate of the upper left corner of the pixel grid, and the **spatial resolution** of each raster pixel. 

#### File formats

- **GeoTIFF**: Raster data commonly comes in the form of GeoTIFF which is an image format (`.tif` extension), with added metadata to store the georeferencing information. This typically contains only a single raster layer, e.g. like the population data shown previously. 

- **NetCDF**: Another common raster file format, especially for climate data, is NetCDF (`.nc` extension). This is a more efficient format and can store up to multiple data variables and stacks of timeseries rasters: 

A large variety of other file formats also exist, but these are the ones you are most likely to come across. 

--------------------------------

## Spatial Resolution and Scale

When working with raster data, an important concept is spatial resolution. Spatial resolution describes the size of each pixel in a raster dataset, usually expressed in degrees or meters.

For example:

- A raster with a 0.1° resolution has relatively large pixels and shows coarse spatial detail.
- A raster with a 0.01° resolution has smaller pixels and captures finer spatial variation.

Higher-resolution data can show more local detail, but it also results in larger datasets and higher computational cost.

In climate and health workflows, spatial resolution matters because climate data is often provided on a grid that does not match the boundaries used in DHIS2. To make climate data usable in DHIS2, raster values typically need to be aggregated to administrative units, such as districts or regions.

Understanding spatial resolution helps explain why aggregation is a necessary step in most Climate Tools workflows.

---------------------------------------
## Combining Climate Data with DHIS2 Organisation Units

The real power of GIS comes from the ability to combine multiple datasets based on their geographic location.

In climate and health use cases, this often means combining:

- Raster climate data, such as rainfall or temperature
- Vector data, such as administrative boundaries or facility locations

By overlaying these datasets, you can calculate summary statistics, such as average rainfall per district or total precipitation within a catchment area. These operations rely on spatial relationships like overlap and containment, rather than matching rows in a table.

Most Climate Tools workflows follow this general pattern:

- Load climate data as rasters
- Load administrative boundaries as vectors
- Aggregate raster values to the vector geometries
- Produce tabular results that can be imported into DHIS2

This process is a core building block for integrating climate and environmental data into DHIS2.

DHIS2 Climate Tools focuses on supporting these harmonization steps in a reproducible and transparent way. Rather than hiding complexity, the toolkit exposes the key steps needed to transform external geospatial data into DHIS2-ready datasets.

Understanding this harmonization process helps explain both the structure of the guides and the design of the example workflows provided in Climate Tools.

-----------------------------------
## How This Fits with Climate Tools

While GIS software like QGIS or the DHIS2 Maps App is well suited for visualization and exploratory analysis, Climate Tools focuses on programmatic workflows.

Using Python-based tools makes it possible to:

- Repeat analyses consistently
- Process large volumes of climate data
- Automate routine tasks
- Integrate results directly into DHIS2

The concepts introduced on this page provide the foundation needed to follow the tutorials and workflows in the rest of the documentation, without requiring prior GIS expertise.

----------------------------------------

## Combining and Layering GIS Data

One of the central ideas in GIS is that you can overlay and combine different datasets based on their location. This allows you to explore relationships between data that exist in the same geographic space.

For example:

- You might have rainfall data as a raster and administrative boundaries as a vector layer. By overlaying them, you can calculate the average rainfall per district.
- You could combine health facility locations (points) with flood-prone areas (polygons) to see which facilities are at risk.

These operations rely on spatial relationships, such as:

- Intersection – where features overlap
- Containment – where one feature is completely inside another
- Proximity / distance – how close features are to each other

In Python, these operations can be performed programmatically using libraries like geopandas for vectors and xarray or rasterio for rasters. Climate Tools builds on these libraries to provide workflows that let you combine datasets efficiently and reproducibly, turning raw climate or environmental data into actionable information for DHIS2.

### Zonal statistics: Aggregating raster to vector data

How does this impact us when trying to integrate climate data into DHIS2? Most of the time climate data is gridded raster data, and we want to map that onto DHIS2 organisation units which are vector geometries. 

So the question becomes how do we do we combine raster and vector data? This is what we refer to as harmonizing the data. In GIS this is typically called zonal statistics, and the process goes something like this:

- Make sure that the vector data and raster data are in the same coordinate system so they can be compared.
- Figure out which pixels land inside each vector geometry, such as each organisation unit.
- Then those pixels are aggregated based on some statistic like sum or mean.
- The aggregated value for each vector geometry is attached to the original table containing the organisation units.

![Zonal statistics](./images/zonal-stats.png)

---------------

## Coordinates and Projections

The location of vector and raster data is based on the coordinate reference system (CRS) of the data. This information is typically embedded as metadata in the file itself or in auxilliary files. 

### Latitude-longitude coordinates

Often, coordinates are given as latitude longitude coordinates, a way of dividing the earth between decimal degrees ranging from -180 to 180 longitudes (west to east) and -90 to 90 (south to north):

![Dividing earth into latitude-longitudes](./images/lat-long.png)

When data is stored in latitude longitude coordinates, a useful way to think of these coordinates is that every 0.1 decimal degree is approximately equivalent to 10km at the equator (though this distance decreases towards the poles). This means that a raster dataset with 0.01 degree resolution has pixels with the size of 1km near the equator. The [Wikipedia entry on Decimal Degrees](https://en.wikipedia.org/wiki/Decimal_degrees) has a useful overview table of decimal distances in metric units. 

You can typically tell that data is defined as latitude-longitude when you observe:

- coordinate values that fit within the range of valid decimal degrees.
- the CRS name is given as "WGS1984".
- the CRS EPSG Code is given as 4326. 

### Projected x-y coordinates

Other times your data are defined by instead using mathematical equations to project the earth's 3-dimensional sphere to a 2-dimensional flat x-y surface. Different coordinate systems are used to better visualize data for particular regions of the world or different use-cases. This is why there are many different shapes and looks for maps:

![Comparisons of world map projections](./images/projections.png)

You can typically tell it's x-y projected data by the very large values of the coordinates, e.g. millions. 

---------------------------------

## Ways of Working with GIS data

Geospatial data can be worked with using different kinds of tools, depending on the task.

### Desktop software

Traditionally GIS has been limited to desktop software. The two most widely used general-purpose GIS software today include: 
- [ArcGIS Pro](https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview) is a desktop GIS software by ESRI, the biggest commercial geospatial company today. It's focused on user-friendlyness, but often limited by a steep price point. 
- [QGIS](https://qgis.org/) is a free and open-source GIS, used in many parts of the world and sectors. It's focused on being freely available and feature-rich. 

### Web-based GIS

With the growth of performant online applications, Javascript mapping technologies such as [OpenLayers](https://openlayers.org/), [Leaflet](https://leafletjs.com/), [MapBox](https://www.mapbox.com/) has enabled fully fledged GIS applications in the cloud. 

For DHIS2-users, we have the [Maps App](https://docs.dhis2.org/en/use/user-guides/dhis-core-version-239/analysing-data/maps.html). This comes packaged with multiple climate and environmental data sources that can be layered on top of each other, and comparing these to DHIS2 organisation units and aggregated data elements. Although this computes climate data statistics for DHIS2 organisation units, this is primarily for visualization and exploration purposes. GIS operations and computations is not possible. 

### Programming Tools

Most programming languages also have a range of geospatial libraries for programmatically working with geospatial data. Python in particular has become very popular for working with geospatial data. This allows for greater automation than traditional GIS software. 

This is where Climate Tools fits in. 

--------------

## List of Resources

This guide has provided only a basic introduction to the most fundamental concepts you need to get started with GIS. For further details and more advanced topics, check out our [list of resources](../resources.md) which includes various training materials and books on the subject. 