Paper Link: https://doi.org/10.3390/rs18030466
TLDR: NOAH is a GenAI dataset that covers 8,742,469 km2 of non-overlapping land areas in Canada at 815 distinct locations under 5 modalities at 30 m spatial resolution, where each modality covers 40,000 km2.
Earth observation and Remote Sensing (RS) data are widely used in various applications, including natural disaster modeling and prediction.
Currently, there are two main types of satellites used in RS: geostationary and polar orbiting.
However, the coverage of geostationary satellites is limited to a smaller region.
Additionally, images from the polar orbiting satellites are discontinuous, which limits their effectiveness for real-time disaster modeling, especially in rapidly evolving situations like wildfires.
To address these limitations, we introduce Now Observation Assemble Horizon (NOAH), a multi-modal, sensor fusion dataset that combines Ground-Based Sensors (GBS) of weather stations with topography, vegetation (land cover, biomass, and crown cover), and fuel types.
NOAH is collated using publicly available Canadian data from Environment and Climate Change Canada (ECCC), Spatialized Canadian National Forest Inventory (SCANFI) and Landsat 8, which are well-maintained, documented, and reliable.
Models trained on NOAH can produce real-time data for disaster modeling in remote locations, complementing the use of field instruments and can be used for Generative Artificial Intelligence (GenAI) applications.
The baseline modeling was done on UNet backbone with Feature-wise Linear Modulation (FiLM) injection of GBS data.
Each image in NOAH is 100 MB+, with about 234K+, totaling to 20TB+ of data. Hence, a mini version of the NOAH is made accessible at Hugging Face. Full data access can be provided upon request. It should be noted that a physical hard drive will need to be shipped to make it possible. The code for the research can be accessed from GitHub.
This repository contains the following code:
- Collation of NOAH and NOAH mini datasets
- Code for data preprocessing
- Code for data splitting
- Code for data visualization
- Code for data modelling with UNet + FiLM
The dataset covers 8,742,469 Km2 of non-overlapping land areas in Canada at 815 distinct locations.
Each sample covers 40,000 km2.
The spatial resolution is 30 m.
A figure showing the coverage is given below.

| Name | Provider | Link |
|---|---|---|
| Topography | SCANFI | Source |
| Land Cover of Canada | NRCan | Source |
| Biomass | SCANFI | Source |
| Crown Cover | SCANFI | Source |
| Fuel Types | NRCan | Source |
| Landsat 8 | NASA & USGS | Source |
| Weather Station | ECCC | Source |
A sample of the data with diffrent modalities can be seen in the figure below.

A sample of the samaller verion of NOAH has been uploaded to Hugging Face.
Hugging FaceNOAH mini
To benchmark the results A UNet + FiLM apprach was used to account for the multi-modal dataset. The architecure is given below.
Following are a list of dependancies need to execute the code with ease:
- Python 3.10.12
- Jupyter Notebooks
- Docker
- Docker Compose
Ideally use the docker deployment to run the code as all the dependancies are preinstalled.
The code is available as notebooks
- Ensure
port 8899is open on your host machine.- Ensure you have the .env values configured according to your system requirements.
git clone https://github.com/Forest-Fire-Research/noah.git
cd noah
docker compose up -d
Open Notebook - localhost:8899 to run the code
You need to have running jupyter notebook environment and python installed
git clone https://github.com/Forest-Fire-Research/noah.git
cd noah
pip install -r requirements.txt
Once the requirements are installed you can run the code in notebooks
@inproceedings{
noah2026RemoteSensing,
title={NOAH: A Multi-Modal and Sensor Fusion Dataset for Generative Artificial Intelligence in Remote Sensing},
author={Abdul Mutakabbir, Chung-Horng Lung, Marzia Zaman, Darshana Upadhyay, Koreen Millard, Thambirajah Ravichandran, and Richard Purcell},
booktitle={Remote Sensing},
year={2026},
doi={https://doi.org/10.3390/rs18030466}
}
The research produced is part of ongoing collaborative work between Carleton University, University of Waterloo, and Dalhousie University with industry partners Cistel Tech nology and Hegyi Geomatics International Inc. Additional support was received from Research Computing Services at Carleton University.

