Code for the accepted paper at the KDD AI4Science Track:
Knowledge-Guided Learning for Global Carbon Flux Prediction: Integrating High-Level Remote Sensing with Bottom-Up Physical Modeling
DERE is a knowledge-guided learning framework for global carbon flux prediction. It integrates process-based model simulations, high-level remote sensing observations, and in-situ flux measurements to predict carbon flux variables, including GPP, RECO, and NEE.
Process-based ecosystem models provide important physical knowledge for carbon flux prediction, but they often have limited flexibility to directly incorporate increasingly available observations. In particular, high-level remote sensing observations, such as aggregated plant functional type information, cannot be directly matched with bottom-up sub-processes in ecosystem models.
DERE addresses this challenge through a decomposition-and-resembling design. It first decomposes simulation outputs into physically meaningful intermediate components, and then resembles them with high-level remote sensing labels and sparse in-situ observations. The framework further uses probabilistic label expansion and uncertainty-aware finetuning to improve the use of sparse carbon flux measurements.
- Knowledge-guided decomposition-and-resembling framework
- Integration of process-based simulations, remote sensing labels, and in-situ observations
- High-level remote sensing supervision for bottom-up sub-processes
- CSDI-based probabilistic label expansion for sparse in-situ flux observations
- Uncertainty-aware fine-tuning with imputed observations
- Baseline and KGML implementations across multiple time-series backbones
- Comparison models including Transformer, Informer, FEDformer, iTransformer, TimeXer, and SimpleTM
DERE/
├── README.md
├── .gitignore
├── DERE-main/ # Main DERE pipeline and Transformer/Informer experiments
│ ├── 01_Baseline_Informer.py
│ ├── 01_Baseline_Transformer.py
│ ├── 02_KGML_Informer.py
│ ├── 02_KGML_Transformer.py
│ ├── Step01_DERE_Train_3PureModels_CompetitionModel.py
│ ├── Step02_DERE_Finetune_CompetitionModel.py
│ ├── Step03_DERE_Train_PFTModel.py
│ ├── Step04_DERE_Finetune_with_InSitu.py
│ ├── Step05_DERE_InSitu_imputation_CSDI-main/
│ ├── Step06_DERE_Finetune_with_InSitu_imputation.py
│ ├── data/
│ ├── exp/
│ ├── models/
│ └── utils/
├── FEDformer_iTransformer_TimeXer/ # FEDformer, iTransformer, and TimeXer experiments
└── SimpleTM/ # SimpleTM experiments
The main code for DERE is located in DERE-main/. This folder contains the proposed DERE pipeline, Transformer/Informer baselines, KGML variants, model definitions, experiment scripts, and utility functions.
The folders FEDformer_iTransformer_TimeXer/ and SimpleTM/ contain additional comparison models used in the paper.
The proposed DERE framework is implemented as a sequential pipeline:
-
Train pure component models and the competition model
DERE-main/Step01_DERE_Train_3PureModels_CompetitionModel.py -
Finetune the competition model
DERE-main/Step02_DERE_Finetune_CompetitionModel.py -
Train the PFT model using high-level remote sensing labels
DERE-main/Step03_DERE_Train_PFTModel.py -
Finetune with in-situ carbon flux observations
DERE-main/Step04_DERE_Finetune_with_InSitu.py -
Perform CSDI-based in-situ label imputation
DERE-main/Step05_DERE_InSitu_imputation_CSDI-main/ -
Finetune with imputed in-situ observations
DERE-main/Step06_DERE_Finetune_with_InSitu_imputation.py
The repository includes the baseline and KGML models used in the paper.
| Model | Location |
|---|---|
| Transformer | DERE-main/ |
| Informer | DERE-main/ |
| FEDformer | FEDformer_iTransformer_TimeXer/ |
| iTransformer | FEDformer_iTransformer_TimeXer/ |
| TimeXer | FEDformer_iTransformer_TimeXer/ |
| SimpleTM | SimpleTM/ |
Example scripts include:
DERE-main/01_Baseline_Transformer.py
DERE-main/01_Baseline_Informer.py
DERE-main/02_KGML_Transformer.py
DERE-main/02_KGML_Informer.py
FEDformer_iTransformer_TimeXer/01_Baseline_FEDformer.py
FEDformer_iTransformer_TimeXer/01_Baseline_iTransformer.py
FEDformer_iTransformer_TimeXer/01_Baseline_TimeXer.py
FEDformer_iTransformer_TimeXer/02_KGML_FEDformer.py
FEDformer_iTransformer_TimeXer/02_KGML_iTransformer.py
FEDformer_iTransformer_TimeXer/02_KGML_TimeXer.py
SimpleTM/01_Baseline_SimpleTM.py
SimpleTM/02_KGML_SimpleTM.py
The experiments use publicly available data sources, including in-situ carbon flux observations, process-based model simulations, and remote-sensing-derived plant functional type labels.
The in-situ flux observations are based on public carbon flux benchmark datasets such as CarbonSense. The process-based model data include input conditions and simulation outputs from CarbonGlobe. The high-level remote sensing labels are derived from ESA CCI plant functional type products.
Due to data size and data-sharing considerations, processed datasets are not included in this repository. Please refer to the paper for detailed data sources, preprocessing procedures, and citations. Before running the experiments, please update the default data paths in the scripts, such as Dataset/DeepED_dataset, according to your local data location.
If you find this repository useful, please cite:
@inproceedings{xu2026knowledge,
author = {Shuo Xu and Zhihao Wang and Ruohan Li and Ruichen Wang and Lei Ma and George C. Hurtt and Xiaowei Jia and Yiqun Xie},
title = {Knowledge-Guided Learning for Global Carbon Flux Prediction: Integrating High-Level Remote Sensing with Bottom-Up Physical Modeling},
booktitle = {Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
year = {2026},
address = {Jeju Island, Republic of Korea},
publisher = {ACM},
doi = {10.1145/3770855.3818927}
}For questions or feedback, feel free to reach out:
- Shuo Xu — shuoxu98@umd.edu
- Yiqun Xie — xie@umd.edu