FWDNNet: Cross-Heterogeneous Encoder Fusion via Feature-Level TensorDot Operations for Land-Cover Mapping
β¨ Official PyTorch Implementation β¨
π 31-December-2025 Accepted at IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS) π
Key Features β’ Getting Started β’ Datasets β’ Results β’ Citation β’ Contact
+ π 31-December-2025: FWDNNet accepted for publication in IEEE TGRS!
+ π October-November 2025: Manuscript submitted to IEEE TGRS
+ π October 2023: sKwanda_V1,2 datasets publicly released
+ π March 2023: Code and datasets developed| Lead Authors |
|
Boaz MwubahimanaΒΉ Β· Graduate Student Member, IEEE |
| Co-Authors |
|
Swalpa Kumar RoyΒ³ Β· Senior Member, IEEE |
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, China
- Xinjiang Astronomical Observatory, Chinese Academy of Sciences, China
- Department of Computer Science and Engineering, Tezpur University, India
- Nicholas School of the Environment, Duke University, USA
- Center for Geographic Information Systems and Remote Sensing (CGIS), University of Rwanda
- Department of Geosciences and Natural Resource Management, University of Copenhagen, Denmark
- College of Geography and Remote Sensing, Hohai University, China
- AIMS Research and Innovation Centre & African Centre of Excellence in Data Science, University of Rwanda
- Rwanda Environment Management Authority (REMA), Rwanda
- WaterAid Rwanda, Kigali, Rwanda
- Water for People Rwanda, Kigali, Rwanda
- College of Engineering, Carnegie Mellon University, Rwanda
- Department of Environmental Sciences, Emory University, USA
π§ Corresponding Authors:
- Yan Jianguo (jgyan@whu.edu.cn)
- Dingruibo Miao (miaodrb@whu.edu.cn)
We present FWDNNet, a novel encoder-decoder architecture that integrates heterogeneous deep learning backbones through innovative TensorDot fusion modules for high-resolution land cover mapping. Unlike traditional fusion approaches that rely on simple concatenation or averaging, FWDNNet preserves tensor structures while enabling adaptive, probabilistic feature weighting across five specialized backbone encoders.
- TensorDot Fusion: High-order multilinear transformations that capture complex inter-architectural dependencies
- Probabilistic Attention: Variational inference-based adaptive backbone weighting
- Heterogeneous Integration: Seamless fusion of CNNs (ResNet34, InceptionV3, VGG16, EfficientNet-B3) and Transformers (Swin-T)
|
State-of-the-Art Accuracy +2.2% over best baseline |
Superior Segmentation +1.7% mIoU improvement |
Inference Efficiency 58.2ms per image |
Resource Efficient 12.85GB GPU memory |
| Metric | FWDNNet | Best Baseline | Improvement |
|---|---|---|---|
| π― Overall Accuracy | 95.3% | 93.1% | +2.2% β |
| π mean IoU (mIoU) | 91.8% | 90.1% | +1.7% β |
| β‘ Inference Time | 58.2ms | 73.8ms | -21.1% β |
| πΎ Memory Usage | 12.85GB | 95.74GB | -86.6% β |
| π’ Parameters | 35.0M | 41.0M | -14.6% β |
| π Transfer Score | 97.1% | 92.3% | +4.8% β |
graph TB
A[Input Image<br/>HΓWΓC] --> B1[ResNet34]
A --> B2[InceptionV3]
A --> B3[VGG16]
A --> B4[EfficientNet-B3]
A --> B5[Swin Transformer]
B1 --> C[TensorDot<br/>Fusion Module]
B2 --> C
B3 --> C
B4 --> C
B5 --> C
C --> D[Probabilistic<br/>Attention]
D --> E[Tucker<br/>Decomposer]
E --> F[Unified<br/>Decoder]
F --> G[Segmentation<br/>Output]
style C fill:#ff9999
style D fill:#99ccff
style E fill:#99ff99
style F fill:#ffcc99
1οΈβ£ Heterogeneous Encoders (Click to expand)
Five specialized backbone networks for parallel feature extraction:
| Encoder | Purpose | Key Feature |
|---|---|---|
| π· ResNet34 | Residual Learning | Deep feature extraction |
| πΆ InceptionV3 | Multi-scale | Multiple receptive fields |
| π΅ VGG16 | Hierarchical | Layer-wise features |
| π’ EfficientNet-B3 | Efficiency | Compound scaling |
| π£ Swin Transformer | Global Context | Shifted window attention |
2οΈβ£ TensorDot Fusion Module (Click to expand)
Mathematical Formulation:
π―_fused = π’ Γβ π―β Γβ π―β ... Γ_M π―_M
- Preserves tensor structure
- Captures high-order interactions
- Learnable core tensor π’
3οΈβ£ Probabilistic Attention (Click to expand)
Variational Inference Weighting:
π²_att = softmax(f_ΞΈ(π―β, π―β, ..., π―_M))
- Adaptive backbone selection
- Scene-dependent weighting
- Reduces feature redundancy
4οΈβ£ Multi-Objective Loss (Click to expand)
Comprehensive Loss Function:
β_total = β_focal + Ξ»ββ_consist + Ξ»ββ_uncert + Ξ»ββ_div + Ξ»ββ_sparse + Ξ»β
β_bound
- Focal loss for class imbalance
- Consistency regularization
- Uncertainty estimation
- Diversity promotion
- Boundary preservation
|
Urban Landscapes
|
Agricultural Lands
|
Great Plains
|
data/
βββ Dubai/
β βββ train/
β β βββ images/ # 512Γ512 RGB patches
β β βββ labels/ # Ground truth masks
β βββ val/
β β βββ images/
β β βββ labels/
β βββ test/
β βββ images/
β βββ labels/
βββ Nyagatare/
β βββ [same structure]
βββ Oklahoma/
βββ [same structure]
|
Core Dependencies |
Additional Packages |
# 1οΈβ£ Clone the repository
git clone https://github.com/YourUsername/FWDNNet.git
cd FWDNNet
# 2οΈβ£ Create conda environment
conda create -n fwdnnet python=3.7 -y
conda activate fwdnnet
# 3οΈβ£ Install PyTorch (CUDA 10.1)
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
# 4οΈβ£ Install other dependencies
pip install -r requirements.txt
# 5οΈβ£ Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"# Download datasets and models
bash scripts/download_data.sh
# Prepare dataset
python utils/prepare_dataset.py --dataset all
# Run quick test
python test_installation.py# Train on Dubai dataset (default config)
python train.py --dataset Dubai
# Train with custom config
python train.py --config configs/fwdnnet_dubai.yaml
# Multi-GPU training
python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset Dubaiπ Configuration File Example (Click to expand)
# configs/fwdnnet_dubai.yaml
model:
name: FWDNNet
encoders:
- resnet34
- inception_v3
- vgg16
- efficientnet_b3
- swin_t
fusion:
type: tensordot
tucker_rank: [64, 64, 64]
attention:
type: probabilistic
temperature: 0.1
training:
batch_size: 16
epochs: 200
learning_rate: 1e-3
optimizer:
name: AdamW
betas: [0.9, 0.999]
weight_decay: 0.01
scheduler:
name: ExponentialLR
gamma: 0.95
step: 10
early_stopping:
patience: 20
monitor: val_miou
loss:
focal_weight: 1.0
consistency_weight: 0.5
uncertainty_weight: 0.3
diversity_weight: 0.2
boundary_weight: 0.4
data:
input_size: [512, 512]
num_classes: 4
augmentation:
flip: 0.5
rotate: 45
elastic: true
gaussian_noise: 0.02# With Weights & Biases
python train.py --dataset Dubai --use_wandb
# With TensorBoard
python train.py --dataset Dubai --use_tensorboard
tensorboard --logdir=runs/# Resume from checkpoint
python train.py --dataset Dubai --resume checkpoints/fwdnnet_epoch_50.pth
# Resume with different learning rate
python train.py --resume checkpoints/fwdnnet_epoch_50.pth --lr 1e-4# Evaluate on test set
python test.py --dataset Dubai --checkpoint checkpoints/fwdnnet_best.pth
# Evaluate with visualization
python test.py --dataset Dubai --checkpoint checkpoints/fwdnnet_best.pth --visualize
# Cross-domain evaluation
python test.py \
--source_dataset Dubai \
--target_dataset Nyagatare \
--checkpoint checkpoints/fwdnnet_dubai.pth# Single image inference
python inference.py \
--input path/to/image.tif \
--checkpoint checkpoints/fwdnnet_best.pth \
--output results/prediction.png
# Batch inference
python inference.py \
--input_dir path/to/images/ \
--checkpoint checkpoints/fwdnnet_best.pth \
--output_dir results/
# Large-scale inference (tiled processing)
python inference_large.py \
--input large_image.tif \
--checkpoint checkpoints/fwdnnet_best.pth \
--tile_size 512 \
--overlap 50 \
--output result_mosaic.tif| Model | Accuracy (%) | mIoU (%) | F1-Score | Inference (ms) | Params (M) | Memory (GB) |
|---|---|---|---|---|---|---|
| ResNet-34 | 93.5 | 89.0 | 0.800 | 45.2 | 24.0 | 93.30 |
| InceptionV3 | 80.1 | 84.0 | 0.832 | 52.7 | 30.0 | 114.19 |
| VGG-16 | 82.0 | 88.0 | 0.729 | 38.4 | 24.0 | 90.61 |
| EfficientNet-B3 | 91.8 | 87.3 | 0.825 | 28.6 | 12.0 | 45.67 |
| Swin-T | 89.2 | 86.1 | 0.847 | 67.3 | 28.0 | 78.32 |
| SegFormer-B2 | 92.4 | 88.7 | 0.859 | 41.2 | 25.0 | 62.48 |
| HRNet-W32 | 94.1 | 90.1 | 0.862 | 73.8 | 41.0 | 95.74 |
| FWDNNet | 95.3 | 91.8 | 0.876 | 58.2 | 35.0 | 12.85 |
| Dubai | Nyagatare | Oklahoma |
|
|
|
| Source β Target | Source mIoU | Target mIoU | Transfer Score |
|---|---|---|---|
| Dubai β Nyagatare | 92.4% | 90.1% | 97.5% |
| Dubai β Oklahoma | 92.4% | 89.3% | 96.6% |
| Nyagatare β Oklahoma | 91.1% | 89.8% | 98.6% |
| Configuration | mIoU (%) | Ξ mIoU |
|---|---|---|
| Single Encoder (ResNet34) | 89.0 | - |
| Multi-Encoder (Avg) | 90.1 | +1.1% |
| + TensorDot Fusion | 91.3 | +2.3% |
| + Probabilistic Attention | 91.8 | +2.8% |
| Full FWDNNet | 91.8 | +2.8% |
| Metric | FWDNNet | HRNet-W32 | Improvement |
|---|---|---|---|
| π Training Time | 6.2h | 13.4h | -53.7% β¬οΈ |
| πΎ Memory Usage | 12.85GB | 95.74GB | -86.6% β¬οΈ |
| π’ Parameters | 35.0M | 41.0M | -14.6% β¬οΈ |
| π FLOPs | 45.2G | 52.1G | -13.2% β¬οΈ |
| π Throughput | 17.2 img/s | 13.5 img/s | +27.4% β¬οΈ |
(qualitative comparison figures)

(regional-scale inference results)
This work was supported by:
- National Natural Science Foundation of China (Grant Nos. 42241116 and 42071332)
- National Key R&D Program of China (Grant Nos. 2022YFF0503202 and 2022YFB3903605)
- Macau Science and Technology Development Fund (SKL-LPS(MUST)-2021-2023)
- Xinjiang Heaven Lake Talent Program (2022)
We acknowledge support from:
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University
- National Land Authority of Rwanda
- Mohammed Bin Rashid Space Centre (MBRSC) for satellite imagery
- U.S. Department of Agriculture's National Agriculture Imagery Program (NAIP) for aerial imagery
If you find this work useful in your research, please consider citing:
@ARTICLE{11343844,
author={Mwubahimana, Boaz and Jianguo, Yan and Miao, Dingruibo and Roy, Swalpa Kumar and Li, Zhuohong and Ma, Le and Kagoyire, Clarisse and Guo, Haonan and Mugabowindekwe, Maurice and Nyandwi, Elias and Nzayisenga, Isaac and Athanase, Hafashimana and Maridadi, Eugene and Nsengiyumva, Jean Baptiste and Byukusenge, Elie and Dukundane, Remy and Rwanyiziri, Gaspard and Huang, Xiao},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={FWDNNet: Cross-Heterogeneous Encoder Fusion via Feature-Level TensorDot Operations for Land-Cover Mapping},
year={2026},
volume={},
number={},
pages={1-1},
keywords={Remote sensing;Transformers;Computer architecture;Feature extraction;Semantic segmentation;Computational modeling;Computational efficiency;Semantics;Land surface;Faces;Convolutional neural networks (CNNs);deep learning;CNN-to-token conversion;TensorDot fusion;remote sensing (RS) segmentation},
doi={10.1109/TGRS.2026.3652451}}Our previous works on land cover mapping:
@ARTICLE{11124258,
author={Mwubahimana, Boaz and Jianguo, Yan and Miao, Dingruibo and Li, Zhuohong and Guo, Haonan and Ma, Le and Mugabowindekwe, Maurice and Roy, Swalpa Kumar and Huang, Xiao and Nyandwi, Elias and Joseph, Tuyishimire and Habineza, Eric and Mwizerwa, Fidele and Athanase, Hafashimana and Rwanyiziri, Gaspard},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={C2FNet: Cross-Probabilistic Weak Supervision Learning for High-Resolution Land Cover Enhancement},
year={2025},
volume={63},
number={},
pages={1-30},
keywords={Spatial resolution;Remote sensing;Land surface;Weak supervision;Training;Feature extraction;Annotations;Image resolution;Noise measurement;Earth;Coarse-to-fine networks (C2FNets);cross-resolution learning;deep neural networks;Earth observation;land cover mapping;probabilistic supervision;remote sensing;weakly supervised learning (WSL)},
doi={10.1109/TGRS.2025.3598681}}
@article{Mwubahimana19052025,
author = {Boaz Mwubahimana and Yan Jianguo and Maurice Mugabowindekwe and Xiao Huang and Elias Nyandwi and Joseph Tuyishimire and Eric Habineza and Fidele Mwizerwa and Dingruibo Miao},
title = {Vision transformer-based feature harmonization network for fine-resolution land cover mapping},
journal = {International Journal of Remote Sensing},
volume = {46},
number = {10},
pages = {3736--3769},
year = {2025},
publisher = {Taylor \& Francis},
doi = {10.1080/01431161.2025.2491816}
}- π¬ VHF-ParaNet: Vision Transformers Feature Harmonization [Code] [Paper]
- π GLC10 Dataset: Global Land Cover at 10m resolution [Link]
- π°οΈ Google Earth Engine: Satellite imagery access [Link]
- πΊοΈ ESRI Land Cover: Global land cover products [Link]
For questions, collaborations, or issues:
π§ Corresponding Authors:
- Yan Jianguo: jgyan@whu.edu.cn
- Dingruibo Miao: miaodrb@whu.edu.cn
π§ Lead Author:
- Boaz Mwubahimana: aiboaz1896@gmail.com | m.boaz@whu.edu.cn
π Issues & Contributions:
- Open an Issue
- Submit a Pull Request
Copyright (c) 2025 Wuhan University, State Key Laboratory of LIESMARS
This code and datasets are released for NON-COMMERCIAL and RESEARCH purposes only.
For commercial applications, please contact the corresponding authors:
- Yan Jianguo (jgyan@whu.edu.cn)
- Dingruibo Miao (miaodrb@whu.edu.cn)
Licensed under the MIT License for research purposes.
IEEE Style:
APA Style:
Mwubahimana, B., Yan, J., Miao, D., Roy, S. K., Li, Z., Ma, L., ... & Huang, X. (2025). FWDNNet: Cross-Heterogeneous Encoder Fusion via Feature-Level TensorDot Operations for Land-Cover Mapping. IEEE Transactions on Geoscience and Remote Sensing. [Accepted for publication]



