From 2a383a070395ac14eda24af72eb4524729fc5ae2 Mon Sep 17 00:00:00 2001
From: "d.pascualhe" Neurocomputing, 2024 David Pascual-Hernández1, Sergio Paniego1, Roberto Calvo-Palomino1
Expert Systems with Applications, 2026
+ ++ David Pascual-Hernández1, + Sergio Paniego1, + Roberto Calvo-Palomino1, + Inmaculada Mora-Jiménez1, + Jose Maria Cañas-Plaza1 +
DOI: 10.1016/j.eswa.2026.132656
## Abstract - +Intelligent autonomous driving in off-road environments is an emerging field with great potential to impact areas such as agriculture, forestry, and rescue operations. Perception in these scenarios presents unique challenges due to the diversity of elements and weather conditions, along with the inherent ambiguity in class definitions. Consequently, off-road visual semantic segmentation datasets remain underdeveloped, roughly ten times smaller than their urban counterparts, hindering dependable performance assessment and potentially compromising the safety of autonomous systems. To address these challenges, we present a comprehensive cross-dataset evaluation of visual semantic segmentation models for autonomous off-road navigation. We propose a unified ontology that harmonizes class definitions across relevant datasets, enabling their combination for both training and testing. This approach ensures fair model comparisons and reliable assessment of generalization to unseen domains. We further benchmark models on the original datasets, analyze the impact of different ontology harmonization criteria and conversion strategies, and evaluate the trade-off between segmentation performance and computational cost. Results show that Transformer-based architectures achieve the most consistent segmentation performance across datasets. While often computationally demanding, some variants maintain real-time inference (≈12 ms) with top-tier accuracy. The unified ontology simplifies the segmentation task, yielding more reliable models and about 40% faster training convergence. Cross-dataset training further enhances generalization, improving mean IoU by up to +20% on RUGD and +13% on WildScenes compared to RELLIS-3D-only training. Overall, this study provides valuable insights for developing robust perception modules for off-road autonomous vehicles. +
-
+
+
+ + Examples of the ontology conversion proposed for enabling cross-dataset evaluation. +
+
+ + Overview of our cross-dataset training and evaluation pipeline. +
+
+
+
+ + mIoU vs. average inference time per image. Models trained on the combined RELLIS-3D and GOOSE train datasets, and evaluated on RUGD (a) and WildScenes (b) complete datasets. Bubble size represents the number of parameters for each model. Labels indicate model names. Bold labels highlight Pareto-optimal models. +