This repository presents RFOD, a novel Random Forest-based framework for Outlier Detection in mixed-type tabular data.
Rather than modeling a global joint distribution, RFOD reframes anomaly detection as a feature-wise conditional reconstruction problem, training dedicated random forests for each feature conditioned on the others. This design robustly handles heterogeneous data types while preserving the semantic integrity of categorical features. To further enable precise and interpretable detection, RFOD combines Adjusted Gower's Distance (AGD) for cell-level scoring, which adapts to skewed numerical data and accounts for categorical confidence, with Uncertainty-Weighted Averaging (UWA) to aggregate cell-level scores into robust row-level anomaly scores.
The algorithms presented herein are protected by patent and provided as is. To obtain access to the ZIP file, kindly contact the authors for the password. Please note that the use of this code is restricted to academic research purposes only. Users from the industry may test and evaluate this code by contacting the authors and obtaining a license. Thank you.
We would appreciate using the following BibTeX to cite this work when you use the source codes in your paper.
@inproceedings{ang2026rfod,
title={RFOD: Random Forest-based Outlier Detection for Tabular Data},
author={Ang, Yihao and Yao, Peicheng and Bao, Yifan and Feng, Yushuo and Huang, Qiang and Tung, Anthony KH and Huang, Zhiyong},
booktitle={2026 IEEE 42nd International Conference on Data Engineering (ICDE)},
year={2026}
}