Autoencoding Random Forests

Autoencoding Random Forests ('RFAE') provide a method to autoencode data using Random Forests ('RF'), which involves projecting the data to a latent feature space of chosen dimensionality (usually a lower dimension), and then decoding the latent representations back into the input space. The encoding stage is useful for feature engineering and data visualisation tasks, akin to how principal component analysis ('PCA') is used , and the decoding stage is usefulfor compression and denoising tasks. At its core, 'RFAE' is a post-processing pipeline on a trained random forest model. This means that it can accept any trained RF of ranger object type: 'RF', 'URF' or ARFs'. Because of this, it inherits RFs' robust performance and capacity to seamlessly handle mixed-type tabular data.

The package can be installed by running:

devtools::install_github("bips-hb/RFAE")

You can also clone the repository and run:

devtools::build("RFAE")

Examples

Using Fisher's iris dataset, we train a RF and pass it through the autoencoding pipeline:

# Set seed
set.seed(1)
# Split training and test
trn <- sample(1:nrow(iris), 100)
tst <- setdiff(1:nrow(iris), trn)
# Train RF
rf <- ranger::ranger(Species ~ ., data = iris[trn, ], num.trees=50)

Encode data and project test data to create new embeddings:

# Fit encoder object
emap <- encode(rf, iris[trn, ], k=2)
# Embed new test samples
emb <- predict(emap, rf, iris[tst, ])

Decode test samples back to the input space:

# Decode samples
out <- decode_knn(rf, emap, emb, k=5)$x_hat

Measure the reconstruction error between decoded and actual samples:

error <- reconstruction_error(out, iris[tst, ])

For more detailed examples, refer to the package vignette.

Python Library

The Python version of RFAE is currently under development. A preliminary version is currently available at RFAE_py

References

Vu, B. D., Kapar, J., Wright, M., & Watson, D. S. (2025). Autoencoding Random Forests. arXiv preprint arXiv:2505.21441. Link here - NeurIPS version coming soon!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
R		R
build		build
inst/doc		inst/doc
man		man
tests		tests
vignettes		vignettes
DESCRIPTION		DESCRIPTION
MD5		MD5
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autoencoding Random Forests

Examples

Python Library

References

About

Uh oh!

Releases

Packages

Languages

cran/RFAE

Folders and files

Latest commit

History

Repository files navigation

Autoencoding Random Forests

Examples

Python Library

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages