LDATree
is an R modeling package for fitting classification trees. If
you are unfamiliar with classification trees, here is a
tutorial
about the traditional CART and its R implementation rpart
.
Compared to other similar trees, LDATree
sets itself apart in the
following ways:
-
It applies the idea of LDA (Linear Discriminant Analysis) when selecting variables, finding splits, and fitting models in terminal nodes.
-
It addresses certain limitations of the R implementation of LDA (
MASS::lda
), such as handling missing values, dealing with more features than samples, and constant values within groups. -
Re-implement LDA using the Generalized Singular Value Decomposition (GSVD), LDATree offers quick response, particularly with large datasets.
-
The package also includes several visualization tools to provide deeper insights into the data.
install.packages("LDATree")
The CRAN version is an outdated one from 08/2023. As of 06/2024, please use the command below for the current version, and the official updated CRAN release will be coming soon!
library(devtools)
install_github('Moran79/LDATree')
To build an LDATree:
library(LDATree)
set.seed(443)
mpg <- as.data.frame(ggplot2::mpg)
datX <- mpg[, -5] # All predictors without Y
response <- mpg[, 5] # we try to predict "cyl" (number of cylinders)
fit <- Treee(datX = datX, response = response, verbose = FALSE)
To plot the LDATree:
# View the overall tree.
plot(fit)
# Three types of individual plots
# 1. Scatter plot on first two LD scores
plot(fit, datX = datX, response = response, node = 1)
# 2. Density plot on the first LD score
plot(fit, datX = datX, response = response, node = 3)
# 3. A message
plot(fit, datX = datX, response = response, node = 2)
#> [1] "Every observation in this node is predicted to be 4"
To make predictions:
# Prediction only.
predictions <- predict(fit, datX)
head(predictions)
#> [1] "4" "4" "4" "4" "6" "6"
# A more informative prediction
predictions <- predict(fit, datX, type = "all")
head(predictions)
#> response node 4 5 6 8
#> 1 4 14 1 0 0 0
#> 2 4 6 1 0 0 0
#> 3 4 6 1 0 0 0
#> 4 4 6 1 0 0 0
#> 5 6 18 0 0 1 0
#> 6 6 15 0 0 1 0
If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub