Multi-class Classification #16

JackStat · 2015-04-10T18:00:26Z

Is there any work on adding multi-class classification capabilities? Maybe we could start something with gbm.

ecpolley · 2015-04-13T20:21:15Z

I haven't been working on adding multi-class classification capabilities to the existing code. In practice, I often split the multi-class problem into a collection of binary classification problems. Say you have 3 classes (A, B, and C), you could fit binary classifiers for A vs B or C, B vs A or C, and C vs A or B then combine the results to make a classification passed on the highest probability estimate. The probability estimates are not correct because they are not contained to sum to 1, but the approach does allow flexibility for the classifier for the different categories. Here is a quick example

## multi-class classification
library(SuperLearner)
set.seed(843)
N <- 100

# outcome
Y <- sample(c("A", "B", "C"), size = N, replace = TRUE, prob = c(.1, .5, .4))

# variables
X1 <- rnorm(n = N, mean = (as.numeric(Y == "A") + .5*(as.numeric(Y == "C"))), sd = 1)
X2 <- rnorm(n = N, mean = (as.numeric(Y == "B")), sd = 1)
X3 <- rnorm(n = N, mean = (-1*as.numeric(Y == "B" | Y == "C")), sd = 1)
X4 <- rnorm(n = N, mean = X2, sd = 1)
X5 <- rnorm(n = N, mean = (X1*as.numeric(Y == "A") + as.numeric(Y == "A" | Y == "C")), sd = 1)

DAT <- data.frame(X1, X2, X3, X4, X5)


# test Data
# outcome
M <- 10000
Y_test <- sample(c("A", "B", "C"), size = M, replace = TRUE, prob = c(.1, .5, .4))

# variables
X1_test <- rnorm(n = M, mean = (as.numeric(Y_test == "A") + .5*(as.numeric(Y_test == "C"))), sd = 1)
X2_test <- rnorm(n = M, mean = (as.numeric(Y_test == "B")), sd = 1)
X3_test <- rnorm(n = M, mean = (-1*as.numeric(Y_test == "B" | Y_test == "C")), sd = 1)
X4_test <- rnorm(n = M, mean = X2_test, sd = 1)
X5_test <- rnorm(n = M, mean = (X1_test*as.numeric(Y_test == "A") + as.numeric(Y_test == "A" | Y_test == "C")), sd = 1)

DAT_test <- data.frame(X1 = X1_test, X2 = X2_test, X3 = X3_test, X4 = X4_test, X5 = X5_test)

# figure
# library(GGally)
# DAT2 <- data.frame(Y, DAT)
# ggpairs(DAT2, color = "Y")

# create the 3 binary variables
Y_A <- as.numeric(Y == "A")
Y_B <- as.numeric(Y == "B")
Y_C <- as.numeric(Y == "C")

# simple library, should include more classifiers
SL.library <- c("SL.gbm", "SL.glmnet", "SL.glm", "SL.knn", "SL.gam", "SL.mean")

# least squares loss function
fit_A <- SuperLearner(Y = Y_A, X = DAT, newX = DAT_test, SL.library = SL.library, verbose = FALSE, method = "method.NNLS", family = binomial(), cvControl = list(stratifyCV = TRUE))
fit_B <- SuperLearner(Y = Y_B, X = DAT, newX = DAT_test, SL.library = SL.library, verbose = FALSE, method = "method.NNLS", family = binomial(), cvControl = list(stratifyCV = TRUE))
fit_C <- SuperLearner(Y = Y_C, X = DAT, newX = DAT_test, SL.library = SL.library, verbose = FALSE, method = "method.NNLS", family = binomial(), cvControl = list(stratifyCV = TRUE))

SL_pred <- data.frame(pred_A = fit_A$SL.predict[, 1], pred_B = fit_B$SL.predict[, 1], pred_C = fit_C$SL.predict[, 1])
Classify <- apply(SL_pred, 1, function(xx) c("A", "B", "C")[unname(which.max(xx))])
table(Classify, Y_test)

ledell · 2015-04-13T22:07:26Z

Multi-class classification is something I have thought about adding. A reasonable way to implement this is using multiple response linear regression (MLR). Details in this paper: https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume10/ting99a.pdf

JackStat · 2015-04-14T16:02:34Z

You should be able to optimize weights of different models given the multi-class logloss function right?

ecpolley · 2015-04-14T20:13:29Z

Yes, if each base learner in the library output a vector of predicted probabilities for the classes, you could define a convex combination of the predicted probabilities based on minimizing the V-fold cross-validated multi-class log loss estimate. Can you suggest some examples for base learners that return probability vectors?

JackStat · 2015-07-12T03:13:55Z

Sorry for the long delay. Here are a couple. randomForest is probably easiest.



library(xgboost)

param <- list("objective" = "multi:softprob",
              "eval_metric" = "mlogloss",
              "num_class" = 9)

y = iris[,'Species']
y = as.numeric(y)

x = iris[,1:4]
x = as.matrix(x)
x = matrix(as.numeric(x),nrow(x),ncol(x))

bstG = xgboost(param=param, data = x, label = y, nrounds=100)

xgG = predict(bstG,x)
xgG = matrix(xgG,4,length(xgG)/4)
xgG = t(xgG)


####################


library(randomForest)

rr <- randomForest(Species ~ ., iris)
predict(rr, type = 'prob')

ck37 · 2016-04-04T01:32:33Z

Polymars was also designed specifically for multiple classification (http://projecteuclid.org/euclid.aos/1031594728 part 6 on "polyclass").

ck37 · 2016-10-07T16:49:30Z

Looks like the code for a bunch of wrappers already exists! We just need to integrate it:

ae-tate · 2020-11-26T09:10:37Z

Was this ever implemented? I keep running into errors when trying it out with SL.glmnet. The links ck37 posted are unfortunately down.

mrubinst757 · 2021-09-01T18:01:59Z

Same question; would be great if this were an option

JackStat changed the title ~~Multiclass problems "enhancement"~~ Multi-class Classification Apr 10, 2015

ck37 mentioned this issue Jun 14, 2016

Add xgboost support #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-class Classification #16

Multi-class Classification #16

JackStat commented Apr 10, 2015

ecpolley commented Apr 13, 2015

ledell commented Apr 13, 2015

JackStat commented Apr 14, 2015

ecpolley commented Apr 14, 2015

JackStat commented Jul 12, 2015

ck37 commented Apr 4, 2016

ck37 commented Oct 7, 2016

ae-tate commented Nov 26, 2020

mrubinst757 commented Sep 1, 2021

Multi-class Classification #16

Multi-class Classification #16

Comments

JackStat commented Apr 10, 2015

ecpolley commented Apr 13, 2015

ledell commented Apr 13, 2015

JackStat commented Apr 14, 2015

ecpolley commented Apr 14, 2015

JackStat commented Jul 12, 2015

ck37 commented Apr 4, 2016

ck37 commented Oct 7, 2016

ae-tate commented Nov 26, 2020

mrubinst757 commented Sep 1, 2021