support for gamma and poisson regression #36

spedygiorgio · 2017-03-14T13:51:56Z

Hi @bgreenwell thank you so much for your awesome package... Really fantastic..

Nevertheless, we have noticed that it only supports gaussian regression and classification. Is it possible to further implement gamma and poisson regressions? I am using xgboost 0.6.x
Thank in advance for your attention
Giorgio

bgreenwell · 2017-03-14T16:07:38Z

Hello @spedygiorgio,

Certainly! It may be a few days before I can get around to it, but for now you can just use the pred.fun argument in the call to partial. The pred.fun argument is explained in more detail in an upcoming article for the r journal (https://github.com/bgreenwell/pdp-paper/blob/master/RJwrapper.pdf), but simply put, it lets you compute partial dependence function for a wider range of circumstances (survival models, non-gaussian models like GLMs that have different link functions, etc.). Here is a quick example using Poisson deviance:

# Setup
data(mtcars)
library(ggplot2)
library(pdp)
library(xgboost)
set.seed(101)
bst <- xgboost(data = as.matrix(mtcars[, -11]), label = mtcars[, 11],
               objective = "count:poisson", nrounds = 50)

# PDP prediction function for XGBoost with Poisson deviance
pfun <- function(object, newdata) {
  mean(exp(predict(object, newdata = as.matrix(newdata))))
}

# One variable
bst %>%
  partial(pred.var = "mpg", pred.fun = pfun, train = mtcars[, -11]) %>% 
  autoplot() +
  ylab("Number of carburetors") +
  theme_light()

# Two variables
pdp.mpg.hp <- partial(bst, pred.var = c("mpg", "hp"), pred.fun = pfun, 
                      chull = TRUE, train = mtcars[, -11])
autoplot(pdp.mpg.hp, contour = TRUE, legend.title = "Number of\ncarburetors")

In the meantime, please let me know if you need anything else. Thanks for using the package and thanks for offering useful feedback about how it can be improved.

Best,

Brandon

bgreenwell · 2017-03-14T16:09:53Z

Note, I used exp in my definition of pfun so that predictions would be on the response scale, rather than the link. It's not necessary, just my preference.

bgreenwell · 2017-03-14T16:29:39Z

I'll probably remove the restriction for non-Gaussian cases for all supported models and add an inv.link argument to partial; the default could be identity. So, I imagine something like the following:

partial(poisson.bst, pred.var = "mpg", pred.fun = pfun, train = mtcars[, -11], inv.link = exp)

spedygiorgio · 2017-03-31T10:13:30Z

Hi, @bgreenwell thank you for your feedback and sorry for my late answer. I believed to have alrready thanked you.

I would like to provide some more suggestions... Always on xgboost models, for some problems it is used to create data with weight (e.g. when modeling claim severity) and base margin (eg. offset for poisson rate regression). For example, I transform data.frame to xgboost matrices using the following function:

prepare_db_xgboost<-function(df,x_vars, y_var, offset_var, weight_var, na_code) {
  #force df as data frame
  df<-as.data.frame(df)
  previous_na_action <- options('na.action')
  options(na.action='na.pass')
 
 
  supplementaryVars<-character()
  if (!missing(offset_var)) supplementaryVars<-c(supplementaryVars,offset_var)
  if (!missing(weight_var)) supplementaryVars<-c(supplementaryVars,weight_var)
  
  vars2Keep<-c(x_vars,y_var,supplementaryVars)
  
  df<-df[,vars2Keep]
  
  # Matrici sparse
  sparse_all<-sparse.model.matrix(object = ~.-1., data = df)
  options(na.action=previous_na_action$na.action)
  
  # only predictors cols
  predictors_cols<-setdiff(colnames(sparse_all),c(y_var,supplementaryVars))
  
  #creating xgboost matrix, allowing for NA
  
  if (!missing(na_code)) {
    if (missing(weight_var)) {
      db_xgb_out <- xgb.DMatrix(data =sparse_all[,predictors_cols], label=sparse_all[,y_var], missing = na_code )
    } else {
      db_xgb_out <- xgb.DMatrix(data =sparse_all[,predictors_cols], label=sparse_all[,y_var],weight = sparse_all[,weight_var], missing = na_code )
    }
  } else {
    if (missing(weight_var)) {
      db_xgb_out <- xgb.DMatrix(data =sparse_all[,predictors_cols], label=sparse_all[,y_var])
    } else {
      db_xgb_out <- xgb.DMatrix(data =sparse_all[,predictors_cols], label=sparse_all[,y_var],weight = sparse_all[,weight_var])
    }
  }
  
  
  # adding possible offset
  
  
  if(!missing(offset_var)) {
    setinfo(db_xgb_out,"base_margin",sparse_all[,offset_var])
  }
  
  return(db_xgb_out)
}
train.xgb<-prepare_db_xgboost(df = train,x_vars = predictors,y_var = "premiotariffa",na_code = -1000000)

Does the pdp package allows for base_margin and weight?

bgreenwell · 2017-03-31T17:46:40Z

The partial function internally calls predict.xgb.Booster for "xgb.Booster" objects, so offsets and weights should be accounted for. I'll look into further this weekend to be sure.

spedygiorgio · 2017-03-31T17:48:37Z

Thank you really very much for your attention Il ven 31 mar 2017, 19:46 Brandon Greenwell <notifications@github.com> ha scritto:

…

The partial function internally calls predict.xgb.Booster for "xgb.Booster" objects, so offsets and weights should be accounted for. I'll look into further this weekend to be sure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#36 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyWxjwvQU_ex4ZkVtcG22evSV2M9nJgks5rrTwAgaJpZM4Mcmc5> .

bgreenwell · 2017-03-31T18:13:37Z

On second thought, partial creates predictions for new data points, so I guess the answer is no, but should still be possible using the pred.fun argument I mentioned previously. Let me see if I can throw together a small working example.

bgreenwell · 2017-04-04T13:20:01Z

Hi @spedygiorgio,

So it looks like the sample weights only contribute to the loss function while building an XGBoost model, so partial should take that into account. However, I am still trying to figure out how to incorporate offsets. partial makes predictions over a grid, so an offset would need to be supplied for each grid point and I'm not sure how realistic that is. I think, however, that the best option for incorporating offsets is to just compute PDPs for over the original training data. More to come and how this might be accomplished...

spedygiorgio · 2017-04-04T14:04:18Z

Hi, maybe we can assume a default value for the offset... For poisson regression I know that it is zero, so I would expose it as a function parameter with default value zero. Also we can assume a default value of 1 for weight... Il mar 4 apr 2017, 15:20 Brandon Greenwell <notifications@github.com> ha scritto:

…

Hi @spedygiorgio <https://github.com/spedygiorgio>, So it looks like the sample weights only contribute to the loss function while building an XGBoost model, so partial should take that into account. However, I am still trying to figure out how to incorporate offsets. partial makes predictions over a grid, so an offset would need to be supplies for each grid point and I'm not sure how realistic that is. I think, however, that the best option for incorporating offsets is to just compute PDPs for over the original training data. More to come and how this might be accomplished... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#36 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyWxi4sEwoyppRiY5qpwxiYqwsdpnpCks5rskOCgaJpZM4Mcmc5> .

bgreenwell · 2017-04-19T18:38:20Z

Closing this issue. Opened another regarding handling offsets (#38).

bgreenwell closed this as completed Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for gamma and poisson regression #36

support for gamma and poisson regression #36

spedygiorgio commented Mar 14, 2017

bgreenwell commented Mar 14, 2017 •

edited

Loading

bgreenwell commented Mar 14, 2017

bgreenwell commented Mar 14, 2017 •

edited

Loading

spedygiorgio commented Mar 31, 2017 •

edited

Loading

bgreenwell commented Mar 31, 2017

spedygiorgio commented Mar 31, 2017 via email

bgreenwell commented Mar 31, 2017

bgreenwell commented Apr 4, 2017 •

edited

Loading

spedygiorgio commented Apr 4, 2017 via email

bgreenwell commented Apr 19, 2017

support for gamma and poisson regression #36

support for gamma and poisson regression #36

Comments

spedygiorgio commented Mar 14, 2017

bgreenwell commented Mar 14, 2017 • edited Loading

bgreenwell commented Mar 14, 2017

bgreenwell commented Mar 14, 2017 • edited Loading

spedygiorgio commented Mar 31, 2017 • edited Loading

bgreenwell commented Mar 31, 2017

spedygiorgio commented Mar 31, 2017 via email

bgreenwell commented Mar 31, 2017

bgreenwell commented Apr 4, 2017 • edited Loading

spedygiorgio commented Apr 4, 2017 via email

bgreenwell commented Apr 19, 2017

bgreenwell commented Mar 14, 2017 •

edited

Loading

bgreenwell commented Mar 14, 2017 •

edited

Loading

spedygiorgio commented Mar 31, 2017 •

edited

Loading

bgreenwell commented Apr 4, 2017 •

edited

Loading