Add xgboost support #35

ck37 · 2016-06-10T03:28:55Z

Hi,

This adds support for XGBoost learners. XGBoost is the most popular algorithm on Kaggle and is a version of GBM.

It's important to test different hyperparameters for Xgboost/GBM so I included the create.SL.xgboost function in this code. That function also supports defining the functions in an environment other than GlobalEnv, which makes it cleaner to generate many configurations without polluting the global environment.

Happy to make any other tweaks that are preferred.

Thanks,
Chris

ecpolley · 2016-06-10T18:02:59Z

Thanks Chris, this algorithm had been on my todo list for a new wrapper.

In most cases, I've been trying to have new wrappers be added to the SuperLearnerExtra directory. The goal being to keep the primary SuperLearner package easier to maintain and have fewer dependencies, plus having a public space (SuperLearnerExtra) where people might be more willing to contribute wrappers they put together for specific projects.

ledell · 2016-06-14T20:49:01Z

Eric, have you thought about putting SuperLearnerExtra on CRAN and adding it to the Suggests section? Unfortunately, I don't think many SuperLearner users are aware it exists. Personally, I think it would be great to fold all the libraries in SuperLearnerExtra into the main SuperLearner package. I used to be a fan of being more modular with R packages, but lately I'm trending the other way because I think people don't realize that there are cool "extras" that are not included.

ledell · 2016-06-14T20:52:04Z

Chris, Nice work on adding the xgboost wrapper. The one thing I'd modify is the family$family == "multinomial" since SuperLearner only supports regression and binary classification.

ck37 · 2016-06-14T21:54:17Z

Yeah, I think that moving this or other key wrappers to another repository is much worse for the user. It's just more work & complexity for them, and given how simple the wrappers are it doesn't seem to avert much maintenance. I will be happy to work on any bugs with the wrappers I submit in any case (I have ~4 more years left in grad school so I'm not going anywhere). I can also expand the automated checks more if preferred.

I agree with Erin that SuperLearnerExtra is not well-known among users, and it doesn't seem like it needs to be. It seems reasonable for models that are no longer on CRAN, like DSA or BART, but it makes SuperLearner hard to use for mainstream wrappers.

And since superLearnerExtra is not even a package you would have to clone the repo manually (if the user has git) or re-download a zip file to get regular updates. That is why SuperLearnerExtra can't be added to the "Suggests" line. There is also no way to run automated tests like in the SuperLearner package, which is important to maintain quality of the wrappers over time, or maintain software versioning for reproducibility of research.

Erin, thanks for checking out the code. The reason I included the multinomial snippet is to provide a starting point for eventually supporting multiple classification per #16. It should be able to hang out there until/if that feature is implemented. But happy to remove that part if preferred.

Thanks,
Chris

ecpolley · 2016-06-16T19:18:41Z

I agree SuperLearnerExtra hasn't been well utilized and might be worth re-thinking the idea. I've tried to highlight it every time I teach the package. I'll clarify that SuperLearnerExtra isn't intended to be a stand alone package, but more a repository of helpful code pieces (maybe SuperLearnerGist would be a better name?). I was hoping a lower bar of entry, versus formal package requirements, would make people more likely to share SuperLearner code from their projects.

The idea of "key wrappers" is very context specific. My idea for the primary SuperLearner package is to set up the framework for the cross-validation and the ensemble component/loss functions (although this is flexible now with the method.* functions), but make it as easy as possible to bring in your own algorithms to are suitable for the specific research question. My view is that the included wrappers are a guide to help the user create their own wrappers for a project. I realize this can make it difficult for a new user, probably compounded by my poor documentation of the functions.

I'll add these wrappers (how can I say no to a ~4 year commitment from Chris!) but thought I would clarify my view and ask for comments/suggestions.

Thanks!
Eric

ck37 · 2016-06-16T23:36:33Z

Sounds very fair, thanks Eric.

Support XGBoost

e439ba1

ecpolley merged commit e164242 into ecpolley:master Jun 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xgboost support #35

Add xgboost support #35

ck37 commented Jun 10, 2016

ecpolley commented Jun 10, 2016

ledell commented Jun 14, 2016

ledell commented Jun 14, 2016

ck37 commented Jun 14, 2016

ecpolley commented Jun 16, 2016

ck37 commented Jun 16, 2016

Add xgboost support #35

Add xgboost support #35

Conversation

ck37 commented Jun 10, 2016

ecpolley commented Jun 10, 2016

ledell commented Jun 14, 2016

ledell commented Jun 14, 2016

ck37 commented Jun 14, 2016

ecpolley commented Jun 16, 2016

ck37 commented Jun 16, 2016