Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xgboost support #35

Merged
merged 1 commit into from
Jun 16, 2016
Merged

Add xgboost support #35

merged 1 commit into from
Jun 16, 2016

Conversation

ck37
Copy link
Contributor

@ck37 ck37 commented Jun 10, 2016

Hi,

This adds support for XGBoost learners. XGBoost is the most popular algorithm on Kaggle and is a version of GBM.

It's important to test different hyperparameters for Xgboost/GBM so I included the create.SL.xgboost function in this code. That function also supports defining the functions in an environment other than GlobalEnv, which makes it cleaner to generate many configurations without polluting the global environment.

Happy to make any other tweaks that are preferred.

Thanks,
Chris

@ecpolley
Copy link
Owner

Thanks Chris, this algorithm had been on my todo list for a new wrapper.

In most cases, I've been trying to have new wrappers be added to the SuperLearnerExtra directory. The goal being to keep the primary SuperLearner package easier to maintain and have fewer dependencies, plus having a public space (SuperLearnerExtra) where people might be more willing to contribute wrappers they put together for specific projects.

@ledell
Copy link
Contributor

ledell commented Jun 14, 2016

Eric, have you thought about putting SuperLearnerExtra on CRAN and adding it to the Suggests section? Unfortunately, I don't think many SuperLearner users are aware it exists. Personally, I think it would be great to fold all the libraries in SuperLearnerExtra into the main SuperLearner package. I used to be a fan of being more modular with R packages, but lately I'm trending the other way because I think people don't realize that there are cool "extras" that are not included.

@ledell
Copy link
Contributor

ledell commented Jun 14, 2016

Chris, Nice work on adding the xgboost wrapper. The one thing I'd modify is the family$family == "multinomial" since SuperLearner only supports regression and binary classification.

@ck37
Copy link
Contributor Author

ck37 commented Jun 14, 2016

Yeah, I think that moving this or other key wrappers to another repository is much worse for the user. It's just more work & complexity for them, and given how simple the wrappers are it doesn't seem to avert much maintenance. I will be happy to work on any bugs with the wrappers I submit in any case (I have ~4 more years left in grad school so I'm not going anywhere). I can also expand the automated checks more if preferred.

I agree with Erin that SuperLearnerExtra is not well-known among users, and it doesn't seem like it needs to be. It seems reasonable for models that are no longer on CRAN, like DSA or BART, but it makes SuperLearner hard to use for mainstream wrappers.

And since superLearnerExtra is not even a package you would have to clone the repo manually (if the user has git) or re-download a zip file to get regular updates. That is why SuperLearnerExtra can't be added to the "Suggests" line. There is also no way to run automated tests like in the SuperLearner package, which is important to maintain quality of the wrappers over time, or maintain software versioning for reproducibility of research.

Erin, thanks for checking out the code. The reason I included the multinomial snippet is to provide a starting point for eventually supporting multiple classification per #16. It should be able to hang out there until/if that feature is implemented. But happy to remove that part if preferred.

Thanks,
Chris

@ecpolley
Copy link
Owner

I agree SuperLearnerExtra hasn't been well utilized and might be worth re-thinking the idea. I've tried to highlight it every time I teach the package. I'll clarify that SuperLearnerExtra isn't intended to be a stand alone package, but more a repository of helpful code pieces (maybe SuperLearnerGist would be a better name?). I was hoping a lower bar of entry, versus formal package requirements, would make people more likely to share SuperLearner code from their projects.

The idea of "key wrappers" is very context specific. My idea for the primary SuperLearner package is to set up the framework for the cross-validation and the ensemble component/loss functions (although this is flexible now with the method.* functions), but make it as easy as possible to bring in your own algorithms to are suitable for the specific research question. My view is that the included wrappers are a guide to help the user create their own wrappers for a project. I realize this can make it difficult for a new user, probably compounded by my poor documentation of the functions.

I'll add these wrappers (how can I say no to a ~4 year commitment from Chris!) but thought I would clarify my view and ask for comments/suggestions.

Thanks!
Eric

@ecpolley ecpolley merged commit e164242 into ecpolley:master Jun 16, 2016
@ck37
Copy link
Contributor Author

ck37 commented Jun 16, 2016

Sounds very fair, thanks Eric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants