support DART - new regularization = dropout trees during learning #809

gugatr0n1c · 2016-02-07T16:50:55Z

There is nice article about dropout from neural nets, applied to gradient boosting:

http://arxiv.org/pdf/1505.01866.pdf

It is about drop out some trees during learning process and rescale weight of trees because of that...
It can help to accuracy...

Thanx for consideration...

tqchen · 2016-02-10T19:21:31Z

This should not be too hard to add by adding a new gradient booster, or extending current gbtree.

Far0n · 2016-03-08T13:14:39Z

I would suggest:

extending gbtree
new parameter 'dart_dropout', 'dart_p' or something in that direction (float, range [0,1], default: 0)

thoughts?

marugari · 2016-04-17T07:10:55Z

I've created a prototype.(not tested yet)
https://github.com/marugari/xgboost/tree/prototype_dart

tqchen · 2016-04-17T17:00:14Z

@marugari It would be great if you can run some benchmarks and see the performance, we can look into bring it back to main repo

marugari · 2016-04-20T15:12:25Z

it improves performance slightly.
https://github.com/marugari/xgboost/tree/prototype_dart/tests/dart

tqchen · 2016-04-20T16:08:09Z

This sounds interesting. @marugari Can we also refactored the code a bit?

Since I guess most part of dart could reuse the code of GBTree, and only prediction function needs to be replace. Let us consider use inheritance or a common base code, so the code is cleaner and easier to maintain.

Then I am happy to take a review and bring it to xgboost main repo

marugari · 2016-04-20T20:56:06Z

@tqchen Sure.
I also think Dart shold inherit from GBTree but could not fix duplicate symbols. 😢

tqchen · 2016-04-20T20:58:08Z

If there is not too much code, we can put two classes both in gbm.cc which should solve the problem

marugari · 2016-04-20T21:17:58Z

@tqchen If it is agreeable, I can refact the code.
The following methods are defined in Dart.

Configure
Load
Save
DoBoost
Predict
CommitModel
Pred
DropTrees
NormalizeTrees

tqchen · 2016-04-20T21:24:06Z

Sounds good.

marugari · 2016-04-23T09:50:03Z

How's this?
marugari@d4e3a6b

Far0n · 2016-04-23T10:00:46Z

great work @marugari. thank you for bringing DART in.

tqchen · 2016-04-23T23:06:44Z

@marugari This looks good, can you

rebase your commits so the history of the old dart.cc is removed, another simpler way might be start from a clean fork
Add a few comments on what dart is, and the algorithms
Open a pull request and I will do more detailed code review in the PR
It is likely that you can reuse a few functions, like SaveModel, or change the parent function to isolate the common parts and further reduce the code.
To prevent the codebase from getting too large, let us avoid ipynb in the main repo. Use markdown and python scripts instead. You can always put ipynb to https://github.com/dmlc/web-data
Please write an markdown introduction in documents on how dart can be used.

Thanks for the great effort

marugari · 2016-06-12T10:14:26Z

It has been merged. 949d1e3

The tutorial is under construction.
https://github.com/marugari/xgboost/blob/prototype_dart/doc/tutorials/dart.md

tqchen · 2016-06-12T17:01:10Z

great:)

tqchen · 2016-06-24T03:21:39Z

@marugari any updates on english version of guest blogpost?

marugari · 2016-06-25T11:04:54Z

Updated.
https://github.com/marugari/xgboost/blob/prototype_dart/doc/tutorials/dart.md

tqchen · 2016-06-25T16:49:48Z

looks nice,

it would be great if you add a few brief sentences to describe DART (like abstract in publication) in the intro.
Please send the PR to the dmlc blog
- Reference format of other posts to add tags
- https://github.com/dmlc/dmlc.github.io/tree/master/_posts
- Add bio of yourself in the end
- Add reference link to xgboost in the posts
- Add a reference to your japanese version of blog
Separately, let us send the same post to https://github.com/dmlc/xgboost/tree/master/doc/tutorials

marugari · 2016-06-27T12:57:05Z

Can I use MathJax in dmlc.github.io?

tqchen · 2016-06-27T17:42:16Z

seems not, maybe use images for formula for safety

tqchen · 2016-07-04T17:45:39Z

Link to the post http://dmlc.ml/xgboost/2016/07/02/support-dropout-on-xgboost.html
Thanks for the great effort!

One last thing, let us PR a copy of tutorial to https://github.com/dmlc/xgboost/tree/master/doc/tutorials so it can benefit users who look into the document for recipes

marugari · 2016-07-05T00:15:55Z

I'm fixing formulas.

edmondja · 2016-07-06T16:05:10Z

param = {'booster': 'dart',
'max_depth': 5, 'learning_rate': 0.1,
'objective': 'multi:softmax', 'silent': True,
'sample_type': 'uniform',
'normalize_type': 'tree',
'rate_drop': 0.1,
'skip_drop': 0.5}
num_round = 50
bst = xgb.train(param, xg_train, num_round)
Traceback (most recent call last):

File "", line 15, in
bst = xgb.train(param, xg_train, num_round)

File "//anaconda/lib/python2.7/site-packages/xgboost/training.py", line 121, in train
bst.update(dtrain, i, obj)

File "//anaconda/lib/python2.7/site-packages/xgboost/core.py", line 694, in update
_check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))

File "//anaconda/lib/python2.7/site-packages/xgboost/core.py", line 97, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())

XGBoostError: unknown booster type: dart

Isnt it implemented in XGBoost ? I don't get it

marugari · 2016-07-07T02:15:32Z

Last version release does not support DART.

Please refer to gbtree.cc(or gbtree-inl.hpp).

tqchen added enhancement labels Feb 7, 2016

This was referenced Jun 29, 2016

add dart formula dmlc/web-data#3

Merged

DART: new tree booster provides Dropout dmlc/dmlc.github.io#18

Merged

marugari mentioned this issue Jul 10, 2016

add Dart tutorial #1347

Merged

tqchen closed this as completed Jul 10, 2016

gugatr0n1c mentioned this issue Dec 17, 2016

Dart - very poor accuracy microsoft/LightGBM#126

Closed

lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support DART - new regularization = dropout trees during learning #809

support DART - new regularization = dropout trees during learning #809

gugatr0n1c commented Feb 7, 2016

tqchen commented Feb 10, 2016

Far0n commented Mar 8, 2016

marugari commented Apr 17, 2016

tqchen commented Apr 17, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 23, 2016

Far0n commented Apr 23, 2016

tqchen commented Apr 23, 2016

marugari commented Jun 12, 2016

tqchen commented Jun 12, 2016

tqchen commented Jun 24, 2016

marugari commented Jun 25, 2016

tqchen commented Jun 25, 2016

marugari commented Jun 27, 2016

tqchen commented Jun 27, 2016

tqchen commented Jul 4, 2016

marugari commented Jul 5, 2016

edmondja commented Jul 6, 2016

marugari commented Jul 7, 2016

support DART - new regularization = dropout trees during learning #809

support DART - new regularization = dropout trees during learning #809

Comments

gugatr0n1c commented Feb 7, 2016

tqchen commented Feb 10, 2016

Far0n commented Mar 8, 2016

marugari commented Apr 17, 2016

tqchen commented Apr 17, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 20, 2016

tqchen commented Apr 20, 2016

marugari commented Apr 23, 2016

Far0n commented Apr 23, 2016

tqchen commented Apr 23, 2016

marugari commented Jun 12, 2016

tqchen commented Jun 12, 2016

tqchen commented Jun 24, 2016

marugari commented Jun 25, 2016

tqchen commented Jun 25, 2016

marugari commented Jun 27, 2016

tqchen commented Jun 27, 2016

tqchen commented Jul 4, 2016

marugari commented Jul 5, 2016

edmondja commented Jul 6, 2016

marugari commented Jul 7, 2016