whenever possible, split classifiers, don't combine #42

ClimbsRocks · 2015-10-08T16:07:52Z

for example, with random forests, we could use criterion='gini' or criterion='entropy'. when we had this combined, and had gridsearch choose which of those two was best, it doubled the size of the space that gridsearch had to process through, and gave us a classifier that placed 250th on kaggle give credit.

when i broke those out to each be their own separate classifier (holding all the other hyperparameters to test the same, but now having two separate grid searches, one for entropy and one for gini), the training time was probably equivalent (we have cut the training space in half for each classifier, but doubled the number of classifiers), but we got way better results.

turns out that gini generalizes super well here, despite not scoring as well as entropy on gridsearch. gini by itself placed 133, entropy continued to score around 238 (slight improvement even, it seems), and the ensembler placed in between (164).

so we close-to-doubled our placing simply by breaking out classifiers, and i would not expect this to have any kind of a time penalty.

ClimbsRocks · 2015-12-07T04:58:36Z

this approach isn't supported by RandomizedSearchCV, which does not face the same drawbacks mentioned above.

ClimbsRocks closed this as completed Dec 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whenever possible, split classifiers, don't combine #42

whenever possible, split classifiers, don't combine #42

ClimbsRocks commented Oct 8, 2015

ClimbsRocks commented Dec 7, 2015

whenever possible, split classifiers, don't combine #42

whenever possible, split classifiers, don't combine #42

Comments

ClimbsRocks commented Oct 8, 2015

ClimbsRocks commented Dec 7, 2015