Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whenever possible, split classifiers, don't combine #42

Closed
ClimbsRocks opened this issue Oct 8, 2015 · 1 comment
Closed

whenever possible, split classifiers, don't combine #42

ClimbsRocks opened this issue Oct 8, 2015 · 1 comment

Comments

@ClimbsRocks
Copy link
Owner

for example, with random forests, we could use criterion='gini' or criterion='entropy'. when we had this combined, and had gridsearch choose which of those two was best, it doubled the size of the space that gridsearch had to process through, and gave us a classifier that placed 250th on kaggle give credit.

when i broke those out to each be their own separate classifier (holding all the other hyperparameters to test the same, but now having two separate grid searches, one for entropy and one for gini), the training time was probably equivalent (we have cut the training space in half for each classifier, but doubled the number of classifiers), but we got way better results.

turns out that gini generalizes super well here, despite not scoring as well as entropy on gridsearch. gini by itself placed 133, entropy continued to score around 238 (slight improvement even, it seems), and the ensembler placed in between (164).

so we close-to-doubled our placing simply by breaking out classifiers, and i would not expect this to have any kind of a time penalty.

@ClimbsRocks
Copy link
Owner Author

this approach isn't supported by RandomizedSearchCV, which does not face the same drawbacks mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant