Using test statistics as a measure of decision tree split quality is a useful split halting measure that can yield improved model quality. I am proposing to add the chi-squared test statistic as a new impurity option (in addition to "gini" and "entropy") for classification decision trees and ensembles.
I added unit testing to verify that the chi-squared "impurity" measure functions as expected when used for decision tree training.
This is a re-submission of #13438 to fix target branch
Test build #59740 has finished for PR 13440 at commit 04c1316.
Test build #59745 has finished for PR 13440 at commit 1136518.
Test build #59751 has finished for PR 13440 at commit 6d38cfd.
Test build #64309 has finished for PR 13440 at commit 6d38cfd.
Is this something your still working on? If so it would be good to merge in the latest master. We can also check with @jkbradley to see if he has some review bandwidth.
@holdenk yes, I'll rebase it this week.
Test build #66679 has finished for PR 13440 at commit b199ae3.
Implement a Chi-Squared test statistic option for measuring split qua…
…lity when training decision trees
Test build #66756 has finished for PR 13440 at commit 83f5e83.
test this please
Test build #66766 has finished for PR 13440 at commit 83f5e83.
@holdenk @jkbradley looks like it's clean again
@erikerlandson Are you still working on this PR? Thanks! Miao
I am still interested in this, but I don't have any sense about whether upstream has any interest. Does upstream have any intention to accept it?
Merge branch 'master' into chisquared_split_quality
Test build #73006 has started for PR 13440 at commit 61cbf7c.
i stopped the build as i need to restart jenkins... i'll retrigger this when we're back up and running.
@erikerlandson I am just helping clearing the stale PRs. :) I have no idea whether they have intention to accept it.
Test build #73008 has finished for PR 13440 at commit 61cbf7c.
@thunterdb Can you take a look? Thanks!