Skip to content

Conversation

@yang1young
Copy link

In Spark MLlib, Decision Trees use Gini impurity, Entropy and Variance as impurity. The Entropy impurity implement by calculating the Info Gain, which is put forward by J. Ross Quinlan in ID3 algorithm. And it can be improved by implementing C4.5 algorithm,which using Info Gain Ratio instead of Info Gain to calculate impurity. By implementing C4.5 algorithm, the Decision Trees model can achieve higher forecast accuracy in most cases.
https://issues.apache.org/jira/browse/SPARK-8078

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these lines don't match the project code style. Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

@sryza
Copy link
Contributor

sryza commented Jun 3, 2015

Mind giving this a more descriptive title that includes [MLLIB]?

@srowen
Copy link
Member

srowen commented Jun 4, 2015

OK, if you're closing this JIRA, do you mind closing this PR?

@yang1young yang1young closed this Jun 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants