[SPARK-4736][mllib] [random forest] functions returning the category with weights #3583

dikejiang · 2014-12-03T12:28:52Z

In this version, we add two functions: 1) predictByVotingWithWeight(features: Vector) and 2) predictWithWeight(features: Vector). And we also modify the function: predictByVoting(features: Vector).

There are at least two reasons why we make such improvement:

1 ) In our practice, we want to find the top N samples from one category. However in 1.3.0 version, the function of predict can only give the predicted category but without weights.

What's more, in our practice, the numbers of positive and negative samples are very unbalance. There are much less positive samples than negative samples. According to the results of votes, there are very few samples predicted as positive sample. If the weights are also given, users can make a proper threshold to modify the results so that the performance can be improved.

In this version, we add two functions: 1) predictByVotingWithWeight(features: Vector) and 2) predictWithWeight(features: Vector). And we also modify the function: predictByVoting(features: Vector). There are at least two reasons why we make such improvement: 1 ) In our practice, we want to find the top N samples from one category. However in 1.3.0 version, the function of predict can only give the predicted category but without weights. 2) What's more, in our practice, the numbers of positive and negative samples are very unbalance. There are much less positive samples than negative samples. According to the results of votes, there are very few samples predicted as positive sample. If the weights are also given, users can make a proper threshold to modify the results so that the performance can be improved.

mengxr · 2014-12-03T14:31:50Z

@dikejiang Do you mind creating a JIRA and adding the JIRA number to the PR title? Thanks!

dikejiang · 2014-12-04T04:42:22Z

@mengxr done!

jkbradley · 2014-12-09T20:09:19Z

@dikejiang Thanks for the PR! I'm wondering if you'd be interested in a more general API. In the new experimental ML package, I have a PR [https://www.github.com//pull/3637] which introduces a few prediction methods, one of which is:

def predictRaw(features: Vector): Vector // for each label, predict a confidence

What do you think of using this instead of only predicting the top label's weight?
Eventually, confidence predictions could be improved by incorporating each tree's confidence in its prediction (rather than having each tree simply vote for a single label, as is done now). (But that could be a later PR.)

dikejiang · 2014-12-11T00:44:30Z

@jkbradley Of course I am intersted in the more general API if it could provid confidence prdictions, because in our practice we usually need such confidence value to make top N rank. In addtion, I am quite agree with you that confidence predictions could be improved by incorporating each tree's confidence.

jkbradley · 2014-12-11T03:20:26Z

@dikejiang Great, thanks!

dikejiang · 2014-12-12T05:58:04Z

@mengxr OK to go?

jkbradley · 2014-12-15T19:21:31Z

@dikejiang Apologies--I think I was not clear. I was recommending that you change this PR to implement predictRaw(), rather than predictWithWeight(). Does that sound reasonable? Since predictRaw gives more info than predictWithWeight, it seems best to only include predictRaw. Thanks!

change function name predictWithWeight->predictRaw predictByVotingWithWeight->predictRawByVoting

jkbradley · 2015-03-06T17:56:17Z

@dikejiang Do you still plan to update this PR to return a Vector of probabilities? I'm planning a major reorganization of trees & ensembles APIs here: [https://issues.apache.org/jira/browse/SPARK-6113]
I don't want it to mess up your PR; we could either finish up this PR soon, or we could wait until the API update (which should help by making the proper API clearer).

AmplabJenkins · 2015-04-27T18:23:19Z

Can one of the admins verify this patch?

jkbradley · 2015-04-27T19:00:07Z

@dikejiang This work is now being done here: [https://issues.apache.org/jira/browse/SPARK-3727]
Can you please close this PR?

If you still want to work on this task, please coordinate on the JIRA I linked. Thanks!

dikejiang changed the title ~~[mllib] [random forest] functions returning the category with weights~~ [SPARK-4736][mllib] [random forest] functions returning the category with weights Dec 4, 2014

Update treeEnsembleModels.scala

956b1b6

change function name predictWithWeight->predictRaw predictByVotingWithWeight->predictRawByVoting

asfgit closed this in 555213e Apr 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4736][mllib] [random forest] functions returning the category with weights #3583

[SPARK-4736][mllib] [random forest] functions returning the category with weights #3583

Uh oh!

dikejiang commented Dec 3, 2014

Uh oh!

mengxr commented Dec 3, 2014

Uh oh!

dikejiang commented Dec 4, 2014

Uh oh!

jkbradley commented Dec 9, 2014

Uh oh!

dikejiang commented Dec 11, 2014

Uh oh!

jkbradley commented Dec 11, 2014

Uh oh!

dikejiang commented Dec 12, 2014

Uh oh!

jkbradley commented Dec 15, 2014

Uh oh!

jkbradley commented Mar 6, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

jkbradley commented Apr 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-4736][mllib] [random forest] functions returning the category with weights #3583

[SPARK-4736][mllib] [random forest] functions returning the category with weights #3583

Uh oh!

Conversation

dikejiang commented Dec 3, 2014

Uh oh!

mengxr commented Dec 3, 2014

Uh oh!

dikejiang commented Dec 4, 2014

Uh oh!

jkbradley commented Dec 9, 2014

Uh oh!

dikejiang commented Dec 11, 2014

Uh oh!

jkbradley commented Dec 11, 2014

Uh oh!

dikejiang commented Dec 12, 2014

Uh oh!

jkbradley commented Dec 15, 2014

Uh oh!

jkbradley commented Mar 6, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

jkbradley commented Apr 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants