[SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's #4819

MechCoder · 2015-02-27T21:11:30Z

While computing the error, with and without validation, for every iteration, the feature prediction of the previous trees was not being cached, which lead to re computation.

SparkQA · 2015-02-27T21:12:34Z

Test build #28089 has started for PR 4819 at commit fa215cc.

This patch merges cleanly.

MechCoder · 2015-02-27T21:13:08Z

@jkbradley Is this similar to what you had in mind?

SparkQA · 2015-02-27T22:19:26Z

Test build #28089 has finished for PR 4819 at commit fa215cc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-27T22:19:31Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28089/
Test FAILed.

SparkQA · 2015-02-28T04:18:01Z

Test build #28110 has started for PR 4819 at commit 7d4ed48.

This patch merges cleanly.

SparkQA · 2015-02-28T05:33:50Z

Test build #28110 has finished for PR 4819 at commit 7d4ed48.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-28T05:33:54Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28110/
Test PASSed.

MechCoder · 2015-02-28T14:19:32Z

@jkbradley I am assuming that this is what you intended. It works but I'm not sure about the present design, which differs from the design that you had posted in the JIRA.

def evaluateEachIteration(data: RDD[LabeledPoint], evaluator): Array[Double]

I am unable to understand how this would work, if the existing residual is not passed and could you also say what Array[Double] is supposed to be? And does the evaluator mean the loss metric (as done in this PR)?

MechCoder · 2015-02-28T14:23:39Z

Also, the present code is unoptimized since there are two runs across the data RDD. one to update the residual, and the other to calculate the error. But that can be taken care after we discuss the design.
I just want to make sure that I'm on the right track.

jkbradley · 2015-03-02T18:53:58Z

@MechCoder I had intended to use this internally and to expose a public method. (The "evaluateEachIteration" method was the public one, but feel free to think of a better name.) Yes, the evaluator was the loss metric, which should probably be an optional parameter (defaulting to the training metric).

[https://issues.apache.org/jira/browse/SPARK-6025]: This is the JIRA for the public method.
[https://issues.apache.org/jira/browse/SPARK-5972]: This is the JIRA for the internal optimization.

I'm Ok with combining the 2 JIRAs in 1 PR since they are closely related. For the internal optimization, the "residual" to store is not really the residual but rather the cumulative prediction of the ensemble; that in turn can be used to compute both the gradient and the error. (Note it will be important to use the cached residual for computing the gradient, not just the objective.) That may require adding some internal API to ensembles to permit prediction from a pre-computed sum of trees' predictions.

MechCoder · 2015-03-03T17:19:54Z

Ouch. I just realised what you meant.. Scratch my previous couple of comments. :/

MechCoder · 2015-03-03T18:26:20Z

@jkbradley Just one quick clarification, please.

When you mean evaluateEachIteration should return an Array of Doubles, do you mean that each element of the returned array corresponds to the error per iteration (i.e tree)? In that case how does the name errorPerIteration sound?

jkbradley · 2015-03-04T05:22:04Z

@MechCoder No problem; sorry I didn't make the JIRAs clearer! Calling it errorPerIteration sounds OK unless we allow users to pass in evaluators, in which case the evaluator might be something new like Accuracy which isn't an "error" metric. I'd still vote for evaluateEachIteration in case we allow this later on.

MechCoder · 2015-03-04T06:53:31Z

@jkbradley , Yes but each element of the array returned corresponds to the error / loss in every iteration right?

jkbradley · 2015-03-04T19:00:24Z

That's correct: element i should have the error/loss for the ensemble containing trees {0, 1, ..., i}.

MechCoder changed the title ~~[SPARK-6025] Add helper method to efficiently compute error in GBT's~~ [SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's Feb 28, 2015

[SPARK-6025] Add helper method to efficiently compute error in GBT's

7d4ed48

MechCoder force-pushed the spark-6025 branch from fa215cc to 7d4ed48 Compare February 28, 2015 04:12

MechCoder closed this Mar 4, 2015

MechCoder deleted the spark-6025 branch March 4, 2015 04:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's #4819

[SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's #4819

MechCoder commented Feb 27, 2015

SparkQA commented Feb 27, 2015

MechCoder commented Feb 27, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

SparkQA commented Feb 28, 2015

SparkQA commented Feb 28, 2015

AmplabJenkins commented Feb 28, 2015

MechCoder commented Feb 28, 2015

MechCoder commented Feb 28, 2015

jkbradley commented Mar 2, 2015

MechCoder commented Mar 3, 2015

MechCoder commented Mar 3, 2015

jkbradley commented Mar 4, 2015

MechCoder commented Mar 4, 2015

jkbradley commented Mar 4, 2015

[SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's #4819

[SPARK-6025] [MLlib] Add helper method to efficiently compute error in GBT's #4819

Conversation

MechCoder commented Feb 27, 2015

SparkQA commented Feb 27, 2015

MechCoder commented Feb 27, 2015

SparkQA commented Feb 27, 2015

AmplabJenkins commented Feb 27, 2015

SparkQA commented Feb 28, 2015

SparkQA commented Feb 28, 2015

AmplabJenkins commented Feb 28, 2015

MechCoder commented Feb 28, 2015

MechCoder commented Feb 28, 2015

jkbradley commented Mar 2, 2015

MechCoder commented Mar 3, 2015

MechCoder commented Mar 3, 2015

jkbradley commented Mar 4, 2015

MechCoder commented Mar 4, 2015

jkbradley commented Mar 4, 2015