[SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction #6109

dbtsai · 2015-05-13T05:25:47Z

The difference is because we previously don't fit the intercept in Spark 1.3. Here, we change the input String so that the probability of instance 6 can be classified as 1.0 without any ambiguity.

with lambda = 0.001 in current LOR implementation, the prediction is

(4, spark i j k) --> prob=[0.1596407738787411,0.8403592261212589], prediction=1.0
(5, l m n) --> prob=[0.8378325685476612,0.16216743145233883], prediction=0.0
(6, spark hadoop spark) --> prob=[0.0692663313297627,0.9307336686702373], prediction=1.0
(7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0

and the training accuracy is

(0, a b c d e spark) --> prob=[0.0021342419881406746,0.9978657580118594], prediction=1.0
(1, b d) --> prob=[0.9959176174854043,0.004082382514595685], prediction=0.0
(2, spark f g h) --> prob=[0.0014541569986711233,0.9985458430013289], prediction=1.0
(3, hadoop mapreduce) --> prob=[0.9982978367343561,0.0017021632656438518], prediction=0.0

AmplabJenkins · 2015-05-13T05:27:10Z

Merged build triggered.

AmplabJenkins · 2015-05-13T05:27:18Z

Merged build started.

SparkQA · 2015-05-13T05:29:10Z

Test build #32584 has started for PR 6109 at commit 8f40ccd.

SparkQA · 2015-05-13T07:16:32Z

Test build #32584 has finished for PR 6109 at commit 8f40ccd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-13T07:16:37Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-13T07:16:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32584/
Test PASSed.

AmplabJenkins · 2015-05-13T07:27:10Z

Merged build triggered.

AmplabJenkins · 2015-05-13T07:27:19Z

Merged build started.

SparkQA · 2015-05-13T07:28:00Z

Test build #32596 has started for PR 6109 at commit ac63ce4.

SparkQA · 2015-05-13T09:09:29Z

Test build #32596 has finished for PR 6109 at commit ac63ce4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-13T09:09:34Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-13T09:09:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32596/
Test PASSed.

…iction The difference is because we previously don't fit the intercept in Spark 1.3. Here, we change the input `String` so that the probability of instance 6 can be classified as `1.0` without any ambiguity. with lambda = 0.001 in current LOR implementation, the prediction is ``` (4, spark i j k) --> prob=[0.1596407738787411,0.8403592261212589], prediction=1.0 (5, l m n) --> prob=[0.8378325685476612,0.16216743145233883], prediction=0.0 (6, spark hadoop spark) --> prob=[0.0692663313297627,0.9307336686702373], prediction=1.0 (7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0 ``` and the training accuracy is ``` (0, a b c d e spark) --> prob=[0.0021342419881406746,0.9978657580118594], prediction=1.0 (1, b d) --> prob=[0.9959176174854043,0.004082382514595685], prediction=0.0 (2, spark f g h) --> prob=[0.0014541569986711233,0.9985458430013289], prediction=1.0 (3, hadoop mapreduce) --> prob=[0.9982978367343561,0.0017021632656438518], prediction=0.0 ``` Author: DB Tsai <dbt@netflix.com> Closes #6109 from dbtsai/lor-example and squashes the following commits: ac63ce4 [DB Tsai] first commit (cherry picked from commit c1080b6) Signed-off-by: Xiangrui Meng <meng@databricks.com>

mengxr · 2015-05-14T08:26:35Z

LGTM. Merged into master and branch-1.4. Thanks!

…iction The difference is because we previously don't fit the intercept in Spark 1.3. Here, we change the input `String` so that the probability of instance 6 can be classified as `1.0` without any ambiguity. with lambda = 0.001 in current LOR implementation, the prediction is ``` (4, spark i j k) --> prob=[0.1596407738787411,0.8403592261212589], prediction=1.0 (5, l m n) --> prob=[0.8378325685476612,0.16216743145233883], prediction=0.0 (6, spark hadoop spark) --> prob=[0.0692663313297627,0.9307336686702373], prediction=1.0 (7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0 ``` and the training accuracy is ``` (0, a b c d e spark) --> prob=[0.0021342419881406746,0.9978657580118594], prediction=1.0 (1, b d) --> prob=[0.9959176174854043,0.004082382514595685], prediction=0.0 (2, spark f g h) --> prob=[0.0014541569986711233,0.9985458430013289], prediction=1.0 (3, hadoop mapreduce) --> prob=[0.9982978367343561,0.0017021632656438518], prediction=0.0 ``` Author: DB Tsai <dbt@netflix.com> Closes apache#6109 from dbtsai/lor-example and squashes the following commits: ac63ce4 [DB Tsai] first commit

first commit

ac63ce4

dbtsai force-pushed the lor-example branch from 8f40ccd to ac63ce4 Compare May 13, 2015 07:22

asfgit closed this in c1080b6 May 14, 2015

dbtsai deleted the lor-example branch June 20, 2015 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction #6109

[SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction #6109

dbtsai commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

mengxr commented May 14, 2015

[SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction #6109

[SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction #6109

Conversation

dbtsai commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

mengxr commented May 14, 2015