[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

HyukjinKwon · 2016-06-11T16:55:10Z

What changes were proposed in this pull request?

Currently, AFTAggregator is not being merged correctly. For example, if there is any single empty partition in the data, this creates an AFTAggregator with zero total count which causes the exception below:

IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.'

Please see AFTSurvivalRegression.scala#L573-L575 as well.

Just to be clear, the python example aft_survival_regression.py seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged AFTAggregator.

Executing bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt configurations (AFAIK, it sets the parallelism level to the number of CPU cores).

How was this patch tested?

An unit test in AFTSurvivalRegressionSuite.scala and manually tested by bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py.

HyukjinKwon · 2016-06-11T16:56:43Z

cc @jkbradley Could you please take a look?

HyukjinKwon · 2016-06-11T17:35:52Z

FYI, the example was being passed in branch-1.6 because it does not check whether the count is 0 or not. This check was introduced in 101663f.

SparkQA · 2016-06-11T17:45:58Z

Test build #60345 has finished for PR 13619 at commit fb16b71.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-11T18:56:41Z

@HyukjinKwon Thanks! The fix looks correct. Could you please add a tiny unit test which fails before your fix & works afterwards?

HyukjinKwon · 2016-06-12T04:34:55Z

@jkbradley Sure!

SparkQA · 2016-06-12T05:19:42Z

Test build #60353 has finished for PR 13619 at commit 4447d0a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-12T07:03:22Z

Test build #60357 has finished for PR 13619 at commit c86ede8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-12T07:39:52Z

Test build #60359 has finished for PR 13619 at commit 2c8adbf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-12T21:26:10Z

LGTM
Merging with master and branch-2.0
Thank you!

## What changes were proposed in this pull request? Currently, `AFTAggregator` is not being merged correctly. For example, if there is any single empty partition in the data, this creates an `AFTAggregator` with zero total count which causes the exception below: ``` IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.' ``` Please see [AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575) as well. Just to be clear, the python example `aft_survival_regression.py` seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged `AFTAggregator`. Executing `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt configurations (AFAIK, it sets the parallelism level to the number of CPU cores). ## How was this patch tested? An unit test in `AFTSurvivalRegressionSuite.scala` and manually tested by `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py`. Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes #13619 from HyukjinKwon/SPARK-15892. (cherry picked from commit e355460) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

## What changes were proposed in this pull request? Currently, `AFTAggregator` is not being merged correctly. For example, if there is any single empty partition in the data, this creates an `AFTAggregator` with zero total count which causes the exception below: ``` IllegalArgumentException: u'requirement failed: The number of instances should be greater than 0.0, but got 0.' ``` Please see [AFTSurvivalRegression.scala#L573-L575](https://github.com/apache/spark/blob/6ecedf39b44c9acd58cdddf1a31cf11e8e24428c/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L573-L575) as well. Just to be clear, the python example `aft_survival_regression.py` seems using 5 rows. So, if there exist partitions more than 5, it throws the exception above since it contains empty partitions which results in an incorrectly merged `AFTAggregator`. Executing `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py` on a machine with CPUs more than 5 is being failed because it creates tasks with some empty partitions with defualt configurations (AFAIK, it sets the parallelism level to the number of CPU cores). ## How was this patch tested? An unit test in `AFTSurvivalRegressionSuite.scala` and manually tested by `bin/spark-submit examples/src/main/python/ml/aft_survival_regression.py`. Author: hyukjinkwon <gurwls223@gmail.com> Author: Hyukjin Kwon <gurwls223@gmail.com> Closes apache#13619 from HyukjinKwon/SPARK-15892. (cherry picked from commit e355460) Signed-off-by: Joseph K. Bradley <joseph@databricks.com> (cherry picked from commit be3c41b)

zzcclp · 2016-06-13T01:47:40Z

mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala

+    // the parallelism is bigger than that. Because the issue was about `AFTAggregator`s
+    // being merged incorrectly when it has an empty partition, running the codes below
+    // should not throw an exception.
+    val dataset = spark.createDataFrame(


@HyukjinKwon value spark is not found here, it should be sqlContext, right?

I compile it in branch-1.6.

with branch-2.0, it is OK , i think that this pr should not be merged into branch-1.6 directly.

Oh, it seems this is merged into branch-1.6 too. Yes, it should be sqlContext for branch-1.6.

HyukjinKwon · 2016-06-13T02:15:03Z

@zzcclp I made a PR against branch-1.6 here, #13630. Thank you for pointing this out quickly.

zzcclp · 2016-06-13T02:50:45Z

@HyukjinKwon OK, but this pr should be reverted first.
@jkbradley could you revert this pr for branch-1.6 ?

HyukjinKwon · 2016-06-13T02:55:09Z

Yes, it seems it is still failing. I can change the PR to revert this if @jkbradley is busy for now. Otherwise, I will close mine.

zzcclp · 2016-06-14T02:35:05Z

@HyukjinKwon , could you revert this pr for branch-1.6 ?

HyukjinKwon · 2016-06-14T02:36:37Z

@zzcclp I just did here #13630

zzcclp · 2016-06-14T02:37:37Z

@HyukjinKwon , just see it, thanks.

jkbradley · 2016-06-14T21:07:44Z

My apologies for the slow response. I'll revert it now. Thank you for pinging!

jkbradley · 2016-06-14T21:09:13Z

Reverted commit to branch-1.6 in commit 2f3e327

… 1.6 ## What changes were proposed in this pull request? This PR backports #13619. The original test added in branch-2.0 was failed in branch-1.6. This seems because the behaviour was changed in 101663f. This was failure while calculating Euler's number which ends up with a infinity regardless of this path. So, I brought the dataset from `AFTSurvivalRegressionExample` to make sure this is working and then wrote the test. I ran the test before/after creating empty partitions. `model.scale` becomes `1.0` with empty partitions and becames `1.547` without them. After this patch, this becomes always `1.547`. ## How was this patch tested? Unit test in `AFTSurvivalRegressionSuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13725 from HyukjinKwon/SPARK-15892-1-6.

… 1.6 ## What changes were proposed in this pull request? This PR backports apache#13619. The original test added in branch-2.0 was failed in branch-1.6. This seems because the behaviour was changed in apache@101663f. This was failure while calculating Euler's number which ends up with a infinity regardless of this path. So, I brought the dataset from `AFTSurvivalRegressionExample` to make sure this is working and then wrote the test. I ran the test before/after creating empty partitions. `model.scale` becomes `1.0` with empty partitions and becames `1.547` without them. After this patch, this becomes always `1.547`. ## How was this patch tested? Unit test in `AFTSurvivalRegressionSuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#13725 from HyukjinKwon/SPARK-15892-1-6. (cherry picked from commit fd05389)

Fix incorrect comparison

fb16b71

Fetch upstream

9f7bec3

Add a test

4447d0a

HyukjinKwon added 2 commits June 12, 2016 15:14

Fix typos

c86ede8

Do not make 1000 AFTInput but just 2

2c8adbf

asfgit closed this in e355460 Jun 12, 2016

zzcclp reviewed Jun 13, 2016
View reviewed changes

HyukjinKwon mentioned this pull request Jun 13, 2016

Revert "Incorrectly merged AFTAggregator with zero total count" for branch-1.6 #13630

Closed

HyukjinKwon mentioned this pull request Jun 17, 2016

[SPARK-15892][ML] Backport correctly merging AFTAggregators to branch 1.6 #13725

Closed

HyukjinKwon deleted the SPARK-15892 branch January 2, 2018 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

HyukjinKwon commented Jun 11, 2016 •

edited

HyukjinKwon commented Jun 11, 2016 •

edited

HyukjinKwon commented Jun 11, 2016 •

edited

SparkQA commented Jun 11, 2016

jkbradley commented Jun 11, 2016

HyukjinKwon commented Jun 12, 2016

SparkQA commented Jun 12, 2016

SparkQA commented Jun 12, 2016

SparkQA commented Jun 12, 2016

jkbradley commented Jun 12, 2016

zzcclp Jun 13, 2016

zzcclp Jun 13, 2016

zzcclp Jun 13, 2016

HyukjinKwon Jun 13, 2016

HyukjinKwon commented Jun 13, 2016

zzcclp commented Jun 13, 2016

HyukjinKwon commented Jun 13, 2016 •

edited

zzcclp commented Jun 14, 2016

HyukjinKwon commented Jun 14, 2016

zzcclp commented Jun 14, 2016

jkbradley commented Jun 14, 2016

jkbradley commented Jun 14, 2016

[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

[SPARK-15892][ML] Incorrectly merged AFTAggregator with zero total count #13619

Conversation

HyukjinKwon commented Jun 11, 2016 • edited

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon commented Jun 11, 2016 • edited

HyukjinKwon commented Jun 11, 2016 • edited

SparkQA commented Jun 11, 2016

jkbradley commented Jun 11, 2016

HyukjinKwon commented Jun 12, 2016

SparkQA commented Jun 12, 2016

SparkQA commented Jun 12, 2016

SparkQA commented Jun 12, 2016

jkbradley commented Jun 12, 2016

zzcclp Jun 13, 2016

Choose a reason for hiding this comment

zzcclp Jun 13, 2016

Choose a reason for hiding this comment

zzcclp Jun 13, 2016

Choose a reason for hiding this comment

HyukjinKwon Jun 13, 2016

Choose a reason for hiding this comment

HyukjinKwon commented Jun 13, 2016

zzcclp commented Jun 13, 2016

HyukjinKwon commented Jun 13, 2016 • edited

zzcclp commented Jun 14, 2016

HyukjinKwon commented Jun 14, 2016

zzcclp commented Jun 14, 2016

jkbradley commented Jun 14, 2016

jkbradley commented Jun 14, 2016

HyukjinKwon commented Jun 11, 2016 •

edited

HyukjinKwon commented Jun 11, 2016 •

edited

HyukjinKwon commented Jun 11, 2016 •

edited

HyukjinKwon commented Jun 13, 2016 •

edited