[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

zhengruifeng · 2020-04-29T10:45:33Z

What changes were proposed in this pull request?

1, make AFT reuse common functions in ml.optim, rather than making its own impl.

Why are the changes needed?

The logic in optimizing AFT is quite similar to other algorithms like other algs based on RDDLossFunction,
We should reuse the common functions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

SparkQA · 2020-04-29T12:11:03Z

Test build #122056 has finished for PR 28404 at commit 7fe08b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-29T12:29:13Z

Test build #122057 has finished for PR 28404 at commit f5d3bb1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Seems OK pending tests, if there is no change to behavior.

mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/AFTAggregator.scala

srowen · 2020-05-01T14:27:56Z

mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/AFTAggregator.scala

+  @transient private lazy val parameters = bcCoefficients.value.toArray
+  // the regression coefficients to the covariates
+  @transient private lazy val coefficients = parameters.slice(2, dim)
+  @transient private lazy val intercept = parameters(1)


Do these all need to be lazy, or even members? I know that's how it was already. But for example coefficients is just used once, as is intercept. It seems easy enough to just refer to bcCoefficients.value(1) or whatever and get rid of most or all of these, unless I'm missing how it really needs to be computed once.

This aggregator was mainly copied from the original one.
For intercept and sigma, I am neutral on keeping them members or just refter to bcCoefficients.value(1). I don't feel strong about it.
I guess this AFT was following others impl's like LiR/LoR, in which a lot of transient and lazy members were used.

The lazy val overhead is non-trivial but probably won't matter here. Hm, OK if it's how other code works, that's OK, but I wouldn't mind cleaning this up one day. It can be simpler.

zhengruifeng · 2020-05-03T08:35:38Z

I changed the order of the three parts in bcCoefficients, now that no transient lazy variables is needed

SparkQA · 2020-05-03T09:41:25Z

Test build #122226 has finished for PR 28404 at commit b9c663f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-05-05T13:35:29Z

Merged to master

zhengruifeng · 2020-05-06T02:00:11Z

Thanks @srowen for reviewing!

init

7fe08b8

zhengruifeng added the ML label Apr 29, 2020

nit

f5d3bb1

srowen reviewed Apr 29, 2020

View reviewed changes

mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/AFTAggregator.scala Show resolved Hide resolved

srowen reviewed May 1, 2020

View reviewed changes

change order of parts in bcCoef

b9c663f

srowen closed this in 701deac May 5, 2020

zhengruifeng deleted the mv_aft_optim branch May 6, 2020 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

zhengruifeng commented Apr 29, 2020

SparkQA commented Apr 29, 2020

SparkQA commented Apr 29, 2020

srowen left a comment

srowen May 1, 2020

zhengruifeng May 2, 2020

srowen May 2, 2020

zhengruifeng commented May 3, 2020

SparkQA commented May 3, 2020

srowen commented May 5, 2020

zhengruifeng commented May 6, 2020

[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

Conversation

zhengruifeng commented Apr 29, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Apr 29, 2020

SparkQA commented Apr 29, 2020

srowen left a comment

Choose a reason for hiding this comment

srowen May 1, 2020

Choose a reason for hiding this comment

zhengruifeng May 2, 2020

Choose a reason for hiding this comment

srowen May 2, 2020

Choose a reason for hiding this comment

zhengruifeng commented May 3, 2020

SparkQA commented May 3, 2020

srowen commented May 5, 2020

zhengruifeng commented May 6, 2020