Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404

Closed
wants to merge 3 commits into from

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

1, make AFT reuse common functions in ml.optim, rather than making its own impl.

Why are the changes needed?

The logic in optimizing AFT is quite similar to other algorithms like other algs based on RDDLossFunction,
We should reuse the common functions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

@SparkQA
Copy link

SparkQA commented Apr 29, 2020

Test build #122056 has finished for PR 28404 at commit 7fe08b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 29, 2020

Test build #122057 has finished for PR 28404 at commit f5d3bb1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK pending tests, if there is no change to behavior.

@transient private lazy val parameters = bcCoefficients.value.toArray
// the regression coefficients to the covariates
@transient private lazy val coefficients = parameters.slice(2, dim)
@transient private lazy val intercept = parameters(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these all need to be lazy, or even members? I know that's how it was already. But for example coefficients is just used once, as is intercept. It seems easy enough to just refer to bcCoefficients.value(1) or whatever and get rid of most or all of these, unless I'm missing how it really needs to be computed once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aggregator was mainly copied from the original one.
For intercept and sigma, I am neutral on keeping them members or just refter to bcCoefficients.value(1). I don't feel strong about it.
I guess this AFT was following others impl's like LiR/LoR, in which a lot of transient and lazy members were used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy val overhead is non-trivial but probably won't matter here. Hm, OK if it's how other code works, that's OK, but I wouldn't mind cleaning this up one day. It can be simpler.

@zhengruifeng
Copy link
Contributor Author

I changed the order of the three parts in bcCoefficients, now that no transient lazy variables is needed

@SparkQA
Copy link

SparkQA commented May 3, 2020

Test build #122226 has finished for PR 28404 at commit b9c663f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 5, 2020

Merged to master

@srowen srowen closed this in 701deac May 5, 2020
@zhengruifeng
Copy link
Contributor Author

Thanks @srowen for reviewing!

@zhengruifeng zhengruifeng deleted the mv_aft_optim branch May 6, 2020 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants