New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31603][ML]AFT uses common functions in RDDLossFunction #28404
Conversation
Test build #122056 has finished for PR 28404 at commit
|
Test build #122057 has finished for PR 28404 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK pending tests, if there is no change to behavior.
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/AFTAggregator.scala
Show resolved
Hide resolved
@transient private lazy val parameters = bcCoefficients.value.toArray | ||
// the regression coefficients to the covariates | ||
@transient private lazy val coefficients = parameters.slice(2, dim) | ||
@transient private lazy val intercept = parameters(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these all need to be lazy, or even members? I know that's how it was already. But for example coefficients is just used once, as is intercept. It seems easy enough to just refer to bcCoefficients.value(1) or whatever and get rid of most or all of these, unless I'm missing how it really needs to be computed once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This aggregator was mainly copied from the original one.
For intercept
and sigma
, I am neutral on keeping them members or just refter to bcCoefficients.value(1)
. I don't feel strong about it.
I guess this AFT was following others impl's like LiR/LoR, in which a lot of transient and lazy members were used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy val overhead is non-trivial but probably won't matter here. Hm, OK if it's how other code works, that's OK, but I wouldn't mind cleaning this up one day. It can be simpler.
I changed the order of the three parts in |
Test build #122226 has finished for PR 28404 at commit
|
Merged to master |
Thanks @srowen for reviewing! |
What changes were proposed in this pull request?
1, make AFT reuse common functions in
ml.optim
, rather than making its own impl.Why are the changes needed?
The logic in optimizing AFT is quite similar to other algorithms like other algs based on
RDDLossFunction
,We should reuse the common functions.
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing testsuites