-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25793][ML]call SaveLoadV2_0.load for classNameV2_0 #22790
Conversation
Test build #97756 has finished for PR 22790 at commit
|
@@ -126,7 +126,7 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] { | |||
val model = SaveLoadV1_0.load(sc, path) | |||
model | |||
case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) => | |||
val model = SaveLoadV1_0.load(sc, path) | |||
val model = SaveLoadV2_0.load(sc, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mgaido91
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a regression, but it looks like a correctness or data loss issue at Spark 2.4.0 new feature.
Do you know why https://issues.apache.org/jira/browse/SPARK-25793 is created as a Minor
and Documentation
issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cloud-fan
@@ -126,7 +126,7 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] { | |||
val model = SaveLoadV1_0.load(sc, path) | |||
model | |||
case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) => | |||
val model = SaveLoadV1_0.load(sc, path) | |||
val model = SaveLoadV2_0.load(sc, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no test to verify calling correct load method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve the write/load model tests in order to include also a different distance measure from the default one. In this way we should catch this error. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have ever use SaveLoadV2_0
to save model for now? Looks BisectingKMeansModel.save
simply calls SaveLoadV1_0.save
to save models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test build #97794 has finished for PR 22790 at commit
|
Test build #97824 has finished for PR 22790 at commit
|
Retest this please. |
Test build #97882 has finished for PR 22790 at commit
|
cc @mengxr @WeichenXu123 how serious is it? shall we treat it as a blocker? |
Test build #97904 has finished for PR 22790 at commit
|
retest this please |
This shouldn't block 2.4.0 release. Based on the code, it doesn't introduce regression to existing features (just using V1 format and ignore trainingCost and distanceMeasure). Correctness issue occurs only when someone uses a non-default distanceMeasure and then save/load. Could someone help confirm? If current vote passes, we can list it as an known issue in the release notes and fix it in 2.4.1. If other blockers show up, we fix it before RC5. Btw, this PR needs a regression test in order to merge. |
Test build #97928 has finished for PR 22790 at commit
|
I added a regression test in
But the bug doesn't really affect the above test. With the bug, even though mllib
|
Test build #97942 has finished for PR 22790 at commit
|
Agree with @mengxr about this not being a blocker. Please notice this is a problem only for the |
Is this ready to go? We are going to have another RC, and would be good to include it. |
LGTM. |
@@ -109,7 +109,7 @@ class BisectingKMeansModel private[clustering] ( | |||
|
|||
@Since("2.0.0") | |||
override def save(sc: SparkContext, path: String): Unit = { | |||
BisectingKMeansModel.SaveLoadV1_0.save(sc, this, path) | |||
BisectingKMeansModel.SaveLoadV2_0.save(sc, this, path) | |||
} | |||
|
|||
override protected def formatVersion: String = "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this formatVersion
needed to change too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say yes, but actually this is never used. I think we can actually remove this from the trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch! need change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the formatVersion
to 2.0. There are quite a few files that implement trait Saveable
and have formatVersion
. I don't feel comfortable to change other files for this PR. Maybe I will open a separate jira to remove formatVersion
from trait Saveable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on that, I already have a patch for removing it (moreover this PR can target 2.4, while removal should be done only on master I think). I am submitting it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #22830 for that, thanks.
LGTM |
Test build #98036 has finished for PR 22790 at commit
|
thanks, merging to master/2.4! |
## What changes were proposed in this pull request? The following code in BisectingKMeansModel.load calls the wrong version of load. ``` case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) => val model = SaveLoadV1_0.load(sc, path) ``` Closes #22790 from huaxingao/spark-25793. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit dc9b320) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request? The following code in BisectingKMeansModel.load calls the wrong version of load. ``` case (SaveLoadV2_0.thisClassName, SaveLoadV2_0.thisFormatVersion) => val model = SaveLoadV1_0.load(sc, path) ``` Closes apache#22790 from huaxingao/spark-25793. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The following code in BisectingKMeansModel.load calls the wrong version of load.