[SPARK-14298] [ML] [MLlib] LDA should support disable checkpoint#12089
[SPARK-14298] [ML] [MLlib] LDA should support disable checkpoint#12089yanboliang wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #54636 has finished for PR 12089 at commit
|
|
Thanks for catching this. I'd prefer we make the change in PeriodicCheckpointer. That way, LDA with EM will still call PeriodicCheckpointer, which can handle persistence even if checkpointing is turned off. |
|
@jkbradley Agree, I will update the PR soon. |
|
Test build #55213 has finished for PR 12089 at commit
|
|
Test build #55214 has finished for PR 12089 at commit
|
|
Jenkins, test this please. |
|
Test build #55310 has finished for PR 12089 at commit
|
|
@jkbradley It's ready for review now. Thanks! |
|
Thanks for updating it! Since this was a bug from before, let's add a unit test. It should be easy to write a unit test now that we keep the last checkpoint by default. However, that test will be hard to backport to earlier Spark versions. Let's do this:
|
|
@jkbradley Sent #12286 for unit test, please help to review. Thanks! |
## What changes were proposed in this pull request? This is follow up for #12089, add unit test for EM LDA which test disable checkpointing when set ```checkpointInterval = -1```. ## How was this patch tested? unit test. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12286 from yanboliang/spark-14298-followup.
|
Backported to 1.6 and 1.5 as well now |
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
## What changes were proposed in this pull request? In the doc of [```checkpointInterval```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala#L241), we told users that they can disable checkpoint by setting ```checkpointInterval = -1```. But we did not handle this situation for LDA actually, we should fix this bug. ## How was this patch tested? Existing tests. cc jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#12089 from yanboliang/spark-14298. (cherry picked from commit 56af8e8) Signed-off-by: Joseph K. Bradley <joseph@databricks.com> (cherry picked from commit 05dbc28)
What changes were proposed in this pull request?
In the doc of
checkpointInterval, we told users that they can disable checkpoint by settingcheckpointInterval = -1. But we did not handle this situation for LDA actually, we should fix this bug.How was this patch tested?
Existing tests.
cc @jkbradley