Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6684] [mllib] [ml] Add checkpointing to GBTs #7804

Closed
wants to merge 3 commits into from

Conversation

jkbradley
Copy link
Member

Add checkpointing to GradientBoostedTrees, GBTClassifier, GBTRegressor

CC: @mengxr

@@ -144,6 +144,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
this.checkpointInterval = lda.getCheckpointInterval
this.graphCheckpointer = new PeriodicGraphCheckpointer[TopicCounts, TokenCount](
checkpointInterval, graph.vertices.sparkContext)
this.graphCheckpointer.update(this.graph)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been done in the previous PR.

@@ -269,6 +269,8 @@ object GradientBoostedTrees extends Logging {
logInfo("Internal timing for DecisionTree:")
logInfo(s"$timer")

predErrorCheckpointer.deleteAllCheckpoints()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR, but what if we want to keep the last RDD checkpointed in the queue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be a problem for recoverability (unless the driver crashes), but that sounds useful for model stats. To do!

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

LGTM except one minor comment.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39097 has finished for PR 7804 at commit 9cc3a04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39101 has finished for PR 7804 at commit b3e160c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class QRDecomposition[UType, VType](Q: UType, R: VType)
    • case class UnixTimestamp(timeExp: Expression, format: Expression)
    • case class FromUnixTime(sec: Expression, format: Expression)
    • abstract class ArrayData extends SpecializedGetters with Serializable
    • class GenericArrayData(array: Array[Any]) extends ArrayData

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39111 has finished for PR 7804 at commit 3fbd7ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

merging with master. Thanks for the review!

@asfgit asfgit closed this in be7be6d Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants