Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18019][ML] Add instrumentation to GBTs #15574

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -137,9 +137,17 @@ class GBTClassifier @Since("1.4.0") (
}
val numFeatures = oldDataset.first().features.size
val boostingStrategy = super.getOldBoostingStrategy(categoricalFeatures, OldAlgo.Classification)

val instr = Instrumentation.create(this, oldDataset)
instr.logParams(params: _*)
instr.logNumFeatures(numFeatures)
instr.logNumClasses(2)

val (baseLearners, learnerWeights) = GradientBoostedTrees.run(oldDataset, boostingStrategy,
$(seed))
new GBTClassificationModel(uid, baseLearners, learnerWeights, numFeatures)
val m = new GBTClassificationModel(uid, baseLearners, learnerWeights, numFeatures)
instr.logSuccess(m)
m
}

@Since("1.4.1")
Expand Down
Expand Up @@ -123,9 +123,17 @@ class GBTRegressor @Since("1.4.0") (@Since("1.4.0") override val uid: String)
val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset)
val numFeatures = oldDataset.first().features.size
val boostingStrategy = super.getOldBoostingStrategy(categoricalFeatures, OldAlgo.Regression)

val instr = Instrumentation.create(this, oldDataset)
instr.logParams(params: _*)
instr.logNumFeatures(numFeatures)
instr.logNumClasses(0)
Copy link
Member

@viirya viirya Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to log number of classes for regression?

Copy link
Member

@jkbradley jkbradley Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's odd to log that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I was following the instrumentation for decision trees and random forest, which logs numClasses = 0 when used for regression.

16/10/24 20:21:58 INFO Instrumentation: RandomForestRegressor-rfr_162dc2c01631-1744025389-3: {"numClasses":0}

I will go ahead and remove it from both dt/rf and gbt unless you feel strongly otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm going to leave RF the way it is since it has been logging numClasses as zero for regression since it was created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I don't think we've promised anyone that these schema are stable APIs (and we have no way to test it right now). But that'd be good to do in the future


val (baseLearners, learnerWeights) = GradientBoostedTrees.run(oldDataset, boostingStrategy,
$(seed))
new GBTRegressionModel(uid, baseLearners, learnerWeights, numFeatures)
val m = new GBTRegressionModel(uid, baseLearners, learnerWeights, numFeatures)
instr.logSuccess(m)
m
}

@Since("1.4.0")
Expand Down