New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18019][ML] Add instrumentation to GBTs #15574
Conversation
Test build #67302 has finished for PR 15574 at commit
|
cc @jkbradley |
I'll take a look now |
val instr = Instrumentation.create(this, oldDataset) | ||
instr.logParams(params: _*) | ||
instr.logNumFeatures(numFeatures) | ||
instr.logNumClasses(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need to log number of classes for regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's odd to log that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. I was following the instrumentation for decision trees and random forest, which logs numClasses = 0
when used for regression.
16/10/24 20:21:58 INFO Instrumentation: RandomForestRegressor-rfr_162dc2c01631-1744025389-3: {"numClasses":0}
I will go ahead and remove it from both dt/rf and gbt unless you feel strongly otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm going to leave RF the way it is since it has been logging numClasses as zero for regression since it was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I don't think we've promised anyone that these schema are stable APIs (and we have no way to test it right now). But that'd be good to do in the future
Other than that, this looks good to me. |
LGTM except for one minor comment. |
Thanks @viirya and @jkbradley for review. I removed logging numClasses for |
Test build #67522 has finished for PR 15574 at commit
|
LGTM |
## What changes were proposed in this pull request? Add instrumentation for logging in ML GBT, part of umbrella ticket [SPARK-14567](https://issues.apache.org/jira/browse/SPARK-14567) ## How was this patch tested? Tested locally: ```` 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas) 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1} 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2} 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0} ... 16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished ```` Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15574 from sethah/gbt_instr.
## What changes were proposed in this pull request? Add instrumentation for logging in ML GBT, part of umbrella ticket [SPARK-14567](https://issues.apache.org/jira/browse/SPARK-14567) ## How was this patch tested? Tested locally: ```` 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas) 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1} 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2} 16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0} ... 16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished ```` Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15574 from sethah/gbt_instr.
What changes were proposed in this pull request?
Add instrumentation for logging in ML GBT, part of umbrella ticket SPARK-14567
How was this patch tested?
Tested locally: