Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18019][ML] Add instrumentation to GBTs #15574

Closed
wants to merge 2 commits into from

Conversation

sethah
Copy link
Contributor

@sethah sethah commented Oct 20, 2016

What changes were proposed in this pull request?

Add instrumentation for logging in ML GBT, part of umbrella ticket SPARK-14567

How was this patch tested?

Tested locally:

16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas)
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0}
...
16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished

@SparkQA
Copy link

SparkQA commented Oct 21, 2016

Test build #67302 has finished for PR 15574 at commit 2daba5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Oct 24, 2016

cc @jkbradley

@jkbradley
Copy link
Member

I'll take a look now

val instr = Instrumentation.create(this, oldDataset)
instr.logParams(params: _*)
instr.logNumFeatures(numFeatures)
instr.logNumClasses(0)
Copy link
Member

@viirya viirya Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to log number of classes for regression?

Copy link
Member

@jkbradley jkbradley Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's odd to log that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I was following the instrumentation for decision trees and random forest, which logs numClasses = 0 when used for regression.

16/10/24 20:21:58 INFO Instrumentation: RandomForestRegressor-rfr_162dc2c01631-1744025389-3: {"numClasses":0}

I will go ahead and remove it from both dt/rf and gbt unless you feel strongly otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm going to leave RF the way it is since it has been logging numClasses as zero for regression since it was created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I don't think we've promised anyone that these schema are stable APIs (and we have no way to test it right now). But that'd be good to do in the future

@jkbradley
Copy link
Member

Other than that, this looks good to me.

@viirya
Copy link
Member

viirya commented Oct 25, 2016

LGTM except for one minor comment.

@sethah
Copy link
Contributor Author

sethah commented Oct 25, 2016

Thanks @viirya and @jkbradley for review. I removed logging numClasses for GBTRegressor.

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67522 has finished for PR 15574 at commit b2ad1c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

@asfgit asfgit closed this in 2c7394a Oct 25, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

Add instrumentation for logging in ML GBT, part of umbrella ticket [SPARK-14567](https://issues.apache.org/jira/browse/SPARK-14567)

## How was this patch tested?

Tested locally:

````
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas)
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0}
...
16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished
````

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15574 from sethah/gbt_instr.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Add instrumentation for logging in ML GBT, part of umbrella ticket [SPARK-14567](https://issues.apache.org/jira/browse/SPARK-14567)

## How was this patch tested?

Tested locally:

````
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas)
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0}
...
16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished
````

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15574 from sethah/gbt_instr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants