-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24132][ML] Instrumentation improvement for classification #21204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #4163 has finished for PR 21204 at commit
|
| * actual numClasses exceeds maxNumClasses | ||
| */ | ||
| protected def getNumClasses(dataset: Dataset[_], maxNumClasses: Int = 100): Int = { | ||
| protected def getNumClasses(dataset: Dataset[_], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have Instrumentation readily available, I recommend using the old logging here. (That will also avoid this API-breaking change which is causing the MiMA failure.)
| val categoricalFeatures: Map[Int, Int] = | ||
| MetadataUtils.getCategoricalFeatures(dataset.schema($(featuresCol))) | ||
| val numClasses: Int = getNumClasses(dataset) | ||
| val numClasses: Int = getNumClasses(dataset, instr = OptionalInstrumentation.create(instr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like this, you can use instr.logNumClasses() instead of relying on logging within the getNumClasses() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see what else you can log easily (e.g., numFeatures)
|
Test build #90070 has finished for PR 21204 at commit
|
|
Test build #90076 has finished for PR 21204 at commit
|
|
Retest this please. |
|
Test build #90148 has finished for PR 21204 at commit
|
|
Test build #90223 has finished for PR 21204 at commit
|
|
LGTM. Merged into master. Thanks! |
What changes were proposed in this pull request?
Add OptionalInstrumentation as argument for getNumClasses in ml.classification.Classifier
Change the function call for getNumClasses in train() in ml.classification.DecisionTreeClassifier, ml.classification.RandomForestClassifier, and ml.classification.NaiveBayes
Modify the instrumentation creation in ml.classification.LinearSVC
Change the log call in ml.classification.OneVsRest and ml.classification.LinearSVC
How was this patch tested?
Manual.
Please review http://spark.apache.org/contributing.html before opening a pull request.