Skip to content

Conversation

@lu-wang-dl
Copy link
Contributor

What changes were proposed in this pull request?

  • Add OptionalInstrumentation as argument for getNumClasses in ml.classification.Classifier

  • Change the function call for getNumClasses in train() in ml.classification.DecisionTreeClassifier, ml.classification.RandomForestClassifier, and ml.classification.NaiveBayes

  • Modify the instrumentation creation in ml.classification.LinearSVC

  • Change the log call in ml.classification.OneVsRest and ml.classification.LinearSVC

How was this patch tested?

Manual.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@felixcheung
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented May 1, 2018

Test build #4163 has finished for PR 21204 at commit 7b75ed6.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* actual numClasses exceeds maxNumClasses
*/
protected def getNumClasses(dataset: Dataset[_], maxNumClasses: Int = 100): Int = {
protected def getNumClasses(dataset: Dataset[_],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have Instrumentation readily available, I recommend using the old logging here. (That will also avoid this API-breaking change which is causing the MiMA failure.)

val categoricalFeatures: Map[Int, Int] =
MetadataUtils.getCategoricalFeatures(dataset.schema($(featuresCol)))
val numClasses: Int = getNumClasses(dataset)
val numClasses: Int = getNumClasses(dataset, instr = OptionalInstrumentation.create(instr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases like this, you can use instr.logNumClasses() instead of relying on logging within the getNumClasses() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see what else you can log easily (e.g., numFeatures)

@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90070 has finished for PR 21204 at commit 893c93c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lu-wang-dl lu-wang-dl changed the title [SPARK-24132][ML]Expand instrumentation for classification [SPARK-24132][ML] Instrumentation improvement for classification May 2, 2018
@SparkQA
Copy link

SparkQA commented May 2, 2018

Test build #90076 has finished for PR 21204 at commit e41869e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lu-wang-dl
Copy link
Contributor Author

lu-wang-dl commented May 3, 2018

Retest this please.

@SparkQA
Copy link

SparkQA commented May 3, 2018

Test build #90148 has finished for PR 21204 at commit e41869e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2018

Test build #90223 has finished for PR 21204 at commit e41869e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 9, 2018

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 7e73502 May 9, 2018
@lu-wang-dl lu-wang-dl deleted the SPARK-23686 branch May 16, 2018 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants