Skip to content

[SPARK-17505][MLLIB]Add setBins for BinaryClassificationMetrics#15058

Closed
mpjlu wants to merge 5 commits intoapache:masterfrom
mpjlu:addSetNumBins
Closed

[SPARK-17505][MLLIB]Add setBins for BinaryClassificationMetrics#15058
mpjlu wants to merge 5 commits intoapache:masterfrom
mpjlu:addSetNumBins

Conversation

@mpjlu
Copy link
Copy Markdown

@mpjlu mpjlu commented Sep 12, 2016

What changes were proposed in this pull request?

Add a setBins method for BinaryClassificationMetrics.
BinaryClassificationMetrics is a class in mllib/Evaluation. numBins is a key attribute of it. If numBins greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins".
It is useful to let the user set the numBins.

How was this patch tested?

ut cases were changed to test this PR.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@mpjlu mpjlu changed the title [MLLIB]Add setBins for BinaryClassificationMetrics [SPARK-17505][MLLIB]Add setBins for BinaryClassificationMetrics Sep 12, 2016
class BinaryClassificationMetrics @Since("1.3.0") (
@Since("1.3.0") val scoreAndLabels: RDD[(Double, Double)],
@Since("1.3.0") val numBins: Int) extends Logging {
@Since("1.3.0") var numBins: Int) extends Logging {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do it this way, because it won't cause the count/confusion RDDs to recompute. You can already set numBins right here right? what's the value in a setter?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can cause the count/confusion RDD to recompute.
yes, you can set numBins when construct BinaryClassificationMetrics. But when you want to change the numBins, if there is no setBins function, you have to construct another BinaryClassificationMetrics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not create another metrics object? it isn't any harder or more expensive to compute.

Copy link
Copy Markdown
Author

@mpjlu mpjlu Sep 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this method is because, in the code, there is "// TODO: Allow the user to vary the number of bins using a setBins method in BinaryClassificationMetrics. " but I find there is no setBins method. So I add it. If not plan to add this method, maybe we can remove this comments.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can create another metrics. But why not add a setBins method. I think add a setBins method is reasonable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a typo in the comment, and intends to refer to the calling class. See mpjlu@c5c6ade The caller is setting a default to 100, not BinaryClassificationMetrics.

Why not add a method? because of the complexity it introduces -- exactly what your PR has to add. I would not make this change.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree with you, the method to add setBins is complexity. I just can not find simple method to add setBins.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @srowen , can we use a simple way to add the setBins method. Seems add a setBins method is useful.
Just as the testSuite in this PR shows. If the user wants to test different numBins.
Create one object, just call setBins, he can easily get different results.
How about change a way to implement setBins?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see value compared to the complexity this introduces. It makes the object mutable, and doesn't make it any easier to get different binnings than it is now.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got it, thanks.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2016

Test build #65255 has finished for PR 15058 at commit d06b2b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Copy Markdown
Member

srowen commented Sep 15, 2016

I feel strongly enough about it that I'd request we close this PR

@mpjlu
Copy link
Copy Markdown
Author

mpjlu commented Sep 15, 2016

sure, I close it now

@mpjlu mpjlu closed this Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants