[SPARK-17505][MLLIB]Add setBins for BinaryClassificationMetrics#15058
[SPARK-17505][MLLIB]Add setBins for BinaryClassificationMetrics#15058mpjlu wants to merge 5 commits intoapache:masterfrom
Conversation
| class BinaryClassificationMetrics @Since("1.3.0") ( | ||
| @Since("1.3.0") val scoreAndLabels: RDD[(Double, Double)], | ||
| @Since("1.3.0") val numBins: Int) extends Logging { | ||
| @Since("1.3.0") var numBins: Int) extends Logging { |
There was a problem hiding this comment.
You can't do it this way, because it won't cause the count/confusion RDDs to recompute. You can already set numBins right here right? what's the value in a setter?
There was a problem hiding this comment.
It can cause the count/confusion RDD to recompute.
yes, you can set numBins when construct BinaryClassificationMetrics. But when you want to change the numBins, if there is no setBins function, you have to construct another BinaryClassificationMetrics.
There was a problem hiding this comment.
Yes, why not create another metrics object? it isn't any harder or more expensive to compute.
There was a problem hiding this comment.
I add this method is because, in the code, there is "// TODO: Allow the user to vary the number of bins using a setBins method in BinaryClassificationMetrics. " but I find there is no setBins method. So I add it. If not plan to add this method, maybe we can remove this comments.
There was a problem hiding this comment.
Yes, we can create another metrics. But why not add a setBins method. I think add a setBins method is reasonable.
There was a problem hiding this comment.
It looks like a typo in the comment, and intends to refer to the calling class. See mpjlu@c5c6ade The caller is setting a default to 100, not BinaryClassificationMetrics.
Why not add a method? because of the complexity it introduces -- exactly what your PR has to add. I would not make this change.
There was a problem hiding this comment.
Thanks. I agree with you, the method to add setBins is complexity. I just can not find simple method to add setBins.
There was a problem hiding this comment.
hi @srowen , can we use a simple way to add the setBins method. Seems add a setBins method is useful.
Just as the testSuite in this PR shows. If the user wants to test different numBins.
Create one object, just call setBins, he can easily get different results.
How about change a way to implement setBins?
There was a problem hiding this comment.
I don't see value compared to the complexity this introduces. It makes the object mutable, and doesn't make it any easier to get different binnings than it is now.
|
Test build #65255 has finished for PR 15058 at commit
|
|
I feel strongly enough about it that I'd request we close this PR |
|
sure, I close it now |
What changes were proposed in this pull request?
Add a setBins method for BinaryClassificationMetrics.
BinaryClassificationMetrics is a class in mllib/Evaluation. numBins is a key attribute of it. If numBins greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins".
It is useful to let the user set the numBins.
How was this patch tested?
ut cases were changed to test this PR.
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)