[SPARK-14154] [MLlib] Simplify the implementation for Kolmogorov–Smirnov test#11954
[SPARK-14154] [MLlib] Simplify the implementation for Kolmogorov–Smirnov test#11954hhbyyh wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #54161 has finished for PR 11954 at commit
|
| val ksStat = searchOneSampleStatistic(localData, n) // result: global extreme | ||
| val ksStat = data.sortBy(x => x).zipWithIndex().map { case (v, i) => | ||
| val f = cdf(v) | ||
| math.max(f - i.toDouble / n, (i + 1).toDouble / n - f) |
There was a problem hiding this comment.
You don't need toDouble if n is already a Double. It looks like the first element you compute here has an opposite sign to what was there before. Am I missing something or is that change unintentional? EDIT: oh, it's because the original impl took the abs later. Yes dl should be less than the cdf so this makes it positive. Eyeballing it, I think this is indeed an equivalent but much simpler computation.
There was a problem hiding this comment.
Thanks Sean. I didn't notice n is already a Double. Will change that.
|
Test build #54284 has finished for PR 11954 at commit
|
|
Test build #54297 has finished for PR 11954 at commit
|
|
Merged to master |
What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-14154
I just read the code for KolmogorovSmirnovTest and find it could be much simplified following the original definition.
Send a PR for discussion
How was this patch tested?
unit test