Skip to content

[SPARK-14154] [MLlib] Simplify the implementation for Kolmogorov–Smirnov test#11954

Closed
hhbyyh wants to merge 2 commits intoapache:masterfrom
hhbyyh:ksoptimize
Closed

[SPARK-14154] [MLlib] Simplify the implementation for Kolmogorov–Smirnov test#11954
hhbyyh wants to merge 2 commits intoapache:masterfrom
hhbyyh:ksoptimize

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Mar 25, 2016

What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-14154

I just read the code for KolmogorovSmirnovTest and find it could be much simplified following the original definition.

Send a PR for discussion

How was this patch tested?

unit test

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54161 has finished for PR 11954 at commit c7dcb97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val ksStat = searchOneSampleStatistic(localData, n) // result: global extreme
val ksStat = data.sortBy(x => x).zipWithIndex().map { case (v, i) =>
val f = cdf(v)
math.max(f - i.toDouble / n, (i + 1).toDouble / n - f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need toDouble if n is already a Double. It looks like the first element you compute here has an opposite sign to what was there before. Am I missing something or is that change unintentional? EDIT: oh, it's because the original impl took the abs later. Yes dl should be less than the cdf so this makes it positive. Eyeballing it, I think this is indeed an equivalent but much simpler computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sean. I didn't notice n is already a Double. Will change that.

@SparkQA
Copy link

SparkQA commented Mar 27, 2016

Test build #54284 has finished for PR 11954 at commit fa94568.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 28, 2016

Test build #54297 has finished for PR 11954 at commit fa94568.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 29, 2016

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants