Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31138][ML][FOLLOWUP] ANOVA optimization #27979

Closed
wants to merge 1 commit into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Mar 22, 2020

What changes were proposed in this pull request?

1, remove unused var numFeatures;
2, remove the computation of numSamples and numClasses, since they can be directly infered by counts: OpenHashMap[Double, Long]

Why are the changes needed?

remove a unnecessary job to compute numSamples and numClasses

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuites

nit
val ssTot = sumOfSq - sqSum / numSamples
).map { case (col, (sum, sumOfSq, sums, counts)) =>
val numSamples = counts.iterator.map(_._2).sum
val numClasses = counts.size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly get numSamples and numClasses here

@SparkQA
Copy link

SparkQA commented Mar 22, 2020

Test build #120165 has finished for PR 27979 at commit 7048f8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me

@huaxingao
Copy link
Contributor

LGTM. Thanks @zhengruifeng

@zhengruifeng
Copy link
Contributor Author

Merged to master, thanks @srowen @huaxingao

@zhengruifeng zhengruifeng deleted the anova_followup branch March 23, 2020 03:18
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
1, remove unused var `numFeatures`;
2, remove the computation of `numSamples` and `numClasses`, since they can be directly infered by `counts: OpenHashMap[Double, Long]`

### Why are the changes needed?
remove a unnecessary job to compute `numSamples` and `numClasses`

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
existing testsuites

Closes apache#27979 from zhengruifeng/anova_followup.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants