Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18111] [SQL] Wrong ApproximatePercentile answer when multiple records have the minimum value #15641

Closed
wants to merge 4 commits into from

Conversation

wzhfy
Copy link
Contributor

@wzhfy wzhfy commented Oct 26, 2016

What changes were proposed in this pull request?

When multiple records have the minimum value, the answer of ApproximatePercentile is wrong.

How was this patch tested?

add a test case

@wzhfy wzhfy changed the title [SPARK-18111] [SQL] add the min element when compressing samples in QuantileSummaries [SPARK-18111] [SQL] Wrong ApproximatePercentile answer when multiple records have the minimum value Oct 26, 2016
@wzhfy
Copy link
Contributor Author

wzhfy commented Oct 26, 2016

/cc @thunterdb @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67566 has finished for PR 15641 at commit 3ad5920.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67574 has finished for PR 15641 at commit de1fdd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get @thunterdb 's OK but that looks good to me.

@SparkQA
Copy link

SparkQA commented Oct 27, 2016

Test build #67634 has finished for PR 15641 at commit 0a5489c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor Author

wzhfy commented Oct 27, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2016

Test build #67639 has finished for PR 15641 at commit 0a5489c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor Author

wzhfy commented Oct 27, 2016

Seems many test builds failed in org.apache.spark.sql.streaming.StreamingQuerySuite.StreamExecution metadata garbage collection. Do you know why? @srowen

@wzhfy
Copy link
Contributor Author

wzhfy commented Oct 27, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Oct 27, 2016

Test build #67652 has finished for PR 15641 at commit 0a5489c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 31, 2016

@thunterdb It looks reasonable to my understanding, having looked at this code with you before, so if you don't object I'll merge this tomorrow.

@srowen
Copy link
Member

srowen commented Nov 1, 2016

Merged to master

@asfgit asfgit closed this in cb80edc Nov 1, 2016
@srowen
Copy link
Member

srowen commented Nov 1, 2016

It has a merge conflict in 2.0, but if you'd like to get it into 2.0 and could open a similar PR for testing, I can merge that.

@wzhfy
Copy link
Contributor Author

wzhfy commented Nov 1, 2016

@srowen Seems branch-2.0 doesn't have related files, ApproximatePercentile.scala is not merged in 2.0?

@srowen
Copy link
Member

srowen commented Nov 1, 2016

Yeah it was in sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala before in 2.0. I'm not sure if it has the same problem or not.

@wzhfy
Copy link
Contributor Author

wzhfy commented Nov 1, 2016

Got it, thanks! I think it also has the problem, let me open a PR for it.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ecords have the minimum value

## What changes were proposed in this pull request?

When multiple records have the minimum value, the answer of ApproximatePercentile is wrong.
## How was this patch tested?

add a test case

Author: wangzhenhua <wangzhenhua@huawei.com>

Closes apache#15641 from wzhfy/percentile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants