Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly #18851

Closed
wants to merge 1 commit into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Aug 5, 2017

What changes were proposed in this pull request?

The definition of maxRows in LocalLimit operator was simply wrong. This patch introduces a new maxRowsPerPartition method and uses that in pruning. The patch also adds more documentation on why we need local limit vs global limit.

Note that this previously has never been a bug because the way the code is structured, but future use of the maxRows could lead to bugs.

How was this patch tested?

Should be covered by existing test cases.

@rxin
Copy link
Contributor Author

rxin commented Aug 5, 2017

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Aug 5, 2017

Test build #80275 has finished for PR 18851 at commit 9911176.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* In most cases when we want to push down limit, it is often better to only push some partition
* local limit. Consider the following:
*
* GlobalLimit(Union(A, B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ')' at the end: GlobalLimit(Union(A, B) -> GlobalLimit(Union(A, B)).

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rxin
Copy link
Contributor Author

rxin commented Aug 7, 2017

Looks like the strip global limit is used by at least some test cases.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 28, 2017

Test build #82285 has finished for PR 18851 at commit 9911176.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants