New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19851] Add support for EVERY and ANY (SOME) aggregates #22047

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@dilipbiswal
Contributor

dilipbiswal commented Aug 8, 2018

What changes were proposed in this pull request?

This PR is a rebased version of original work link by
@ptkool.

Please give credit to @ptkool for this work.

Description from original PR:
This pull request implements the EVERY and ANY aggregates.

How was this patch tested?

Testing was performed using unit tests, integration tests, and manual tests.

@dilipbiswal

This comment has been minimized.

Show comment
Hide comment
@dilipbiswal

dilipbiswal Aug 8, 2018

Contributor

@gatorsmile I tried to implement the rewrites suggested in the original PR. It does not seem very straightforward to me. The basic issue is, we are unable to replace the aggregate expression to a scalar expression over aggregates. We only support limited number of true aggregate expressions under window.

For example -we are unable to rewrite .

select key, value, some(value) over(partition by key order by value) from src group by key, value

to

select key, value, coalesce(max(c1) == true, false) over(partition by key order by value) from src group by key, value

I tried a similar frame work to replace aggregate expressions like ReplaceExpressions. Please let me know what you think.

Contributor

dilipbiswal commented Aug 8, 2018

@gatorsmile I tried to implement the rewrites suggested in the original PR. It does not seem very straightforward to me. The basic issue is, we are unable to replace the aggregate expression to a scalar expression over aggregates. We only support limited number of true aggregate expressions under window.

For example -we are unable to rewrite .

select key, value, some(value) over(partition by key order by value) from src group by key, value

to

select key, value, coalesce(max(c1) == true, false) over(partition by key order by value) from src group by key, value

I tried a similar frame work to replace aggregate expressions like ReplaceExpressions. Please let me know what you think.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 8, 2018

Test build #94456 has finished for PR 22047 at commit 9503d9e.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 8, 2018

Test build #94456 has finished for PR 22047 at commit 9503d9e.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 9, 2018

Test build #94459 has finished for PR 22047 at commit 6288a05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 9, 2018

Test build #94459 has finished for PR 22047 at commit 6288a05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon

This comment has been minimized.

Show comment
Hide comment
@HyukjinKwon

HyukjinKwon Aug 10, 2018

Member

Please give credit to @ptkool for this work.

FWIW, we can now credit to multiple people per 51bee7a :-)

Member

HyukjinKwon commented Aug 10, 2018

Please give credit to @ptkool for this work.

FWIW, we can now credit to multiple people per 51bee7a :-)

usage = "_FUNC_(expr) - Returns the character length of string data or number of bytes of " +
"binary data. The length of string data includes the trailing spaces. The length of binary " +
"data includes binary zeros.",
usage = """

This comment has been minimized.

@HyukjinKwon

HyukjinKwon Aug 10, 2018

Member

Looks unrelated

@HyukjinKwon

HyukjinKwon Aug 10, 2018

Member

Looks unrelated

This comment has been minimized.

@dilipbiswal

dilipbiswal Aug 10, 2018

Contributor

@HyukjinKwon Not sure why.. when i did a build/sbt doc , i got an error here. Thats the reason i had to fix.

@dilipbiswal

dilipbiswal Aug 10, 2018

Contributor

@HyukjinKwon Not sure why.. when i did a build/sbt doc , i got an error here. Thats the reason i had to fix.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 10, 2018

Test build #94586 has finished for PR 22047 at commit af4d901.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 10, 2018

Test build #94586 has finished for PR 22047 at commit af4d901.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 11, 2018

Test build #94588 has finished for PR 22047 at commit 6593cf4.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 11, 2018

Test build #94588 has finished for PR 22047 at commit 6593cf4.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dilipbiswal

This comment has been minimized.

Show comment
Hide comment
@dilipbiswal

dilipbiswal Aug 11, 2018

Contributor

retest this please

Contributor

dilipbiswal commented Aug 11, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Aug 11, 2018

Test build #94602 has finished for PR 22047 at commit 6593cf4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Aug 11, 2018

Test build #94602 has finished for PR 22047 at commit 6593cf4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment