Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7659] [SQL] Sort by attributes that are not present in the select clause when there is windowexpression analyze error #6169

Closed
wants to merge 3 commits into from

Conversation

scwf
Copy link
Contributor

@scwf scwf commented May 15, 2015

Jira: https://issues.apache.org/jira/browse/SPARK-7659

If there are WindowExpression in select clause, sorting by attributes that are not present in the child analyze failure. such as

select month,
sum(product) over (partition by month)
from windowData order by area

/cc @yhuai

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32768 has finished for PR 6169 at commit 9d95648.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

I've tested the following query in Hive

SELECT sum(key) OVER (partition by value) FROM src ORDER BY value;

It also failed in semantic analysis, I am not sure why you think we should support this? As the order by expression is not part of the aggregate expressions.

Sorry if misunderstood something.

@yhuai
Copy link
Contributor

yhuai commented May 15, 2015

@chenghao-intel We are little bit different on handling order by columns. For Hive, order by columns need to appear in the select clause. We do not have this restriction. In the analyzer, we will find the missing columns and add it back to the project list.

@scwf
Copy link
Contributor Author

scwf commented May 15, 2015

@chenghao-intel in the old version we support sort by attribute which not in child output, that is what ResolveSortReferences does, so we support

select key form src order by value

and

select sum(value) from src group by key order by key

this two sql is valid in many sql dialect.

after windowfunction is in, this rule is invalid when there are windowfunction in select, this pr is try to fix this. but now this resolution is not right, @yhuai any idea for this?

@chenghao-intel
Copy link
Contributor

Thank you for explanation, I meant the area is neither in groupBy expression nor aggregate expression list, why should we support that?

@chenghao-intel
Copy link
Contributor

Ok, sorry, I understand eventually.

@chenghao-intel
Copy link
Contributor

We should keep adding the missed attributes into the Window, right? And that's also should be in a transitive manner down to the lowest Window.

@yhuai
Copy link
Contributor

yhuai commented May 15, 2015

I feel it is hard to fix it. Since we can end up having a chain of Window operators. Probably the best chance is to add those missing columns to the project list before we extract window expressions and create Window operators. However, I think it requires lots of changes. I guess for now, it will be good to provide an nice error message and ask the user to add missing order by expressions back to the select clause (if users really do not want the order by expressions, put window functions in a sub query and do not project these expressions in the outer query block). In order to generate the nice error message, I think we can just detect the pattern of sort->project->window and see if we have missing columns in the sort operator.

@scwf
Copy link
Contributor Author

scwf commented May 15, 2015

Yes, actually this PR is try to add the missing columns before we extract window expressions and add window node. i will have a try on this way, if we can not handle it easily i will give a error message.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32786 has finished for PR 6169 at commit a8054c3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32790 has finished for PR 6169 at commit f559226.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@scwf
Copy link
Contributor Author

scwf commented May 15, 2015

retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32803 has finished for PR 6169 at commit f559226.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@scwf
Copy link
Contributor Author

scwf commented May 17, 2015

the basic idea here is to ignore window expressions in ResolveSortReferences

@scwf
Copy link
Contributor Author

scwf commented Jul 1, 2015

close this now, will reopen if find a more reasonable fix

@scwf scwf closed this Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants