Skip to content

Commit

Permalink
[SPARK-28645][SQL] ParseException is thrown when the window is redefined
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Currently in Spark one could redefine a window. For instance:

`select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w AS (ORDER BY unique1);`
The window `w` is defined two times. In PgSQL, on the other hand, a thrown will happen:

`ERROR:  window "w" is already defined`

### Why are the changes needed?
The current implement gives the following window definitions a higher priority. But it wasn't Spark's intention and users can't know from any document of Spark.
This PR fixes the bug.

### Does this PR introduce _any_ user-facing change?
Yes.
There is an example query output with/without this fix.
```
SELECT
    employee_name,
    salary,
    first_value(employee_name) OVER w highest_salary,
    nth_value(employee_name, 2) OVER w second_highest_salary
FROM
    basic_pays
WINDOW
    w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING),
    w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING)
ORDER BY salary DESC
```
The output before this fix:
```
Larry Bott	11798	Larry Bott	Gerard Bondur
Gerard Bondur	11472	Larry Bott	Gerard Bondur
Pamela Castillo	11303	Larry Bott	Gerard Bondur
Barry Jones	10586	Larry Bott	Gerard Bondur
George Vanauf	10563	Larry Bott	Gerard Bondur
Loui Bondur	10449	Larry Bott	Gerard Bondur
Mary Patterson	9998	Larry Bott	Gerard Bondur
Steve Patterson	9441	Larry Bott	Gerard Bondur
Julie Firrelli	9181	Larry Bott	Gerard Bondur
Jeff Firrelli	8992	Larry Bott	Gerard Bondur
William Patterson	8870	Larry Bott	Gerard Bondur
Diane Murphy	8435	Larry Bott	Gerard Bondur
Leslie Jennings	8113	Larry Bott	Gerard Bondur
Gerard Hernandez	6949	Larry Bott	Gerard Bondur
Foon Yue Tseng	6660	Larry Bott	Gerard Bondur
Anthony Bow	6627	Larry Bott	Gerard Bondur
Leslie Thompson	5186	Larry Bott	Gerard Bondur
```
The output after this fix:
```
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

The definition of window 'w' is repetitive(line 8, pos 0)
```

### How was this patch tested?
Jenkins test.

Closes #30512 from beliefer/SPARK-28645.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
2 people authored and cloud-fan committed Nov 27, 2020
1 parent 2c41d9d commit e432550
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 5 deletions.
Expand Up @@ -815,10 +815,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
ctx: WindowClauseContext,
query: LogicalPlan): LogicalPlan = withOrigin(ctx) {
// Collect all window specifications defined in the WINDOW clause.
val baseWindowMap = ctx.namedWindow.asScala.map {
val baseWindowTuples = ctx.namedWindow.asScala.map {
wCtx =>
(wCtx.name.getText, typedVisit[WindowSpec](wCtx.windowSpec))
}.toMap
}
baseWindowTuples.groupBy(_._1).foreach { kv =>
if (kv._2.size > 1) {
throw new ParseException(s"The definition of window '${kv._1}' is repetitive", ctx)
}
}
val baseWindowMap = baseWindowTuples.toMap

// Handle cases like
// window w1 as (partition by p_mfgr order by p_name
Expand Down
14 changes: 13 additions & 1 deletion sql/core/src/test/resources/sql-tests/inputs/window.sql
Expand Up @@ -250,4 +250,16 @@ WINDOW w AS (
ORDER BY salary DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY department;
ORDER BY department;

SELECT
employee_name,
salary,
first_value(employee_name) OVER w highest_salary,
nth_value(employee_name, 2) OVER w second_highest_salary
FROM
basic_pays
WINDOW
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING),
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING)
ORDER BY salary DESC;
38 changes: 36 additions & 2 deletions sql/core/src/test/resources/sql-tests/results/window.sql.out
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 35
-- Number of queries: 36


-- !query
Expand Down Expand Up @@ -739,4 +739,38 @@ Gerard Hernandez SCM 6949 Larry Bott Pamela Castillo
George Vanauf Sales 10563 George Vanauf Steve Patterson
Steve Patterson Sales 9441 George Vanauf Steve Patterson
Julie Firrelli Sales 9181 George Vanauf Steve Patterson
Foon Yue Tseng Sales 6660 George Vanauf Steve Patterson
Foon Yue Tseng Sales 6660 George Vanauf Steve Patterson


-- !query
SELECT
employee_name,
salary,
first_value(employee_name) OVER w highest_salary,
nth_value(employee_name, 2) OVER w second_highest_salary
FROM
basic_pays
WINDOW
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING),
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING)
ORDER BY salary DESC
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.parser.ParseException

The definition of window 'w' is repetitive(line 8, pos 0)

== SQL ==
SELECT
employee_name,
salary,
first_value(employee_name) OVER w highest_salary,
nth_value(employee_name, 2) OVER w second_highest_salary
FROM
basic_pays
WINDOW
^^^
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING),
w AS (ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING)
ORDER BY salary DESC

0 comments on commit e432550

Please sign in to comment.