Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-40002][SQL] Don't push down limit through window using ntile
### What changes were proposed in this pull request? Change `LimitPushDownThroughWindow` so that it no longer supports pushing down a limit through a window using ntile. ### Why are the changes needed? In an unpartitioned window, the ntile function is currently applied to the result of the limit. This behavior produces results that conflict with Spark 3.1.3, Hive 2.3.9 and Prestodb 0.268 #### Example Assume this data: ``` create table t1 stored as parquet as select * from range(101); ``` Also assume this query: ``` select id, ntile(10) over (order by id) as nt from t1 limit 10; ``` With Spark 3.2.2, Spark 3.3.0, and master, the limit is applied before the ntile function: ``` +---+---+ |id |nt | +---+---+ |0 |1 | |1 |2 | |2 |3 | |3 |4 | |4 |5 | |5 |6 | |6 |7 | |7 |8 | |8 |9 | |9 |10 | +---+---+ ``` With Spark 3.1.3, and Hive 2.3.9, and Prestodb 0.268, the limit is applied _after_ ntile. Spark 3.1.3: ``` +---+---+ |id |nt | +---+---+ |0 |1 | |1 |1 | |2 |1 | |3 |1 | |4 |1 | |5 |1 | |6 |1 | |7 |1 | |8 |1 | |9 |1 | +---+---+ ``` Hive 2.3.9: ``` +-----+-----+ | id | nt | +-----+-----+ | 0 | 1 | | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 1 | | 8 | 1 | | 9 | 1 | +-----+-----+ 10 rows selected (1.72 seconds) ``` Prestodb 0.268: ``` id | nt ----+---- 0 | 1 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 6 | 1 7 | 1 8 | 1 9 | 1 (10 rows) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Two new unit tests. Closes #37443 from bersprockets/pushdown_ntile. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit c9156e5) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information