Test Apache ORC 1.8.6-SNAPSHOT #4

dongjoon-hyun · 2023-11-05T06:25:43Z

No description provided.

### What changes were proposed in this pull request? In the `Window` node, both `partitionSpec` and `orderSpec` must be orderable, but the current type check only verifies `orderSpec` is orderable. This can cause an error in later optimizing phases. Given a query: ``` with t as (select id, map(id, id) as m from range(0, 10)) select rank() over (partition by m order by id) from t ``` Before the PR, it fails with an `INTERNAL_ERROR`: ``` org.apache.spark.SparkException: [INTERNAL_ERROR] grouping/join/window partition keys cannot be map type. SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.needNormalize(NormalizeFloatingNumbers.scala:103) at org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers$.org$apache$spark$sql$catalyst$optimizer$NormalizeFloatingNumbers$$needNormalize(NormalizeFloatingNumbers.scala:94) ... ``` After the PR, it fails with a `EXPRESSION_TYPE_IS_NOT_ORDERABLE`, which is expected: ``` org.apache.spark.sql.catalyst.ExtendedAnalysisException: [EXPRESSION_TYPE_IS_NOT_ORDERABLE] Column expression "m" cannot be sorted because its type "MAP<BIGINT, BIGINT>" is not orderable. SQLSTATE: 42822; line 2 pos 53; Project [RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4] +- Project [id#1L, m#0, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4, RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4] +- Window [rank(id#1L) windowspecdefinition(m#0, id#1L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS RANK() OVER (PARTITION BY m ORDER BY id ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)#4], [m#0], [id#1L ASC NULLS FIRST] +- Project [id#1L, m#0] +- SubqueryAlias t +- SubqueryAlias t +- Project [id#1L, map(id#1L, id#1L) AS m#0] +- Range (0, 10, step=1, splits=None) at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) ... ``` ### How was this patch tested? Unit test. Closes apache#45730 from chenhao-db/SPARK-47572. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Test Apache ORC 1.8.6-SNAPSHOT

cbe466c

github-actions bot added the BUILD label Nov 5, 2023

dongjoon-hyun mentioned this pull request Nov 5, 2023

Release Apache ORC 1.8.6 apache/orc#1650

Closed

16 tasks

dongjoon-hyun closed this Nov 10, 2023

dongjoon-hyun mentioned this pull request Apr 10, 2024

Release Apache ORC 1.8.7 apache/orc#1887

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Apache ORC 1.8.6-SNAPSHOT #4

Test Apache ORC 1.8.6-SNAPSHOT #4

dongjoon-hyun commented Nov 5, 2023

Test Apache ORC 1.8.6-SNAPSHOT #4

Test Apache ORC 1.8.6-SNAPSHOT #4

Conversation

dongjoon-hyun commented Nov 5, 2023