Test Apache ORC 1.9.1-SNAPSHOT #1

dongjoon-hyun · 2023-08-11T03:11:52Z

No description provided.

### What changes were proposed in this pull request? This PR uses SMALLINT (as TINYINT ranges [0, 255]) instead of BYTE to fix the ByteType mapping for MsSQLServer JDBC ```java [info] com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #1: Cannot find data type BYTE. [info] at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1662) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:898) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:793) [info] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7417) [info] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3488) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:262) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:237) [info] at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeUpdate(SQLServerStatement.java:733) [info] at org.apache.spark.sql.jdbc.JdbcDialect.createTable(JdbcDialects.scala:267) ``` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#46164 from yaooqinn/SPARK-47938. Lead-authored-by: Kent Yao <yao@apache.org> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…rtition data results should return user-facing error ### What changes were proposed in this pull request? Create an example parquet table with partitions and insert data in Spark: ``` create table t(col1 string, col2 string, col3 string) using parquet location 'some/path/parquet-test' partitioned by (col1, col2); insert into t (col1, col2, col3) values ('a', 'b', 'c'); ``` Go into the `parquet-test` path in the filesystem and try to copy parquet data file from path `col1=a/col2=b` directory into `col1=a`. After that, try to create new table based on parquet data in Spark: ``` create table broken_table using parquet location 'some/path/parquet-test'; ``` This query errors with internal error. Stack trace excerpts: ``` org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000 ... Caused by: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: Partition column name list #0: col1 Partition column name list #1: col1, col2For partitioned table directories, data files should only live in leaf directories. And directories at the same level should have the same partition column name. Please check the following directories for unexpected files or inconsistent partition column names: file:some/path/parquet-test/col1=a file:some/path/parquet-test/col1=a/col2=b at scala.Predef$.assert(Predef.scala:279) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:391) ... ``` Fix this by changing internal error to user-facing error. ### Why are the changes needed? Replace internal error with user-facing one for valid sequence of Spark SQL operations. ### Does this PR introduce _any_ user-facing change? Yes, it presents the user with regular error instead of internal error. ### How was this patch tested? Added checks to `ParquetPartitionDiscoverySuite` which simulate the described scenario by manually breaking parquet table in the filesystem. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47668 from nikolamand-db/SPARK-49163. Authored-by: Nikola Mandic <nikola.mandic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dongjoon-hyun closed this Aug 11, 2023

dongjoon-hyun reopened this Aug 11, 2023

github-actions bot added the BUILD label Aug 11, 2023

dongjoon-hyun force-pushed the SPARK_ORC_1.9.1_SNAPSHOT branch from 6c98ee3 to 449e38e Compare August 11, 2023 03:13

dongjoon-hyun mentioned this pull request Aug 11, 2023

Release Apache ORC 1.9.1 apache/orc#1578

Closed

15 tasks

dongjoon-hyun force-pushed the SPARK_ORC_1.9.1_SNAPSHOT branch 2 times, most recently from c15c9db to 0dac68a Compare August 11, 2023 03:30

dongjoon-hyun changed the title ~~Upgrade ORC to 1.9.0-SNAPSHOT~~ Upgrade ORC to 1.9.1-SNAPSHOT Aug 11, 2023

dongjoon-hyun changed the title ~~Upgrade ORC to 1.9.1-SNAPSHOT~~ Test Apache ORC 1.9.1-SNAPSHOT Aug 11, 2023

Upgrade ORC to 1.9.0-SNAPSHOT

dbac681

dongjoon-hyun force-pushed the SPARK_ORC_1.9.1_SNAPSHOT branch from 0dac68a to dbac681 Compare August 11, 2023 06:11

dongjoon-hyun closed this Aug 17, 2023

dongjoon-hyun mentioned this pull request Nov 5, 2023

Release Apache ORC 1.9.2 apache/orc#1647

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Apache ORC 1.9.1-SNAPSHOT #1

Test Apache ORC 1.9.1-SNAPSHOT #1

dongjoon-hyun commented Aug 11, 2023

Test Apache ORC 1.9.1-SNAPSHOT #1

Test Apache ORC 1.9.1-SNAPSHOT #1

Conversation

dongjoon-hyun commented Aug 11, 2023