-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48177][BUILD] Upgrade Apache Parquet
to 1.14.0
#46447
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, finally.
Please run the following and attach the updated dependency file, @Fokko .
dev/test-dependencies.sh --replace-manifest
Apache Parquet
to 1.14.0
cc @cloud-fan , @HyukjinKwon , @mridulm , @sunchao , @yaooqinn , @LuciferYang , @steveloughran , @viirya , @huaxin, @parthchandra , too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it seems that there exist many unit test failures.
[info] *** 189 TESTS FAILED ***
[error] Failed: Total 1526, Failed 189, Errors 0, Passed 1337, Ignored 597
[error] Failed tests:
[error] org.apache.spark.sql.hive.execution.SQLQuerySuite
[error] org.apache.spark.sql.hive.execution.HiveResolutionSuite
[error] org.apache.spark.sql.hive.execution.HiveDDLSuite
[error] org.apache.spark.sql.hive.execution.HiveQuerySuite
[error] org.apache.spark.sql.hive.execution.SQLQuerySuiteAE
[error] org.apache.spark.sql.hive.execution.HiveSQLViewSuite
[error] org.apache.spark.sql.hive.execution.HashUDAQuerySuite
[error] org.apache.spark.sql.hive.execution.PruneHiveTablePartitionsSuite
[error] org.apache.spark.sql.hive.execution.HiveUDAFSuite
[error] org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite
[error] org.apache.spark.sql.hive.execution.HiveTableScanSuite
[error] org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite
[error] org.apache.spark.sql.hive.execution.HiveCommandSuite
[error] org.apache.spark.sql.hive.execution.HashUDAQueryWithControlledFallbackSuite
[error] org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
[error] org.apache.spark.sql.hive.execution.HiveUDFSuite
[error] org.apache.spark.sql.hive.HiveSparkSubmitSuite
[error] org.apache.spark.sql.hive.execution.HashAggregationQuerySuite
[error] (hive / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1448 s (24:08), completed May 7, 2024, 9:07:49 PM
For example,
- SPARK-6851: Self-joined converted parquet tables *** FAILED *** (4 seconds, 473 milliseconds)
[info] java.util.concurrent.ExecutionException: org.apache.spark.SparkException:
[FAILED_READ_FILE.NO_HINT] Encountered error while reading file
file:///home/runner/work/spark/spark/target/tmp/warehouse-75fc0262-e914-40da-98bf-ad2460270fb5/orders/state=CA/month=20151/part-00000-d46019ae-951c-4974-96da-2b38ade7b49e.c000.snappy.parquet. SQLSTATE: KD001
Oh, it seems that wrong FYI, this PR is supposed to have two files: |
Thanks for pointing out @dongjoon-hyun. I've fixed it right away 👍 |
I have to look into the tests 👀 |
I think the
|
Thanks for digging into this @rshkv, let's follow up on the Parquet side |
What changes were proposed in this pull request?
Why are the changes needed?
Fixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140
Does this PR introduce any user-facing change?
No
How was this patch tested?
Using the existing unit tests
Was this patch authored or co-authored using generative AI tooling?
No