[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

peter-toth · 2022-12-02T13:27:34Z

What changes were proposed in this pull request?

Currently the config spark.sql.sources.useV1SourceList doesn't work with V2 file tables in session catalog, it is always the V1 path that is used. This PR enables V2 file tables (if they are not in spark.sql.sources.useV1SourceList) in read paths via session catalog and fixes a few issues where V2 behaves differently to V1.

Why are the changes needed?

It would be good if we could use the already available V2 file source implmenentaions with the session catalog. We ran into a few problems with V2 optimization paths that want to fix in the future. But, currently Spark don't have built-in catalog support for any of the V2 file table implementations. As a first step this PR enables V2 controlled by spark.sql.sources.useV1SourceList for the select query plans only. All commands and InsertIntoStatement remain using V1 implementations.

The PR also contains some test changes:

SQLQuerySuite is splitted into V1 and V2 versions.
V2 versions of OrcPartitionDiscoverySuite and ParquetPartitionDiscoverySuite are modified to behave like the V1 versions do. Basically the order of output columns changed in the edge case when partitioning and data columns overlap.

Does this PR introduce any user-facing change?

Yes, see order of output columns when partitioning and data columns overlap.

How was this patch tested?

Existing and new UTs.

peter-toth · 2022-12-02T13:35:08Z

cc @cloud-fan

github-actions · 2023-06-24T00:24:04Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added AVRO SQL STRUCTURED STREAMING labels Dec 2, 2022

peter-toth mentioned this pull request Dec 2, 2022

[WIP][SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites #38640

Closed

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from 38bfc2a to b531c04 Compare December 12, 2022 20:46

peter-toth changed the title ~~[SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog~~ [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog Dec 13, 2022

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 8 times, most recently from ac301e3 to d9154d6 Compare December 19, 2022 14:27

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 6 times, most recently from fc1d46b to 6f8b0e7 Compare December 23, 2022 12:59

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 4 times, most recently from c65c7ac to 1bc52a8 Compare January 5, 2023 13:10

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 4 times, most recently from 5649882 to 02e973f Compare January 12, 2023 13:08

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from 02e973f to e80b737 Compare March 6, 2023 17:06

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from e80b737 to 7744619 Compare March 10, 2023 17:23

github-actions bot added the CONNECT label Mar 10, 2023

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 2 times, most recently from 7e1dce5 to db5cde8 Compare March 14, 2023 15:14

peter-toth added 2 commits March 15, 2023 19:10

Enable V2 file tables in read paths in session catalog

8f676a3

quick test to enable all v2 file sources

7bcf665

peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from db5cde8 to 7bcf665 Compare March 15, 2023 18:11

github-actions bot added CORE PYTHON labels Mar 15, 2023

github-actions bot added the Stale label Jun 24, 2023

github-actions bot closed this Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

peter-toth commented Dec 2, 2022 •

edited

peter-toth commented Dec 2, 2022

github-actions bot commented Jun 24, 2023

[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

Conversation

peter-toth commented Dec 2, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

peter-toth commented Dec 2, 2022

github-actions bot commented Jun 24, 2023

peter-toth commented Dec 2, 2022 •

edited