Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog #38885

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Dec 2, 2022

What changes were proposed in this pull request?

Currently the config spark.sql.sources.useV1SourceList doesn't work with V2 file tables in session catalog, it is always the V1 path that is used. This PR enables V2 file tables (if they are not in spark.sql.sources.useV1SourceList) in read paths via session catalog and fixes a few issues where V2 behaves differently to V1.

Why are the changes needed?

It would be good if we could use the already available V2 file source implmenentaions with the session catalog. We ran into a few problems with V2 optimization paths that want to fix in the future. But, currently Spark don't have built-in catalog support for any of the V2 file table implementations. As a first step this PR enables V2 controlled by spark.sql.sources.useV1SourceList for the select query plans only. All commands and InsertIntoStatement remain using V1 implementations.

The PR also contains some test changes:

  • SQLQuerySuite is splitted into V1 and V2 versions.
  • V2 versions of OrcPartitionDiscoverySuite and ParquetPartitionDiscoverySuite are modified to behave like the V1 versions do. Basically the order of output columns changed in the edge case when partitioning and data columns overlap.

Does this PR introduce any user-facing change?

Yes, see order of output columns when partitioning and data columns overlap.

How was this patch tested?

Existing and new UTs.

@peter-toth
Copy link
Contributor Author

cc @cloud-fan

@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from 38bfc2a to b531c04 Compare December 12, 2022 20:46
@peter-toth peter-toth changed the title [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog Dec 13, 2022
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 8 times, most recently from ac301e3 to d9154d6 Compare December 19, 2022 14:27
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 6 times, most recently from fc1d46b to 6f8b0e7 Compare December 23, 2022 12:59
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 4 times, most recently from c65c7ac to 1bc52a8 Compare January 5, 2023 13:10
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 4 times, most recently from 5649882 to 02e973f Compare January 12, 2023 13:08
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from 02e973f to e80b737 Compare March 6, 2023 17:06
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from e80b737 to 7744619 Compare March 10, 2023 17:23
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch 2 times, most recently from 7e1dce5 to db5cde8 Compare March 14, 2023 15:14
@peter-toth peter-toth force-pushed the SPARK-enable-dsv2-file-source-read-in-session-catalog branch from db5cde8 to 7bcf665 Compare March 15, 2023 18:11
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 24, 2023
@github-actions github-actions bot closed this Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant