Skip to content

Conversation

@vladimirg-db
Copy link
Contributor

@vladimirg-db vladimirg-db commented Feb 27, 2025

What changes were proposed in this pull request?

Add maxRows field to CTERelationRef.

Why are the changes needed?

The Analyzer validates scalar subqueries by checking if it outputs just one row or not: https://github.com/vladimirg-db/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ValidateSubqueryExpression.scala#L144.

This works in fixed-point Analyzer, because CTEs are inlined before CheckAnalysis. However, in single-pass Analyzer, we validate the subquery right after its validation, so CTERelationRef must output correct maxRows as well, based on the related CTERelationDef's maxRows.

Does this PR introduce any user-facing change?

There should be no changes to existing Catalyst behavior.
Added a flag to mitigate the potential edge-cases: spark.sql.cteRelationDefMaxRows.enabled.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Feb 27, 2025
@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/add-max-rows-field-to-cte-relation-ref branch from 3b1bb8c to c77e7fe Compare February 27, 2025 11:26
@vladimirg-db vladimirg-db changed the title [SPARK-51337][SQL] Add maxRows to CTERelationRef [SPARK-51337][SQL] Add maxRows to CTERelationDef and CTERelationRef Feb 27, 2025
@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/add-max-rows-field-to-cte-relation-ref branch from c77e7fe to 463c1a9 Compare February 27, 2025 14:14
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 88addf4 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants