[SPARK-56714] [SQL] Remove __metadata_col metadata from aggregated access only columns#55667
[SPARK-56714] [SQL] Remove __metadata_col metadata from aggregated access only columns#55667mihailoale-db wants to merge 1 commit into
__metadata_col metadata from aggregated access only columns#55667Conversation
__metadata_col metadata from aggregated access only columns__metadata_col metadata from aggregated access only columns
a357193 to
52747c3
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Nice cleanup — the prior coupling looked accidental. I traced the only setter call site (NameScope.updateHiddenOutputProperties) and the only predicate call site (NameScope.getHiddenOutputCandidates); neither depends on aggregatedAccessOnly attributes also being metadata columns, so decoupling is sound and existing behavior is preserved.
Two small, optional follow-ups:
-
The doc on
AGGREGATED_ACCESS_ONLY(lines 179–185) still says "If set, this metadata column can only be accessed underAggregateExpression." After this PR the marker is not restricted to metadata columns — could you tweak the wording (e.g. "this attribute" / "this column") so the comment matches the new contract? -
There is a symmetric unit test for the analogous metadata key —
AnalysisSuite.scala:1761"SPARK-43293:__qualified_access_onlyshould be ignored in normal columns". An equivalent targeted test for__aggregated_access_only(e.g. that a non-metadata attribute marked viamarkAsAggregatedAccessOnly()does not satisfyisMetadataCol) would lock the new contract in directly, on top of the existing end-to-end SQL coverage.
52747c3 to
2ce41ca
Compare
2ce41ca to
2c07360
Compare
Addressed your comments. PTAL again. |
|
the proto failure is unrelated, thanks, merging to master/4.x/4.2! |
…ccess only columns ### What changes were proposed in this pull request? In this PR I propose to remove `__metadata_col` metadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver. ### Why are the changes needed? To make aggregated access only metadata semantically sound. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55667 from mihailoale-db/removemetadataforqualified. Authored-by: Mihailo Aleksic <mihailo.aleksic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ccess only columns ### What changes were proposed in this pull request? In this PR I propose to remove `__metadata_col` metadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver. ### Why are the changes needed? To make aggregated access only metadata semantically sound. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #55667 from mihailoale-db/removemetadataforqualified. Authored-by: Mihailo Aleksic <mihailo.aleksic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In this PR I propose to remove
__metadata_colmetadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver.Why are the changes needed?
To make aggregated access only metadata semantically sound.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.