Skip to content

[SPARK-56714] [SQL] Remove __metadata_col metadata from aggregated access only columns#55667

Closed
mihailoale-db wants to merge 1 commit into
apache:masterfrom
mihailoale-db:removemetadataforqualified
Closed

[SPARK-56714] [SQL] Remove __metadata_col metadata from aggregated access only columns#55667
mihailoale-db wants to merge 1 commit into
apache:masterfrom
mihailoale-db:removemetadataforqualified

Conversation

@mihailoale-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

In this PR I propose to remove __metadata_col metadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver.

Why are the changes needed?

To make aggregated access only metadata semantically sound.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@mihailoale-db mihailoale-db changed the title Remove __metadata_col metadata from aggregated access only columns [SPARK-56714] [SQL] Remove __metadata_col metadata from aggregated access only columns May 4, 2026
@mihailoale-db mihailoale-db force-pushed the removemetadataforqualified branch from a357193 to 52747c3 Compare May 4, 2026 14:51
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup — the prior coupling looked accidental. I traced the only setter call site (NameScope.updateHiddenOutputProperties) and the only predicate call site (NameScope.getHiddenOutputCandidates); neither depends on aggregatedAccessOnly attributes also being metadata columns, so decoupling is sound and existing behavior is preserved.

Two small, optional follow-ups:

  1. The doc on AGGREGATED_ACCESS_ONLY (lines 179–185) still says "If set, this metadata column can only be accessed under AggregateExpression." After this PR the marker is not restricted to metadata columns — could you tweak the wording (e.g. "this attribute" / "this column") so the comment matches the new contract?

  2. There is a symmetric unit test for the analogous metadata key — AnalysisSuite.scala:1761 "SPARK-43293: __qualified_access_only should be ignored in normal columns". An equivalent targeted test for __aggregated_access_only (e.g. that a non-metadata attribute marked via markAsAggregatedAccessOnly() does not satisfy isMetadataCol) would lock the new contract in directly, on top of the existing end-to-end SQL coverage.

@mihailoale-db mihailoale-db force-pushed the removemetadataforqualified branch from 52747c3 to 2ce41ca Compare May 5, 2026 10:18
@mihailoale-db mihailoale-db force-pushed the removemetadataforqualified branch from 2ce41ca to 2c07360 Compare May 5, 2026 12:14
@mihailoale-db
Copy link
Copy Markdown
Contributor Author

Nice cleanup — the prior coupling looked accidental. I traced the only setter call site (NameScope.updateHiddenOutputProperties) and the only predicate call site (NameScope.getHiddenOutputCandidates); neither depends on aggregatedAccessOnly attributes also being metadata columns, so decoupling is sound and existing behavior is preserved.

Two small, optional follow-ups:

  1. The doc on AGGREGATED_ACCESS_ONLY (lines 179–185) still says "If set, this metadata column can only be accessed under AggregateExpression." After this PR the marker is not restricted to metadata columns — could you tweak the wording (e.g. "this attribute" / "this column") so the comment matches the new contract?
  2. There is a symmetric unit test for the analogous metadata key — AnalysisSuite.scala:1761 "SPARK-43293: __qualified_access_only should be ignored in normal columns". An equivalent targeted test for __aggregated_access_only (e.g. that a non-metadata attribute marked via markAsAggregatedAccessOnly() does not satisfy isMetadataCol) would lock the new contract in directly, on top of the existing end-to-end SQL coverage.

Addressed your comments. PTAL again.

@cloud-fan
Copy link
Copy Markdown
Contributor

the proto failure is unrelated, thanks, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in 3a6bc83 May 5, 2026
cloud-fan pushed a commit that referenced this pull request May 5, 2026
…ccess only columns

### What changes were proposed in this pull request?
In this PR I propose to remove `__metadata_col` metadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver.

### Why are the changes needed?
To make aggregated access only metadata semantically sound.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #55667 from mihailoale-db/removemetadataforqualified.

Authored-by: Mihailo Aleksic <mihailo.aleksic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 5, 2026
…ccess only columns

### What changes were proposed in this pull request?
In this PR I propose to remove `__metadata_col` metadata from aggregated access only columns. It's not semantically needed and it blocks metadata column resolution work in single-pass resolver.

### Why are the changes needed?
To make aggregated access only metadata semantically sound.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #55667 from mihailoale-db/removemetadataforqualified.

Authored-by: Mihailo Aleksic <mihailo.aleksic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants