Skip to content

[SPARK-56920][SQL][FOLLOWUP] Add CreateMetricView logical plan and pre-parse inputColumns#56010

Closed
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:SPARK-54119-followup
Closed

[SPARK-56920][SQL][FOLLOWUP] Add CreateMetricView logical plan and pre-parse inputColumns#56010
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:SPARK-54119-followup

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented May 20, 2026

What changes were proposed in this pull request?

Two refactors on top of SPARK-56920 that make the metric-view plan shape more amenable to downstream extension and simpler for resolvers.

1. Introduce CreateMetricView logical plan as the parser's return type.

  • Previously CreateMetricViewCommand doubled as both the parser output and the V1 runnable command. The V2 strategy pattern-matched on it for non-session catalogs, while the V1 path executed via .run().
  • Now the parser returns CreateMetricView (a UnaryCommand); for the session catalog ResolveSessionCatalog rewrites it to CreateMetricViewCommand (V1 runnable); for non-session v2 catalogs DataSourceV2Strategy continues to dispatch to CreateV2MetricViewExec.
  • This gives the parser a single, v1/v2-agnostic logical shape and frees CreateMetricViewCommand to be V1-execution-only.

2. Pre-parse YAML expressions into inputColumns on MetricViewPlaceholder.

  • MetricViewPlaceholder.desc: MetricView is replaced with inputColumns: Seq[InputColumn]. MetricViewPlanner.parseYAML now populates parsed Expression and column Metadata for each dimension/measure column. ResolveMetricView reads pre-parsed expressions directly instead of re-parsing from desc.select.
  • MetricViewPlanner.planWrite returns the descriptor alongside the placeholder (used only for property emission at CREATE time), so callers that need it don't have to recover it from the placeholder.

Why are the changes needed?

  • Splitting the parser-output logical plan from the runnable command is a widely adopted pattern in Spark — CreateViewCreateViewCommand, CreateTable → V1/V2 runnable commands, DropTableDropTableCommand/V2 drop, etc. Aligning metric views with this pattern lets future extensions (e.g., schema modes, temp/materialized variants) add fields to the logical plan without changing the runnable's shape, and gives downstream rules a single match target to dispatch from.
  • Carrying pre-parsed inputColumns on the placeholder gives a stable, analyzer-friendly representation and decouples the resolver from the YAML serde. The resolver no longer needs a ParserInterface field for re-parsing expressions, and the per-column metadata conversion happens once at planning time.

Does this PR introduce any user-facing change?

No. Internal refactor only.

How was this patch tested?

Existing test suites pass locally:

  • MetricViewV2CatalogSuite (31/31)
  • SimpleMetricViewSuite (19/19)
  • MetricViewFactorySuite (16/16)

Was this patch authored or co-authored using generative AI tooling?

Co-authored using Claude Code.

…e-parse inputColumns

### What changes were proposed in this pull request?

Two refactors on top of SPARK-54119 that make the metric-view plan shape
more amenable to downstream extension (e.g., TEMPORARY/MATERIALIZED
metric views) and simpler for resolvers.

**1. Introduce `CreateMetricView` logical plan as the parser's return type.**
   - Previously `CreateMetricViewCommand` doubled as both the parser output
     and the V1 runnable command. The V2 strategy pattern-matched on it for
     non-session catalogs, while the V1 path executed via `.run()`.
   - Now the parser returns `CreateMetricView` (a `UnaryCommand`); for the
     session catalog `ResolveSessionCatalog` rewrites it to
     `CreateMetricViewCommand` (V1 runnable); for non-session v2 catalogs
     `DataSourceV2Strategy` continues to dispatch to `CreateV2MetricViewExec`.
   - This gives the parser a single, v1/v2-agnostic logical shape and frees
     `CreateMetricViewCommand` to be V1-execution-only.

**2. Pre-parse YAML expressions into `inputColumns` on `MetricViewPlaceholder`.**
   - `MetricViewPlaceholder.desc: MetricView` is replaced with
     `inputColumns: Seq[InputColumn]`. `MetricViewPlanner.parseYAML` now
     populates parsed `Expression` and column `Metadata` for each
     dimension/measure column. `ResolveMetricView` reads pre-parsed
     expressions directly instead of re-parsing from `desc.select`.
   - `MetricViewPlanner.planWrite` returns the descriptor alongside the
     placeholder (used only for property emission at CREATE time), so
     callers that need it don't have to recover it from the placeholder.

### Why are the changes needed?

- Splitting the parser-output logical plan from the runnable command is a
  standard Spark pattern (cf. `CreateView` -> `CreateViewCommand`) and lets
  future extensions (e.g., schema modes, temp/materialized variants) add
  fields to the logical plan without changing the runnable's shape.
- Carrying pre-parsed `inputColumns` on the placeholder gives a stable,
  analyzer-friendly representation and decouples the resolver from the
  YAML serde. The resolver no longer needs a `ParserInterface` field for
  re-parsing expressions, and the per-column metadata conversion happens
  once at planning time.

### Does this PR introduce _any_ user-facing change?

No. Internal refactor only.

### How was this patch tested?

Existing test suites pass locally:
- `MetricViewV2CatalogSuite` (31/31)
- `SimpleMetricViewSuite` (19/19)
- `MetricViewFactorySuite` (16/16)

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using Claude Code.
@cloud-fan cloud-fan changed the title [SPARK-54119][SQL][FOLLOWUP] Add CreateMetricView logical plan and pre-parse inputColumns [SPARK-56920][SQL][FOLLOWUP] Add CreateMetricView logical plan and pre-parse inputColumns May 20, 2026
@cloud-fan
Copy link
Copy Markdown
Contributor Author

@cloud-fan cloud-fan force-pushed the SPARK-54119-followup branch from 2314f36 to 8d4dfc4 Compare May 21, 2026 02:15
@cloud-fan cloud-fan force-pushed the SPARK-54119-followup branch from 8d4dfc4 to 2371d30 Compare May 21, 2026 07:02
@cloud-fan
Copy link
Copy Markdown
Contributor Author

thanks for review, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in 37a442c May 21, 2026
cloud-fan added a commit that referenced this pull request May 21, 2026
…e-parse inputColumns

### What changes were proposed in this pull request?

Two refactors on top of SPARK-56920 that make the metric-view plan shape more amenable to downstream extension and simpler for resolvers.

**1. Introduce `CreateMetricView` logical plan as the parser's return type.**

- Previously `CreateMetricViewCommand` doubled as both the parser output and the V1 runnable command. The V2 strategy pattern-matched on it for non-session catalogs, while the V1 path executed via `.run()`.
- Now the parser returns `CreateMetricView` (a `UnaryCommand`); for the session catalog `ResolveSessionCatalog` rewrites it to `CreateMetricViewCommand` (V1 runnable); for non-session v2 catalogs `DataSourceV2Strategy` continues to dispatch to `CreateV2MetricViewExec`.
- This gives the parser a single, v1/v2-agnostic logical shape and frees `CreateMetricViewCommand` to be V1-execution-only.

**2. Pre-parse YAML expressions into `inputColumns` on `MetricViewPlaceholder`.**

- `MetricViewPlaceholder.desc: MetricView` is replaced with `inputColumns: Seq[InputColumn]`. `MetricViewPlanner.parseYAML` now populates parsed `Expression` and column `Metadata` for each dimension/measure column. `ResolveMetricView` reads pre-parsed expressions directly instead of re-parsing from `desc.select`.
- `MetricViewPlanner.planWrite` returns the descriptor alongside the placeholder (used only for property emission at CREATE time), so callers that need it don't have to recover it from the placeholder.

### Why are the changes needed?

- Splitting the parser-output logical plan from the runnable command is a widely adopted pattern in Spark — `CreateView` → `CreateViewCommand`, `CreateTable` → V1/V2 runnable commands, `DropTable` → `DropTableCommand`/V2 drop, etc. Aligning metric views with this pattern lets future extensions (e.g., schema modes, temp/materialized variants) add fields to the logical plan without changing the runnable's shape, and gives downstream rules a single match target to dispatch from.
- Carrying pre-parsed `inputColumns` on the placeholder gives a stable, analyzer-friendly representation and decouples the resolver from the YAML serde. The resolver no longer needs a `ParserInterface` field for re-parsing expressions, and the per-column metadata conversion happens once at planning time.

### Does this PR introduce _any_ user-facing change?

No. Internal refactor only.

### How was this patch tested?

Existing test suites pass locally:
- `MetricViewV2CatalogSuite` (31/31)
- `SimpleMetricViewSuite` (19/19)
- `MetricViewFactorySuite` (16/16)

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using Claude Code.

Closes #56010 from cloud-fan/SPARK-54119-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 37a442c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request May 21, 2026
…e-parse inputColumns

### What changes were proposed in this pull request?

Two refactors on top of SPARK-56920 that make the metric-view plan shape more amenable to downstream extension and simpler for resolvers.

**1. Introduce `CreateMetricView` logical plan as the parser's return type.**

- Previously `CreateMetricViewCommand` doubled as both the parser output and the V1 runnable command. The V2 strategy pattern-matched on it for non-session catalogs, while the V1 path executed via `.run()`.
- Now the parser returns `CreateMetricView` (a `UnaryCommand`); for the session catalog `ResolveSessionCatalog` rewrites it to `CreateMetricViewCommand` (V1 runnable); for non-session v2 catalogs `DataSourceV2Strategy` continues to dispatch to `CreateV2MetricViewExec`.
- This gives the parser a single, v1/v2-agnostic logical shape and frees `CreateMetricViewCommand` to be V1-execution-only.

**2. Pre-parse YAML expressions into `inputColumns` on `MetricViewPlaceholder`.**

- `MetricViewPlaceholder.desc: MetricView` is replaced with `inputColumns: Seq[InputColumn]`. `MetricViewPlanner.parseYAML` now populates parsed `Expression` and column `Metadata` for each dimension/measure column. `ResolveMetricView` reads pre-parsed expressions directly instead of re-parsing from `desc.select`.
- `MetricViewPlanner.planWrite` returns the descriptor alongside the placeholder (used only for property emission at CREATE time), so callers that need it don't have to recover it from the placeholder.

### Why are the changes needed?

- Splitting the parser-output logical plan from the runnable command is a widely adopted pattern in Spark — `CreateView` → `CreateViewCommand`, `CreateTable` → V1/V2 runnable commands, `DropTable` → `DropTableCommand`/V2 drop, etc. Aligning metric views with this pattern lets future extensions (e.g., schema modes, temp/materialized variants) add fields to the logical plan without changing the runnable's shape, and gives downstream rules a single match target to dispatch from.
- Carrying pre-parsed `inputColumns` on the placeholder gives a stable, analyzer-friendly representation and decouples the resolver from the YAML serde. The resolver no longer needs a `ParserInterface` field for re-parsing expressions, and the per-column metadata conversion happens once at planning time.

### Does this PR introduce _any_ user-facing change?

No. Internal refactor only.

### How was this patch tested?

Existing test suites pass locally:
- `MetricViewV2CatalogSuite` (31/31)
- `SimpleMetricViewSuite` (19/19)
- `MetricViewFactorySuite` (16/16)

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using Claude Code.

Closes #56010 from cloud-fan/SPARK-54119-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 37a442c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants