[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487
[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487chenwang-databricks wants to merge 9 commits intoapache:masterfrom
Conversation
a48673d to
6f2ac0b
Compare
245dc93 to
9190bf0
Compare
12905c0 to
8f62efc
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Thanks for extending metric views to v2 catalogs. The shape of the change is right — route through ViewCatalog.createView with PROP_TABLE_TYPE = METRIC_VIEW and a typed viewDependencies, and promote METRIC_VIEW to a first-class CatalogTableType. A few significant concerns:
1. v2 path forks instead of reusing V2ViewPreparation and changes behavior on IF NOT EXISTS / OR REPLACE. CreateMetricViewCommand.createMetricViewInV2Catalog is open-coded inside a v1 RunnableCommand and duplicates most of V2ViewPreparation.buildViewInfo, but with materially different behavior: allowExisting and replace are silently dropped on the v2 path (regression vs. v1, which honored ignoreIfExists), there's no viewExists short-circuit, no createOrReplaceView, no cross-type collision decoding (so a table sitting at the ident leaks ViewAlreadyExistsException instead of unsupportedCreateOrReplaceViewOnTableError), and the CharVarcharUtils / SchemaUtils.checkColumnNameDuplication / checkIndeterminateCollationInSchema / owner-stamping work that V2ViewPreparation does is missing. Refactoring this into a CreateV2MetricViewExec (or extending V2ViewPreparation with a viewDependencies / extraProperties hook) plus a DataSourceV2Strategy rule would close all of these gaps in one move.
2. Audit other tableType == VIEW gates for METRIC_VIEW. Promoting metric views to a distinct CatalogTableType opens up silent regressions wherever existing code branches on VIEW. Two I'm fairly sure about: CatalogTable.toJsonLinkedHashMap (interface.scala:670) gates View Text / View Original Text / View Schema Mode / View Catalog and Namespace / SQL Path on tableType == VIEW, so DESCRIBE TABLE EXTENDED on a metric view will lose all of those rows — visible regression vs. the pre-PR viewWithMetrics storage. And HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (lines 277, 297, 598, 838) only special-case VIEW, so a Hive-metastore-backed session-catalog metric view will go through the wrong branches on persist and read-back. Tests don't cover Hive-backed metric views, so this would only surface in production. Worth grepping tableType == CatalogTableType.VIEW and tableType == VIEW repo-wide and deciding which sites also need to accept METRIC_VIEW.
3. TableDependency.tableFullName is a single dot-joined string but the Javadoc claims a fixed three-part shape. This breaks down two ways: (a) MetricViewHelper.collectTableDependencies uses unquotedString for V1 sources, which yields a 2-part db.tbl whenever TableIdentifier.catalog is unset, and mkString(".") for V2 sources, which yields N-part names for catalogs with multi-level namespaces (e.g., Iceberg cat.db1.db2.tbl is 4 parts) — so the producer doesn't honor the contract; (b) any consumer parsing the string back via split/join loses arity and is ambiguous against quoted identifiers containing a literal .. Since this is brand-new public API, it's the right time to make the storage structural — e.g., TableDependency(String[] nameParts) — rather than committing to a string the producer can't always honor.
4. MetricViewPlanner.parseYAML raises SparkException.internalError for malformed user YAML. Replacing invalidLiteralForWindowDurationError is right, but internalError is the wrong category — a typo in the user's YAML body shouldn't surface as "Spark internal error, please contact support." Should be a QueryCompilationErrors / AnalysisException with a real error class.
Minor stuff (typed DTO sealing/mutability, the ViewInfo constructor's metric-view special case, missing tests for OR REPLACE / IF NOT EXISTS / V1-source dependency / multi-level-namespace V2 source / read-back through loadRelation) is in the inline comments below.
| Seq.empty | ||
| } | ||
|
|
||
| private def createMetricViewInV2Catalog( |
There was a problem hiding this comment.
This new method open-codes a v2 view-CREATE path inside a v1 RunnableCommand. The plain-view v2 path is already factored: V2ViewPreparation.buildViewInfo() (CreateV2ViewExec.scala:62-110) does almost exactly what this method's lines 109-138 do — alias plan, schema, properties, currentCatalog/Namespace, sqlConfigs, schemaMode, queryColumnNames, comment — and CreateV2ViewExec adds the IF NOT EXISTS / OR REPLACE / cross-type-collision handling.
Would you consider refactoring this into either (a) a CreateV2MetricViewExec extends V2ViewPreparation exec invoked by a DataSourceV2Strategy rule that matches CreateMetricViewCommand on a non-session catalog, or (b) extending V2ViewPreparation with a viewDependencies / extraProperties hook that CreateV2ViewExec ignores and a metric-view subclass populates? Either way the metric-view path picks up the same flag handling and validation as plain v2 views without copy-paste.
| .withViewDependencies(deps) | ||
| comment.foreach(builder.withComment) | ||
|
|
||
| viewCatalog.createView(ident, builder.build()) |
There was a problem hiding this comment.
allowExisting and replace are silently ignored here:
CREATE VIEW IF NOT EXISTS … WITH METRICSon a v2 catalog will throwViewAlreadyExistsExceptioninstead of being a no-op — regression vs. v1 (createMetricViewInSessionCatalogpassesignoreIfExists = allowExisting).CREATE OR REPLACE VIEW … WITH METRICSwill likewise throw instead of replacing.- If a table sits at the ident (mixed
RelationCatalog), theViewAlreadyExistsExceptionfromRelationCatalog.createViewwill leak as a generic "already exists" instead of the dedicatedunsupportedCreateOrReplaceViewOnTableErrorthatCreateV2ViewExec's catch block (CreateV2ViewExec.scala:153-169) decodes viatableExists.
CreateV2ViewExec.run shows the full pattern: viewExists short-circuit on IF NOT EXISTS, then createOrReplaceView vs createView, with the catch block handling the table-collision case. None of the three paths is covered by the new test suite today — see comment there.
| // goes through `ViewHelper.prepareTable` with `isMetricView = true`). | ||
| val aliasedSchema = ViewHelper | ||
| .aliasPlan(sparkSession, analyzed, userSpecifiedColumns, retainMetadata = true) | ||
| .schema |
There was a problem hiding this comment.
Compared with V2ViewPreparation.buildViewInfo for plain v2 views, the v2 metric-view path is missing:
CharVarcharUtils.getRawSchema(...)on the aliased schema before passing it to the builder.SchemaUtils.checkColumnNameDuplication(...).SchemaUtils.checkIndeterminateCollationInSchema(query.schema).withOwner(CurrentUserContext.getCurrentUser)— v1 (prepareTable→CatalogTable) and plain v2 (CreateV2ViewExec) both stamp the owner, but here the capturedViewInfohas noPROP_OWNER.
All four are exactly what V2ViewPreparation.buildViewInfo already does — the cleanest fix is to share that code rather than re-add these one by one.
| val EXTERNAL = new CatalogTableType("EXTERNAL") | ||
| val MANAGED = new CatalogTableType("MANAGED") | ||
| val VIEW = new CatalogTableType("VIEW") | ||
| val METRIC_VIEW = new CatalogTableType("METRIC_VIEW") |
There was a problem hiding this comment.
Adding METRIC_VIEW as a separate CatalogTableType means every existing tableType == CatalogTableType.VIEW site needs to be audited for whether it should also accept METRIC_VIEW. Two visible regressions I noticed in unchanged code:
(a) CatalogTable.toJsonLinkedHashMap (line 670 of this file). The block emitting "View Text", "View Original Text", "View Schema Mode", "View Catalog and Namespace", and "SQL Path" is gated on tableType == CatalogTableType.VIEW. With this PR's change to tableType = METRIC_VIEW, all of those rows disappear from DESCRIBE TABLE EXTENDED output — a user-visible regression vs. the pre-PR VIEW + viewWithMetrics=true storage. Likely fix: change to if (tableType == VIEW || tableType == METRIC_VIEW).
(b) HiveExternalCatalog. Three sites only handle VIEW:
createTable(lines 277, 297) — empty-schema fallback for Hive-incompatible types.alterTable(line 598) — VIEW-specific alter path.restoreTableMetadata(line 838) — read-back falls intorestoreHiveSerdeTableinstead of restoring schema from view properties.
Tests use the in-memory catalog so the Hive path isn't exercised, but a Hive-metastore-backed session-catalog metric view (the realistic deployment) won't round-trip correctly.
Worth a repo-wide grep for tableType == CatalogTableType.VIEW / tableType == VIEW to enumerate the rest before merging.
| * The dependent table is identified by its fully-qualified three-part name | ||
| * in the form {@code catalog_name.schema_name.table_name}. | ||
| * | ||
| * @param tableFullName fully-qualified three-part table name |
There was a problem hiding this comment.
The "three-part name in the form catalog_name.schema_name.table_name" claim doesn't hold for catalogs with multi-level namespaces (e.g. Iceberg cat.db1.db2.tbl is 4 parts), and the producer in MetricViewHelper.collectTableDependencies doesn't always emit three parts either: V1 sources use TableIdentifier.unquotedString which yields 2 parts when the catalog component is unset, and V2 sources dot-join the namespace which yields N parts.
Since the dot-flattened string is also ambiguous against quoted identifiers containing a literal ., the cleanest fix is to make this DTO structural — e.g. record TableDependency(String[] nameParts) (and the same for FunctionDependency). Consumers can join the parts themselves if they need a flat string, but they can't recover arity if the join happens at storage time.
| String tableType = properties().get(TableCatalog.PROP_TABLE_TYPE); | ||
| if (!TableSummary.METRIC_VIEW_TABLE_TYPE.equals(tableType)) { | ||
| properties().put(TableCatalog.PROP_TABLE_TYPE, TableSummary.VIEW_TABLE_TYPE); | ||
| } |
There was a problem hiding this comment.
Embedding the metric-view-specific value in the generic ViewInfo constructor is awkward — it makes ViewInfo aware of one specific PROP_TABLE_TYPE value that lives in the metric-view code. BaseBuilder.withTableType already exists (TableInfo.java:147); the metric-view path could just call it after withProperties(...) and let this constructor remain the unconditional default:
private ViewInfo(Builder builder) {
super(builder);
...
properties().putIfAbsent(TableCatalog.PROP_TABLE_TYPE, TableSummary.VIEW_TABLE_TYPE);
}and in metricViewCommands.scala drop the viewProperties.put(...) for PROP_TABLE_TYPE and add .withTableType(TableSummary.METRIC_VIEW_TABLE_TYPE) after .withProperties(viewProperties). That way ViewInfo doesn't reference METRIC_VIEW_TABLE_TYPE at all, and adding future view kinds doesn't require touching this class.
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public interface Dependency { |
There was a problem hiding this comment.
The Javadoc claim "A dependency is one of: TableDependency or FunctionDependency" is only structurally enforced if the interface is sealed. As a plain interface on Java 17, third-party connectors can implement it and the "is one of" claim is no longer true.
| public interface Dependency { | |
| @Evolving | |
| public sealed interface Dependency permits TableDependency, FunctionDependency { |
(Or alternatively, drop the "is one of" line if you want to leave the interface open for future extension.)
| public record DependencyList(Dependency[] dependencies) { | ||
|
|
||
| public DependencyList { | ||
| Objects.requireNonNull(dependencies, "dependencies must not be null"); |
There was a problem hiding this comment.
Records auto-generate accessors that return field references, so DependencyList.dependencies() hands callers the live array — a downstream consumer can mutate the lineage list of any ViewInfo via this accessor.
Either defensively copy in the canonical constructor and the accessor, or expose List<Dependency> instead:
public DependencyList {
Objects.requireNonNull(dependencies, "dependencies must not be null");
dependencies = dependencies.clone();
}
public Dependency[] dependencies() {
return dependencies.clone();
}Same consideration applies to existing ViewInfo.queryColumnNames() / currentNamespace(), but those are pre-existing — at least worth doing it on the new types.
| * minimal [[RelationCatalog]] so the same instance can also host the source table referenced by | ||
| * the metric view's YAML. | ||
| */ | ||
| class MetricViewV2CatalogSuite extends QueryTest with SharedSparkSession { |
There was a problem hiding this comment.
Test gaps worth filling — especially the first three, which would have caught the silent-flag-ignore findings on the create path:
CREATE OR REPLACE VIEW … WITH METRICSon the v2 catalog (currently throwsViewAlreadyExistsExceptioninstead of replacing).CREATE VIEW IF NOT EXISTS … WITH METRICSon the v2 catalog (regression vs. v1; currently throws instead of no-op).CREATE VIEW … WITH METRICSagainst a v2 catalog where a table sits at the target ident — should produce the dedicated cross-type-collision error, not a genericViewAlreadyExistsException.- A V1 source dependency: a v2 metric view referencing a session-catalog table (
HiveTableRelation/LogicalRelation). Surfaces whattableFullName()actually contains for that case. - A multi-part-namespace V2 source: a metric view whose source identifier is
cat.ns1.ns2.tbl. Surfaces whatmkString(".")produces fortableFullName(). - Read-back:
SELECT * FROM <v2 metric view>(or at leastEXPLAIN/DESCRIBE EXTENDED) so theloadRelation→MetadataOnlyTable→V1Table.toCatalogTable(ViewInfo)round-trip is exercised end-to-end. The capture-only fixture pins what flows INTOcreateViewbut not what comes back OUT.
| * representation on other Spark platforms so consumers of the catalog see a | ||
| * consistent property layout. | ||
| */ | ||
| def getProperties: Map[String, String] = { |
There was a problem hiding this comment.
Worth noting in the Scaladoc that metric_view.from.sql is truncated to Constants.MAXIMUM_PROPERTY_SIZE and is therefore a descriptive value (for catalog browsers / lineage tooling), not a round-trippable representation of the source — a consumer relying on it to reconstruct the view body would silently get a truncated query for any view whose source SQL exceeds the limit.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
Add support for CREATE VIEW ... WITH METRICS and the corresponding V2 catalog round-trip on top of the ViewCatalog / RelationCatalog APIs introduced by SPARK-52729. The V2 metric-view create path now builds a ViewInfo and calls ViewCatalog.createView. Metric views are distinguished from plain views with PROP_TABLE_TYPE=METRIC_VIEW, which ViewInfo now preserves when callers intentionally set it. This avoids mutating ViewInfo properties after construction while keeping plain views defaulted to VIEW. ViewInfo gains a typed viewDependencies field so catalogs can persist structured table/function dependencies without flattening nested lineage into string properties. The metric-view planner collects direct table dependencies from the analyzed source plan and passes them through this field. Also add first-class CatalogTableType.METRIC_VIEW / TableSummary.METRIC_VIEW_TABLE_TYPE support, V2-to-V1 conversion for MetricView ViewInfo rows, drop-command parity for metric views, metric_view.* descriptive properties, and focused V1/V2 tests for dependency extraction, ViewInfo payloads, DROP VIEW routing, and user-specified column metadata preservation.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
…ursion, scalastyle Address four CI failures introduced by the v2 metric-view changes on this PR: 1. `SparkOperation.tableTypeString` (hive-thriftserver) did not know how to map the new `CatalogTableType.METRIC_VIEW` to a Hive JDBC table type, causing `SparkMetadataOperationSuite` `GET_TABLE_TYPES` to fail with "Unknown table type is found: CatalogTableType(METRIC_VIEW)". Map `METRIC_VIEW` to `"VIEW"` on the JDBC wire (Hive only knows `TABLE` / `VIEW`). 2. `HiveClientImpl.toHiveTableType` (hive) had the same gap, breaking 19 `HiveMetricViewSuite` cases that go through HMS persistence on the V1 session-catalog path. Map `METRIC_VIEW` to `HiveTableType.VIRTUAL_VIEW`. 3. `MetricViewRecordingCatalog.loadRelation` in `MetricViewV2CatalogSuite` recursed infinitely. `class MetricViewRecordingCatalog extends InMemoryTableCatalog with RelationCatalog` linearizes `RelationCatalog` ahead of `InMemoryTableCatalog`, so `super.loadTable` resolved to `RelationCatalog.loadTable`, whose default delegates back to `loadRelation`. Qualify the super call to `super[InMemoryTableCatalog]` to dispatch directly to the table-catalog implementation. 4. Scalastyle import-order violation in `DataSourceV2Strategy.scala`: `metricview.serde.MetricViewFactory` was placed before `execution.datasources.*`. Reorder to satisfy the Scalastyle import-order rule. Signed-off-by: chen.wang <chen.wang@databricks.com>
b7d086f to
269056a
Compare
…ble / loadTableOrView Adapt `MetricViewV2CatalogSuite` to the renames in SPARK-56655, which closed out the still-`Evolving` v2 view API surface introduced by SPARK-52729: - `RelationCatalog` -> `TableViewCatalog` - `MetadataOnlyTable` -> `MetadataTable` - `loadRelation` -> `loadTableOrView` - `ViewCatalog` gained a new abstract `renameView(oldIdent, newIdent)` Updates: - Imports: drop `MetadataOnlyTable`, `RelationCatalog`; add `MetadataTable`, `TableViewCatalog`. - `MetricViewRecordingCatalog`: mix in `TableViewCatalog` instead of `RelationCatalog`; rename the override of `loadRelation` -> `loadTableOrView`, and `MetadataOnlyTable` -> `MetadataTable`. The qualified `super[InMemoryTableCatalog].loadTable(ident)` recursion-bypass continues to work (the default `loadTable` derived from `TableViewCatalog` still delegates to `loadTableOrView`, hence the bypass is still required). - Add a `renameView(oldIdent, newIdent)` implementation: moves the `views` map entry, the `MetricViewRecordingCatalog.capturedViews` entry, and rejects via `ViewAlreadyExistsException` / `NoSuchViewException` per the `TableViewCatalog` contract. - Update doc comments and a few test names that mentioned the old API. Signed-off-by: chen.wang <chen.wang@databricks.com>
…-trip
Five fixes to recover the metric-view test suites after the latest rebase
on apache/spark master:
1. `MetricViewV2CatalogSuite.withTestCatalogTables`: a few negative tests
pre-create a regular table at `mv` to exercise cross-type collisions,
but the previous cleanup only did `DROP VIEW IF EXISTS mv`, leaving the
table behind to poison every subsequent test in the suite (cleanup
raised `WRONG_COMMAND_FOR_OBJECT_TYPE` because mv was a TABLE, not a
VIEW). Sweep `mv` as both view and table in cleanup, wrapped in `Try`
so the symmetric "wrong command for type" errors don't escape.
2. Test assertions on `info.queryText() === yaml`: the captured queryText
includes the `\n` between the SQL `$$` markers and the YAML body, but
`MetricViewFactory.toYAML(...)` doesn't emit surrounding newlines. Trim
both sides before the equality check (3 sites).
3. `MISSING_CATALOG_ABILITY` assertion: SPARK-56655 added the `.VIEWS`
subclass; the bare error condition no longer surfaces directly. Update
the assertion to expect `MISSING_CATALOG_ABILITY.VIEWS`.
4. `MetricViewRecordingCatalog.{createView, replaceView}`: the
`TableViewCatalog` active-rejection contract requires `createView` to
throw `ViewAlreadyExistsException` (and `replaceView` to throw
`NoSuchViewException`) when a *table* sits at the ident, not just when
a view does. The previous override only checked `views`, so the
pre-table-then-create-view test never raised the cross-type collision
`CreateV2MetricViewExec` is supposed to decode into
`EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`. Add a tables-only existence
probe (bypassing `TableViewCatalog.tableExists`'s default, which would
recurse through our own `loadTableOrView`) and reject up front.
5. `HiveExternalCatalog` HMS round-trip for metric views: `HiveTableType`
has no `METRIC_VIEW` variant -- both regular views and metric views
serialize as `VIRTUAL_VIEW`, and `HiveClientImpl.getTableOption` always
maps `VIRTUAL_VIEW` back to `CatalogTableType.VIEW`, dropping the
metric-view distinction on read. Result: `HiveMetricViewSuite` (which
reuses the same v1 `MetricViewSuite` against
`HiveExternalCatalog`) saw the YAML body parsed as SQL, raising
`PARSE_SYNTAX_ERROR` on every test.
Persist a `view.subType = METRIC_VIEW` property on write
(`tableMetaToTableProps`) and lift `tableType` back to `METRIC_VIEW`
on read (`restoreTableMetadata`) when the marker is present. Property
name follows the existing `view.*` convention so it round-trips
through HMS like other view metadata.
Also factored the `metric_view.*` column metadata preservation across
user-specified renames into a hookable `retainColumnMetadata: Boolean`
on `V2ViewPreparation` (defaults to `false`); `CreateV2MetricViewExec`
overrides it to `true`. Mirrors what
`ViewHelper.prepareTable(isMetricView = true)` already does on the v1
path, fixing the `key not found: metric_view.type` failure on the v2
"user-specified column names with comments preserve metric_view.*
metadata" test.
Signed-off-by: chen.wang <chen.wang@databricks.com>
…tructField Add three small public APIs that let v2 catalog connectors (such as the Unity Catalog Spark connector built on TableViewCatalog) round-trip a Spark StructField through external storage without reaching for private[sql] helpers or the singleton-StructType wrap workaround: - StructField.json / StructField.prettyJson: public counterparts of DataType.json / DataType.prettyJson, exposing the existing private[sql] jsonValue. - StructField.fromJson(String): companion-object parser that mirrors DataType.fromJson and is the inverse of StructField.json. - Column.fromStructField(StructField): static factory in the catalog Column interface that maps a Spark StructField (with metadata) into a connector Column (with metadataInJSON), symmetric to TableInfo.schema() which already goes the other way via CatalogV2Util.v2ColumnsToStructType. Widens DataType.parseStructField from private to private[sql] so the new StructField.fromJson companion can call it directly. Tests: - DataTypeSuite covers StructField round-trip via .json / .fromJson with metric_view-style metadata + comment, and an empty-metadata field. - CatalogV2UtilSuite covers Column.fromStructField for both the metadata-bearing and empty-metadata cases. Signed-off-by: chen.wang <chen.wang@databricks.com>
cloud-fan
left a comment
There was a problem hiding this comment.
Round 2 review.
Status: 13 prior findings addressed (4 main concerns + 9 inline), 0 remaining, 4 new (3 newly introduced in commits since the prior review at 8f62efc6, 1 late catch). The shared V2ViewPreparation / V2CreateViewPreparation refactor reads cleanly, the typed Dependency / DependencyList API replaces the dot-joined-string form, the INVALID_METRIC_VIEW_YAML error is in place, and the tableType == VIEW audit covers most sites. Four follow-ups inline.
| alternative = "DROP VIEW" | ||
| ) | ||
| case o if o != CatalogTableType.VIEW && isView => | ||
| case o if o != CatalogTableType.VIEW && o != CatalogTableType.METRIC_VIEW && isView => |
There was a problem hiding this comment.
verifyAlterTableType at line 1086 of this file was missed in the audit. With tableType == METRIC_VIEW:
ALTER TABLE <metric_view> RENAME TO ...falls through both arms (line 1092case CatalogTableType.VIEW if !isViewand line 1096case o if o != CatalogTableType.VIEW && isView) and silently renames the metric view as if it were a regular table — regression vs. the pre-PRviewWithMetricsstorage where the same call threwcannotAlterViewWithAlterTableError.ALTER VIEW <metric_view> RENAME TO ...matchescase o if o != CatalogTableType.VIEW && isViewand throwscannotAlterTableWithAlterViewError("cannot alter table with alter view") — wrong category for a view-kind.
Same fix pattern as the DROP audit on this hunk: widen both cases at line 1092 / 1096 to also accept METRIC_VIEW.
| * | ||
| * @since 4.2.0 | ||
| */ | ||
| def json: String = compact(render(jsonValue)) |
There was a problem hiding this comment.
The most recent commit (d55143f5b6184d2638e7fde4dae1fc29040b9509) adds three new public APIs (StructField.json, StructField.fromJson, Column.fromStructField) and widens DataType.parseStructField from private to private[sql]. None of them are called by code in this PR — verified by grep, the only call sites are the tests added in the same commit. Per the commit message they exist to support a downstream Unity Catalog connector.
Public API additions to org.apache.spark.sql.types and org.apache.spark.sql.connector.catalog warrant a dedicated PR with its own JIRA ticket so they can be reviewed independently of metric-view support. Could you split this commit out?
| String comment = field.getComment().isDefined() ? field.getComment().get() : null; | ||
| String metadataJson = field.metadata().equals(Metadata.empty()) | ||
| ? null | ||
| : field.metadata().json(); |
There was a problem hiding this comment.
This diverges from the canonical inverse CatalogV2Util.structFieldToV2Column (CatalogV2Util.scala:762), which strips the "comment" key from f.metadata().json() before stamping it as metadataInJSON — because the comment is exposed separately via Column.comment().
Here, when field was built with withComment(...), the "comment" key sits inside field.metadata() and gets written into metadataInJSON as well, while also being lifted into comment. Net effect: the comment is duplicated. Round-trip via v2ColumnToStructField happens to recover, but any downstream consumer reading metadataInJSON will see an unexpected "comment" entry.
Mirror structFieldToV2Column and strip "comment" before serializing — using a MetadataBuilder to drop the key, then null out the result if it becomes empty. That also matches the existing function's behavior of returning null metadataInJSON for fields whose only metadata was the comment.
| // a SQL string, so it can't ride along as a regular `query` `LogicalPlan` field on the | ||
| // logical command the way `CreateView` does. | ||
| val analyzed = MetricViewHelper.analyzeMetricViewText( | ||
| session, ident.asTableIdentifier, originalText) |
There was a problem hiding this comment.
ident.asTableIdentifier throws requiresSinglePartNamespaceError whenever ident has a multi-level namespace. Plain v2 views (CreateV2ViewExec, this file:324) carry Identifier directly and support multi-level namespace targets, but v2 metric views fail here because MetricViewHelper.analyzeMetricViewText requires a TableIdentifier.
The dependency-extraction multi-level test only exercises the source (testcat.ns_a.ns_b.events_deep); the metric view's own ident is always single-level (testcat.ns.mv) — a CREATE at cat.db1.db2.mv would fail here. analyzeMetricViewText only uses the name to (a) build a synthetic CatalogTable for analysis context and (b) feed name.nameParts into verifyTemporaryObjectsNotExists, both of which could take Seq[String] directly.
Either lift the helper to multi-part nameParts or call out the single-level-namespace constraint as a known limitation of the v2 metric-view path.
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
What changes were proposed in this pull request?
This PR extends metric-view support to DS v2 catalogs by routing
CREATE VIEW ... WITH METRICSthrough theViewCatalog/RelationCatalogAPIs introduced by SPARK-52729. Third-party v2 catalogs that implementViewCatalogcan now host metric views with the same metadata fidelity as session-catalog metric views.1. V2 metric-view CREATE path -- shared with
CreateV2ViewExec. A newCreateV2MetricViewExecandCreateV2ViewExecboth extend a newV2CreateViewPreparationtrait (which itself extendsV2ViewPreparation). The trait owns the shared CREATE-siderun():viewExistsshort-circuit onIF NOT EXISTS,createOrReplaceViewforOR REPLACE, and cross-type collision decoding (ViewAlreadyExistsException->tableExists->EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-modeUNSUPPORTED, typedviewDependencies,PROP_TABLE_TYPE = METRIC_VIEW) via two optional hooks onV2ViewPreparation.DataSourceV2StrategyinterceptsCreateMetricViewCommandon a non-session catalog and routes to the new exec; the v1 session-catalog path stays inCreateMetricViewCommand.run.2. First-class
METRIC_VIEWtable type.CatalogTableType.METRIC_VIEWis added alongsideEXTERNAL/MANAGED/VIEW.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant for the V2 surface.view.viewWithMetricsproperty hack is removed;CatalogTable.isMetricViewcheckstableType == METRIC_VIEWdirectly.V1Table.summarizeTableTypeandV1Table.toCatalogTable(catalog, ident, ViewInfo)translate between the V2 property form and the V1 enum.3. Repo-wide
tableType == VIEWaudit. Promoting metric views to a distinctCatalogTableTypeopens silent regressions wherever existing code branches onVIEW. 14 sites updated to acceptMETRIC_VIEWalongsideVIEW:CatalogTable.toJsonLinkedHashMap(DESCRIBE EXTENDED rows),HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}(4 sites for Hive-metastore-backed metric views).SessionCatalog.isView,InMemoryCatalog.listViews,RelationResolution(streaming-on-view rejection),Analyzer.lookupTableOrView(ResolvedPersistentViewwrapping),rules.scala(saveDataIntoView),DataStreamWriter(streaming write),DescribeRelationJsonCommand,AnalyzeColumnCommand,AnalyzePartitionCommand,CommandUtils.analyzeTable, and 3 sites intables.scala(CREATE TABLE LIKEprovider lookup,ALTER TABLE ADD COLUMNS,DESCRIBE PARTITION).SHOW CREATE TABLEon a metric view (no round-trippableCREATE VIEW ... WITH METRICSform yet) throws the dedicatedUNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEWerror rather than falling through to the VIEW branch and emitting a misleadingCREATE VIEW ...that loses the metric-view kind.4. Drop-command parity.
DropTableCommand(v1 path) treats bothVIEWandMETRIC_VIEWas views:DROP TABLErejects either withwrongCommandForObjectTypeError, andDROP VIEWaccepts either.V2SessionCatalog.dropTableInternalextends the existing "view rejected fromDROP TABLE" guard to coverMETRIC_VIEW.DropTableExec's existingviewExistsfallback already throwsEXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVEwhen a view sits at aDROP TABLEtarget -- this works unchanged for metric views sinceRelationCatalog's defaultviewExistsderives fromloadRelationand recognizesMetadataOnlyTable + ViewInfo.ResolveSessionCatalog'sDropViewrouting comment is clarified: v2 metric views fall through toDataSourceV2StrategyandViewCatalog.dropView.5. Typed view dependencies (
ViewInfo.viewDependencies).org.apache.spark.sql.connector.catalog:Dependency(sealed interface withDependency.table(String...)/Dependency.function(String...)factories),TableDependency,FunctionDependency,DependencyList(Dependency[]).TableDependencyandFunctionDependencycarry the dependency identifier as structural multi-part name parts (record TableDependency(String[] nameParts)), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Icebergcat.db1.db2.tbl-> 4 parts) and V1 sources without a catalog component (db.tbl-> 2 parts) round-trip without ambiguity against quoted identifiers containing literal..DependencyList's canonical constructor and accessor defensivelyclone()the array so consumers can't mutate storedViewInfodependency state.ViewInfogains aviewDependenciesfield and aViewInfo.Builder.withViewDependencies(...)setter.MetricViewHelper.collectTableDependencieswalks the analyzed plan and emits structuralSeq[Seq[String]]parts (V1 sources viaTableIdentifier.nameParts; V2 sources via(catalog.name +: namespace) :+ name).6.
metric_view.*descriptor properties.MetricView.getPropertiesproduces canonical descriptive properties (metric_view.from.type,metric_view.from.name/metric_view.from.sql,metric_view.where) that the create path merges into the view's properties bag, so catalog browsers and tooling can inspect the source/filter without re-parsing the YAML body. Long values are truncated toConstants.MAXIMUM_PROPERTY_SIZE; the Scaladoc ongetPropertiescalls out thatmetric_view.from.sqlis therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.7.
ViewInfoconstructor cleanup. The metric-view-specificPROP_TABLE_TYPE = METRIC_VIEWspecial case is dropped from the genericViewInfoconstructor in favor ofproperties().putIfAbsent(...). Callers that want a more specific kind (e.g.METRIC_VIEW) callBaseBuilder.withTableType(...)beforebuild()-- exercised byCreateV2MetricViewExecvia the newV2ViewPreparation.tableTypehook.8.
ViewHelper.aliasPlan(retainMetadata). The user-specified-column-with-comment branch inaliasPlanpreviously dropped existing column metadata. A newretainMetadata: Boolean = falseparameter merges the analyzed attribute's metadata into the new comment metadata.ViewHelper.prepareTablepassesretainMetadata = isMetricView, preserving the per-columnmetric_view.type/metric_view.exprkeys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.9. Error classes.
INVALID_METRIC_VIEW_YAML(sqlState 42K0L).MetricViewPlanner.parseYAML's catch blocks now route throughQueryCompilationErrors.invalidMetricViewYamlErrorinstead ofSparkException.internalError, so a typo in the user's YAML body surfaces as a user-correctableAnalysisExceptionrather than "please contact support".UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW(sqlState 0A000) for the v1 session-catalog path.10. Misc.
MetricViewCanonical.parseSourceaccepts multipart identifiers (parseMultipartIdentifier) so 3-partcatalog.schema.tablesource references work asAssetSource.Why are the changes needed?
Before this PR, metric-view DDL only worked against the session catalog: the create path called
SessionCatalog.createTabledirectly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 (#51419) shippedViewCatalogandRelationCatalogas the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.Once metric views can live on a v2 catalog, two more constraints surface:
ViewCatalog.loadViewneeds to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this inPROP_TABLE_TYPE = METRIC_VIEWkeeps the distinction wire-compatible and letsV1Table.toCatalogTablereconstructCatalogTableType.METRIC_VIEWon the read path.DependencyListofTableDependency/FunctionDependencywith structuralString[] namePartslets catalogs persist the lineage as a first-class field with full fidelity.The remaining changes (drop-command parity,
aliasPlanmetadata retention,metric_view.*properties,parseMultipartIdentifier,tableType == VIEWaudit) are mechanical follow-ups that fall out of supporting metric views as a realCatalogTableTypeand as a v2 catalog citizen -- without them, basic operations likeDROP VIEW,DESCRIBE TABLE EXTENDED, orCREATE VIEW (a COMMENT 'c') WITH METRICS ...would silently degrade.Does this PR introduce any user-facing change?
Yes, both for end users and for catalog plugin developers:
End users:
CREATE VIEW <ident> WITH METRICS ...now works against any v2 catalog that implementsViewCatalog. Previously it was rejected withMISSING_CATALOG_ABILITY.VIEWSfor non-session catalogs.IF NOT EXISTSandOR REPLACEare honored on the v2 path (regression vs. v1 fixed).SELECT region, measure(count_sum) FROM <mv> ..., dropped withDROP VIEW, listed viaSHOW VIEWS, and described withDESCRIBE TABLE/DESCRIBE TABLE EXTENDED.DROP TABLEon a metric view throwsEXPECT_TABLE_NOT_VIEW.SHOW TABLESdoes not list metric views (perRelationCatalogcontract).SHOW CREATE TABLEon a metric view throws an explicit unsupported-feature error (no round-trippable form yet).CatalogTableType.METRIC_VIEWinstead ofCatalogTableType.VIEW + view.viewWithMetrics=true. Observable inDESCRIBE TABLE EXTENDED'sTyperow and thetableTypecolumn of thetablessystem table. SQL behavior is unchanged.DROP TABLE/DROP VIEWmismatch now mentionMETRIC_VIEWalongsideVIEW.INVALID_METRIC_VIEW_YAML(user-correctable) instead of "Spark internal error, please contact support".Catalog plugin developers:
org.apache.spark.sql.connector.catalog: sealed interfaceDependencywithpermits TableDependency, FunctionDependency, both records carryingString[] nameParts.DependencyList(Dependency[])with defensive copy. All@Evolving,@since 4.2.0.ViewInfogains a typedviewDependencies()accessor andViewInfo.Builder.withViewDependencies(...)setter.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant.CatalogTableType.METRIC_VIEWenum value (v1 surface).V2ViewPreparation(private toorg.apache.spark.sql.execution.datasources.v2) gains optionalviewDependencies/tableTypehooks.How was this patch tested?
MetricViewV2CatalogSuite-- 29 tests across 5 sections, all against an in-memoryViewCatalogtest fixture (MetricViewRecordingCatalog extends InMemoryTableCatalog with RelationCatalog):Section 1 -- CREATE-related (10 tests):
METRIC_VIEWtable type and view text viaViewInfo.metric_view.*descriptor properties + view context (currentCatalog/currentNamespace) + captured SQL configs.SQLSourceand comment.metric_view.type/metric_view.exprin column metadata.metric_view.*metadata (pins thealiasPlan(retainMetadata = true)fix).CREATE OR REPLACE VIEW ... WITH METRICSreplaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText,metric_view.where, dependencies).CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when the view exists (catalog never sees the secondcreateViewcall).CREATE VIEW ... WITH METRICSover a v2 table at the ident throwsEXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when a v2 table sits at the ident (v1 parity).CREATE VIEW ... WITH METRICSon a non-ViewCatalogcatalog fails withMISSING_CATALOG_ABILITY.VIEWS.Section 2 -- Dependency extraction (5 tests):
JOINcaptures both tables as 3-partnameParts.nameParts.testcat.ns_a.ns_b.events_deep) emits 4-partnameParts.Section 3 -- SELECT cases (5 tests, modeled on
MetricViewSuitepatterns):SELECT measure(count_sum) FROM <mv> GROUP BY regionreturns aggregated rows (exercises the fullloadRelation->MetadataOnlyTable(ViewInfo)->V1Table.toCatalogTable(ViewInfo)->ResolveMetricViewround-trip).SELECT measure(...) WHERE region = ...-- query-layer filter on top of the view.whereclause is applied (where = Some("count > 1")filters at view-resolution time).ORDER BY measure(...) DESC LIMIT 1over the metric view.Section 4 -- DESCRIBE cases (2 tests):
DESCRIBE TABLE EXTENDEDround-trips throughloadRelationand emits theView Text/Typerows (gated through theCatalogTable.toJsonLinkedHashMapVIEW gate, which now accepts METRIC_VIEW).DESCRIBE TABLE(non-EXTENDED) returns the aliased columns.Section 5 -- DROP / SHOW cases (7 tests):
DROP VIEWsucceeds on a v2 metric view.DROP VIEW IF EXISTSon a non-existent v2 metric view is a no-op.DROP TABLEon a v2 metric view throwsEXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE(and asserts the metric view is not deleted).DROP TABLE IF EXISTSon a v2 metric view also throws (IF EXISTSdoesn't silence the wrong-type error, v1 parity).SHOW CREATE TABLEon a v2 metric view throwsUNSUPPORTED_FEATURE.TABLE_OPERATIONwith operation "SHOW CREATE TABLE".SHOW TABLESdoes not list v2 metric views (asserts sourceeventsis included,mvis excluded).SHOW VIEWSlists v2 metric views.Existing session-catalog metric-view tests (
MetricViewSuite,SimpleMetricViewSuite) and v1 path tests pass unchanged.DDLSuiteandHiveDDLSuitehad theirtableTypesenumerations updated to includeMETRIC_VIEWin two assertion lists.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Sonnet 4.7)