Skip to content

Add DocumentFactory existence check (documentExists / document_exists) #5089

@aglinxinyuan

Description

@aglinxinyuan

Feature Summary

DocumentFactory (in both common/workflow-core Scala and amber Python) exposes createDocument / openDocument (and the snake_case Python equivalents), but no way to ask whether a document already exists at a given vfs:// URI without trying to open it.

Today, code that wants a create-only-if-absent flow has to call openDocument inside a try/catch and inspect the failure — which conflates "the table doesn't exist" with "the catalog rejected the request for some other reason," and pays the cost of loading full table metadata just to answer a boolean.

Proposed Solution or Design

Add a new helper in both languages that performs a focused existence probe via the iceberg catalog's native tableExists API:

  • Scala: DocumentFactory.documentExists(uri: URI): Boolean
  • Python: DocumentFactory.document_exists(uri: str) -> bool

Behavior:

  • For vfs:// URIs: resolve the VFSResourceType to its iceberg namespace, then call Catalog.tableExists(TableIdentifier.of(namespace, storageKey)) (Scala) / catalog.table_exists(f"{namespace}.{storage_key}") (Python). Unexpected catalog errors propagate rather than being swallowed.
  • For unsupported schemes: throw UnsupportedOperationException / raise NotImplementedError.
  • For unsupported VFSResourceType values: throw IllegalArgumentException / raise ValueError.

While we're touching this, extract a private resolveNamespace / _resolve_namespace helper so createDocument, openDocument, and the new existence check share one resource-type → namespace mapping in each language, instead of three copies that can drift.

Affected Area

  • Storage / Metadata

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions