Feature Summary
DocumentFactory (in both common/workflow-core Scala and amber Python) exposes createDocument / openDocument (and the snake_case Python equivalents), but no way to ask whether a document already exists at a given vfs:// URI without trying to open it.
Today, code that wants a create-only-if-absent flow has to call openDocument inside a try/catch and inspect the failure — which conflates "the table doesn't exist" with "the catalog rejected the request for some other reason," and pays the cost of loading full table metadata just to answer a boolean.
Proposed Solution or Design
Add a new helper in both languages that performs a focused existence probe via the iceberg catalog's native tableExists API:
- Scala:
DocumentFactory.documentExists(uri: URI): Boolean
- Python:
DocumentFactory.document_exists(uri: str) -> bool
Behavior:
- For
vfs:// URIs: resolve the VFSResourceType to its iceberg namespace, then call Catalog.tableExists(TableIdentifier.of(namespace, storageKey)) (Scala) / catalog.table_exists(f"{namespace}.{storage_key}") (Python). Unexpected catalog errors propagate rather than being swallowed.
- For unsupported schemes: throw
UnsupportedOperationException / raise NotImplementedError.
- For unsupported
VFSResourceType values: throw IllegalArgumentException / raise ValueError.
While we're touching this, extract a private resolveNamespace / _resolve_namespace helper so createDocument, openDocument, and the new existence check share one resource-type → namespace mapping in each language, instead of three copies that can drift.
Affected Area
Feature Summary
DocumentFactory(in bothcommon/workflow-coreScala andamberPython) exposescreateDocument/openDocument(and the snake_case Python equivalents), but no way to ask whether a document already exists at a givenvfs://URI without trying to open it.Today, code that wants a create-only-if-absent flow has to call
openDocumentinside a try/catch and inspect the failure — which conflates "the table doesn't exist" with "the catalog rejected the request for some other reason," and pays the cost of loading full table metadata just to answer a boolean.Proposed Solution or Design
Add a new helper in both languages that performs a focused existence probe via the iceberg catalog's native
tableExistsAPI:DocumentFactory.documentExists(uri: URI): BooleanDocumentFactory.document_exists(uri: str) -> boolBehavior:
vfs://URIs: resolve theVFSResourceTypeto its iceberg namespace, then callCatalog.tableExists(TableIdentifier.of(namespace, storageKey))(Scala) /catalog.table_exists(f"{namespace}.{storage_key}")(Python). Unexpected catalog errors propagate rather than being swallowed.UnsupportedOperationException/ raiseNotImplementedError.VFSResourceTypevalues: throwIllegalArgumentException/ raiseValueError.While we're touching this, extract a private
resolveNamespace/_resolve_namespacehelper socreateDocument,openDocument, and the new existence check share one resource-type → namespace mapping in each language, instead of three copies that can drift.Affected Area