feat(storage): add DocumentFactory.documentExists#5085
Conversation
Enables "create only if absent" flows to test existence without catching exceptions from openDocument. Splits off from #4206. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a small storage helper to let callers check whether a VFS-backed document already exists at a URI, avoiding the current “probe by calling openDocument and catching exceptions” pattern. This fits into workflow-core’s storage abstractions around VFS/Iceberg-backed documents.
Changes:
- Added
DocumentFactory.documentExists(uri: URI): Booleanfor thevfsscheme. - Implemented existence probing via Iceberg namespace resolution +
IcebergUtil.loadTableMetadata. - Explicitly rejects unsupported URI schemes with
UnsupportedOperationException.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5085 +/- ##
============================================
+ Coverage 43.12% 43.14% +0.01%
+ Complexity 2208 2207 -1
============================================
Files 1045 1045
Lines 40249 40260 +11
Branches 4252 4250 -2
============================================
+ Hits 17358 17369 +11
Misses 21817 21817
Partials 1074 1074
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extract resolveNamespace helper so createDocument, openDocument, and documentExists share one resourceType → namespace mapping. - Use catalog.tableExists via IcebergUtil instead of loadTableMetadata so transient catalog errors surface instead of becoming false negatives. - Tweak the unsupported-scheme message to mention "checking document existence" rather than "checking the document". - Add IcebergDocumentSpec cases for existing/fresh URIs and unsupported schemes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the single-use IcebergUtil.tableExists wrapper and calls catalog.tableExists directly via TableIdentifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the Scala documentExists helper: - Extract _resolve_namespace so create_document, open_document, and document_exists share one resource_type -> namespace mapping. - document_exists calls catalog.table_exists directly so transient catalog errors surface instead of becoming false negatives. - Add unit tests covering true/false catalog responses, unsupported resource type, and unsupported URI scheme. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Xinyuan Lin <xinyual3@uci.edu>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Also could we make Codecov 100%? |
…ng asymmetry Address review feedback on #5085: - Document `@throws` cases on `DocumentFactory.documentExists` (Scala) and add a `Raises:` block on `DocumentFactory.document_exists` (Python), so callers gating a `createDocument` call know what to catch. - Note in Python `_resolve_namespace` that only RESULT/STATE are mapped because CONSOLE_MESSAGES and RUNTIME_STATISTICS are written exclusively from the Scala runtime; the asymmetry vs. the Scala helper is intentional. - Split the combined documentExists assert in `IcebergDocumentSpec` into two `it should` blocks ("true for a created URI" / "false for a never-created URI") so a regression names the failing scenario.
… of resolveNamespace The original `IcebergDocumentSpec` only exercised the RESULT and STATE branches of `DocumentFactory.resolveNamespace`. Codecov flagged CONSOLE_MESSAGES and RUNTIME_STATISTICS as missing patch coverage. Add `documentExists` cases for both URI types so all four mapped resource-type branches are now exercised. The defensive `case _ =>` branch in `resolveNamespace` remains unreachable from any well-formed VFS URI (`VFSURIFactory.decodeURI` already validates resource types before reaching `resolveNamespace`); exercising it would require either reflection-based test plumbing or restructuring the lookup. Keeping it as defensive code for future enum additions.
…reflection After covering CONSOLE_MESSAGES and RUNTIME_STATISTICS, the match in `resolveNamespace` was still flagged as partial by jacoco because the defensive `case _ =>` (unreachable from any well-formed VFS URI) was never taken. Reflect into the private method and pass `null` to exercise the wildcard branch, asserting it throws IllegalArgumentException as documented. Keeps the defensive code intact while bringing branch coverage on `resolveNamespace` to 100%.
What changes were proposed in this PR?
Adds a
documentExists-style helper toDocumentFactoryin both the Scala and Python code paths, so callers can check whether an iceberg-backed document already exists at avfs://URI without catching exceptions fromopenDocument/open_document.DocumentFactory.documentExists(uri: URI): Boolean. Resolves theVFSResourceTypeto its iceberg namespace, then probes the catalog viaIcebergCatalogInstance.getInstance().tableExists(TableIdentifier.of(namespace, storageKey)). ThrowsUnsupportedOperationExceptionfor non-vfsURI schemes;IllegalArgumentExceptionfor unsupported resource types.DocumentFactory.document_exists(uri: str) -> bool. Same shape: probes viacatalog.table_exists(f"{namespace}.{storage_key}"); raisesNotImplementedError/ValueErrorsymmetrically.resolveNamespace(Scala) and_resolve_namespace(Python) socreateDocument,openDocument, and the new helper share one resource-type → namespace mapping in each language.Catalog.tableExistsrather thanloadTableMetadata:loadTableMetadatacatches every exception and returnsNone, so a transient catalog error would have surfaced as a false-negative "doesn't exist" answer.Catalog.tableExistsonly returnsfalseon actual not-found, and lets unexpected errors propagate.open_documentfrom a hard-coded"vfs"literal toVFSURIFactory.VFS_FILE_URI_SCHEMEaligns the three methods on the same scheme constant.Any related issues, documentation, discussions?
Closes: #5089
How was this PR tested?
sbt "WorkflowCore/Test/compile"— clean.sbt "WorkflowCore/testOnly *IcebergDocumentSpec"— 14/14 pass, including two new cases assertingdocumentExistsreturns true aftercreateDocument, false on a fresh URI, and throwsUnsupportedOperationExceptionfor an unsupported scheme.sbt "WorkflowCore/testOnly *IcebergUtilSpec"— 13/13 pass (refactor did not touchIcebergUtil).pytest amber/src/test/python/core/storage/test_document_factory.py— 11/11 pass, including four new cases coveringdocument_existsreturning true/false based oncatalog.table_exists, raisingValueErroron an unsupported resource type, and raisingNotImplementedErroron an unsupported scheme.ruff checkclean ondocument_factory.pyandtest_document_factory.py.Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.7 in compliance with ASF.