feat(catalog): add PurgeTable as optional interface for physical file deletion#1104
Conversation
laskoviymishka
left a comment
There was a problem hiding this comment.
Shape looks right overall: PurgeableTable as an optional interface, CLI opt-in via type assertion, REST using purgeRequested=true, Hadoop delegating to RemoveAll. Splitting metadata drop from storage cleanup is the right factoring, and the shared WalkDir / BulkRemovableIO helper seems like a clean storage-agnostic base.
I’d hold merge for one safety issue though: today --purge can look successful while leaving files behind.
Three pieces combine badly:
PurgeTableFilesswallows file-deletion errors and returnsnil- catalog wrappers fall back to metadata-only drop on any non-
ErrNoSuchTableLoadTablefailure - the location walk only covers
Metadata().Location(), so files underwrite.data.path/write.metadata.pathcan be missed
Net effect: user runs --purge, sees success, and assumes storage is clean. That’s worse than the current metadata-only behavior, because at least today the limitation is explicit.
The tests don’t catch this yet: the SQL test only checks metadata.json is gone, CreateTable writes no data files, and the Glue/Hive mocks don’t exercise a filesystem path.
Before merge, I’d want:
- propagate errors from
PurgeTableFiles - remove the
LoadTablefallback-to-DropTable, or return something likeErrPurgeIncomplete - extend the SQL test to write real data files and assert the tree is gone
- either make the Glue/Hive tests exercise an in-memory FS, or drop them as purge-path coverage
- document what
PurgeableTableactually guarantees, including thewrite.data.path/write.metadata.pathlimitation for location-walk implementations
One smaller note: drop-then-purge means a purge failure leaves no easy retry path because the table no longer loads. PyIceberg does the same, so I’m not pushing to invert it, but it’s worth calling out in godoc.
Once those are covered, happy to take another pass.
|
|
||
| // PurgeTableFiles physically deletes all files under the table's warehouse location using WalkDir. | ||
| // It is best-effort — if interrupted, orphaned files may remain. | ||
| func PurgeTableFiles(ctx context.Context, tbl *table.Table) (err error) { |
There was a problem hiding this comment.
Walks Metadata().Location(), but tables can write outside that root via write.data.path / write.metadata.path (table/properties.go:27-28). PyIceberg/Java walk manifest entries instead, which catches that case.
At minimum document the limitation on PurgeableTable. Better, union in manifest-referenced paths that fall outside Location(). wdyt?
|
|
||
| return nil | ||
| }) | ||
| } |
There was a problem hiding this comment.
Every deletion error is silently dropped: the WalkDir err, the DeleteFiles err, every individual Remove. Function declares (err error) but always returns nil, so callers can't tell a no-op purge from success.
I'd collect with errors.Join and return. BulkRemovableIO also returns the successfully-deleted list, worth surfacing when len(deleted) < len(files).
| if err != nil { | ||
| if errors.Is(err, catalog.ErrNoSuchTable) { | ||
| return err | ||
| } |
There was a problem hiding this comment.
This (same in hive.go, sql.go) silently downgrades to metadata-only on any non-ErrNoSuchTable LoadTable failure: network blip, S3 403, corrupt metadata JSON. User asked for --purge, catalog entry vanishes, files stay, no signal.
I'd return the error. If the user wants metadata-only on load failure they can re-run drop table without the flag. At minimum a typed ErrPurgeIncomplete the CLI surfaces, not a silent fallback.
| s.ErrorIs(err, catalog.ErrNoSuchTable) | ||
|
|
||
| // The physical file should be gone | ||
| s.NoFileExists(metaFile) |
There was a problem hiding this comment.
Only checks the single metadata.json is gone. CreateTable against an empty table writes no data files, so the walk + bulk-delete path is never exercised. The test would still pass if PurgeTableFiles only removed metadata.json and silently failed on everything else.
I'd extend with an AppendTable commit so there are real data + manifest files, then assert the directory is empty.
| require.False(t, isConcurrentModificationException(errors.New("network timeout"))) | ||
| require.False(t, isConcurrentModificationException(nil)) | ||
| } | ||
|
|
There was a problem hiding this comment.
No FS plumbed in, so PurgeTableFiles never gets exercised. The test only asserts LoadTable + DropTable get called, which DropTable tests already cover. Same shape in hive_test.go:1207.
Either drop these or thread an in-memory FS so the purge path is real. As written they'd still pass if we deleted PurgeTableFiles entirely.
| CommitTransaction(ctx context.Context, commits []table.TableCommit) error | ||
| } | ||
|
|
||
| // PurgeableTable is an optional interface that catalogs can implement |
There was a problem hiding this comment.
Godoc could carry the contract callers actually need: purge is best-effort, file-deletion errors may or may not surface, and REST delegates server-side while SQL/Glue/Hive/Hadoop are client-side and may miss files outside the location root. Today a downstream implementor has no signal about what 'physical table deletion' guarantees.
|
All five concerns are valid and addressed in the updated patch.
Holding the push on this last point until we make a call on Option A vs B:
Let me know your preference on Option A or B, and I will push the updated patch. |
|
I think it's fine to touch table package, so lean to opt A |
|
Just pushed the update using Option A. getReferencedFiles is now exported as GetReferencedFiles on the Table struct, keeping the same logic with zero duplication. All review points are fully addressed. |
laskoviymishka
left a comment
There was a problem hiding this comment.
Thanks for the quick turnaround, all landed in shape.
Almost there. Two things before merge:
The big one is gc.enabled. Iceberg's spec gates data-file deletion on this table property - Java's CatalogUtil.dropTableData and PyIceberg both refuse to purge data files when it's false, because the same physical files may be shared with snapshotted clones or branched tables. As-is this PR will delete those files anyway and silently corrupt the other reader. The fix is small (read the property at the top of PurgeTableFiles), but it has to be there before this lands.
- The other is the union-dedup. normalizeURI only strips file://, so the LocalFS walk (bare OS paths) and the metadata side (file:/// URIs) end up as distinct keys in the set. Today the double-delete is masked by ErrNoSuchFile being swallowed, but the dedup intent is broken and cloud FSs will surface spurious errors as soon as their WalkDir scheme form differs from the metadata. There's already a normalizeFilePath helper in table/orphan_cleanup.go, i'd reuse that.
Plus a few smaller things in inline: the godoc/contract mismatch on "best-effort" vs. propagated errors (pick one), Hadoop's PurgeTable silently ignoring external write.data.path / write.metadata.path, the stray commented-out regex, and the unused purgeErr field on the test mock.
Fix the gc.enabled gate and the dedup, address the contract-vs-godoc choice, and this is good to land. Happy to take another pass once those are in.
|
|
||
| // PurgeTableFiles physically deletes all files under the table's warehouse location | ||
| // and any referenced files written outside the location root (e.g., via write.data.path | ||
| // or write.metadata.path properties). |
There was a problem hiding this comment.
We're deleting data files unconditionally here, but Iceberg's spec gates this on the gc.enabled table property — Java's CatalogUtil.dropTableData and PyIceberg both refuse to purge data when gc.enabled=false, because the same physical files may be referenced by snapshotted clones or branched tables. On a gc.enabled=false table this PR will silently corrupt the other reader. I'd read tbl.Metadata().Properties().Get("gc.enabled", "true") at the top of PurgeTableFiles and short-circuit (or at least skip the data-file portion) when it's false. wdyt?
| cat, | ||
| ), | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
This only strips file:// — so the LocalFS walk callback (which yields bare OS paths like /tmp/foo) and the metadata-side URIs (file:///tmp/foo) land in the set as two distinct keys. Today it's masked because the second Remove hits ErrNoSuchFile and we swallow it, but the union-dedup intent is broken, and for any cloud FS where WalkDir emits a different scheme form than the metadata we'll get spurious BulkRemovableIO errors. There's already a normalizeFilePath helper in table/orphan_cleanup.go that does full URL normalization across schemes — I'd reuse that instead of rolling our own here.
| } | ||
|
|
||
| // PurgeTableFiles physically deletes all files under the table's warehouse location | ||
| // and any referenced files written outside the location root (e.g., via write.data.path |
There was a problem hiding this comment.
The godoc calls this "best-effort" but errors.Join propagates every per-file failure and the catalogs wrap it as "dropped table but failed to purge files". That's not best-effort — that's fail-loud, with no retry path because the catalog row is already gone. Either match Java/PyIceberg (log per-file failures, return nil so the catalog drop stays the source of truth) or keep fail-loud and return a structured PurgeResult{CatalogDropped bool, FileErrors []error} so the caller can tell that the catalog state is durable. Right now the doc and the code disagree, which is the worst of both.
| return os.RemoveAll(tablePath) | ||
| } | ||
|
|
||
| func (c *Catalog) PurgeTable(ctx context.Context, identifier table.Identifier) error { |
There was a problem hiding this comment.
This delegates to os.RemoveAll(tablePath), which means any data written to write.data.path or write.metadata.path outside the table root just survives the purge. The interface godoc explicitly claims "files written outside the root… are deleted", so Hadoop is silently violating its own contract. I'd at minimum load the table and run the same union-walk + bulk-delete path the other catalogs use, or call out the caveat in the Hadoop-specific godoc.
| // files, statistics and partition-statistics paths (Puffin, etc.), and all paths reachable | ||
| // from current snapshots (manifest lists, manifests, data files). | ||
| func (t Table) getReferencedFiles(fs iceio.IO) (map[string]bool, error) { | ||
| func (t Table) GetReferencedFiles(fs iceio.IO) (map[string]bool, error) { |
There was a problem hiding this comment.
This is a public method on Table whose only caller is catalog/internal, takes an iceio.IO it'll panic-deref if the table has snapshots and fs is nil, and the returned map[string]bool has no documented contract about path normalization. I'd consider either folding this into a Table.PurgeFiles(ctx) that owns its own FS, or pushing it onto Metadata and documenting the nil-fs case. Exporting it as-is locks in an awkward surface for one internal caller.
Also worth a note: this walk doesn't include deletion-vector blob paths. With #1100 just merged, V3 tables with DVs will leak .dv files on stores where ListableIO isn't available.
| // ([\w-]{36}) -> UUID (36 characters, including hyphens) | ||
| // (?:\.\w+)? -> optional codec name | ||
| // \.metadata\.json -> file extension | ||
| // var tableMetadataFileNameRegex = regexp.MustCompile(`^(\d+)-([\w-]{36})(?:\.\w+)?\.metadata\.json`) |
There was a problem hiding this comment.
Stray commented-out tableMetadataFileNameRegex line that the diff left behind. All three of us stopped on it. Worth a quick clean.
| // File-deletion errors are propagated to the caller, but because the | ||
| // catalog entry is already removed at that point there is no automatic | ||
| // retry path. | ||
| type PurgeableTable interface { |
There was a problem hiding this comment.
The catalogs that intentionally satisfy this interface would benefit from a var _ catalog.PurgeableTable = (*Catalog)(nil) assertion (REST especially, since it's not exercised in the same test fixture). Cheap compile-time guard against someone renaming the method out from under the interface.
| mockCatalogForDrop | ||
| purgeCalled bool | ||
| purgeIdent table.Identifier | ||
| purgeErr error |
There was a problem hiding this comment.
purgeErr is declared on the mock but never used in a test case — I'd add a TestRunDropTablePurgeError that returns an error here and asserts the wrapping ("dropped table but failed to purge files") so the contract can't silently regress.
| s.Require().NoError(json.Unmarshal(metaBytes, &metaMap)) | ||
| metaMap["statistics"] = []any{ | ||
| map[string]any{ | ||
| "snapshot-id": int64(1), |
There was a problem hiding this comment.
Hard-coding "snapshot-id": int64(1) here is fragile — if it ever drifts from tbl.CurrentSnapshot().SnapshotID, the test will still pass, but for the wrong reason (the external file gets removed by the walk rather than by the referenced-files path, which is what this test is supposed to exercise). I'd read the snapshot id off the loaded table instead.
| err = purger.PurgeTable(ctx, ident) | ||
| } else { | ||
| output.Error(fmt.Errorf("catalog %s does not support purge", cat.CatalogType())) | ||
| osExit(1) |
There was a problem hiding this comment.
The purge branch uses osExit(1) (the swappable test hook), but the namespace branch right next to it still calls bare os.Exit(1). Functionally fine today, but it's the kind of thing that makes future test additions land on a confusing failure mode. I'd make both use osExit for consistency.
|
Thanks for the detailed review! I've pushed a new commit addressing all points:
|
laskoviymishka
left a comment
There was a problem hiding this comment.
Looks good overall. The catalog wiring and gc.enabled=false handling are much closer now.
One potential bug before merge: path normalization. The orphan-file path seems to use two different normalizers on each side of the comparison, so local file://... paths can be mismatched and valid files may be treated as orphans. I left the details inline, but I’d unify both sides on one url.Parse-based normalizer.
A few smaller things, mostly follow-ups / cleanup:
gc.enabled=falseshould probably only skip data files; Java still deletes manifests / metadata.- purge should include deleted data files too;
discardDeleted=trueis right for orphan detection, but not for table purge. - the purge-error test seems to cover a contract the current catalogs don’t actually expose.
- prefer
slog.WarnContextoverlog.Printfso callers can route logs.
Everything else I asked for looks addressed. Fix the normalization issue, and I’m happy to take another pass.
| // | ||
| // normalizeFilePath normalizes a file path for comparison, handling schemes, authorities, and separators. | ||
| // It also aligns file:// URIs and bare local file system paths so they normalize to the same format. | ||
| func normalizeFilePath(path string) string { |
There was a problem hiding this comment.
This silently eats the authority on RFC 8089 URIs like file://localhost/foo/bar.parquet, so localhost gets folded into the path. The downstream effect is that getReferencedFiles stores keys via this stripper while identifyOrphanFiles looks them up via normalizeFilePathWithConfig (which does not strip file:), so on a LocalFS where WalkDir returns file:///foo/x.parquet and the referenced set holds /foo/x.parquet, the file is classified as an orphan and deleted.
I'd parse with url.Parse for the file: scheme and take parsed.Path directly, then point both call sites at the same function. file:///foo and file://localhost/foo both land at /foo and we don't have to reason about trim ordering.
… deletion Fixes apache#1092 Add PurgeableTable as an optional interface so catalogs can support physical deletion of table files when dropping a table. Per catalog: - SQL: load table metadata, drop catalog entry, walk and delete all files under the table location via WalkDir + BulkRemovableIO fallback. - Glue: same pattern; falls back to metadata-only drop if table files are inaccessible (e.g. S3 permissions). - Hive: same pattern; falls back to metadata-only drop on load failure. - Hadoop: delegates to DropTable which already removes the directory tree. - REST: existing PurgeTable (purgeRequested=true) satisfies the interface automatically — no changes needed. Wires --purge flag into the CLI drop table command via type assertion against PurgeableTable; exits with an error if the catalog does not support client-side purge. No breaking changes to the Catalog.DropTable signature. Client-side purge is best-effort — if interrupted between catalog drop and file deletion, orphaned files may remain. For reliable cleanup, prefer expire-snapshots + remove-orphan-files. Signed-off-by: Mithil Girish <t.r.mithil@gmail.com>
- Export GetReferencedFiles in the table package so catalogs can
resolve all manifest-referenced storage paths without duplication.
- Update PurgeTableFiles to union the base Location() walk with all
active metadata, manifests, and snapshot entries. Files written to
custom write.data.path / write.metadata.path locations or external
statistics paths are now safely deleted.
- Collect all WalkDir, DeleteFiles, and Remove errors via errors.Join
and return them. os.IsNotExist is safely ignored.
- Remove the silent fallback to metadata-only drops in SQL, Glue, and
Hive catalogs. If loading or purging fails for non-ErrNoSuchTable
reasons, the error is returned immediately.
- Overhaul TestPurgeTable in the SQL catalog to write real Parquet
data via Arrow and inject an external stats.puffin file outside the
table Location() to verify complete physical cleanup.
- Remove non-functional Glue and Hive mock purge tests that lacked
filesystem backends.
- Expand PurgeableTable godoc to document the best-effort model,
type-assertion pattern, drop-then-purge retry limitation, and
client-side vs REST distinction.
Signed-off-by: Mithil Girish <t.r.mithil@gmail.com>
…tion
- Respect 'gc.enabled' table property to prevent shared file corruption
- Implement robust path deduplication using 'normalizeFilePath'
- Update error handling to log warnings instead of failing DropTable
- Update Hadoop catalog to trigger PurgeFiles before deleting root
- Add robust context cancellation (ctx.Err()) and concurrency safeguards
Signed-off-by: Mithil Girish <t.r.mithil@gmail.com>
…emantics - Fix path normalization using url.Parse for file: scheme so file:///foo and file://localhost/foo both normalize to /foo - Handle Windows RFC 8089 URIs (file:///C:/...) correctly - Scope gc.enabled=false to data files only; manifests and metadata are still deleted matching Java/PyIceberg behavior - Use discardDeleted=false in PurgeFiles so deleted data files are included in deletion set - Fail hard if getReferencedFiles errors to prevent partial purge - Replace log.Printf with slog.WarnContext for structured logging - Fix purge-error test to assert interface contract not impl detail Signed-off-by: Mithil Girish <t.r.mithil@gmail.com>
|
Pushed the final fixes — unified |
Fixes #1092
Summary
Adds
PurgeableTableas an optional interface so catalogs can supportphysical deletion of table files when dropping a table.
Today,
DropTableonly removes the catalog metadata entry — physicalfiles (Parquet, manifests, metadata JSON) remain on storage. This PR
closes that gap for client-side catalogs without making any breaking
changes to the
Catalog.DropTablesignature.Changes
Core
catalog/catalog.go— addPurgeableTableoptional interfacecatalog/internal/utils.go— addPurgeTableFilesshared helperusing
WalkDir+BulkRemovableIOfallback for storage-agnostic deletionPer-catalog
files under the table location via
WalkDir+BulkRemovableIOfallbackare inaccessible (e.g. S3 permissions)
DropTablewhich already removes the directory treePurgeTable(purgeRequested=true) satisfies theinterface automatically — no changes needed
CLI
cmd/iceberg/main.go— wire--purgeflag intodrop tablecommandvia type assertion against
PurgeableTable; exits with an error if thecatalog does not support client-side purge
Notes
Catalog.DropTablesignatureand file deletion, orphaned files may remain. For reliable cleanup,
prefer
expire-snapshots+remove-orphan-filesTesting
catalog/sql/sql_test.go— verifies physical file deletion on local storagecatalog/glue/glue_test.go— mock test for purge invocation and fallbackcatalog/hive/hive_test.go— mock test for purge invocation and fallbackcmd/iceberg/drop_test.go— CLI flag parsing and purge dispatch tests