Skip to content

feat(io): add ListableIO interface to replace reflect-based directory walking (#913)#917

Merged
zeroshade merged 5 commits into
apache:mainfrom
tanmayrauth:add-listable-io-interface
Apr 22, 2026
Merged

feat(io): add ListableIO interface to replace reflect-based directory walking (#913)#917
zeroshade merged 5 commits into
apache:mainfrom
tanmayrauth:add-listable-io-interface

Conversation

@tanmayrauth
Copy link
Copy Markdown
Contributor

@tanmayrauth tanmayrauth commented Apr 17, 2026

Why: Orphan cleanup's walkDirectory used reflect to reach into blob storage internals (extracting the Bucket field by name). This is fragile — it breaks if the struct changes, bypasses Go's type system, and panics on unexpected IO types.

This PR introduces a ListableIO interface with a WalkDir method so directory walking works through a proper type assertion. Both LocalFS and blobFileIO implement it. Orphan cleanup prefers ListableIO when available, falling back to the original reflect-based implementation for IO types that don't implement it yet. Includes Azure URI userinfo preservation and tests for local, cloud, and Azure paths.

Fixes: #913

@tanmayrauth tanmayrauth requested a review from zeroshade as a code owner April 17, 2026 05:49
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka Please help for the review.

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction's right, code is smaller, reflect is gone.

One real blocker - Azure URI fix, plus need the #916 coordination.

I think this should be merged after #916, after rebase, but before that - Azure URI problem need to be addressed

Comment thread io/gocloud/blob.go
Comment thread table/orphan_cleanup.go Outdated
func walkDirectory(fsys iceio.IO, root string, fn func(path string, info stdfs.FileInfo) error) error {
listable, ok := fsys.(iceio.ListableIO)
if !ok {
return fmt.Errorf("IO implementation %T does not support directory listing (does not implement ListableIO)", fsys)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it worth to inlcude root in error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done.

@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka Yes, the old reflect-based code had the same bug, but since we're introducing a proper interface this is the right time to fix it. Prefix reconstruction now preserves parsed.User when present. Added TestBlobFileIOWalkDirAzureURI to cover the abfs://container@account.dfs.core.windows.net/... case.

tanmayrauth added a commit to tanmayrauth/iceberg-go that referenced this pull request Apr 17, 2026
- Rename RemoveAll to DeleteFiles with ctx and ([]string, error) return
- Pin missing-file and error-shape semantics in interface doc
- Remove BulkRemovableIO caller wiring (deferred to apache#917)
- Restore walkDirectory fallback for non-ListableIO cloud backends
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka The BulkRemovableIO interface and all its caller wiring were originally part of #916, I've moved it to this PR.

Comment thread io/gocloud/blob.go Outdated
prefix += parsed.Host + "/"

return fs.WalkDir(bfs.Bucket, walkPath, func(path string, d fs.DirEntry, err error) error {
return fn(prefix+path, d, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of prefix+path should we use url.Join or an equivalent?

Comment thread io/gocloud/blob.go Outdated
Comment on lines +196 to +210
parsed, err := url.Parse(root)
if err != nil {
return fmt.Errorf("invalid URL %s: %w", root, err)
}

walkPath := strings.TrimPrefix(parsed.Path, "/")
if walkPath == "" {
walkPath = "."
}

prefix := parsed.Scheme + "://"
if parsed.User != nil {
prefix += parsed.User.String() + "@"
}
prefix += parsed.Host + "/"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pretty much just equivalent to something like:

	walkPath := strings.TrimPrefix(parsed.Path, "/")
	if walkPath == "" {
		walkPath = "."
	}

    parsed.Path = ""
    return fs.WalkDir(bfs.Bucket, walkPath, func(path string, d fs.DirEntry, err error) error {
		return fn(parsed.JoinPath(path).String(), d, err)
    })

Or am I missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it a much cleaner way.

Comment thread io/local.go Outdated
Comment on lines +54 to +56
if cleanRoot == "" {
cleanRoot = "."
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a defensive guard for the edge case root == "file://" (no path), where TrimPrefix yields "" and filepath.WalkDir("") returns an error. But a bare file:// with no path isn't a valid table location, so removed.

Comment thread table/orphan_cleanup.go Outdated
Comment on lines +311 to +314
listable, ok := fsys.(iceio.ListableIO)
if !ok {
return fmt.Errorf("IO implementation %T does not support directory listing for %s (does not implement ListableIO)", fsys, root)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fallback to the original impl? Does that make sense to do so? Or is there some other way to implement a fallback?

… walking (apache#913)

Add ListableIO optional interface with WalkDir method, implement it on
LocalFS and blobFileIO, and replace the reflect hack in orphan cleanup
with a type assertion.
Address review feedback:
- Preserve parsed.User (container@) in blob WalkDir prefix reconstruction
- Include root path in ListableIO type assertion error message
- Add test for Azure ADLS URI handling
Add BulkRemovableIO with DeleteFiles(ctx, paths) ([]string, error) and
wire bulk delete into deleteFiles() and PostCommit() with ctx
propagation, error fallthrough, and empty-paths guard.

Items deferred from apache#916 per review feedback.
- Use url.JoinPath instead of manual prefix construction in blob WalkDir
- Remove unnecessary empty root guard in LocalFS.WalkDir
- Add reflect-based fallback in walkDirectory for non-ListableIO impls
@tanmayrauth tanmayrauth force-pushed the add-listable-io-interface branch from cca8f1e to dee05d0 Compare April 17, 2026 21:07
…e#916

Remove BulkRemovableIO interface, DeleteFiles implementation, and all
bulk delete caller wiring from this PR. Keep walkDirectory fallback to
original reflect-based implementation for IO types that don't implement
ListableIO yet.
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@zeroshade Addressed all review feedback:

  • Removed BulkRemovableIO interface and all bulk delete wiring — deferred to feat(io): add BulkRemovableIO interface(#914) #916
  • walkDirectory prefers ListableIO when available, falls back to the original reflect-based implementation for IO types that don't implement it yet — no behavioral change for existing users
  • blobFileIO.WalkDir uses url.JoinPath and preserves Azure URI userinfo
  • Removed unnecessary empty cleanRoot guard from LocalFS.WalkDir

This PR is now self-contained: ListableIO interface + implementations on LocalFS and blobFileIO + fallback to original code. No dependency on #916.

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks!

@zeroshade zeroshade merged commit eb37c1a into apache:main Apr 22, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ListableIO interface to replace reflect-based directory walking

3 participants