feat(io): add ListableIO interface to replace reflect-based directory walking (#913)#917
Conversation
|
@laskoviymishka Please help for the review. |
| func walkDirectory(fsys iceio.IO, root string, fn func(path string, info stdfs.FileInfo) error) error { | ||
| listable, ok := fsys.(iceio.ListableIO) | ||
| if !ok { | ||
| return fmt.Errorf("IO implementation %T does not support directory listing (does not implement ListableIO)", fsys) |
There was a problem hiding this comment.
I think it worth to inlcude root in error.
|
@laskoviymishka Yes, the old reflect-based code had the same bug, but since we're introducing a proper interface this is the right time to fix it. Prefix reconstruction now preserves parsed.User when present. Added TestBlobFileIOWalkDirAzureURI to cover the abfs://container@account.dfs.core.windows.net/... case. |
- Rename RemoveAll to DeleteFiles with ctx and ([]string, error) return - Pin missing-file and error-shape semantics in interface doc - Remove BulkRemovableIO caller wiring (deferred to apache#917) - Restore walkDirectory fallback for non-ListableIO cloud backends
|
@laskoviymishka The BulkRemovableIO interface and all its caller wiring were originally part of #916, I've moved it to this PR. |
| prefix += parsed.Host + "/" | ||
|
|
||
| return fs.WalkDir(bfs.Bucket, walkPath, func(path string, d fs.DirEntry, err error) error { | ||
| return fn(prefix+path, d, err) |
There was a problem hiding this comment.
instead of prefix+path should we use url.Join or an equivalent?
| parsed, err := url.Parse(root) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid URL %s: %w", root, err) | ||
| } | ||
|
|
||
| walkPath := strings.TrimPrefix(parsed.Path, "/") | ||
| if walkPath == "" { | ||
| walkPath = "." | ||
| } | ||
|
|
||
| prefix := parsed.Scheme + "://" | ||
| if parsed.User != nil { | ||
| prefix += parsed.User.String() + "@" | ||
| } | ||
| prefix += parsed.Host + "/" |
There was a problem hiding this comment.
Isn't this pretty much just equivalent to something like:
walkPath := strings.TrimPrefix(parsed.Path, "/")
if walkPath == "" {
walkPath = "."
}
parsed.Path = ""
return fs.WalkDir(bfs.Bucket, walkPath, func(path string, d fs.DirEntry, err error) error {
return fn(parsed.JoinPath(path).String(), d, err)
})Or am I missing something?
There was a problem hiding this comment.
Yes, it a much cleaner way.
| if cleanRoot == "" { | ||
| cleanRoot = "." | ||
| } |
There was a problem hiding this comment.
It was a defensive guard for the edge case root == "file://" (no path), where TrimPrefix yields "" and filepath.WalkDir("") returns an error. But a bare file:// with no path isn't a valid table location, so removed.
| listable, ok := fsys.(iceio.ListableIO) | ||
| if !ok { | ||
| return fmt.Errorf("IO implementation %T does not support directory listing for %s (does not implement ListableIO)", fsys, root) | ||
| } |
There was a problem hiding this comment.
can we fallback to the original impl? Does that make sense to do so? Or is there some other way to implement a fallback?
… walking (apache#913) Add ListableIO optional interface with WalkDir method, implement it on LocalFS and blobFileIO, and replace the reflect hack in orphan cleanup with a type assertion.
Address review feedback: - Preserve parsed.User (container@) in blob WalkDir prefix reconstruction - Include root path in ListableIO type assertion error message - Add test for Azure ADLS URI handling
Add BulkRemovableIO with DeleteFiles(ctx, paths) ([]string, error) and wire bulk delete into deleteFiles() and PostCommit() with ctx propagation, error fallthrough, and empty-paths guard. Items deferred from apache#916 per review feedback.
- Use url.JoinPath instead of manual prefix construction in blob WalkDir - Remove unnecessary empty root guard in LocalFS.WalkDir - Add reflect-based fallback in walkDirectory for non-ListableIO impls
cca8f1e to
dee05d0
Compare
…e#916 Remove BulkRemovableIO interface, DeleteFiles implementation, and all bulk delete caller wiring from this PR. Keep walkDirectory fallback to original reflect-based implementation for IO types that don't implement ListableIO yet.
|
@zeroshade Addressed all review feedback:
This PR is now self-contained: ListableIO interface + implementations on LocalFS and blobFileIO + fallback to original code. No dependency on #916. |
Why: Orphan cleanup's walkDirectory used reflect to reach into blob storage internals (extracting the Bucket field by name). This is fragile — it breaks if the struct changes, bypasses Go's type system, and panics on unexpected IO types.
This PR introduces a ListableIO interface with a WalkDir method so directory walking works through a proper type assertion. Both LocalFS and blobFileIO implement it. Orphan cleanup prefers ListableIO when available, falling back to the original reflect-based implementation for IO types that don't implement it yet. Includes Azure URI userinfo preservation and tests for local, cloud, and Azure paths.
Fixes: #913