fix(gocloud): support cross-bucket IO for write.metadata.path by cassio-paesleme · Pull Request #1078 · apache/iceberg-go

cassio-paesleme · 2026-05-13T21:33:21Z

Summary

The blob FileIO implementation assumes all file operations target a single S3 bucket (the warehouse bucket opened at init time). When Iceberg's write.metadata.path table property points to a different bucket (e.g. a dedicated versioned metadata bucket), defaultKeyExtractor strips the wrong bucket prefix and the file lands in the warehouse bucket under a mangled key. Readers following the absolute S3 URI in metadata.json get a 404.

Concrete failure mode: Setting write.metadata.path = s3://metadata-bucket/db/table/ causes manifest lists (snap-*.avro) and manifest files (*-m*.avro) to be written to s3://warehouse-bucket/metadata-bucket/db/table/snap-*.avro instead of s3://metadata-bucket/db/table/snap-*.avro. The metadata.json records the correct URI, but the bytes are in the wrong place.

Changes

Add resolveBucket() to blobFileIO which parses the full S3 URI and routes to the correct bucket
Primary bucket (warehouse) uses the fast path with no map lookup
Secondary buckets are opened lazily via a BucketOpener callback and cached with sync.RWMutex
Update Open, NewWriter, WriteFile, Remove, and DeleteFiles to use resolveBucket
Wire S3 scheme registration to pass a BucketOpener that reuses the same AWS config
Backward compatible: callers without a BucketOpener get the same legacy behavior

Test plan

TestMultiBucketWriteAndRead - writes to two memblob buckets via different S3 URIs, verifies files land in the correct bucket and can be read back
TestMultiBucketDelete - verifies Remove and DeleteFiles route to the correct bucket
TestMultiBucketOpenerCaching - verifies the opener is called once per bucket name
TestMultiBucketFallbackWithoutOpener - verifies backward compatibility when no opener is set
All existing tests pass unchanged

🤖 Generated with Claude Code

The blob FileIO implementation assumes all file operations target a single S3 bucket (the warehouse bucket). When Iceberg's write.metadata.path table property points to a different bucket, the defaultKeyExtractor strips the wrong prefix and the file lands in the warehouse bucket under a mangled key. Readers following the absolute URI in metadata.json get a 404 because the file is not where they expect it. Add multi-bucket support to blobFileIO: - Add resolveBucket() which parses the full S3 URI, routes to the primary bucket for same-bucket paths, and lazily opens secondary buckets via a BucketOpener callback for cross-bucket paths. - Update Open, NewWriter, WriteFile, Remove, and DeleteFiles to use resolveBucket instead of always using the embedded primary bucket. - Wire S3 scheme registration to use createMultiBucketBlobFS with a BucketOpener that reuses the same AWS config for secondary buckets. - Secondary bucket handles are cached and safe for concurrent use. - Backward compatible: callers that don't set BucketOpener get the same legacy behavior (key extraction falls through to primary bucket). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

laskoviymishka

Thanks for picking this up — write.metadata.path on a separate bucket is a real production gap. The routing shape with resolveBucket + per-bucket cache + BucketOpener callback also looks like the right direction.

I’d hold before merging though.

Main concern: the read path still doesn’t look fixed. blobOpenFile.ReadAt still goes to the primary bucket, so any cross-bucket Parquet / Avro / Puffin read will fail on the first range request. The new TestMultiBucketWriteAndRead doesn’t catch this because io.ReadAll uses Read, not ReadAt.

A few other things I’d sort out before merge:

WalkDir still hardcodes the primary bucket, so orphan-file enumeration over write.metadata.path is still wrong.
Cached secondary buckets are never closed, so long-running processes can leak HTTP transports.
The fix is S3-only. GCS and Azure still call createBlobFS, so they keep the original mangled-key bug. I’d either extend the fix to those backends or call out the S3-only scope clearly and file a follow-up.

One smaller thing I wouldn’t block on: TestMultiBucketFallbackWithoutOpener currently asserts the broken legacy behavior as a contract. Worth either expecting an error, or renaming the test to make that intent explicit.

Once those are sorted, happy to take another pass.

laskoviymishka · 2026-05-14T18:36:23Z

+	name := filepath.Base(key)

-	r, err := bfs.NewReader(bfs.ctx, key, nil)
+	r, err := bucket.NewReader(bfs.ctx, key, nil)


I think there's a subtle issue — Open now routes through the resolved bucket here, but blobOpenFile.ReadAt (line 49) still calls f.b.NewRangeReader(...), which dispatches to the embedded primary bucket. So any cross-bucket file opened here fails on the first ReadAt (404, or worse if a same-named key exists in the primary), and that's the path Parquet column readers and the puffin/DV reader actually take. TestMultiBucketWriteAndRead passes only because io.ReadAll uses Read.

Could we thread bucket into blobOpenFile and have ReadAt use f.bucket.NewRangeReader(...)? A ReadAt-specific test would help lock it in. wdyt?

laskoviymishka · 2026-05-14T18:36:23Z

+	bucketOpener  BucketOpener
+
+	mu      sync.RWMutex
+	buckets map[string]*blob.Bucket


These secondary buckets are never closed — blobFileIO only inherits Close() from the embedded primary. Each cached *blob.Bucket holds its own HTTP transport / connection pool, so a long-running process that touches many distinct metadata buckets will leak transports indefinitely.

I'd add a Close() on blobFileIO that walks bfs.buckets under the lock and closes each before closing the primary. wdyt?

laskoviymishka · 2026-05-14T18:36:23Z

+	}
 }

 func (bfs *blobFileIO) WalkDir(root string, fn fs.WalkDirFunc) error {


WalkDir is the one IO entry point that doesn't go through resolveBucket — it still hardcodes fs.WalkDir(bfs.Bucket, ...). So if root names a secondary bucket (e.g., orphan-file enumeration on write.metadata.path), this silently walks the primary warehouse bucket and returns the wrong listing.

Could we route this through resolveBucket too? Probably worth a small test covering the cross-bucket root.

laskoviymishka · 2026-05-14T18:36:23Z

+			return createS3Bucket(ctx, u, props)
+		}
+
+		return createMultiBucketBlobFS(ctx, bucket, parsed.Host, opener), nil


The fix lands for S3 here, but registerGCSScheme (line 64) and registerAzureSchemes (line 76) still call createBlobFS, so a GCS or ADLS user with write.metadata.path on a separate bucket still hits the mangled-key bug we're fixing. PyIceberg and Java handle cross-bucket GCS transparently, so tables written by them become unreadable from Go in that layout.

Could we either add BucketOpener impls for GCS/Azure here, or call out the S3-only scope in the PR description with a follow-up tracked? wdyt?

cassio-paesleme and others added 3 commits May 13, 2026 16:32

style: add blank lines before returns to satisfy nlreturn

92d2d3c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: fix remaining nlreturn violations in tests and register

62f10e1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cassio-paesleme marked this pull request as ready for review May 14, 2026 14:23

cassio-paesleme requested a review from zeroshade as a code owner May 14, 2026 14:23

laskoviymishka requested changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gocloud): support cross-bucket IO for write.metadata.path#1078

fix(gocloud): support cross-bucket IO for write.metadata.path#1078
cassio-paesleme wants to merge 3 commits into
apache:mainfrom
cassio-paesleme:fix/multi-bucket-blob-io

cassio-paesleme commented May 13, 2026

Uh oh!

laskoviymishka left a comment

Uh oh!

laskoviymishka May 14, 2026

Uh oh!

laskoviymishka May 14, 2026

Uh oh!

laskoviymishka May 14, 2026

Uh oh!

laskoviymishka May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cassio-paesleme commented May 13, 2026

Summary

Changes

Test plan

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 14, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 14, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 14, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants