Skip to content

fix(gocloud): support cross-bucket IO for write.metadata.path#1078

Open
cassio-paesleme wants to merge 3 commits into
apache:mainfrom
cassio-paesleme:fix/multi-bucket-blob-io
Open

fix(gocloud): support cross-bucket IO for write.metadata.path#1078
cassio-paesleme wants to merge 3 commits into
apache:mainfrom
cassio-paesleme:fix/multi-bucket-blob-io

Conversation

@cassio-paesleme
Copy link
Copy Markdown
Contributor

Summary

The blob FileIO implementation assumes all file operations target a single S3 bucket (the warehouse bucket opened at init time). When Iceberg's write.metadata.path table property points to a different bucket (e.g. a dedicated versioned metadata bucket), defaultKeyExtractor strips the wrong bucket prefix and the file lands in the warehouse bucket under a mangled key. Readers following the absolute S3 URI in metadata.json get a 404.

Concrete failure mode: Setting write.metadata.path = s3://metadata-bucket/db/table/ causes manifest lists (snap-*.avro) and manifest files (*-m*.avro) to be written to s3://warehouse-bucket/metadata-bucket/db/table/snap-*.avro instead of s3://metadata-bucket/db/table/snap-*.avro. The metadata.json records the correct URI, but the bytes are in the wrong place.

Changes

  • Add resolveBucket() to blobFileIO which parses the full S3 URI and routes to the correct bucket
  • Primary bucket (warehouse) uses the fast path with no map lookup
  • Secondary buckets are opened lazily via a BucketOpener callback and cached with sync.RWMutex
  • Update Open, NewWriter, WriteFile, Remove, and DeleteFiles to use resolveBucket
  • Wire S3 scheme registration to pass a BucketOpener that reuses the same AWS config
  • Backward compatible: callers without a BucketOpener get the same legacy behavior

Test plan

  • TestMultiBucketWriteAndRead - writes to two memblob buckets via different S3 URIs, verifies files land in the correct bucket and can be read back
  • TestMultiBucketDelete - verifies Remove and DeleteFiles route to the correct bucket
  • TestMultiBucketOpenerCaching - verifies the opener is called once per bucket name
  • TestMultiBucketFallbackWithoutOpener - verifies backward compatibility when no opener is set
  • All existing tests pass unchanged

🤖 Generated with Claude Code

cassio-paesleme and others added 3 commits May 13, 2026 16:32
The blob FileIO implementation assumes all file operations target a
single S3 bucket (the warehouse bucket). When Iceberg's
write.metadata.path table property points to a different bucket,
the defaultKeyExtractor strips the wrong prefix and the file lands
in the warehouse bucket under a mangled key. Readers following the
absolute URI in metadata.json get a 404 because the file is not
where they expect it.

Add multi-bucket support to blobFileIO:

- Add resolveBucket() which parses the full S3 URI, routes to the
  primary bucket for same-bucket paths, and lazily opens secondary
  buckets via a BucketOpener callback for cross-bucket paths.
- Update Open, NewWriter, WriteFile, Remove, and DeleteFiles to use
  resolveBucket instead of always using the embedded primary bucket.
- Wire S3 scheme registration to use createMultiBucketBlobFS with a
  BucketOpener that reuses the same AWS config for secondary buckets.
- Secondary bucket handles are cached and safe for concurrent use.
- Backward compatible: callers that don't set BucketOpener get the
  same legacy behavior (key extraction falls through to primary bucket).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cassio-paesleme cassio-paesleme marked this pull request as ready for review May 14, 2026 14:23
Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up — write.metadata.path on a separate bucket is a real production gap. The routing shape with resolveBucket + per-bucket cache + BucketOpener callback also looks like the right direction.

I’d hold before merging though.

Main concern: the read path still doesn’t look fixed. blobOpenFile.ReadAt still goes to the primary bucket, so any cross-bucket Parquet / Avro / Puffin read will fail on the first range request. The new TestMultiBucketWriteAndRead doesn’t catch this because io.ReadAll uses Read, not ReadAt.

A few other things I’d sort out before merge:

  • WalkDir still hardcodes the primary bucket, so orphan-file enumeration over write.metadata.path is still wrong.
  • Cached secondary buckets are never closed, so long-running processes can leak HTTP transports.
  • The fix is S3-only. GCS and Azure still call createBlobFS, so they keep the original mangled-key bug. I’d either extend the fix to those backends or call out the S3-only scope clearly and file a follow-up.

One smaller thing I wouldn’t block on: TestMultiBucketFallbackWithoutOpener currently asserts the broken legacy behavior as a contract. Worth either expecting an error, or renaming the test to make that intent explicit.

Once those are sorted, happy to take another pass.

Comment thread io/gocloud/blob.go
name := filepath.Base(key)

r, err := bfs.NewReader(bfs.ctx, key, nil)
r, err := bucket.NewReader(bfs.ctx, key, nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle issue — Open now routes through the resolved bucket here, but blobOpenFile.ReadAt (line 49) still calls f.b.NewRangeReader(...), which dispatches to the embedded primary bucket. So any cross-bucket file opened here fails on the first ReadAt (404, or worse if a same-named key exists in the primary), and that's the path Parquet column readers and the puffin/DV reader actually take. TestMultiBucketWriteAndRead passes only because io.ReadAll uses Read.

Could we thread bucket into blobOpenFile and have ReadAt use f.bucket.NewRangeReader(...)? A ReadAt-specific test would help lock it in. wdyt?

Comment thread io/gocloud/blob.go
bucketOpener BucketOpener

mu sync.RWMutex
buckets map[string]*blob.Bucket
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These secondary buckets are never closed — blobFileIO only inherits Close() from the embedded primary. Each cached *blob.Bucket holds its own HTTP transport / connection pool, so a long-running process that touches many distinct metadata buckets will leak transports indefinitely.

I'd add a Close() on blobFileIO that walks bfs.buckets under the lock and closes each before closing the primary. wdyt?

Comment thread io/gocloud/blob.go
}
}

func (bfs *blobFileIO) WalkDir(root string, fn fs.WalkDirFunc) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WalkDir is the one IO entry point that doesn't go through resolveBucket — it still hardcodes fs.WalkDir(bfs.Bucket, ...). So if root names a secondary bucket (e.g., orphan-file enumeration on write.metadata.path), this silently walks the primary warehouse bucket and returns the wrong listing.

Could we route this through resolveBucket too? Probably worth a small test covering the cross-bucket root.

Comment thread io/gocloud/register.go
return createS3Bucket(ctx, u, props)
}

return createMultiBucketBlobFS(ctx, bucket, parsed.Host, opener), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix lands for S3 here, but registerGCSScheme (line 64) and registerAzureSchemes (line 76) still call createBlobFS, so a GCS or ADLS user with write.metadata.path on a separate bucket still hits the mangled-key bug we're fixing. PyIceberg and Java handle cross-bucket GCS transparently, so tables written by them become unreadable from Go in that layout.

Could we either add BucketOpener impls for GCS/Azure here, or call out the S3-only scope in the PR description with a follow-up tracked? wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants