fix(storage): default S3 addressing_style to virtual for custom endpoints#950
fix(storage): default S3 addressing_style to virtual for custom endpoints#950
Conversation
…ints Boto3's 'auto' default picks path-style addressing whenever endpoint_url is set, which breaks S3-compatible providers that require virtual-hosted style. On CAIOS (CoreWeave Object Storage), R2-strict, and Wasabi-strict, this caused HeadBucket and GetBucketLocation to fail with 400, blocking osmo credential set, dataset/data uploads, and the upload preflight check. In _get_boto_config, set addressing_style='virtual' whenever endpoint_url is provided. AWS S3 (no endpoint_url) is unaffected. The TOS branch is folded into the same shape (still hardcoded virtual). An OSMO_S3_ADDRESSING_STYLE env var is provided as an escape hatch for legacy buckets whose names are not DNS-compliant under virtual-hosted addressing. Verified against a real CAIOS bucket via aws s3api head-bucket: path-style returned 400, virtual-hosted returned 200 with BucketRegion populated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughAdds S3 addressing-style handling end-to-end: new credential field and env var ( ChangesS3 addressing-style propagation and endpoint-aware client config
Sequence Diagram(s)sequenceDiagram
participant Env as Env (OSMO_S3_ADDRESSING_STYLE / AWS_S3_FORCE_PATH_STYLE)
participant Cred as Credential (addressing_style)
participant Resolver as Resolver (_get_s3_addressing_style)
participant ConfigBuilder as _get_boto_config
participant Client as create_client
participant Boto as boto3
participant S3 as S3 Endpoint
Env->>Resolver: provide env values
Cred->>Resolver: provide credential.addressing_style
Resolver-->>ConfigBuilder: resolved addressing_style (or None)
ConfigBuilder-->>Client: build client config (may include s3.addressing_style)
Client->>Boto: boto3.client(endpoint_url, config)
Boto->>S3: HTTP request (uses addressing style + endpoint)
S3-->>Boto: response
Boto-->>Client: client returned
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
… override_url
Two follow-ups to the addressing-style change:
1. Honor AWS_S3_FORCE_PATH_STYLE (Codex P1). The quick-start chart and any
localstack/MinIO-style deployment relies on path-style addressing because
K8s service DNS doesn't do wildcard SRV. Boto3 doesn't honor that env
var natively, so the previous default ('virtual' for any custom endpoint)
would have broken those deployments. Resolution order in
_get_s3_addressing_style is now: OSMO_S3_ADDRESSING_STYLE override,
AWS_S3_FORCE_PATH_STYLE, then virtual-for-custom default.
2. Route S3Backend.region()'s GetBucketLocation client through the
credential's override_url (Codex P2). When data_cred.region is unset and
override_url is configured, the precheck previously created a client
pointed at AWS S3 and called get_bucket_location for a bucket that lives
on the custom backend, returning NoSuchBucket. Now the client targets the
correct endpoint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #950 +/- ##
==========================================
- Coverage 44.51% 42.83% -1.68%
==========================================
Files 211 217 +6
Lines 27056 28419 +1363
Branches 4235 4247 +12
==========================================
+ Hits 12044 12174 +130
- Misses 14374 15608 +1234
+ Partials 638 637 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
… runtime
Adds a per-credential addressing_style field and routes override_url all the
way to the Go runtime mount path, so workflows that mount S3-compatible
buckets (CAIOS, Cloudflare R2 strict, Wasabi strict) succeed end-to-end
rather than only at the CLI/service layer.
Two latent bugs in the runtime path that this also fixes:
1. The Go DataCredential struct had no OverrideUrl yaml tag, so the
override_url already being written by to_decrypted_dict was silently
dropped during yaml.Unmarshal in osmo-ctrl. mount-s3 was then targeting
AWS S3 even when the credential pointed at a custom endpoint.
2. mount-s3 was invoked with --force-path-style for every non-TOS scheme.
Custom-endpoint S3-compatible providers that require virtual-hosted
style (CAIOS) had no way to opt out.
Resolution order in both Python (_get_s3_addressing_style) and Go
(shouldForcePathStyle):
1. Credential addressing_style (per-endpoint override).
2. OSMO_S3_ADDRESSING_STYLE env var (operator override).
3. AWS_S3_FORCE_PATH_STYLE env var (chart/localstack/MinIO convention).
4. Heuristic default: virtual for S3 with override_url, path for plain
AWS S3 (preserved), path for Swift/GS (preserved).
The addressing_style field is plumbed:
CLI _save_config → ~/.config/osmo/config.yaml
→ POST /api/credentials/<n> → UserDataCredential.to_db_row (hstore,
additive — no DB migration needed)
→ workflow runtime YAML via to_decrypted_dict + create_config_dict
→ Go DataCredential struct + buildMountCommandArgs
→ mount-s3 --force-path-style flag (or its absence)
Python _get_s3_addressing_style is now scheme-gated: it only applies the
virtual-for-custom-endpoint default to s3:// backends, leaving Swift/GS
to boto3's resolver.
Tests added across both languages cover credential override, env var
resolution order, scheme-gating, TOS/Swift/AWS regression guards, and
local-config round-trip of addressing_style.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/data/storage/backends/s3.py`:
- Around line 121-129: The override returned from OSMO_S3_ADDRESSING_STYLE
should be normalized and validated before passing to botocore: in the block that
reads override (when scheme == common.StorageBackendType.S3.value and override
is set), trim whitespace and lowercase the string (e.g., override =
override.strip().lower()), accept only the supported values (e.g., 'virtual' and
'path') and return the normalized value; if the normalized value is not
supported, raise or surface an explicit error instead of returning the raw
string. Ensure this logic is applied where addressing_style and override are
handled and keep the existing AWS_S3_FORCE_PATH_STYLE fallback unchanged.
In `@src/runtime/pkg/data/data.go`:
- Around line 60-65: The code currently only recognizes "path" and "virtual"
(s3AddressingStylePath, s3AddressingStyleVirtual) while Python advertises
"auto", so add explicit handling: introduce a constant s3AddressingStyleAuto =
"auto" and update the switch/validation logic (the block that reads
osmoS3AddressingStyle / AWS_S3_FORCE_PATH_STYLE and the code around the switch
at lines handling 85-107) to reject "auto" with a clear error (or normalize it
explicitly) before accepting the credential; specifically, detect when
DataCredential.AddressingStyle (or the env osmoS3AddressingStyle) equals "auto"
and return a validation error explaining that "auto" is not supported by the
runtime, or alternatively implement the explicit normalizing branch for
s3AddressingStyleAuto so runtime and Python behave identically.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 79bc5508-cb52-47a4-a15d-7594bae61660
📒 Files selected for processing (11)
src/cli/credential.pysrc/cli/tests/test_credential.pysrc/lib/data/storage/backends/s3.pysrc/lib/data/storage/backends/tests/test_backends.pysrc/lib/data/storage/credentials/credentials.pysrc/runtime/pkg/data/BUILDsrc/runtime/pkg/data/data.gosrc/runtime/pkg/data/data_test.gosrc/runtime/pkg/data/input_output.gosrc/service/core/workflow/objects.pysrc/utils/connectors/postgres.py
✅ Files skipped from review due to trivial changes (2)
- src/runtime/pkg/data/input_output.go
- src/cli/credential.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/data/storage/backends/tests/test_backends.py
…style Pylint W0621: the local 'override = os.getenv(OSMO_S3_ADDRESSING_STYLE)' shadowed the 'override' decorator imported from typing_extensions at the module level. Rename the local to 'osmo_override' to make the lint clean and to better describe what the value is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
The env-var override was previously returned raw to botocore. A typo like
'virtua' or stray whitespace would silently flow into Config(s3=...) and
either be ignored or fail in an opaque way at client-creation time.
Trim + lowercase the value and validate against the supported set
('virtual', 'path', 'auto'); raise ValueError naming the offending raw
value on miss so the failure surfaces immediately at the source.
The credential-level addressing_style parameter is left as-is — the
credential schema already declares Literal['virtual', 'path', 'auto'] | None
and pydantic enforces it case-sensitively at construction. The
AWS_S3_FORCE_PATH_STYLE fallback is also unchanged; it self-normalizes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
The credential schema (Literal['virtual','path','auto']) accepts 'auto' for parity with boto3, but mount-s3 has no auto-detect knob and the runtime silently fell through 'auto' to the env-var/OverrideUrl heuristic. Make that intent explicit: add s3AddressingStyleAuto and an empty switch case with a comment that documents *why* we do not mirror boto3's 'auto' (boto3's 'auto' picks path-style for custom endpoints, which is the very behavior that breaks CAIOS). Behavior is unchanged. The case is documentation-only — a future reader seeing 'auto' in the schema can immediately see how the runtime treats it without having to derive it from the fall-through. Add a regression test that pins this: a credential with override_url and addressing_style='auto' must NOT emit --force-path-style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
cc091ff to
f0ded4b
Compare
Boto3's 'auto' default picks path-style addressing whenever endpoint_url is set, which breaks S3-compatible providers that require virtual-hosted style. On CAIOS (CoreWeave Object Storage), R2-strict, and Wasabi-strict, this caused HeadBucket and GetBucketLocation to fail with 400, blocking osmo credential set, dataset/data uploads, and the upload preflight check.
In _get_boto_config, set addressing_style='virtual' whenever endpoint_url is provided. AWS S3 (no endpoint_url) is unaffected. The TOS branch is folded into the same shape (still hardcoded virtual). An OSMO_S3_ADDRESSING_STYLE env var is provided as an escape hatch for legacy buckets whose names are not DNS-compliant under virtual-hosted addressing.
Verified against a real CAIOS bucket via aws s3api head-bucket: path-style returned 400, virtual-hosted returned 200 with BucketRegion populated.
Description
Issue #940
Checklist
Summary by CodeRabbit