Skip to content

fix(storage): default S3 addressing_style to virtual for custom endpoints#950

Merged
cypres merged 6 commits intomainfrom
fix/s3-addressing-style-virtual
May 5, 2026
Merged

fix(storage): default S3 addressing_style to virtual for custom endpoints#950
cypres merged 6 commits intomainfrom
fix/s3-addressing-style-virtual

Conversation

@cypres
Copy link
Copy Markdown
Member

@cypres cypres commented May 4, 2026

Boto3's 'auto' default picks path-style addressing whenever endpoint_url is set, which breaks S3-compatible providers that require virtual-hosted style. On CAIOS (CoreWeave Object Storage), R2-strict, and Wasabi-strict, this caused HeadBucket and GetBucketLocation to fail with 400, blocking osmo credential set, dataset/data uploads, and the upload preflight check.

In _get_boto_config, set addressing_style='virtual' whenever endpoint_url is provided. AWS S3 (no endpoint_url) is unaffected. The TOS branch is folded into the same shape (still hardcoded virtual). An OSMO_S3_ADDRESSING_STYLE env var is provided as an escape hatch for legacy buckets whose names are not DNS-compliant under virtual-hosted addressing.

Verified against a real CAIOS bucket via aws s3api head-bucket: path-style returned 400, virtual-hosted returned 200 with BucketRegion populated.

Description

Issue #940

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features
    • Global and per-credential S3 addressing-style (virtual/path/auto) with env var support and CLI/config persistence; mount tooling and runtime honor addressing-style when building S3 mount commands.
  • Bug Fixes
    • Region lookup now targets a credential’s configured endpoint, fixing lookups for non-AWS S3-compatible backends.
  • Tests
    • Added comprehensive tests for addressing-style precedence, validation, config persistence, client config behavior, and mount command argument generation.

…ints

Boto3's 'auto' default picks path-style addressing whenever endpoint_url is
set, which breaks S3-compatible providers that require virtual-hosted style.
On CAIOS (CoreWeave Object Storage), R2-strict, and Wasabi-strict, this
caused HeadBucket and GetBucketLocation to fail with 400, blocking
osmo credential set, dataset/data uploads, and the upload preflight check.

In _get_boto_config, set addressing_style='virtual' whenever endpoint_url is
provided. AWS S3 (no endpoint_url) is unaffected. The TOS branch is folded
into the same shape (still hardcoded virtual). An OSMO_S3_ADDRESSING_STYLE
env var is provided as an escape hatch for legacy buckets whose names are
not DNS-compliant under virtual-hosted addressing.

Verified against a real CAIOS bucket via aws s3api head-bucket: path-style
returned 400, virtual-hosted returned 200 with BucketRegion populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
@cypres cypres requested a review from a team as a code owner May 4, 2026 22:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b833e258-c7f1-4ee6-83ff-946f12e74473

📥 Commits

Reviewing files that changed from the base of the PR and between 28431da and f0ded4b.

📒 Files selected for processing (4)
  • src/lib/data/storage/backends/s3.py
  • src/lib/data/storage/backends/tests/test_backends.py
  • src/runtime/pkg/data/data.go
  • src/runtime/pkg/data/data_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/lib/data/storage/backends/s3.py
  • src/lib/data/storage/backends/tests/test_backends.py
  • src/runtime/pkg/data/data_test.go
  • src/runtime/pkg/data/data.go

📝 Walkthrough

Walkthrough

Adds S3 addressing-style handling end-to-end: new credential field and env var (OSMO_S3_ADDRESSING_STYLE), resolver and normalization logic, boto3 config wiring, region precheck targeting credential endpoints, CLI/config persistence, runtime mount-arg changes, and tests in Python and Go.

Changes

S3 addressing-style propagation and endpoint-aware client config

Layer / File(s) Summary
Data shape / config
src/lib/data/storage/credentials/credentials.py, src/runtime/pkg/data/input_output.go, src/service/core/workflow/objects.py
Adds optional addressing_style field to data credential types and YAML/DB payloads; passes it into StaticDataCredential construction.
Env / Constants
src/lib/data/storage/backends/s3.py, src/runtime/pkg/data/data.go
Introduces OSMO_S3_ADDRESSING_STYLE and related addressing-style constants and env parsing helpers.
Env value normalization & resolver
src/lib/data/storage/backends/s3.py
Adds _normalize_env_addressing_style() and _get_s3_addressing_style(scheme, endpoint_url, addressing_style) implementing precedence: credential → OSMO env → AWS_S3_FORCE_PATH_STYLE → endpoint-aware fallback ('virtual' when custom endpoint, None to defer to boto3 when no endpoint).
Boto config construction
src/lib/data/storage/backends/s3.py
Updates _get_boto_config to accept endpoint_url and addressing_style; forces virtual for TOS and only injects s3.addressing_style when resolver returns non-None.
Client wiring / region precheck
src/lib/data/storage/backends/s3.py, src/lib/data/storage/backends/backends.py
create_client now passes endpoint_url and credential addressing_style into _get_boto_config; S3Backend.region() invokes client creation with endpoint_url=data_cred.override_url so bucket-location checks target the credential endpoint.
Credential loading & persistence / CLI
src/lib/data/storage/credentials/credentials.py, src/cli/credential.py, src/cli/tests/test_credential.py
Reads/writes addressing_style from/to config entries; includes it in decrypted credential dicts and CLI save behavior; test added to verify persistence.
Runtime mount command behavior
src/runtime/pkg/data/data.go, src/runtime/pkg/data/input_output.go
Adds helpers to compute --force-path-style for mount-s3 based on backend scheme, credential AddressingStyle, OSMO_S3_ADDRESSING_STYLE, and AWS_S3_FORCE_PATH_STYLE; replaces inline MountURL arg building with buildMountCommandArgs.
Runtime tests & BUILD
src/runtime/pkg/data/data_test.go, src/runtime/pkg/data/BUILD
Adds Go tests covering mount args and addressing-style rules across backends; adds go_test target.
Backend tests
src/lib/data/storage/backends/tests/test_backends.py
Adds unit tests for _get_s3_addressing_style, _get_boto_config behavior, S3 region precheck routing, and credential dict export.
Connector wiring
src/utils/connectors/postgres.py
Includes addressing_style when resolving bucket credentials into credentials.StaticDataCredential.

Sequence Diagram(s)

sequenceDiagram
    participant Env as Env (OSMO_S3_ADDRESSING_STYLE / AWS_S3_FORCE_PATH_STYLE)
    participant Cred as Credential (addressing_style)
    participant Resolver as Resolver (_get_s3_addressing_style)
    participant ConfigBuilder as _get_boto_config
    participant Client as create_client
    participant Boto as boto3
    participant S3 as S3 Endpoint

    Env->>Resolver: provide env values
    Cred->>Resolver: provide credential.addressing_style
    Resolver-->>ConfigBuilder: resolved addressing_style (or None)
    ConfigBuilder-->>Client: build client config (may include s3.addressing_style)
    Client->>Boto: boto3.client(endpoint_url, config)
    Boto->>S3: HTTP request (uses addressing style + endpoint)
    S3-->>Boto: response
    Boto-->>Client: client returned
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
I trimmed the env and hopped through code,
Path or virtual where endpoints goad,
Clients now heed the credential's cue,
Mounts mount true and tests prove too,
A little hop — addressing styles in view.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: defaulting S3 addressing_style to virtual for custom endpoints, which is the core fix addressing the boto3 'auto' default behavior issue.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/s3-addressing-style-virtual

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.1)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

… override_url

Two follow-ups to the addressing-style change:

1. Honor AWS_S3_FORCE_PATH_STYLE (Codex P1). The quick-start chart and any
   localstack/MinIO-style deployment relies on path-style addressing because
   K8s service DNS doesn't do wildcard SRV. Boto3 doesn't honor that env
   var natively, so the previous default ('virtual' for any custom endpoint)
   would have broken those deployments. Resolution order in
   _get_s3_addressing_style is now: OSMO_S3_ADDRESSING_STYLE override,
   AWS_S3_FORCE_PATH_STYLE, then virtual-for-custom default.

2. Route S3Backend.region()'s GetBucketLocation client through the
   credential's override_url (Codex P2). When data_cred.region is unset and
   override_url is configured, the precheck previously created a client
   pointed at AWS S3 and called get_bucket_location for a bucket that lives
   on the custom backend, returning NoSuchBucket. Now the client targets the
   correct endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 82.47423% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.83%. Comparing base (aedb412) to head (f0ded4b).

Files with missing lines Patch % Lines
src/runtime/pkg/data/data.go 76.36% 13 Missing ⚠️
src/lib/data/storage/credentials/credentials.py 66.66% 1 Missing and 1 partial ⚠️
src/service/core/workflow/objects.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #950      +/-   ##
==========================================
- Coverage   44.51%   42.83%   -1.68%     
==========================================
  Files         211      217       +6     
  Lines       27056    28419    +1363     
  Branches     4235     4247      +12     
==========================================
+ Hits        12044    12174     +130     
- Misses      14374    15608    +1234     
+ Partials      638      637       -1     
Flag Coverage Δ
backend 43.60% <82.47%> (-1.89%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/cli/credential.py 100.00% <100.00%> (ø)
src/lib/data/storage/backends/backends.py 57.06% <ø> (+2.05%) ⬆️
src/lib/data/storage/backends/s3.py 29.30% <100.00%> (+5.20%) ⬆️
src/runtime/pkg/data/input_output.go 0.00% <ø> (ø)
src/utils/connectors/postgres.py 71.48% <ø> (ø)
src/lib/data/storage/credentials/credentials.py 74.60% <66.66%> (-1.26%) ⬇️
src/service/core/workflow/objects.py 56.81% <0.00%> (-0.21%) ⬇️
src/runtime/pkg/data/data.go 8.57% <76.36%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… runtime

Adds a per-credential addressing_style field and routes override_url all the
way to the Go runtime mount path, so workflows that mount S3-compatible
buckets (CAIOS, Cloudflare R2 strict, Wasabi strict) succeed end-to-end
rather than only at the CLI/service layer.

Two latent bugs in the runtime path that this also fixes:

1. The Go DataCredential struct had no OverrideUrl yaml tag, so the
   override_url already being written by to_decrypted_dict was silently
   dropped during yaml.Unmarshal in osmo-ctrl. mount-s3 was then targeting
   AWS S3 even when the credential pointed at a custom endpoint.

2. mount-s3 was invoked with --force-path-style for every non-TOS scheme.
   Custom-endpoint S3-compatible providers that require virtual-hosted
   style (CAIOS) had no way to opt out.

Resolution order in both Python (_get_s3_addressing_style) and Go
(shouldForcePathStyle):

  1. Credential addressing_style (per-endpoint override).
  2. OSMO_S3_ADDRESSING_STYLE env var (operator override).
  3. AWS_S3_FORCE_PATH_STYLE env var (chart/localstack/MinIO convention).
  4. Heuristic default: virtual for S3 with override_url, path for plain
     AWS S3 (preserved), path for Swift/GS (preserved).

The addressing_style field is plumbed:

  CLI _save_config → ~/.config/osmo/config.yaml
  → POST /api/credentials/<n> → UserDataCredential.to_db_row (hstore,
    additive — no DB migration needed)
  → workflow runtime YAML via to_decrypted_dict + create_config_dict
  → Go DataCredential struct + buildMountCommandArgs
  → mount-s3 --force-path-style flag (or its absence)

Python _get_s3_addressing_style is now scheme-gated: it only applies the
virtual-for-custom-endpoint default to s3:// backends, leaving Swift/GS
to boto3's resolver.

Tests added across both languages cover credential override, env var
resolution order, scheme-gating, TOS/Swift/AWS regression guards, and
local-config round-trip of addressing_style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/data/storage/backends/s3.py`:
- Around line 121-129: The override returned from OSMO_S3_ADDRESSING_STYLE
should be normalized and validated before passing to botocore: in the block that
reads override (when scheme == common.StorageBackendType.S3.value and override
is set), trim whitespace and lowercase the string (e.g., override =
override.strip().lower()), accept only the supported values (e.g., 'virtual' and
'path') and return the normalized value; if the normalized value is not
supported, raise or surface an explicit error instead of returning the raw
string. Ensure this logic is applied where addressing_style and override are
handled and keep the existing AWS_S3_FORCE_PATH_STYLE fallback unchanged.

In `@src/runtime/pkg/data/data.go`:
- Around line 60-65: The code currently only recognizes "path" and "virtual"
(s3AddressingStylePath, s3AddressingStyleVirtual) while Python advertises
"auto", so add explicit handling: introduce a constant s3AddressingStyleAuto =
"auto" and update the switch/validation logic (the block that reads
osmoS3AddressingStyle / AWS_S3_FORCE_PATH_STYLE and the code around the switch
at lines handling 85-107) to reject "auto" with a clear error (or normalize it
explicitly) before accepting the credential; specifically, detect when
DataCredential.AddressingStyle (or the env osmoS3AddressingStyle) equals "auto"
and return a validation error explaining that "auto" is not supported by the
runtime, or alternatively implement the explicit normalizing branch for
s3AddressingStyleAuto so runtime and Python behave identically.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 79bc5508-cb52-47a4-a15d-7594bae61660

📥 Commits

Reviewing files that changed from the base of the PR and between 147d8c3 and 1fd8af4.

📒 Files selected for processing (11)
  • src/cli/credential.py
  • src/cli/tests/test_credential.py
  • src/lib/data/storage/backends/s3.py
  • src/lib/data/storage/backends/tests/test_backends.py
  • src/lib/data/storage/credentials/credentials.py
  • src/runtime/pkg/data/BUILD
  • src/runtime/pkg/data/data.go
  • src/runtime/pkg/data/data_test.go
  • src/runtime/pkg/data/input_output.go
  • src/service/core/workflow/objects.py
  • src/utils/connectors/postgres.py
✅ Files skipped from review due to trivial changes (2)
  • src/runtime/pkg/data/input_output.go
  • src/cli/credential.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/data/storage/backends/tests/test_backends.py

Comment thread src/lib/data/storage/backends/s3.py
Comment thread src/runtime/pkg/data/data.go
…style

Pylint W0621: the local 'override = os.getenv(OSMO_S3_ADDRESSING_STYLE)'
shadowed the 'override' decorator imported from typing_extensions at the
module level. Rename the local to 'osmo_override' to make the lint clean
and to better describe what the value is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
@cypres cypres temporarily deployed to internal-ci May 4, 2026 23:20 — with GitHub Actions Inactive
cypres and others added 2 commits May 4, 2026 16:24
The env-var override was previously returned raw to botocore. A typo like
'virtua' or stray whitespace would silently flow into Config(s3=...) and
either be ignored or fail in an opaque way at client-creation time.

Trim + lowercase the value and validate against the supported set
('virtual', 'path', 'auto'); raise ValueError naming the offending raw
value on miss so the failure surfaces immediately at the source.

The credential-level addressing_style parameter is left as-is — the
credential schema already declares Literal['virtual', 'path', 'auto'] | None
and pydantic enforces it case-sensitively at construction. The
AWS_S3_FORCE_PATH_STYLE fallback is also unchanged; it self-normalizes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
The credential schema (Literal['virtual','path','auto']) accepts 'auto' for
parity with boto3, but mount-s3 has no auto-detect knob and the runtime
silently fell through 'auto' to the env-var/OverrideUrl heuristic. Make
that intent explicit: add s3AddressingStyleAuto and an empty switch case
with a comment that documents *why* we do not mirror boto3's 'auto'
(boto3's 'auto' picks path-style for custom endpoints, which is the very
behavior that breaks CAIOS).

Behavior is unchanged. The case is documentation-only — a future reader
seeing 'auto' in the schema can immediately see how the runtime treats it
without having to derive it from the fall-through.

Add a regression test that pins this: a credential with override_url and
addressing_style='auto' must NOT emit --force-path-style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
@cypres cypres force-pushed the fix/s3-addressing-style-virtual branch from cc091ff to f0ded4b Compare May 4, 2026 23:30
@cypres cypres temporarily deployed to internal-ci May 4, 2026 23:30 — with GitHub Actions Inactive
@cypres cypres merged commit 4c75379 into main May 5, 2026
12 checks passed
@cypres cypres deleted the fix/s3-addressing-style-virtual branch May 5, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants