Skip to content

[ENHANCEMENT] Improve Amazon Bedrock support#125

Draft
allquixotic wants to merge 3 commits into
Zoo-Code-Org:mainfrom
allquixotic:zoo-port/bedrock-robust-support
Draft

[ENHANCEMENT] Improve Amazon Bedrock support#125
allquixotic wants to merge 3 commits into
Zoo-Code-Org:mainfrom
allquixotic:zoo-port/bedrock-robust-support

Conversation

@allquixotic
Copy link
Copy Markdown

Related GitHub Issue

Closes: #124

Description

This PR improves Amazon Bedrock support across provider metadata, invocation target resolution, settings UI, and request handling.

Key changes:

  • Adds Bedrock control-plane discovery for foundation models and inference profiles.
  • Expands discovered targets with explicit 1M-context variants where applicable.
  • Improves inference-profile, custom ARN, region, and cross-region/global target handling.
  • Updates Bedrock Claude model metadata, including 200k/1M context and max output token support.
  • Adds a Bedrock max-output-token probe flow for models where AWS does not expose the cap directly.
  • Handles Bedrock strict structured output fallback and caches unsupported structured-output responses.
  • Handles Bedrock thinking payload differences, including adaptive thinking models.
  • Adds focused tests for Bedrock metadata, discovery, max-token probing, structured output, reasoning, and JSON schema behavior.

Reviewer focus:

  • Bedrock target resolution and whether the selected model/profile is the same target used for probing and runtime calls.
  • 200k versus 1M context behavior. This PR should not regress eligible models to a 128k context window.
  • The strict structured-output fallback path and whether the unsupported-cache duration is acceptable.

This PR was prepared with agentic AI assistance from the CRC/Zoo porting work, then reviewed and shaped for Zoo Code contribution. The implementation has also been tested in production business use for several weeks before submission.

This is opened as a draft because issue #124 still needs Zoo maintainer approval/assignment under the contribution process.

Test Procedure

The implementation has been exercised in production business use for several weeks.

Reviewers can reproduce the focused local validation with:

pnpm lint
pnpm check-types
pnpm --filter @roo-code/types test -- src/__tests__/bedrock.spec.ts
pnpm --filter zoo-code test -- api/providers/__tests__/bedrock-max-tokens-probe.spec.ts api/providers/__tests__/bedrock-reasoning.spec.ts api/providers/__tests__/bedrock-structured-output.spec.ts shared/__tests__/bedrock-structured-output-cache.spec.ts utils/__tests__/json-schema.spec.ts

Manual verification recommended:

  • Configure Bedrock with an AWS profile and region.
  • Confirm discovered foundation models and inference profiles appear in settings.
  • Select a Claude 4.x profile that supports 1M context and confirm the selected context window remains 1M.
  • Use the max-output-token probe button and confirm the detected cap updates the model settings.
  • Send a tool-using request through Bedrock and confirm normal tool calling still works.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to GitHub Issue [ENHANCEMENT] Improve Amazon Bedrock model discovery, context handling, and output limits #124. It is pending maintainer approval/assignment before ready review.
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

TODO: Attach screenshots of the Bedrock settings model/profile selector and max-output-token probe UI if maintainers want UI evidence before review.

Documentation Updates

  • No documentation updates are required.
  • Yes, documentation updates are required. Bedrock user documentation should mention model/profile discovery, custom ARN/profile selection, 1M context selection, and max-output-token probing.

Additional Notes

Related but not duplicate Zoo work:

Those are adjacent but do not cover the provider discovery/context/output-token behavior in this PR.

Get in Touch

Discord: coorbin

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 21950601-18fd-46df-82b2-1138aa145c14

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@taltas taltas self-assigned this May 16, 2026
Copy link
Copy Markdown
Contributor

@edelauna edelauna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough write-up and the production testing context. This is solid work overall.

One structural ask: this PR is large enough (3,929 additions / 318 deletions across 37 files) and crosses enough boundaries that it is hard to review the model catalog/type changes, runtime request-shaping changes, and settings/control-plane UI changes with confidence in one pass. The max-output-token probe issue is a good example of the risk: UI-discovered state from BedrockMaxTokensProbeButton.tsx / useBedrockMaxTokensProbe.ts flows into
provider settings as awsModelMaxOutputTokens, and then affects model resolution/request construction through resolveBedrockModelInfo and AwsBedrockHandler.

It would help to split this into three sequential PRs:

PR 1 — Types & catalog (packages/types/ only)
Keep the model catalog corrections, Bedrock model metadata changes, promptCacheTtl, the Opus 4.7 entry, and shared Bedrock resolution/types/helpers in packages/types/src/providers/bedrock.ts and related type files/tests. This is mostly data, types, and pure resolution logic, so it should be low risk and easy to land first.

PR 2 — Runtime improvements, depending on PR 1
Keep the provider/runtime behavior together: adaptive thinking payload handling in src/api/providers/bedrock.ts, structured-output fallback and 30-day cache, stripBedrockStrictIncompatibleConstraints, JSON schema updates, and promptCacheTtl plumbing through the cache strategy. This keeps review focused on request construction, retry/cache behavior, and Bedrock compatibility, without also reviewing React/settings state.

PR 3 — Control-plane discovery + settings UI, depending on PR 1
Keep the discovery/probe/UI work together: src/api/providers/bedrock-discovery.ts, the @aws-sdk/client-bedrock dependency and lockfile changes, webviewMessageHandler message cases, Bedrock.tsx, BedrockMaxTokensProbeButton.tsx, useBedrockDiscovery, useBedrockMaxTokensProbe, and the related settings controls/i18n. This can then be reviewed specifically for discovery behavior, saved settings semantics, and UI state isolation.

I think that split would make the series much easier to review and reduce the chance of missing cross-layer regressions.

Comment thread packages/types/src/providers/bedrock.ts Outdated
supportsImages: true,
supportsPromptCache: true,
},
"claude-4-opus": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "claude-4" entry earlier in the map matches first, so any unknown ARN containing claude-4-opus resolves to maxTokens: 8192 instead of 4096 here. Could the more-specific patterns be ordered before the generic ones?

Comment thread src/api/providers/bedrock.ts Outdated
// parseBaseModelId strips cross-region inference prefixes (e.g. `us.`, `eu.`) and the
// synthetic `:1m` dropdown suffix.
const baseModelId = this.parseBaseModelId(modelConfig.id)
const requiresAdaptiveThinking = BEDROCK_ADAPTIVE_THINKING_MODEL_IDS.includes(baseModelId as any)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the file uses as (typeof BEDROCK_GLOBAL_INFERENCE_MODEL_IDS)[number] for these array-includes checks — would it be worth following the same pattern here instead of as any?

Comment thread src/api/providers/bedrock-discovery.ts Outdated

const client = new BedrockClient(toBedrockClientConfig(options))

const [foundationModelsResponse, inferenceProfiles] = await Promise.all([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AWS is slow or unreachable this Promise.all hangs indefinitely and blocks the settings panel. Is there a timeout or cancellation path planned here?

}
try {
const result = await probe(apiConfiguration, targetModelId)
setApiConfigurationField("awsModelMaxOutputTokens", result.maxOutputTokens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awsModelMaxOutputTokens is written as a bare number with no reference to which target was probed. If the user probes Opus 4.7 (→ 128K) and then switches to Haiku 4.5 (real cap 64K), the stale override is applied unconditionally and the next request fails. Should this be keyed by target id, or cleared when the selected target changes?

@allquixotic
Copy link
Copy Markdown
Author

Thanks for these reviews! I attempted to address them with a new commit. This needs some testing.

Copy link
Copy Markdown
Contributor

@edelauna edelauna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the inline feedback — the pattern ordering fix, typed assertions, discovery timeout, and scoped max-output override all look good.

I do want to reiterate my earlier ask to split this into three sequential PRs. Having done a deeper pass through the full diff, the cross-layer concerns reinforce why it's needed:

Findings from a detailed review underscore the split:

  • The highest-risk issues I'm seeing (credentials in React Query cache keys, probe cost/abort gaps, useEffect re-render cascades, missing staleTime on discovery) are all concentrated in the UI/discovery layer. Reviewing these alongside 900+ lines of model catalog changes dilutes focus on the parts that need the most scrutiny.
  • The runtime layer (structured-output fallback, adaptive thinking, compile-wait retry loop) has its own review surface — retry bounds, cache TTL semantics, double 1M-context application in getModel() — that deserves isolated attention.
  • The types/catalog layer is low risk and could land immediately, unblocking the other two.
  • There's also minor scope drift (multi-point-strategy.ts guard changes) that would naturally separate out.

My original proposed split still holds:

  1. PR 1 — Types & catalog (packages/types/): model metadata, resolution helpers, expandBedrockTargetsWith1MVariants, tests. Low risk, land first.
  2. PR 2 — Runtime (depends on PR 1): adaptive thinking, structured-output fallback + 30-day cache,
    stripBedrockStrictIncompatibleConstraints, JSON schema normalization, promptCacheTtl plumbing.
  3. PR 3 — Discovery + settings UI (depends on PR 1): @aws-sdk/client-bedrock, bedrock-discovery.ts, probe flow, React hooks/components, webview message handler, i18n.

This doesn't change the total amount of work — it just makes each piece reviewable in isolation and reduces the chance of cross-layer regressions slipping through.

@allquixotic
Copy link
Copy Markdown
Author

Thanks for the review! I’ll split it up, it’s no problem for me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Improve Amazon Bedrock model discovery, context handling, and output limits

3 participants