feat: add manual failover management#444
Conversation
|
Warning Review limit reached
Next review available in: 37 seconds Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds dashboard-managed failover rules with a new runtime enablement flag, and adds provider-attempt capture, persistence, hydration, and rendering across gateway, server, audit, and dashboard flows. ChangesFailover Rules Management
Provider Attempt Tracking and Audit
Estimated code review effort🎯 5 (Critical) | ⏱️ ~150 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/fallback/resolver_test.go (1)
55-83: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAssert that manual targets never reappear in
SuggestFallbacks().This test now only checks the first two suggestions. If
SuggestFallbacks()starts appending the already-configured manual target ("azure/gpt-4o") later in the slice, the test still passes even thoughinternal/fallback/resolver.go:171-189is explicitly deduping manual selectors before auto suggestions are generated. Add a negative assertion over the full result set. As per coding guidelines,**/*_test.go: Add or update tests for behavior changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/fallback/resolver_test.go` around lines 55 - 83, The fallback ranking test for Resolver currently only validates the first two suggestions, so it can miss regressions where the manual target is appended later in the slice. Update TestResolverSuggestFallbacksReturnsRankingCandidates to assert across the full result from SuggestFallbacks that the manual selector from Resolver.NewResolver config (the "azure/gpt-4o" entry) never appears, while still preserving the existing ordering checks for the top ranked auto suggestions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/fallback.go`:
- Around line 252-263: The disabled-model JSON parsing currently accepts the
literal null because json.Unmarshal into the slice in the cfg.DisabledModelsJSON
branch succeeds with a nil result; update this parsing path to explicitly reject
null before or after unmarshaling. In the fallback config logic that builds
disabled, keep allowing only a JSON array or boolean object shape, and if the
expanded value is null return an error instead of treating it as valid input.
In `@docs/features/failover.mdx`:
- Around line 13-15: The overview text in the failover documentation has an
awkward, broken sentence that makes the runtime migration guidance unclear.
Update the affected prose in the failover overview to rephrase the runtime
failover guidance into a complete, unambiguous sentence, keeping the meaning
that runtime failover relies on manual rules only. Locate the text near the
introductory failover description and adjust the surrounding sentence flow for
clarity.
In `@docs/openapi.json`:
- Around line 6775-6782: The failover request schemas currently define source as
optional even though the delete and upsert handlers require it; update both
admin.deleteFailoverRuleRequest and admin.upsertFailoverRuleRequest in the
OpenAPI document to mark source as required. Make sure the required field list
matches the server contract so generated clients treat source as mandatory for
both request shapes.
- Around line 952-1003: The failover upsert response contract is missing the
successful 204 No Content path. Update the responses for the PUT /admin/failover
operation in docs/openapi.json to include a 204 entry alongside the existing
200/400/401/502/503 responses, matching the behavior in
internal/admin/handler_failover.go and its refreshed view handling.
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 83-89: The failover edit flow in failover.js captures
failoverFormOriginalSource but never uses it when saving, so changing source
creates a new persisted rule instead of updating the existing one. Fix the save
path in the form handling logic by either making source immutable in edit mode
or by using failoverFormOriginalSource to delete/replace the old rule before
upserting the edited one; update the relevant edit/save methods around
failoverFormOriginalSource and failoverForm so the original key is handled
correctly.
- Around line 82-88: The failover UI is still using rule.managed to determine
whether a rule is config-managed, but this PR’s contract uses managed_source
instead. Update the failover state setup in the failover module to derive the
managed flag from managed_source, and make sure the template logic for the
edit/delete affordances also checks managed_source consistently so managed rules
are treated as non-editable dashboard entries.
In `@internal/admin/dashboard/templates/layout.html`:
- Around line 162-170: The failover target picker is using a datalist with a
textarea, but the `list` attribute is not supported on `<textarea>`, so the
autocomplete will never work. Update the `failover-target-options` /
`failover-targets` UI in `layout.html` to use a supported control such as an
`<input>` with `list`, or replace it with a custom autocomplete component while
keeping the `failoverForm.targets` binding and `failoverFormManaged` disabled
state intact.
In `@internal/admin/dashboard/templates/page-failover.html`:
- Around line 21-24: The failover filter input is bound to shared state but the
tables still render the unfiltered arrays, so typing has no effect and leaks
into the models page filter. Update the failover template to use a
failover-specific filtered collection in the table loops, and avoid reusing the
shared modelFilter binding by introducing a dedicated failover filter state and
derived list in the surrounding Alpine data/template logic. Ensure both table
sections that currently iterate the raw collections are switched to the new
filtered source.
In `@internal/admin/dashboard/templates/sidebar.html`:
- Around line 43-46: The sidebar Failover nav link is still shown and clickable
even when the feature is unavailable. Update the sidebar template’s Failover
entry to use the same availability check as the other failover entry points by
gating the anchor on failoverAvailable, so it is hidden when the API reports
failover is unavailable.
In `@internal/admin/handler_failover.go`:
- Around line 169-172: The `/admin/failover/generate` endpoint is documented as
returning `failover.View`, but it only fabricates draft data and should not use
the persisted-rule shape. Update the handler/Swagger annotations in
`admin/handler_failover.go` to return a dedicated draft DTO for this endpoint,
and adjust the generation path in the failover handler so it serializes only the
draft fields instead of `created_at`/`updated_at` or other stored-view-only
values.
In `@internal/failover/factory.go`:
- Around line 90-98: The failover factory currently reuses the model-cache
refresh ticker for propagation, which can leave peer nodes stale for too long.
Update the Result construction in the factory logic to use a dedicated, short
failover refresh interval or an explicit invalidation mechanism instead of
cfg.Cache.Model.RefreshInterval and the time.Hour fallback. Keep the change
localized around the service.StartBackgroundRefresh call so failover updates
propagate independently from the model cache.
In `@internal/failover/service_test.go`:
- Around line 60-89: The current test only covers the happy-path merge in
TestServiceConfigRulesOverrideDashboardRules; add regression tests for the new
write/error behavior too. Extend coverage around
Service/Refresh/UpsertRule/ResetDashboardRules to verify config-managed rules
cannot be overwritten by dashboard writes, config rules survive
ResetDashboardRules, and storage failures from refresh/upsert are surfaced as
errors instead of being swallowed. Use the existing symbols NewService, Refresh,
Rules, Get, and the managed-source constants to keep the tests aligned with the
failover contract.
In `@internal/failover/service.go`:
- Around line 183-185: The pre-read in `service.go` is swallowing unexpected
`Store.Get` errors and still allowing the write to continue, which can hide
storage failures and overwrite metadata. Update the `existing, err :=
s.store.Get(ctx, normalized.Source)` check in the rule write path to distinguish
`ErrNotFound` from other errors: keep the current `CreatedAt` preservation only
when a record exists, but immediately return any non-`ErrNotFound` error before
proceeding. Ensure the fix is applied in the same update flow that uses
`normalized.CreatedAt` and `s.store.Get`.
In `@internal/failover/store_mongodb.go`:
- Around line 19-32: NewMongoDBStore creates indexes with context.Background(),
so index creation ignores the caller’s cancellation and shutdown state. Update
the MongoDB initialization path to accept and use the caller context in
NewMongoDBStore, and pass that ctx through to coll.Indexes().CreateMany instead
of creating a new timeout context. Keep the existing error handling and index
definitions intact while aligning the MongoDB flow with the PostgreSQL init
path.
---
Outside diff comments:
In `@internal/fallback/resolver_test.go`:
- Around line 55-83: The fallback ranking test for Resolver currently only
validates the first two suggestions, so it can miss regressions where the manual
target is appended later in the slice. Update
TestResolverSuggestFallbacksReturnsRankingCandidates to assert across the full
result from SuggestFallbacks that the manual selector from Resolver.NewResolver
config (the "azure/gpt-4o" entry) never appears, while still preserving the
existing ordering checks for the top ranked auto suggestions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 849c231b-e921-4bda-ac96-a07c5211f4eb
📒 Files selected for processing (36)
.env.templateconfig/config.example.yamlconfig/config.goconfig/config_test.goconfig/fallback.godocs/features/failover.mdxdocs/openapi.jsoninternal/admin/dashboard/static/js/dashboard.jsinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/static/js/modules/workflows-layout.test.cjsinternal/admin/dashboard/static/js/modules/workflows.jsinternal/admin/dashboard/static/js/modules/workflows.test.cjsinternal/admin/dashboard/templates/index.htmlinternal/admin/dashboard/templates/layout.htmlinternal/admin/dashboard/templates/model-table-body.htmlinternal/admin/dashboard/templates/page-failover.htmlinternal/admin/dashboard/templates/page-settings.htmlinternal/admin/dashboard/templates/sidebar.htmlinternal/admin/handler.gointernal/admin/handler_failover.gointernal/admin/handler_test.gointernal/admin/routes.gointernal/admin/routes_test.gointernal/app/app.gointernal/app/app_test.gointernal/failover/factory.gointernal/failover/service.gointernal/failover/service_test.gointernal/failover/store.gointernal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/types.gointernal/fallback/resolver.gointernal/fallback/resolver_test.gointernal/server/http_test.go
| "responses": { | ||
| "200": { | ||
| "description": "OK", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/failover.View" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "400": { | ||
| "description": "Bad Request", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "401": { | ||
| "description": "Unauthorized", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "502": { | ||
| "description": "Bad Gateway", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "503": { | ||
| "description": "Service Unavailable", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Document the 204 No Content upsert path.
internal/admin/handler_failover.go can return 204 after a successful PUT /admin/failover when the refreshed view is unavailable, but this operation only advertises 200. That makes the published API contract narrower than the handler.
Suggested OpenAPI fix
"responses": {
"200": {
"description": "OK",
"content": {
"application/json": {
"schema": {
"$ref": "`#/components/schemas/failover.View`"
}
}
}
},
+ "204": {
+ "description": "No Content"
+ },
"400": {
"description": "Bad Request",
"content": {
"application/json": {
"schema": {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/failover.View" | |
| } | |
| } | |
| } | |
| }, | |
| "400": { | |
| "description": "Bad Request", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "401": { | |
| "description": "Unauthorized", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "502": { | |
| "description": "Bad Gateway", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| } | |
| }, | |
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/failover.View`" | |
| } | |
| } | |
| } | |
| }, | |
| "204": { | |
| "description": "No Content" | |
| }, | |
| "400": { | |
| "description": "Bad Request", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "401": { | |
| "description": "Unauthorized", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "502": { | |
| "description": "Bad Gateway", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| } | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/openapi.json` around lines 952 - 1003, The failover upsert response
contract is missing the successful 204 No Content path. Update the responses for
the PUT /admin/failover operation in docs/openapi.json to include a 204 entry
alongside the existing 200/400/401/502/503 responses, matching the behavior in
internal/admin/handler_failover.go and its refreshed view handling.
| "admin.deleteFailoverRuleRequest": { | ||
| "type": "object", | ||
| "properties": { | ||
| "source": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Mark source as required in both failover request schemas.
Both the delete and upsert handlers reject missing or blank source, but these schemas leave it optional. That weakens generated clients and misstates the server contract.
Suggested OpenAPI fix
"admin.deleteFailoverRuleRequest": {
"type": "object",
+ "required": [
+ "source"
+ ],
"properties": {
"source": {
"type": "string"
}
}
},
@@
"admin.upsertFailoverRuleRequest": {
"type": "object",
+ "required": [
+ "source"
+ ],
"properties": {
"description": {
"type": "string"
},Also applies to: 6957-6976
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/openapi.json` around lines 6775 - 6782, The failover request schemas
currently define source as optional even though the delete and upsert handlers
require it; update both admin.deleteFailoverRuleRequest and
admin.upsertFailoverRuleRequest in the OpenAPI document to mark source as
required. Make sure the required field list matches the server contract so
generated clients treat source as mandatory for both request shapes.
| this.failoverFormManaged = Boolean(rule.managed); | ||
| this.failoverFormOriginalSource = rule.source || ''; | ||
| this.failoverForm = { | ||
| source: rule.source || '', | ||
| targets: (Array.isArray(rule.targets) ? rule.targets : []).join('\n'), | ||
| description: rule.description || '', | ||
| enabled: rule.enabled !== false |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use managed_source consistently for managed-rule state.
This module still keys off rule.managed, but the view contract in this PR uses managed_source. As written, config-managed rules will be labeled as normal dashboard rules here, and the same mismatch leaks into the template’s edit/delete affordances. Gate the UI off managed_source instead.
Also applies to: 297-300
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 82 - 88,
The failover UI is still using rule.managed to determine whether a rule is
config-managed, but this PR’s contract uses managed_source instead. Update the
failover state setup in the failover module to derive the managed flag from
managed_source, and make sure the template logic for the edit/delete affordances
also checks managed_source consistently so managed rules are treated as
non-editable dashboard entries.
| // @Success 200 {array} failover.View | ||
| // @Failure 401 {object} core.GatewayError | ||
| // @Failure 503 {object} core.GatewayError | ||
| // @Router /admin/failover/generate [post] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Don't return draft suggestions as failover.View.
failover.View is the persisted-rule schema, but this endpoint fabricates only a subset of those fields. The generated payload will serialize zero created_at/updated_at values and a stored-view shape for objects that have not been saved yet. Please switch /admin/failover/generate to a dedicated typed draft DTO instead of reusing failover.View.
Suggested direction
+type generateFailoverRuleResponse struct {
+ Source string `json:"source"`
+ Targets []string `json:"targets"`
+ Description string `json:"description,omitempty"`
+ Enabled bool `json:"enabled"`
+}
+
func (h *Handler) GenerateFailoverRules(c *echo.Context) error {
...
- suggestions := make([]failover.View, 0)
+ suggestions := make([]generateFailoverRuleResponse, 0)
...
- suggestions = append(suggestions, failover.View{
- Source: source,
- Targets: targets,
- Enabled: true,
- ManagedSource: failover.ManagedSourceDashboard,
- })
+ suggestions = append(suggestions, generateFailoverRuleResponse{
+ Source: source,
+ Targets: targets,
+ Enabled: true,
+ })
}
return c.JSON(http.StatusOK, suggestions)
}Also applies to: 207-212
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/admin/handler_failover.go` around lines 169 - 172, The
`/admin/failover/generate` endpoint is documented as returning `failover.View`,
but it only fabricates draft data and should not use the persisted-rule shape.
Update the handler/Swagger annotations in `admin/handler_failover.go` to return
a dedicated draft DTO for this endpoint, and adjust the generation path in the
failover handler so it serializes only the draft fields instead of
`created_at`/`updated_at` or other stored-view-only values.
| } | ||
|
|
||
| cfg.DefaultMode = ResolveFallbackDefaultMode(cfg.DefaultMode) | ||
| if !cfg.DefaultMode.Valid() { | ||
| return fmt.Errorf("fallback.default_mode must be one of: auto, manual, off") | ||
| } | ||
|
|
||
| if len(cfg.Overrides) > 0 { |
There was a problem hiding this comment.
default_mode: off no longer disables failover — silent behavior change on upgrade
FEATURE_FALLBACK_MODE=off / default_mode: off was the documented way to globally disable fallback. The old resolver gate was if mode == FallbackModeOff && len(cfg.Manual) == 0 && len(cfg.Overrides) == 0 { return nil }, so with manual rules present, the resolver existed but modeFor() returned FallbackModeOff for every model that lacked an explicit mode: manual override — those models got no fallbacks.
After this PR, Enabled defaults to true and DefaultMode is silently ignored at runtime. Any operator who had default_mode: off (or FEATURE_FALLBACK_MODE=off) together with a populated manual_rules_path (or FALLBACK_MANUAL_RULES_PATH) will silently receive fallbacks for all those models on the next deploy.
loadFallbackConfig processes the deprecated DefaultMode field but never maps FallbackModeOff → Enabled=false. Either add that migration or update the deprecation notice and migration docs to call out this specific case explicitly.
| } | ||
| suggestions = append(suggestions, failover.View{ | ||
| Source: source, | ||
| Targets: targets, | ||
| Enabled: true, | ||
| ManagedSource: failover.ManagedSourceDashboard, | ||
| }) | ||
| } | ||
| return c.JSON(http.StatusOK, suggestions) | ||
| } | ||
|
|
||
| func modelSupportsCategory(meta *core.ModelMetadata, category core.ModelCategory) bool { | ||
| if meta == nil || len(meta.Categories) == 0 { | ||
| return true | ||
| } | ||
| for _, candidate := range meta.Categories { | ||
| if candidate == category { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| func (h *Handler) findFailoverView(source string) (failover.View, bool) { | ||
| for _, view := range h.failoverRules.ListViews() { | ||
| if view.Source == source { | ||
| return view, true | ||
| } | ||
| } | ||
| return failover.View{}, false | ||
| } | ||
|
|
||
| func failoverWriteError(err error) error { | ||
| if errors.Is(err, failover.ErrManaged) { | ||
| return core.NewInvalidRequestError("failover rule is managed by configuration and cannot be changed in the dashboard", err) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/admin/dashboard/static/js/modules/failover.js (1)
34-37: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winClear generated drafts when the failover feature becomes unavailable.
On the 503 and fetch-error paths, this only clears
failoverRules.page-failover.htmlstill rendersfailoverGeneratedRulesindependently, so stale generated drafts remain visible and reopenable after the backend reports the feature as unavailable. ResetfailoverGeneratedRulesin these early-return paths too.Also applies to: 49-53
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 34 - 37, The failover unavailable paths in failover.js only clear failoverRules, leaving stale drafts in failoverGeneratedRules visible in page-failover.html. Update the early-return branches in the fetch response handling (including the 503 and fetch-error paths) to also reset failoverGeneratedRules alongside failoverAvailable and failoverRules. Use the failoverGeneratedRules state in the failover.js module as the target to clear so the UI no longer shows reopenable generated drafts after the feature becomes unavailable.internal/admin/handler_failover.go (1)
69-83: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winNormalize
fallback_modelsbefore storing them.
primary_modelis trimmed and validated, but each fallback selector is persisted verbatim. A payload like[" azure/gpt-4o ", ""]will survive this PUT and store selectors that will never match later. Trim each entry and reject blank targets before building the rule.Suggested fix
source := strings.TrimSpace(req.PrimaryModel) if source == "" { return handleError(c, core.NewInvalidRequestError("primary_model is required", nil)) } + targets := make([]string, 0, len(req.FallbackModels)) + for _, raw := range req.FallbackModels { + target := strings.TrimSpace(raw) + if target == "" { + return handleError(c, core.NewInvalidRequestError("fallback_models must not contain blank entries", nil)) + } + targets = append(targets, target) + } enabled := true if existing, ok := h.failoverRules.Get(source); ok && existing != nil { enabled = existing.Enabled } @@ rule := failover.Rule{ Source: source, - Targets: req.FallbackModels, + Targets: targets, Enabled: enabled, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/handler_failover.go` around lines 69 - 83, The failover rule creation path in handler_failover.go trims and validates primary_model, but stores req.FallbackModels unchanged in failover.Rule.Targets. Normalize each fallback model before constructing the rule by trimming whitespace in the same handler logic, and reject any blank entries with an invalid request error so only valid selectors are persisted.docs/features/failover.mdx (1)
42-52: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winDocument the disable-list controls too.
This page explains manual rule sources, but it never mentions
fallback.disabled_models/FAILOVER_DISABLED_MODELS_JSON, even though both are now part of the exposed config surface inconfig/config.example.yamland.env.template. Without that, operators have no docs path for opting specific selectors out of failover.Suggested addition
For env-only deployments, use inline JSON: @@ `fallback.default_mode` and `FEATURE_FALLBACK_MODE` are deprecated compatibility inputs. They are accepted but ignored by runtime failover. + +To disable failover for specific primary model selectors, use +`fallback.disabled_models` in YAML or `FAILOVER_DISABLED_MODELS_JSON` in env-only +deployments. Example: + +```bash +FAILOVER_DISABLED_MODELS_JSON='["claude-sonnet-4"]' +``` The Dashboard can generate suggested mappings from model metadata, but suggestions must be saved as manual mappings before they affect traffic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/features/failover.mdx` around lines 42 - 52, The failover docs in the failover page are missing the disable-list configuration, so add coverage for fallback.disabled_models and FAILOVER_DISABLED_MODELS_JSON alongside the existing rule source examples. Update the documentation near the inline JSON / environment variable examples to show how operators can opt specific selectors out of failover, and mention that this is part of the supported config surface just like the manual mappings described in the page.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 26-29: The PostgreSQL failover store DDL only covers fresh
installs, so upgraded databases with the old source/targets columns will still
break when FailoverStore reads or upserts failover_rules. Update the store
initialization/migration path in FailoverStore or its schema setup to detect the
legacy layout and run a real ALTER TABLE migration that renames source to
primary_model and targets to fallback_models before any queries use the new
names.
In `@internal/failover/store_sqlite.go`:
- Around line 21-24: The failover_rules initializer in store_sqlite.go only
creates the new table shape for fresh databases, so existing SQLite installs
will keep the old source/targets schema and break later queries. Update the
initialization/migration path in the failover store setup to detect and migrate
the existing failover_rules table before any SELECT or upsert logic runs,
backfilling/renaming source and targets into primary_model and fallback_models
so the new column names are always available.
In `@internal/failover/types.go`:
- Line 17: The BSON tag change on Targets in the failover types breaks decoding
of legacy failover_rules documents that still persist targets, so
MongoDBStore.List/Get will return empty Targets and upserts may rewrite them
under fallback_models. Add backward-compatible read handling in the failover
decoding path (or a migration in MongoDBStore) so both targets and
fallback_models are accepted, and only switch the persisted tag after existing
documents can be read reliably.
---
Outside diff comments:
In `@docs/features/failover.mdx`:
- Around line 42-52: The failover docs in the failover page are missing the
disable-list configuration, so add coverage for fallback.disabled_models and
FAILOVER_DISABLED_MODELS_JSON alongside the existing rule source examples.
Update the documentation near the inline JSON / environment variable examples to
show how operators can opt specific selectors out of failover, and mention that
this is part of the supported config surface just like the manual mappings
described in the page.
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 34-37: The failover unavailable paths in failover.js only clear
failoverRules, leaving stale drafts in failoverGeneratedRules visible in
page-failover.html. Update the early-return branches in the fetch response
handling (including the 503 and fetch-error paths) to also reset
failoverGeneratedRules alongside failoverAvailable and failoverRules. Use the
failoverGeneratedRules state in the failover.js module as the target to clear so
the UI no longer shows reopenable generated drafts after the feature becomes
unavailable.
In `@internal/admin/handler_failover.go`:
- Around line 69-83: The failover rule creation path in handler_failover.go
trims and validates primary_model, but stores req.FallbackModels unchanged in
failover.Rule.Targets. Normalize each fallback model before constructing the
rule by trimming whitespace in the same handler logic, and reject any blank
entries with an invalid request error so only valid selectors are persisted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 111eae64-2273-418b-ba43-1ec136d2ce31
📒 Files selected for processing (14)
.env.templateconfig/config.example.yamldocs/features/failover.mdxdocs/openapi.jsoninternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/templates/layout.htmlinternal/admin/dashboard/templates/page-failover.htmlinternal/admin/handler_failover.gointernal/failover/service.gointernal/failover/store.gointernal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/types.go
| // Rule is one manual failover mapping for a primary model selector. | ||
| type Rule struct { | ||
| Source string `json:"primary_model" bson:"_id"` | ||
| Targets []string `json:"fallback_models" bson:"fallback_models"` |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Keep backward-compatible BSON decoding for existing failover documents.
Changing Targets to bson:"fallback_models" means legacy failover_rules documents that still store targets will decode with empty Targets in MongoDBStore.List/Get, and the next upsert will rewrite them under the new field name. Add a migration or dual-read compatibility path before switching the persisted tag.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/types.go` at line 17, The BSON tag change on Targets in the
failover types breaks decoding of legacy failover_rules documents that still
persist targets, so MongoDBStore.List/Get will return empty Targets and upserts
may rewrite them under fallback_models. Add backward-compatible read handling in
the failover decoding path (or a migration in MongoDBStore) so both targets and
fallback_models are accepted, and only switch the persisted tag after existing
documents can be read reliably.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 40-63: Scope the migration checks in the failover PostgreSQL
migration so they only inspect the active schema. In the migration logic that
renames failover_rules columns in store_postgresql.go, update each
information_schema.columns existence check to include table_schema =
current_schema() alongside table_name and column_name, so the ALTER TABLE
failover_rules statements only depend on columns in the schema being migrated.
In `@internal/failover/store_sqlite_test.go`:
- Around line 51-64: The test in store_sqlite_test.go currently verifies only
Source and Targets, so add assertions for the migrated metadata fields that the
rule maps as well. Update the existing List() result checks in the relevant test
case to validate enabled, managed_source, created_at, and updated_at on the
returned row from store.List, using the same row variable already referenced in
the test so regressions in metadata preservation fail here too.
In `@internal/failover/store_sqlite.go`:
- Around line 104-113: The SQLite migration in the failover store is selecting
the legacy primary key value without normalizing it, so rows can be inserted
with leading/trailing spaces even though lookups trim inputs. Update the
migration query in the failover store logic (the code building the INSERT/SELECT
for failover_rules in store_sqlite.go) so the primary key expression is TRIMmed
before being stored, matching the normalization used by Get and Delete and the
TRIM filter already applied in the source query.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0dd8c12f-4b04-4ba5-a223-ce262e5afac1
📒 Files selected for processing (6)
internal/admin/dashboard/templates/model-table-body.htmlinternal/admin/dashboard/templates/shuffle-icon.htmlinternal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/store_sqlite_test.go
There was a problem hiding this comment.
♻️ Duplicate comments (2)
internal/admin/dashboard/static/js/modules/failover.js (2)
77-82: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUse
managed_sourcefor managed-rule UI state.The API/view contract here exposes
managed_source, notmanaged, so config-managed mappings are still treated as editable dashboard rules and shown asOninstead ofConfig. Normalize the managed flag frommanaged_sourcebeforeopenFailoverEdit()andfailoverRuleStatus()consume it.Also applies to: 363-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 77 - 82, The failover rule UI is reading the wrong field for managed state, so config-managed mappings are treated like editable dashboard rules. Update the failover flow to normalize and use managed_source instead of managed before openFailoverEdit() sets form state and before failoverRuleStatus() renders the status. Make sure the rule object consistently derives failoverFormManaged and the displayed label from managed_source so managed rules show as Config rather than On.
77-85: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRenaming the primary model in edit mode creates a second rule.
failoverFormOriginalSourceis captured but never used on save. The admin API upserts byprimary_model, so editingsourceleaves the old mapping active and persists a new one. Makesourceimmutable while editing, or delete/replace the original key before the PUT.Also applies to: 154-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 77 - 85, The edit flow in openFailoverEdit and the save path in failover.js are allowing primary model renames to create a second rule because failoverFormOriginalSource is captured but never applied. Update the edit/save logic so the original primary_model key is preserved or explicitly deleted/replaced before the PUT, and make the source field immutable while failoverFormMode is edit. Ensure the code paths that build the payload and submit the form use failoverFormOriginalSource consistently when editing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 77-82: The failover rule UI is reading the wrong field for managed
state, so config-managed mappings are treated like editable dashboard rules.
Update the failover flow to normalize and use managed_source instead of managed
before openFailoverEdit() sets form state and before failoverRuleStatus()
renders the status. Make sure the rule object consistently derives
failoverFormManaged and the displayed label from managed_source so managed rules
show as Config rather than On.
- Around line 77-85: The edit flow in openFailoverEdit and the save path in
failover.js are allowing primary model renames to create a second rule because
failoverFormOriginalSource is captured but never applied. Update the edit/save
logic so the original primary_model key is preserved or explicitly
deleted/replaced before the PUT, and make the source field immutable while
failoverFormMode is edit. Ensure the code paths that build the payload and
submit the form use failoverFormOriginalSource consistently when editing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 998194ca-8d28-4db5-8420-b3e20e6a23fe
📒 Files selected for processing (5)
internal/admin/dashboard/static/css/dashboard.cssinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/static/js/modules/failover.test.cjsinternal/admin/dashboard/templates/failover-icon.htmlinternal/admin/dashboard/templates/model-table-body.html
…etail as tabs Capture every external provider call made for a logical request (primary + failover) as a structured attempt, including failed attempts' raw upstream response body and headers, and surface them in the dashboard. Backend: - Record a ProviderAttempt per provider call; attach the raw upstream error body (capped 64 KB) and response headers to GatewayError, then to the attempt. - Persist per-attempt response_body/response_headers, gated by LOG_BODIES / LOG_HEADERS (headers redacted), in audit_log_attempts (SQLite + Postgres, additive ADD COLUMN migrations) and embedded for Mongo; hydrate in readers. - Include a compact attempt summary in the live-log preview so the failover indicator appears without waiting for the persisted detail. Dashboard: - Request / Response(s) are now tabs (default = last valid response), replacing the collapsible panes; each failed attempt gets its own tab with a type pill and status badge. Shared audit-pane is headless; Headers 1/3 / Body 2/3 split. - Collapsed audit row shows a per-attempt pip track; request/response direction icons; the error message opens the conversation preview; tab-switch animation. - Rename the Models page heading "Registered Models" -> "Models". Bump the gateway chat hot-path byte ceiling (14080 -> 14784) for the larger per-request attempt structs (allocation count unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s they happen Provider attempts were only published to the live audit preview once the whole request finished, so a failed primary was invisible in real time while failover was still in flight; the failed attempt only appeared after the request was persisted. Add a context-scoped AttemptObserver that the gateway invokes immediately after recording a failed attempt, which re-publishes the audit entry's attempts live. The observer is installed only when failover targets exist (FallbackSelectors non-empty) and fires only on failures, so non-failover/success requests — the hot path — take on no extra per-request work (allocation count unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…he dashboard A request that failed over (primary failed, failover succeeded) stored the primary as resolved_model/provider instead of the model actually called: the audit middleware re-applies the workflow's planned (primary) resolution after the handler set the real route, clobbering it. Costs were unaffected (priced off the failover model via usagePricingModel), but the audit route was wrong. - middleware: don't overwrite resolved_model/provider/provider_name when a runtime failover was already recorded (entry.Data.Failover set). - dashboard: the audit row shows "requested ⮕ failover-target" using data.failover.target_model (correct even for already-clobbered rows), in addition to the existing alias case. - audit detail: split responses into per-attempt tabs whenever any attempt failed (not only with >1 attempt), so a failed primary shows its own tab with the error while failover is still in flight; a lone response tab drops the seq/type/status chips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- The request/response panel hangs flush off the tab strip (rounded bottom corners, no top border; achieved by dropping the tablist margin rather than a negative panel margin). - Tabs get visible outlines with rounded top corners, a transparent bottom edge, and spacing between them; the active tab opens into the panel. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 17
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/admin/dashboard/static/js/modules/conversation-drawer.js`:
- Around line 47-61: The error preview trigger in handleErrorConversationClick
is mouse-only, so make it keyboard accessible by using a focusable interactive
element or adding the needed accessibility behavior to the existing preview
target. Update the conversation drawer’s error block trigger so it can receive
focus and open on Enter/Space as well as click, while preserving the existing
drag/selection checks and the openConversation call flow.
In `@internal/admin/dashboard/templates/audit-pane.html`:
- Around line 5-9: The error preview in the audit pane is clickable via
handleErrorConversationClick but is not keyboard-accessible. Update the <pre> in
audit-pane.html so the interactive preview can receive focus and be activated
from the keyboard, using canShowConversation(pane.entry) to gate the behavior;
preserve the existing click behavior while adding the appropriate focus/keyboard
handling around pane.errorMessage and handleErrorConversationClick.
In `@internal/admin/dashboard/templates/page-audit-logs.html`:
- Around line 98-126: The audit log tab UI in the tablist/template block is
missing the full accessibility contract, so update the tab buttons and panels
together. In the Alpine loop that renders `auditPanes(entry)`, add explicit
`aria-controls` on each tab button and matching `aria-labelledby` on each
`audit-pane-tabpanel`, using stable IDs derived from `p.id` and the panel/tab
elements. Also add arrow-key handling on the tab buttons so left/right (and
optionally up/down) move focus and activate the next/previous tab by updating
`active` in the same `auditEffectiveTab` flow.
In `@internal/admin/handler_failover_test.go`:
- Around line 1-191: The failover handler tests are missing coverage for the
success/error branches in `UpsertFailoverRule`, `DeleteFailoverRule`, and
`failoverGenerateSource`, so add focused cases to exercise those paths. In
`internal/admin/handler_failover_test.go`, extend the existing
`NewHandler`/`RegisterRoutes` setup with tests that verify upsert returns 200 vs
204 depending on the request/result, delete maps a missing source to the
not-found response, and generate handles invalid JSON bodies. Reuse the existing
`failoverHandlerTestStore`, `newFailoverHandlerTestService`, and route-based
request setup to keep the tests aligned with the handler symbols.
In `@internal/auditlog/auditlog.go`:
- Around line 261-266: The truncation in truncateAttemptErrorMessage is
byte-based and can cut a UTF-8 rune in half, producing invalid text for JSON
storage. Update this helper to truncate on rune boundaries instead of slicing
message directly, keeping the same maxAttemptErrorMessageLength behavior while
ensuring the returned string remains valid UTF-8.
In `@internal/auditlog/middleware.go`:
- Around line 612-615: The direct attempt enrichment path in
EnrichLogEntryWithAttempts currently forwards raw AttemptSnapshot data without
any gating, so update it to either accept a logging Config and apply the same
GateAttemptCapture behavior as the Echo path or otherwise default to stripping
response bodies and headers before storing attempts. Keep the existing
enrichEntryWithAttempts flow, but make sure the wrapper is the only entry point
that can attach attempts to a LogEntry and that opt-in-only fields are removed
unless explicitly enabled.
- Around line 219-237: The audit log enrichment in EnrichEntryWithFailover is
incorrectly using Data.Failover as a proxy for whether the executed route was
recorded, which can suppress ResolvedModel, Provider, and ProviderName. Update
the guard logic in internal/auditlog/middleware.go so each field is only skipped
when its actual executed-route field is already populated, or first hydrate the
missing executed route data from Failover.TargetModel before applying the
workflow-derived values. Keep the existing trimmed workflow fallback behavior,
but base the decision on the concrete route fields rather than
executedRouteRecorded alone.
In `@internal/auditlog/reader_postgresql_test.go`:
- Around line 71-73: The Postgres reader tests are only stubbing the new
audit_log_attempts query as empty, so hydrated attempt results and failure paths
are not covered. Update the test setup around the fakePostgreSQLRows and the
reader behavior in audit_log_attempts so at least one case returns a non-empty
attempt row and another exercises rows.Err() or a query error. Add/adjust
assertions in reader_postgresql_test.go to verify attempt hydration and error
handling through the relevant reader methods.
In `@internal/auditlog/reader_postgresql.go`:
- Around line 252-323: The current loadAttempts helper in PostgreSQLReader
performs one audit_log_attempts query per LogEntry, causing N+1 queries during
GetLogs. Refactor loadAttempts to fetch all attempts for the batch at once using
a single WHERE audit_log_id = ANY(...) query, then group the resulting
AttemptSnapshot rows by audit_log_id and attach them back to each entry via
ensureLogData and normalizeAttemptSnapshots.
- Around line 191-193: In the audit log lookup flow in reader_postgresql.go, the
empty-result path currently treats any false from rows.Next() as “not found,”
which can mask query/cursor errors. Update the logic around rows.Next() in the
relevant reader method to check rows.Err() before returning nil,nil, and only
report a missing audit log entry when there is no rows error; if rows.Err() is
set, propagate that error instead.
In `@internal/auditlog/reader_sqlite.go`:
- Around line 335-417: The loadAttempts method is issuing one audit_log_attempts
query per LogEntry, creating an N+1 pattern when GetLogs hydrates many entries.
Change SQLiteReader.loadAttempts to fetch attempts for all entry IDs in a single
query using an IN (...) filter, then group the scanned AttemptSnapshot values by
audit_log_id and attach them back to each entry via ensureLogData. Keep the
existing scan/normalize behavior, but replace the per-entry loop query in
loadAttempts with batched lookup logic.
- Around line 189-190: The no-row branch in the SQLite reader is too permissive:
in the `rows.Next()` check inside `reader_sqlite.go`, a false result can mean
either no results or an iteration error. Update the read path to inspect
`rows.Err()` before returning `(nil, nil)` so the `read`/query logic
distinguishes an actual “not found” case from a failed iteration, using the
existing `rows` handling in this function.
In `@internal/core/errors.go`:
- Around line 54-56: captureGatewayErrorBody truncates oversized error bodies
without leaving any signal that the data is incomplete. Update the logic in
captureGatewayErrorBody so that when body exceeds maxGatewayErrorBodyBytes, it
also records truncation state or appends a clear truncation marker that
downstream consumers can detect. Keep the existing body limit behavior, but make
sure the returned/recorded error payload explicitly indicates truncation
alongside the body content.
- Around line 39-59: Raw upstream error bodies are being retained without
redaction, which can leak sensitive provider data. Update the error-body capture
path in `captureGatewayErrorBody` and the `CaptureAttemptResponseBody` flow to
scrub or redact sensitive content before persisting it for audit, similar to how
`RedactAttemptResponseHeaders` handles headers. Keep the size cap in place, but
ensure the stored `ResponseBody` is sanitized before it is attached to the
gateway error.
In `@internal/gateway/fallback_test.go`:
- Around line 10-24: The two ShouldAttemptFallback tests duplicate the same
setup/assert pattern, so refactor them into a single table-driven
TestShouldAttemptFallback using subtests. Keep the cases for the
model-availability 404 and the generic endpoint 404, and structure the table so
adding more fallback edge cases from fallback.go is straightforward. Use
core.NewProviderError and ShouldAttemptFallback in the shared loop, and verify
each case with the expected boolean.
In `@internal/gateway/fallback.go`:
- Around line 209-243: The 404 fallback logic in `isFallbackError` is too broad
and can classify unrelated NotFound responses as model fallback. Narrow the
`status == http.StatusNotFound` branch in `internal/gateway/fallback.go` so it
only matches provider-specific model unavailability phrasing, or add explicit
exclusions for generic 404s that merely contain words like “disabled”,
“deprecated”, or “unavailable”. Update the `isFallbackError` matcher and its
tests to cover both the intended `endpoint not found` case and the
false-positive 404 cases.
In `@internal/server/audit_attempts.go`:
- Around line 23-48: Add unit test coverage for the pure mapping in
auditAttemptsFromGateway to prevent drift between gateway.ProviderAttempt and
auditlog.AttemptSnapshot. Create a small table-driven test that covers both the
empty gateway.AttemptsFromContext case and a populated attempt, asserting
field-by-field mapping plus the ResponseBody and ResponseHeaders handling via
auditlog.CaptureAttemptResponseBody and auditlog.RedactAttemptResponseHeaders.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a1ad0e27-90b8-471d-971c-7e3e925e9b90
📒 Files selected for processing (49)
internal/admin/dashboard/static/css/dashboard.cssinternal/admin/dashboard/static/js/dashboard.jsinternal/admin/dashboard/static/js/modules/audit-list.jsinternal/admin/dashboard/static/js/modules/conversation-drawer.jsinternal/admin/dashboard/static/js/modules/dashboard-display.test.cjsinternal/admin/dashboard/static/js/modules/dashboard-layout.test.cjsinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/static/js/modules/failover.test.cjsinternal/admin/dashboard/templates/audit-pane.htmlinternal/admin/dashboard/templates/failover-icon.htmlinternal/admin/dashboard/templates/layout.htmlinternal/admin/dashboard/templates/page-audit-logs.htmlinternal/admin/dashboard/templates/page-models.htmlinternal/admin/dashboard/templates/page-settings.htmlinternal/admin/handler_failover.gointernal/admin/handler_failover_test.gointernal/auditlog/attempt_capture.gointernal/auditlog/attempt_capture_test.gointernal/auditlog/auditlog.gointernal/auditlog/middleware.gointernal/auditlog/middleware_test.gointernal/auditlog/reader_mongodb.gointernal/auditlog/reader_postgresql.gointernal/auditlog/reader_postgresql_test.gointernal/auditlog/reader_sqlite.gointernal/auditlog/store_mongodb.gointernal/auditlog/store_postgresql.gointernal/auditlog/store_sqlite.gointernal/auditlog/store_sqlite_test.gointernal/auditlog/stream_wrapper.gointernal/core/errors.gointernal/failover/factory.gointernal/failover/factory_test.gointernal/gateway/attempts.gointernal/gateway/attempts_test.gointernal/gateway/fallback.gointernal/gateway/fallback_test.gointernal/gateway/inference_execute.gointernal/gateway/inference_prepare.gointernal/live/broker.gointernal/live/broker_test.gointernal/llmclient/client.gointernal/server/audit_attempts.gointernal/server/error_support.gointernal/server/fallback_test.gointernal/server/internal_chat_completion_executor.gointernal/server/messages_handler.gointernal/server/translated_inference_service.gotests/perf/hotpath_test.go
| // ResponseBody and ResponseHeaders carry the raw upstream error response so | ||
| // failed provider attempts can be audited. Never serialized to API clients. | ||
| ResponseBody []byte `json:"-"` | ||
| ResponseHeaders http.Header `json:"-"` | ||
| } | ||
|
|
||
| // maxGatewayErrorBodyBytes caps the raw upstream error body retained for audit. | ||
| const maxGatewayErrorBodyBytes = 64 * 1024 | ||
|
|
||
| // captureGatewayErrorBody returns a bounded copy of an upstream error body so | ||
| // the original buffer is not retained and large bodies cannot bloat memory. | ||
| func captureGatewayErrorBody(body []byte) []byte { | ||
| if len(body) == 0 { | ||
| return nil | ||
| } | ||
| if len(body) > maxGatewayErrorBodyBytes { | ||
| body = body[:maxGatewayErrorBodyBytes] | ||
| } | ||
| out := make([]byte, len(body)) | ||
| copy(out, body) | ||
| return out |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
ast-grep run --pattern 'func CaptureAttemptResponseBody($$$) {
$$$
}' --lang go
rg -n -A 15 'func CaptureAttemptResponseBody' --type=go
rg -n -A 15 'func RedactAttemptResponseHeaders' --type=goRepository: ENTERPILOT/GoModel
Length of output: 156
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== files ==\n'
git ls-files internal/core/errors.go internal/server/error_support.go
printf '\n== outline internal/core/errors.go ==\n'
ast-grep outline internal/core/errors.go --lang go || true
printf '\n== outline internal/server/error_support.go ==\n'
ast-grep outline internal/server/error_support.go --lang go || true
printf '\n== search for GatewayError / AttemptSnapshot / CaptureAttemptResponseBody ==\n'
rg -n -A 20 -B 10 'GatewayError|AttemptSnapshot|CaptureAttemptResponseBody|RedactAttemptResponseHeaders|ResponseBody|ResponseHeaders' internal/core/errors.go internal/server/error_support.goRepository: ENTERPILOT/GoModel
Length of output: 24881
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,220p' internal/core/errors.go | cat -n
printf '\n---\n'
sed -n '1,220p' internal/server/error_support.go | cat -nRepository: ENTERPILOT/GoModel
Length of output: 12478
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== search auditlog body/header helpers ==\n'
rg -n -A 20 -B 5 'auditlog\.(Capture|Redact).*Response|CaptureAttempt|RedactAttempt|ResponseHeaders|ResponseBody' .Repository: ENTERPILOT/GoModel
Length of output: 50375
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== auditlog attempt helpers ==\n'
rg -n -A 20 -B 5 'func (CaptureAttemptResponseBody|RedactAttemptResponseHeaders|CaptureAttemptResponseHeaders|CaptureAttemptResponseBody)|CaptureAttemptResponseBody\(|RedactAttemptResponseHeaders\(' internal/auditlog internal/server
printf '\n== auditlog entry body handling ==\n'
rg -n -A 15 -B 5 'ResponseBody|ResponseHeaders|Redact|Capture' internal/auditlogRepository: ENTERPILOT/GoModel
Length of output: 50375
Raw provider error bodies are stored without content redaction. CaptureAttemptResponseBody keeps the upstream body verbatim as JSON/string, and captureGatewayErrorBody only truncates size; unlike RedactAttemptResponseHeaders, there’s no body-level scrubbing before audit persistence. That can leak prompt fragments or other sensitive provider data into logs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/core/errors.go` around lines 39 - 59, Raw upstream error bodies are
being retained without redaction, which can leak sensitive provider data. Update
the error-body capture path in `captureGatewayErrorBody` and the
`CaptureAttemptResponseBody` flow to scrub or redact sensitive content before
persisting it for audit, similar to how `RedactAttemptResponseHeaders` handles
headers. Keep the size cap in place, but ensure the stored `ResponseBody` is
sanitized before it is attached to the gateway error.
| if len(body) > maxGatewayErrorBodyBytes { | ||
| body = body[:maxGatewayErrorBodyBytes] | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win
Truncation happens silently.
captureGatewayErrorBody truncates bodies over 64KiB with no indication left for downstream consumers (e.g., a truncated flag or marker), so anyone debugging from the audit trail can't tell a captured body is incomplete.
🛡️ Suggested tweak
func captureGatewayErrorBody(body []byte) []byte {
if len(body) == 0 {
return nil
}
+ truncated := len(body) > maxGatewayErrorBodyBytes
if len(body) > maxGatewayErrorBodyBytes {
body = body[:maxGatewayErrorBodyBytes]
}
out := make([]byte, len(body))
copy(out, body)
+ if truncated {
+ // surface truncation to callers, e.g. via a sentinel suffix or a
+ // separate bool return, so audit consumers know data was cut.
+ }
return out
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/core/errors.go` around lines 54 - 56, captureGatewayErrorBody
truncates oversized error bodies without leaving any signal that the data is
incomplete. Update the logic in captureGatewayErrorBody so that when body
exceeds maxGatewayErrorBodyBytes, it also records truncation state or appends a
clear truncation marker that downstream consumers can detect. Keep the existing
body limit behavior, but make sure the returned/recorded error payload
explicitly indicates truncation alongside the body content.
| message := strings.ToLower(strings.TrimSpace(gatewayErr.Message)) | ||
| if !strings.Contains(message, "model") { | ||
| return false | ||
| if strings.Contains(message, "model") { | ||
| for _, fragment := range []string{ | ||
| "not found", | ||
| "does not exist", | ||
| "unsupported", | ||
| "unavailable", | ||
| "not available", | ||
| "deprecated", | ||
| "retired", | ||
| "disabled", | ||
| } { | ||
| if strings.Contains(message, fragment) { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for _, fragment := range []string{ | ||
| "not found", | ||
| "does not exist", | ||
| "unsupported", | ||
| "unavailable", | ||
| "not available", | ||
| "deprecated", | ||
| "retired", | ||
| "disabled", | ||
| } { | ||
| if strings.Contains(message, fragment) { | ||
| return true | ||
| if status == http.StatusNotFound { | ||
| for _, fragment := range []string{ | ||
| "unsupported", | ||
| "unavailable", | ||
| "not available", | ||
| "deprecated", | ||
| "retired", | ||
| "disabled", | ||
| } { | ||
| if strings.Contains(message, fragment) { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return false | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n -B2 -A2 'StatusNotFound' internal/gateway/fallback.go internal/gateway/fallback_test.goRepository: ENTERPILOT/GoModel
Length of output: 1237
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '--- internal/gateway/fallback.go ---\n'
sed -n '200,255p' internal/gateway/fallback.go
printf '\n--- internal/gateway/fallback_test.go ---\n'
sed -n '1,120p' internal/gateway/fallback_test.goRepository: ENTERPILOT/GoModel
Length of output: 247
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '%s\n' '--- internal/gateway/fallback.go ---'
sed -n '200,255p' internal/gateway/fallback.go
printf '%s\n' ''
printf '%s\n' '--- internal/gateway/fallback_test.go ---'
sed -n '1,140p' internal/gateway/fallback_test.goRepository: ENTERPILOT/GoModel
Length of output: 2057
Tighten the 404 fallback matcher. The new status == http.StatusNotFound branch catches provider-specific “not available” phrasing, but it also falls back on unrelated 404s that contain words like disabled, deprecated, or unavailable. The existing test only covers endpoint not found, so add a narrower gate for the 404 path or cover the false-positive cases explicitly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/gateway/fallback.go` around lines 209 - 243, The 404 fallback logic
in `isFallbackError` is too broad and can classify unrelated NotFound responses
as model fallback. Narrow the `status == http.StatusNotFound` branch in
`internal/gateway/fallback.go` so it only matches provider-specific model
unavailability phrasing, or add explicit exclusions for generic 404s that merely
contain words like “disabled”, “deprecated”, or “unavailable”. Update the
`isFallbackError` matcher and its tests to cover both the intended `endpoint not
found` case and the false-positive 404 cases.
| func auditAttemptsFromGateway(ctx context.Context) []auditlog.AttemptSnapshot { | ||
| gatewayAttempts := gateway.AttemptsFromContext(ctx) | ||
| if len(gatewayAttempts) == 0 { | ||
| return nil | ||
| } | ||
| attempts := make([]auditlog.AttemptSnapshot, 0, len(gatewayAttempts)) | ||
| for _, attempt := range gatewayAttempts { | ||
| attempts = append(attempts, auditlog.AttemptSnapshot{ | ||
| Seq: attempt.Seq, | ||
| Kind: attempt.Kind, | ||
| ProviderType: attempt.ProviderType, | ||
| ProviderName: attempt.ProviderName, | ||
| Model: attempt.Model, | ||
| StatusCode: attempt.StatusCode, | ||
| Success: attempt.Success, | ||
| ErrorType: attempt.ErrorType, | ||
| ErrorCode: attempt.ErrorCode, | ||
| ErrorMessage: attempt.ErrorMessage, | ||
| StartedAt: attempt.StartedAt, | ||
| DurationNs: attempt.DurationNs, | ||
| ResponseBody: auditlog.CaptureAttemptResponseBody(attempt.ResponseBody), | ||
| ResponseHeaders: auditlog.RedactAttemptResponseHeaders(attempt.ResponseHeaders), | ||
| }) | ||
| } | ||
| return attempts | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Add unit test coverage for the gateway → audit attempt mapping.
auditAttemptsFromGateway is a pure mapping function (easy to test in isolation) but no corresponding test file is included in this cohort, and the PR comments note broadly low patch coverage across the failover/attempt-tracking changes. A small table-driven test asserting field-by-field mapping (including body/header capture and the empty-attempts short-circuit) would be low-cost and catch future field-mapping drift between gateway.ProviderAttempt and auditlog.AttemptSnapshot.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/audit_attempts.go` around lines 23 - 48, Add unit test
coverage for the pure mapping in auditAttemptsFromGateway to prevent drift
between gateway.ProviderAttempt and auditlog.AttemptSnapshot. Create a small
table-driven test that covers both the empty gateway.AttemptsFromContext case
and a populated attempt, asserting field-by-field mapping plus the ResponseBody
and ResponseHeaders handling via auditlog.CaptureAttemptResponseBody and
auditlog.RedactAttemptResponseHeaders.
…config Resolve the still-valid reviewer comments on the manual-failover feature: - service: Upsert now distinguishes ErrNotFound from other Store.Get errors and propagates the latter, so a transient read fault no longer silently resets created_at via stampUpsert (+ test). - sqlite migration: copy the legacy primary key TRIM()'d so a padded key ' gpt-4o ' migrates to 'gpt-4o' and stays reachable by Get/Delete (which trim input). Extend the migration test to also assert enabled/managed_source/ created_at/updated_at survive and the trimmed key is found. - postgres migration: scope the information_schema.columns probes with table_schema = current_schema() so a same-named table in another schema can't skip the source->primary_model / targets->fallback_models rename. - config: reject FAILOVER_DISABLED_MODELS_JSON=null instead of silently accepting a nil slice (parity with the manual-rules path). - docs: fix the broken failover overview sentence. - dashboard: remove the now-dead failoverFormOriginalSource state (the form source is read-only in edit, so it was never used). Left intentionally unaddressed (rationale): the Mongo context.Background() index init is a pattern shared across ~10 stores (a one-off would be inconsistent); default_mode:off disabling is an intentional deprecation; the /generate failover.View draft shape and the OpenAPI 204/required gaps need the generated docs pipeline; GenerateFailoverRules O(N^2) is an admin-only manual endpoint; the resolver precedence is already neutralized by Service.mergeConfig. The 3 CodeQL "DB query from user-controlled source" flags on store_mongodb.go are false positives (parameterized bson values at a constant _id field). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified the latest CodeRabbit/Greptile comments on PR #444 and applied the ones that hold up against current code: - reader_sqlite/reader_postgresql: check rows.Err() on the empty-result path so a query/iteration failure is no longer reported to callers as "log not found". - middleware enrichEntryWithWorkflow: gate each executed-route field on its own concrete value instead of on the failover snapshot alone. A failover snapshot only proves Failover.TargetModel was set, so the coarse guard could leave resolved_model/provider/provider_name blank when the executed route populated some but not all of them; the workflow's planned value now fills any gap. Behavior is unchanged on the common (non-failover) path. - truncateAttemptErrorMessage: back the cut off to a rune boundary so a multi-byte rune in non-ASCII provider errors is not split into invalid UTF-8. Added table-driven coverage for the truncation helper and a middleware test for the partial-route failover case. Verified and intentionally skipped: - Greptile's two "second look" areas (effectiveManualRules merge order, broadened 404 fallback trigger): both correct. Service.mergeConfig appends config rows unconditionally with an atomic swap, so dynamic rules already carry config-wins values; the 404 path only fires on availability-keyword messages. - EnrichLogEntryWithAttempts re-gating: its sole caller already gates with the real logger Config; re-gating with Config{} would strip opt-in bodies/headers. - Attempt error-body redaction: bodies are opt-in (LOGGING_LOG_BODIES) and stored verbatim system-wide; only credential-bearing headers are redacted. - N+1 attempt hydration, dashboard a11y, extra coverage, table-driven fallback test, truncation marker: valid but deferred as separate follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ew nits Worked through the rest of the latest CodeRabbit review batch on PR #444, fixing the items that hold up against current code: - reader_sqlite/reader_postgresql: batch provider-attempt hydration into a single IN (...) / = ANY($1) query keyed by audit_log_id instead of one query per returned entry. A 100-row dashboard page went from 101 reads to 2. - gateway: convert the ShouldAttemptFallback tests to a table and document the 404 matcher explicitly — availability/deprecated 404s fall back, while route and endpoint-not-found 404s do not (covers the false-positive cases). - dashboard a11y: make the clickable error preview keyboard-operable (role/tabindex + Enter/Space) and complete the request/response tab ARIA contract (id/aria-controls/aria-labelledby, roving tabindex, and arrow + Home/End navigation via a new auditTabKeydown helper). - tests: add coverage for auditTabKeydown, the tab ARIA markup, and the admin failover handler's upsert(200)/delete(204)/delete-missing(404)/generate invalid-body(400) branches. Verified and intentionally skipped: - captureGatewayErrorBody truncation marker: a suffix marker would corrupt the JSON bodies the audit pane parses; a structured flag isn't worth the plumbing. - Raw error-body content redaction: bodies are opt-in (LOGGING_LOG_BODIES) and stored verbatim system-wide; only credential-bearing headers are redacted. - EnrichLogEntryWithAttempts re-gating: its sole caller already gates with the real logger Config; re-gating with Config{} would strip opt-in bodies/headers. - auditAttemptsFromGateway unit test: the gateway attempt recorder has no exported writer, so a focused test would require test-only public API; the mapping is exercised end-to-end. Non-empty Postgres attempt hydration is covered on the SQLite path against a real in-memory DB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 47-56: Normalize legacy PostgreSQL failover rule data during
migration by trimming whitespace from the renamed primary model values after the
existing source-to-primary_model rename in the failover_rules migration. Update
the migration logic in store_postgresql.go alongside the schema rename steps so
upgraded rows match the trimming behavior already used by Get and stay reachable
after upgrade.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5b110a03-47da-48e4-b601-1a0b0c60c503
📒 Files selected for processing (21)
config/fallback.godocs/features/failover.mdxinternal/admin/dashboard/static/js/modules/audit-list.jsinternal/admin/dashboard/static/js/modules/audit-list.test.cjsinternal/admin/dashboard/static/js/modules/dashboard-layout.test.cjsinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/templates/audit-pane.htmlinternal/admin/dashboard/templates/page-audit-logs.htmlinternal/admin/handler_failover_test.gointernal/auditlog/auditlog.gointernal/auditlog/auditlog_test.gointernal/auditlog/middleware.gointernal/auditlog/middleware_test.gointernal/auditlog/reader_postgresql.gointernal/auditlog/reader_sqlite.gointernal/failover/service.gointernal/failover/service_test.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/store_sqlite_test.gointernal/gateway/fallback_test.go
💤 Files with no reviewable changes (1)
- internal/admin/dashboard/static/js/modules/failover.js
Two CI failures on PR #444: - lint: CI pinned golangci-lint v2.10, whose bundled staticcheck mis-analyzes `if x == nil { t.Fatal() }` guards under the repo's Go 1.26.4 and emits false-positive SA5011 warnings (surfaced once this PR invalidated the cached analysis of internal/core). Bump the pin to v2.12, whose staticcheck handles Go 1.26; the full module lints clean locally on v2.12. - Integration Tests: the new audit_log_attempts table has a FK to audit_logs (ON DELETE CASCADE), so the PostgreSQL teardown's `DROP TABLE audit_logs` failed with "other objects depend on it". Drop the child table first and add CASCADE to cover future dependents. The 3 CodeQL go/sql-injection alerts on the new Mongo failover store were dismissed as false positives: the _id values are Go strings marshaled as BSON string values, which cannot become Mongo query operators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/setup_test.go (1)
180-198: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueRedundant ordering comment given
CASCADE.Once
CASCADEis added to theDROP TABLEstatement, Postgres will automatically dropaudit_log_attemptswhenaudit_logsis dropped regardless of list order, making the explicit "listed first" comment slightly inaccurate framing (it's defensive but not load-bearing). Not a functional problem — just a minor clarity nit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/setup_test.go` around lines 180 - 198, The reset-table ordering comment in the integration test setup is misleading now that DROP TABLE uses CASCADE, since the explicit “listed first” framing is no longer necessary. Update the comment near the tables list in setup_test.go to reflect that CASCADE handles dependent tables automatically, or remove the ordering rationale entirely, while leaving the table reset loop and table names unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/integration/setup_test.go`:
- Around line 180-198: The reset-table ordering comment in the integration test
setup is misleading now that DROP TABLE uses CASCADE, since the explicit “listed
first” framing is no longer necessary. Update the comment near the tables list
in setup_test.go to reflect that CASCADE handles dependent tables automatically,
or remove the ordering rationale entirely, while leaving the table reset loop
and table names unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3b635f1-7c7c-4135-b2de-e0432a40180c
📒 Files selected for processing (2)
.github/workflows/test.ymltests/integration/setup_test.go
|
T-Rex pricing update — T-Rex was free through June 2026. Effective July 1, 2026, T-Rex adds 2 credits on top of the standard 1-credit review (3 total). T-Rex settings |
…gnment Two follow-up review comments: - store_postgresql migration: after renaming source -> primary_model, trim padded values with btrim so legacy padded keys stay reachable by Get/Delete (which trim input), matching the SQLite migration contract. - audit-list.js auditTabKeydown: drop the unused `let next = idx` initializer (flagged by CodeQL/code-quality); the switch assigns next on every handled key and unhandled keys return early. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 48-50: The trim migration in the PostgreSQL failover setup is tied
to the source rename flow, so it can be skipped for databases that already have
legacy padded primary_model values. In store_postgresql.go, move the UPDATE ...
btrim(...) logic into its own separate IF EXISTS primary_model block after the
rename handling in the failover migration path, so the trim runs independently
and is idempotent. Use the existing migration logic around the failover_rules
update and the primary_model/source rename checks to keep the behavior
consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 184abf1b-ffb7-4495-a921-52818b8c0b01
📒 Files selected for processing (2)
internal/admin/dashboard/static/js/modules/audit-list.jsinternal/failover/store_postgresql.go
`fallback.overrides` was a deprecated, undocumented per-model map whose only
live effect (mode: off) duplicated `disabled_models`; manual/auto were already
no-ops. Removed the Overrides field, the FallbackModelOverride type, its
load-time normalization/validation, and the disable consumer.
Compatibility: config.yaml is parsed with yaml.v3's lenient Unmarshal, so an
existing `fallback.overrides` block still loads without error — it is simply
ignored. The only behavior change is for configs that used
`overrides: {m: {mode: off}}` to disable failover for a model: that disable is
no longer applied. Migrate those to `disabled_models: [m]` (identical effect).
The field was YAML-only (no env var), so env-based deployments are unaffected.
Tests updated to assert the legacy block now loads and has no effect; the
obsolete overrides duplicate-key test is dropped (the manual-rules duplicate
test still covers that validation path).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/config_test.go`:
- Around line 793-798: The test function in config_test.go defines a local
string named yaml that shadows the imported yaml package used elsewhere in the
file. Rename the local variable in this test to a more specific, idiomatic name
such as yamlConfig or yamlContent, and update its use in the surrounding test
setup so the yaml package identifier remains available if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d7325c4a-0d30-4519-9a03-a63b0604ebaf
📒 Files selected for processing (2)
config/config_test.goconfig/fallback.go
💤 Files with no reviewable changes (1)
- config/fallback.go
Address two follow-up review comments: - store_postgresql migration: move the primary_model btrim into its own `IF EXISTS primary_model` block that runs independently of the source-> primary_model rename. A database already migrated by an earlier, non-trimming version keeps padded keys otherwise; now they are normalized on next startup (and it is a no-op once trimmed). - config_test: rename the local `yaml` string to `yamlData` so it no longer shadows the imported yaml package in the new legacy-overrides test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ResolveFallbacks runs on every dispatch (the live-attempt observer gates on FallbackSelectors), and it called Service.Rules() and Service.Disabled(), each of which cloned every failover rule via List() and rebuilt a map — a per-request allocation proportional to the rule count whenever the failover Service is wired. Precompute the rules and disabled lookup maps once per Refresh and publish them atomically as an immutable *ruleSnapshot. Rules()/Disabled() now return the cached maps (read-only) with no per-call cloning or map building; List() reads the same snapshot's rows. The only callers (resolver effectiveManualRules/ effectiveDisabled) range and copy, so they never mutate the shared maps. Added a regression test asserting the maps are reused across calls and a new snapshot is published only on Refresh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Verification
gomodel/internal/admin/dashboard
gomodel/internal/anthropicapi
gomodel/internal/app
gomodel/internal/auditlog
gomodel/internal/authkeys
gomodel/internal/batch
gomodel/internal/batchrewrite
gomodel/internal/budget
gomodel/internal/cache
gomodel/internal/cache/modelcache
gomodel/internal/conversationstore
gomodel/internal/core
gomodel/internal/embedding
gomodel/internal/failover
gomodel/internal/fallback
gomodel/internal/filestore
gomodel/internal/gateway
gomodel/internal/guardrails
gomodel/internal/httpclient
gomodel/internal/live
gomodel/internal/llmclient
gomodel/internal/modeldata
gomodel/internal/modelselectors
gomodel/internal/observability
gomodel/internal/pricingoverrides
gomodel/internal/providers
gomodel/internal/providers/anthropic
gomodel/internal/providers/azure
gomodel/internal/providers/bailian
gomodel/internal/providers/bedrock
gomodel/internal/providers/deepseek
gomodel/internal/providers/gemini
gomodel/internal/providers/googlecommon
gomodel/internal/providers/groq
gomodel/internal/providers/minimax
gomodel/internal/providers/ollama
gomodel/internal/providers/openai
gomodel/internal/providers/opencodego
gomodel/internal/providers/openrouter
gomodel/internal/providers/oracle
gomodel/internal/providers/vertex
gomodel/internal/providers/vllm
gomodel/internal/providers/xai
gomodel/internal/providers/xiaomi
gomodel/internal/providers/zai
gomodel/internal/realtime
gomodel/internal/responsecache
gomodel/internal/responsestore
gomodel/internal/server
gomodel/internal/storage
gomodel/internal/storage/sqlutil
gomodel/internal/streaming
gomodel/internal/usage
gomodel/internal/validation
gomodel/internal/version
gomodel/internal/virtualmodels
gomodel/internal/workflows ./config/...
Summary by CodeRabbit
FAILOVER_ENABLED.fallback.enabled, JSON-based rules, and disabled-model controls).