feat(history): inline Retry button + retry-{any,own} RBAC + linkage + threshold + ops-hint (closes #47) by cristim · Pull Request #168 · LeanerCloud/CUDly

cristim · 2026-04-27T23:18:31Z

Summary

Closes #47.

Adds an inline Retry button to failed Purchase History rows, mirroring the cancel-button infrastructure landed in #145. Three reviewer-confirmed design choices (yes / yes / yes) are implemented:

Linkage — failed→retry chain via retry_execution_id (self-FK on purchase_executions) + retry_attempt_n counter. The History UI renders inline lineage links so users can navigate predecessor↔successor without scrolling.
Threshold soft-block — retry_attempt_n >= 5 rejects with a structured 409; the frontend confirms-with-warning then re-POSTs ?force=true. Prevents an obviously-stuck deployment from accumulating dozens of dead retry rows.
Ops hint — failures whose error matches a known-persistent-misconfig pattern (FROM_EMAIL not configured, SES sandbox, etc.) replace the Retry button entirely with an operator-actionable badge — retrying without changing config is guaranteed to fail again.

Salvages a previously-stalled agent attempt: rebased onto current feat/multicloud-web-frontend tip (which now includes #145 + the created_by_user_id migration 000041). The retry RBAC verbs (retry-own / retry-any) are added alongside the cancel verbs, with DefaultUserPermissions() granting retry-own:purchases to every authenticated user.

Notable details

Slice-aliasing fix. The new execution's Recommendations slice is now a defensive deep copy of the failed row's, so the downstream purchase pipeline (internal/purchase/execution.go) can't mutate the historical row's in-memory representation through the shared backing array. DB rows were already isolated; this closes the in-process aliasing footgun.
clientError grew a details map so the API can surface ops_hint / retry_attempt_n / threshold as structured JSON fields rather than substring-matched message text. The response writer flattens those alongside error in the body.
Already-retried 409 with the descendant ID prevents a stale-cache double-tap from silently overwriting the linkage pointer and orphaning a chain.
Tx ordering in persistRetryExecution guarantees the FK constraint: successor INSERT → predecessor UPSERT → suppression INSERTs in one tx; a crash mid-flow leaves either everything or nothing.
Migration 000042 adds nullable retry_execution_id (self-FK ON DELETE SET NULL) + retry_attempt_n INT NOT NULL DEFAULT 0 + a partial index on the FK. Legacy rows without backfill look like fresh first-retry candidates.

Test plan

go build ./... clean
go test ./internal/api/... ./internal/auth/... ./internal/config/... — 14 new retry tests + bumped permission counts
npx tsc --noEmit clean
npx jest — 1337/1337 frontend tests pass, 11 new retry-button tests
npm run build — webpack production build clean
gocyclo / gofmt / go vet pre-commit hooks pass
3 consecutive clean review passes (Completeness / Correctness / Security / Bugs / Duplication)
Deployed-environment smoke after merge (UI: visible Retry on a failed row; click → confirm → toast + reload + descendant created with retry_attempt_n=1)

Salvage notes

The previous agent stalled mid-3-pass-review with the work uncommitted in the worktree. This PR:

Stashed the inherited changes,
Rebased the branch onto the current integration tip (no conflicts — auto-merge resolved cleanly),
Fixed the previous agent's flagged slice-aliasing concern (real bug — would have corrupted in-memory historical row state),
Fixed a stale comment on persistentFailureHints (case-INsensitive, not case-sensitive),
Refactored retryPurchase cyclomatic complexity from 20 → ≤10 by extracting loadAndValidateRetryRequest, checkRetryRateGates, and persistRetryExecution,
Re-ran the 3-pass review until clean.

Trigger CodeRabbit:

@coderabbitai review

Summary by CodeRabbit

New Features
- Users can now retry failed purchase executions directly from the history view.
- Inline Retry buttons appear only for authorized users on eligible failed rows.
- Force-override capability available after reaching the retry attempt threshold.
- Displays retry lineage and provenance information for traceability.
- Shows operator guidance for known persistent misconfiguration issues instead of retry option.

…hreshold + ops-hint Closes #47. Salvages a previously-stalled agent attempt: the work was implemented but never committed; rebased onto current feat/multicloud-web-frontend tip which now includes PR #145 (cancel button + cancel-{any,own} RBAC + migration 000041_purchase_executions_created_by_user_id). The retry plumbing mirrors the cancel pattern row-for-row: same session-validation gate, same authorize-{any,own} matrix, same legacy-NULL-creator semantics for non-admins. Backend ======= * New POST /api/purchases/retry/{id} (AuthUser). Creates a new pending execution from a *failed* row's stored Recommendations and links the predecessor with retry_execution_id. Three writes happen in one tx (successor INSERT, predecessor UPSERT, suppression INSERT) so a crash mid-flow leaves either everything or nothing. * RBAC: ActionRetryOwn / ActionRetryAny in internal/auth/types.go, retry-own:purchases added to DefaultUserPermissions(). Admins short-circuit, retry-any holders may retry any row, retry-own holders may retry only their own. Test matrix mirrors cancel. * Already-retried guard: a row whose retry_execution_id is already set rejects with 409 + retry_execution_id detail so a stale-cache double-tap can't silently overwrite the lineage pointer. * Persistent-failure block (Q3): handler resolves an ops_hint from a small map of known-persistent failure substrings (FROM_EMAIL not configured, SES sandbox, etc.) and returns 409 with the hint in structured detail fields. The History row also surfaces the same hint inline, replacing the Retry button entirely. * Threshold soft-block (Q2): retry_attempt_n on the new row is predecessor.retry_attempt_n + 1; an attempt that would land at >= 5 is rejected with 409 + retry_attempt_n / threshold details. The frontend confirms-with-warning then re-POSTs with ?force=true to override. * Migration 000042 adds nullable retry_execution_id (self-FK ON DELETE SET NULL) and retry_attempt_n INT NOT NULL DEFAULT 0, plus a partial index on the FK for the rare populated case. store_postgres.go INSERT/UPSERT/SELECT cover the new columns; retry_execution_id is updateable via ON CONFLICT (the retry handler re-saves the predecessor to set the pointer), but created_by_user_id and retry_attempt_n stay INSERT-only so the scheduler's status-only UPSERTs don't rewrite provenance. * clientError grew an optional details map so ops_hint / retry_attempt_n / threshold can reach the frontend as structured JSON fields instead of substring-matched message text. Frontend ======== * History column doubles as the per-row action surface (renderActionCell): Cancel on cancellable pending rows (#46), Retry on retryable failed rows (this change), ops-hint badge on persistent failures, and inline lineage cross-links on retried rows. Status-driven mutually-exclusive decision tree. * canRetryFailedRow gates the button: status=failed AND no ops_hint AND no retry_execution_id (already retried) AND admin OR matching created_by_user_id. UX gate only — backend remains authoritative. * Threshold-reached rows render an "⚠ Retried 5× — click to override" button; the click confirms then POSTs with force=true. * RetryPurchaseResult exported from frontend/src/api so callers can consume execution_id / retry_attempt_n. Slice-aliasing fix ================== The retry handler's new execution borrowed failedExec.Recommendations by slice header. Mutations downstream (internal/purchase/execution.go writes exec.Recommendations[i].{Error,Purchased,PurchaseID}) would have seen through the alias and corrupted the historical failed row's in-memory representation visible to any caller still holding the failedExec pointer. Fixed via defensive copy (append([]RecommendationRecord(nil), failedExec.Recommendations...)) with a comment explaining the in-process aliasing footgun. Tests ===== * 14 new backend table tests covering admin/retry-any/retry-own paths, non-failed rejection, legacy-null-creator non-admin rejection, persistent-failure block + no-match, threshold block + force override, just-under-threshold allow, already-retried 409, missing-session rejection. * 11 new frontend jest tests covering button visibility per role, ops-hint replacement, threshold-button text + force=true wiring, lineage link rendering, retry_attempt_n badge, declined dialog skips API, accepted posts + reloads + toasts, API failure surfaces toast and re-enables button. * Auth permission-count tests bumped 7→8 (default user) and 9→10 (user + 2 groups) to account for the new retry-own grant.

coderabbitai · 2026-04-27T23:18:37Z

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 10197c71-bb4d-445c-b4d3-f191dd32d384

📥 Commits

Reviewing files that changed from the base of the PR and between e686840 and aa3d77d.

📒 Files selected for processing (7)

frontend/src/api/client.ts
frontend/src/api/types.ts
frontend/src/history.ts
internal/api/handler_purchases.go
internal/api/handler_purchases_test.go
internal/api/handler_router.go
internal/config/store_postgres_pgxmock_test.go

📝 Walkthrough

Walkthrough

Introduces a complete "retry failed purchase execution" feature across frontend and backend. Adds UI gating and button interaction in the history table, implements a new authenticated retry API endpoint with authorization, detects persistent operator-fixable errors, creates successor executions via transactional logic, extends database schema with retry linkage metadata, and adds authorization rules and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Frontend Types & API `frontend/src/types.ts`, `frontend/src/api/purchases.ts`, `frontend/src/api/index.ts`	Added `retry_execution_id`, `retry_attempt_n`, and `ops_hint` fields to `HistoryPurchase`; introduced `retryPurchase` function calling `POST /purchases/retry/{executionId}` endpoint and `RetryPurchaseResult` type to handle retry response payload.
Frontend UI & Interaction `frontend/src/history.ts`	Implements retry button rendering with authorization gating (`canRetryFailedRow`), threshold-based override flow at `RETRY_ATTEMPT_N=5`, action cell routing for Retry/Cancel/ops-hint badge/lineage links, click handling with confirmation dialogs, API calls with force flag, error/success toasts, and list reload.
Frontend Tests `frontend/src/__tests__/history-retry-button.test.ts`	Comprehensive Jest test suite validating retry button visibility, threshold override flow, lineage links, ops-hint badge suppression, permission checks, confirmation dialogs, API integration, error handling, and toast notifications.
Backend API & Error Handling `internal/api/handler_router.go`, `internal/api/handler.go`	Extended `clientError` with structured `details` metadata map and `Details()` method; added `NewClientErrorWithDetails` constructor to include machine-readable fields in error responses without message parsing.
Backend Retry Handler `internal/api/handler_purchases.go`, `internal/api/handler_purchases_test.go`	Implements `retryPurchase` with operator-fixable failure detection (`resolveOpsHint`), authorization via `retry-own`/`retry-any` RBAC, retry-attempt soft-block at threshold, successor execution creation with deep-copied recommendations and incremented `RetryAttemptN`, original row linkage via `retry_execution_id`, transactional writes, and comprehensive test coverage for permissions, states, thresholds, and error cases.
Backend History & Routing `internal/api/handler_history.go`, `internal/api/router.go`	Updated history projection to copy `RetryExecutionID` and `RetryAttemptN` metadata; computes `OpsHint` at read time for failed rows via `resolveOpsHint`; added authenticated `POST /api/purchases/retry/` endpoint with path parameter forwarding.
Authorization & Permissions `internal/auth/types.go`, `internal/auth/types_test.go`, `internal/auth/service_test.go`, `internal/auth/service_group_test.go`	Added `ActionRetryOwn` and `ActionRetryAny` constants; updated `DefaultUserPermissions()` to grant all authenticated users `retry-own:purchases`; adjusted test expectations to account for new permission entry (+1 to counts across test suites).
Database Schema & Migration `internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.up.sql`, `internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.down.sql`	Up migration adds `retry_execution_id` (nullable UUID FK with `ON DELETE SET NULL`) and `retry_attempt_n` (non-null int, default 0) columns to `purchase_executions`; creates partial index on `retry_execution_id`; down migration reverses additions.
Database Store & Types `internal/config/store_postgres.go`, `internal/config/store_postgres_pgxmock_test.go`, `internal/config/types.go`	Extended `PurchaseExecution` with `RetryExecutionID` and `RetryAttemptN` persistence; extended `PurchaseHistoryRecord` with `RetryExecutionID`, `RetryAttemptN`, and `OpsHint` synthesis; updated `SavePurchaseExecutionTx` to INSERT/UPDATE retry columns and all query methods to SELECT/scan new fields; adjusted mock test expectations for column alignment.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as Frontend UI
    participant API as API Handler
    participant DB as Database
    participant Email as Email Service

    User->>Frontend: Click Retry button on failed row
    Frontend->>Frontend: Show confirmation dialog
    User->>Frontend: Accept confirmation
    Frontend->>Frontend: Disable button, start loading
    
    alt Is force override required?
        Frontend->>API: POST /purchases/retry/{id}?force=true
    else Normal retry
        Frontend->>API: POST /purchases/retry/{id}
    end
    
    API->>DB: Lookup original execution
    API->>DB: Validate state (failed, not already retried)
    API->>DB: Resolve ops_hint from error
    
    alt ops_hint exists (persistent error)
        API-->>Frontend: HTTP 409 with ops_hint detail
        Frontend->>Frontend: Show error toast, enable button
    else force=false & threshold exceeded
        API-->>Frontend: HTTP 409 with retry_attempt_n detail
        Frontend->>Frontend: Show error toast, enable button
    else Authorization denied
        API-->>Frontend: HTTP 403
        Frontend->>Frontend: Show error toast
    else Success path
        API->>DB: Start transaction
        API->>DB: Create successor execution (pending, incremented RetryAttemptN)
        API->>DB: Update original with retry_execution_id → successor id
        API->>DB: Insert suppressions for successor
        API->>DB: Commit transaction
        API->>Email: Send approval email for successor
        API-->>Frontend: HTTP 200 with response (new execution id, metadata)
        Frontend->>Frontend: Show success toast
        Frontend->>Frontend: Call loadHistory() to refresh table
        Frontend->>API: GET /history
        API->>DB: Fetch updated executions with retry metadata
        API-->>Frontend: Return rows with retry linkage
        Frontend->>Frontend: Re-render history table with updated rows
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A retry so clever, we've sewn it with care,
From frontend to database, retry's everywhere!
With threshold-check precision and ops-hint so bright,
Failed purchases bounce back—retry done right! ✨
Five times, then force, through the flow they all dance,
One hop closer to victory, purchase's second chance!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically summarizes the main change: adding an inline Retry button with RBAC, linkage, threshold, and ops-hint features to the history feature.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/history-retry-button

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cristim · 2026-04-27T23:18:38Z

@coderabbitai review

coderabbitai · 2026-04-27T23:18:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

cristim · 2026-04-27T23:20:55Z

@coderabbitai review

coderabbitai · 2026-04-27T23:21:00Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

internal/config/store_postgres_pgxmock_test.go (1)

368-381: Consider asserting the new retry fields directly.

The mock rows now match the expanded projection, but explicit checks for RetryExecutionID and RetryAttemptN would make these tests catch scan-order regressions in the new columns too.

Also applies to: 399-413

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/config/store_postgres_pgxmock_test.go` around lines 368 - 381, Add
explicit assertions in the test that verify the scanned values for the new retry
fields to catch scan-order regressions: after the row is added via
pgxmock.NewRows(cols).AddRow(...) and the result is scanned into the execution
struct (the code paths that currently assert other fields like Status,
StepNumber, etc.), add checks that execution.RetryExecutionID (or the field name
used in your struct) equals the expected nil/empty value and
execution.RetryAttemptN equals the expected numeric value (0 in this mock); do
this for the two locations mentioned around the cols/AddRow block and the
similar block at lines ~399-413 so the test explicitly validates both
RetryExecutionID and RetryAttemptN.

internal/api/handler_router.go (1)

47-51: Defensively copy details to avoid shared-map mutation leaks.

map[string]any is reference-typed; returning/storing it directly allows external mutation of error payload state. Cloning in constructor/accessor makes this safer and more predictable.

♻️ Suggested hardening

 func (e *clientError) Details() map[string]any { return e.details }
 
+func cloneDetails(in map[string]any) map[string]any {
+	if in == nil {
+		return nil
+	}
+	out := make(map[string]any, len(in))
+	for k, v := range in {
+		out[k] = v
+	}
+	return out
+}
+
-func NewClientErrorWithDetails(code int, message string, details map[string]any) error {
-	return &clientError{message: message, code: code, details: details}
+func NewClientErrorWithDetails(code int, message string, details map[string]any) error {
+	return &clientError{message: message, code: code, details: cloneDetails(details)}
 }

-func (e *clientError) Details() map[string]any { return e.details }
+func (e *clientError) Details() map[string]any { return cloneDetails(e.details) }

Also applies to: 63-65

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_router.go` around lines 47 - 51, Return a defensive copy
of the details map instead of the internal reference: in clientError.Details()
clone the e.details map into a new map and return that copy, and likewise make a
copy when assigning/storing details in the clientError constructor/initializer
(any function that constructs or sets clientError.details, e.g., NewClientError
or the clientError literal assignment) to avoid external code mutating the
internal map; perform a shallow copy of keys/values (map[string]any) since
values are interface{}.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/history.ts`:
- Around line 598-602: The catch branch currently only shows err.message which
loses structured retry info; update the catch handling for retryError in
frontend/src/history.ts to cast/parses retryError as any and extract backend
retry fields (ops_hint, retry_attempt_n, threshold) and include them in the
toast (e.g., append ops_hint and attempt/threshold details to the message or
pass them in toast metadata) while still falling back to err.message/'unknown
error'; keep re-enabling the button (btn.disabled = false) and use the same
showToast call to surface the structured retry guidance to the user.

In `@internal/api/handler_purchases.go`:
- Around line 651-663: When creating the retry successor PurchaseExecution (the
newExecution variable of type config.PurchaseExecution), preserve the original
plan-scoped metadata by copying PlanID and StepNumber from the failed execution
(failedExec) into the newExecution struct; update the initializer to include
PlanID: failedExec.PlanID and StepNumber: failedExec.StepNumber so the retried
execution remains associated with the same plan and ramp step.
- Around line 581-593: The guard that returns the retry_execution_id leaks
cross-user metadata because it runs before RBAC; move the call to
authorizeSessionRetry(ctx, session, failedExec) to execute before the
already-retried check (the block that inspects failedExec.RetryExecutionID and
returns NewClientErrorWithDetails with retry_execution_id) so authorization is
enforced prior to revealing RetryExecutionID/descendant info; update the logic
around failedExec.RetryExecutionID, failedExec.ExecutionID, and the
NewClientErrorWithDetails call accordingly to ensure no retry_execution_id is
returned unless authorizeSessionRetry has succeeded.

---

Nitpick comments:
In `@internal/api/handler_router.go`:
- Around line 47-51: Return a defensive copy of the details map instead of the
internal reference: in clientError.Details() clone the e.details map into a new
map and return that copy, and likewise make a copy when assigning/storing
details in the clientError constructor/initializer (any function that constructs
or sets clientError.details, e.g., NewClientError or the clientError literal
assignment) to avoid external code mutating the internal map; perform a shallow
copy of keys/values (map[string]any) since values are interface{}.

In `@internal/config/store_postgres_pgxmock_test.go`:
- Around line 368-381: Add explicit assertions in the test that verify the
scanned values for the new retry fields to catch scan-order regressions: after
the row is added via pgxmock.NewRows(cols).AddRow(...) and the result is scanned
into the execution struct (the code paths that currently assert other fields
like Status, StepNumber, etc.), add checks that execution.RetryExecutionID (or
the field name used in your struct) equals the expected nil/empty value and
execution.RetryAttemptN equals the expected numeric value (0 in this mock); do
this for the two locations mentioned around the cols/AddRow block and the
similar block at lines ~399-413 so the test explicitly validates both
RetryExecutionID and RetryAttemptN.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0a6bfe37-5731-4364-9ce9-b79a8e5edf5e

📥 Commits

Reviewing files that changed from the base of the PR and between 5520317 and e686840.

📒 Files selected for processing (20)

frontend/src/__tests__/history-retry-button.test.ts
frontend/src/api/index.ts
frontend/src/api/purchases.ts
frontend/src/history.ts
frontend/src/types.ts
internal/api/handler.go
internal/api/handler_history.go
internal/api/handler_purchases.go
internal/api/handler_purchases_test.go
internal/api/handler_router.go
internal/api/router.go
internal/auth/service_group_test.go
internal/auth/service_test.go
internal/auth/types.go
internal/auth/types_test.go
internal/config/store_postgres.go
internal/config/store_postgres_pgxmock_test.go
internal/config/types.go
internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.down.sql
internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.up.sql

Three actionable findings from CR's first review (PR #168): 1. **Major / security**: authorize before the already-retried guard. The `failedExec.RetryExecutionID` 409 ran ahead of `authorizeSessionRetry`, so any signed-in user who could guess a failed-row UUID would learn it had been retried AND get the descendant execution ID — a cross-user info leak. Reordered the gate sequence so RBAC runs first; the descendant UUID is now surfaced only after the caller has proven they're allowed to act on the row. New regression test `TestHandler_retryPurchase_AlreadyRetried_RBACBeforeLeak` asserts the 403 path doesn't carry retry_execution_id in the details map. 2. **Major / correctness**: preserve PlanID + StepNumber on retry successors. Without this, retrying a failed planned execution dropped the new row out of plan-scoped history and lost its ramp-step attribution. Both fields now propagate from the predecessor; for ad-hoc executions PlanID="" and StepNumber=0 so propagation is a no-op. New regression test `TestHandler_retryPurchase_PreservesPlanMetadata` asserts the propagation. 3. **Minor / UX**: surface structured retry hints in the failure toast. The catch path used to discard `ops_hint` / `retry_attempt_n` / `threshold` from a 4xx body and toast only `err.message`, leaving the user with a generic failure instead of the operator-actionable hint or override guidance the backend already returns. To wire this up: - `ApiError` gained an optional `details: Record<string, unknown>` field (frontend/src/api/types.ts). - `apiRequest` now extracts ALL non-`error` keys from a 4xx body into `error.details` (frontend/src/api/client.ts) so callers can branch on machine-readable hints without parsing message text. - The retry catch in history.ts inspects `err.details.ops_hint` first, then falls back to a "already retried n× (threshold k) — confirm the override prompt to force" message synthesised from `retry_attempt_n` + `threshold`, then to `err.message`. Plus two nits from the same review: 4. **Defensive copy of clientError details map**. The previous shape stored the constructor's input map by reference; a caller mutating the map after construction would reach into the error's payload. `cloneDetails` shallow-copies on both construction and `Details()` read so the error is sealed. 5. **Explicit retry-field assertions in pgxmock GetExecutionByID tests**. Two new assertions (one for the NULL/0 legacy-row case, one for a populated chain-position-2 case) guard against scan- order regressions in the new `retry_execution_id` / `retry_attempt_n` columns. A column reorder upstream would scan into the wrong destination and fail at least one of these. All 1337 frontend tests + ~16 backend retry/auth/config tests pass. gocyclo / gofmt / go vet pre-commit hooks pass.

cristim · 2026-04-28T00:05:13Z

@coderabbitai review

coderabbitai · 2026-04-28T00:05:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

cristim · 2026-04-28T14:14:50Z

P2: substantial feature (retry failed purchases with RBAC, linkage, threshold, ops-hint). CI green, mergeable, 1337 frontend tests pass. Good overall but CodeRabbit flagged docstring coverage (28.57% vs 80% threshold) — worth addressing before merge. Backend-side save guard for invalid combos noted as out of scope; worth tracking as a follow-up issue. (triage agent wave2-F)

…205) (#206) * fix(db): idempotently restore retry_execution_id column dropped by migration drift (closes #204, closes #205) Production /api/purchases/* surfaced 500 errors after PR #161 deployed, with the underlying CloudWatch error: ERROR: column "retry_execution_id" does not exist (SQLSTATE 42703) User-facing symptoms (dashboard "Upcoming Scheduled Purchases" panel): * "Failed to load purchase details: Internal server error" (#205) * "Failed to cancel purchase" (#204) Both buttons hit `Handler.GetExecutionByID` whose SELECT references the retry_execution_id and retry_attempt_n columns introduced by PR #168. Root cause: migration version drift. * PR #189 added 000042_recommendations_add_engine_to_key earlier on the same day. Deployed Lambdas applied it as version 42. * PR #168 then added 000042_purchase_executions_retry_linkage, colliding on version 42. * PR #195 resolved the on-disk duplicate by renaming RECOMMENDATIONS to 000043 — the wrong direction. Deployed DBs already had schema_migrations.version=42 marked as the engine content, so on the next migrate.Up() they: - skipped on-disk 000042 (now retry_linkage) entirely (DB version >= 42 already) - applied on-disk 000043 (now engine again) as a no-op via the ADD COLUMN IF NOT EXISTS guards already in that file Net effect: the retry_execution_id / retry_attempt_n columns were never added on prod, but the Go code expects them. Forward-only corrective. The new 000044 replays 000042's schema additions wrapped in IF NOT EXISTS / CREATE INDEX IF NOT EXISTS so it's a no-op on any DB that already has the columns (fresh deploys, manually-fixed envs) and additive on the broken ones. We don't try to repair schema_migrations history — golang-migrate only cares about the high-water version, and we can't tell from outside whether a given environment skipped 42 or applied 43 twice. The migration up + down sql files document the incident and the edge cases inline so future readers don't have to re-derive the history from the git log. Closes #204 Closes #205 * fix(db): repair partially-fixed DBs in 000044, not just no-op (CR pass 1) CR pass 1 on PR #206 flagged that the original 000044 only added columns when missing — a manually-fixed DB with the column but no FK, or with retry_attempt_n nullable, would still be broken after this migration. Rewrite as five idempotent steps that each check pg_catalog / information_schema for the precise piece they own, so every reachable partial state is repaired: 1. Add retry_execution_id column if missing. 2. Add the FK on retry_execution_id -> execution_id if no FK on that column exists. Match by column relationship (pg_constraint join pg_attribute) NOT by constraint name, so a fresh-deploy DB whose 000042 ran cleanly with the auto-generated `purchase_executions_retry_execution_id_fkey` is recognised as already-FK'd and we don't add a second. 3. Add retry_attempt_n column if missing (as nullable, so step 4 can backfill before flipping to NOT NULL). 4. Backfill any NULLs and SET DEFAULT 0 / SET NOT NULL unconditionally — Postgres treats those as no-ops when the constraint is already as requested. 5. Create the partial index if missing. Down stays simple: DROP COLUMN ... CASCADE removes the FK regardless of its name. Comments inline document each reachable partial state and why each step is safe to re-run. Migration is a no-op on healthy DBs, fully repairing on broken ones.

… RBAC (#286) (#299) * feat(api,history): in-dashboard purchase approval + approve-{any,own} RBAC (closes #286) Adds an inline Approve button on pending Purchase History rows + the session-authed backend dispatch that lets admins (and `approve-own` holders for their own pending rows) approve without round-tripping to the SES email. Mirrors the same dual-auth shape as the Cancel button (PR #145) and Retry button (PR #168). Backend ------- * `internal/auth/types.go`: new constants `ActionApproveOwn` / `ActionApproveAny` next to the existing cancel-/retry- block. `DefaultUserPermissions()` adds `approve-own:purchases` so every authenticated user can approve pending executions they themselves created. `approve-any` exists for future operator roles; no default non-admin grant. * `internal/api/handler_purchases.go::approvePurchase` refactored into the same three-mode dispatch as `cancelPurchase`: 1. Session present + RBAC matches → session-authed approve via `approvePurchaseViaSession` regardless of token. 2. token != "" → legacy email-link path, unchanged. 3. token == "" → session-authed dashboard branch. Permission-denied (403) on the session check falls through to the token branch so a logged-in non-admin who is the per-account contact_email recipient can still approve via the email link. * New `approvePurchaseViaSession` mirrors `cancelPurchaseViaSession` minus the suppression-cleanup tx (approve doesn't drop suppressions — those persist until cancel/expiry). Status flip + `ApprovedBy = session.Email` stamp + SavePurchaseExecution. * New `authorizeSessionApprove` mirror of `authorizeSessionCancel`: admin → ok; else `approve-any` → ok; else `approve-own` AND `created_by_user_id == session.UserID` → ok; else 403. No new migration: existing `purchase_executions.approved_by` (TEXT, migration 000035) and `created_by_user_id` (UUID FK, migration 000041) are sufficient — the email field stamps the actor and the UUID drives the own-ness check. Frontend -------- * `frontend/src/api/purchases.ts`: new `approvePurchase(executionId)` caller wrapping `POST /purchases/approve/{id}` with bearer-session auth (no token in URL — backend's session-first dispatch picks the auth path). * `frontend/src/api/index.ts`: re-export the new function from the api barrel. * `frontend/src/history.ts`: - new `canApprovePendingRow` predicate mirroring `canCancelPendingRow` (admin → yes; non-admin → only own pending rows; legacy null-creator rows out of reach via this UI). - `renderActionCell` now renders Approve + Cancel side-by-side for pending rows where the session qualifies for both. Each predicate is checked independently so a custom role with only one verb renders just that button. - new click handler on `.history-approve-btn` mirroring the cancel-button pattern: confirmDialog (non-destructive) → `api.approvePurchase` → reload history + success toast. Failed approve surfaces an error toast + re-enables the button. RI exchange ----------- `approveRIExchange` has the same shape gap (token-only auth) but mutates a state-machine and triggers downstream cloud-side exchange execution — bigger surface than fits in this PR. Filing as a follow-up issue tracking the symmetric work. Tests ----- Backend * `internal/auth/types_test.go` + `service_test.go` + `service_group_test.go`: bumped permission-count assertions (8→9, 10→11) and added `ActionApproveOwn:ResourcePurchases` membership check in `DefaultUserPermissions returns user access`. * `internal/api/handler_purchases_test.go`: added `.Maybe()` HasPermissionAPI mocks (returning false for both verbs) on five existing approve-purchase tests so the new session-first dispatch falls through to the token branch they exercise. * `internal/api/coverage_extras_test.go`: `TestHandler_approvePurchase_EmptyToken` now asserts the empty-token + zero-handler case returns SOME error (recovered from panic via `defer recover`) — the empty-token path is no longer a pre-flight 400, since after #286 it's the dispatch into the session-authed branch. * `internal/api/handler_test.go::TestHandler_HandleRequest_ApprovePurchase`: same `.Maybe()` mocks for the end-to-end approve route test. Frontend * New `frontend/src/__tests__/history-approve-button.test.ts`: 8 tests covering admin/regular/anonymous render branches, non-pending-row absence, side-by-side Approve+Cancel render, declined confirmDialog, accepted confirm + reload + toast, and API-failure error toast + button re-enable. Verification: - `go test ./...` clean (4265 passed, 0 failed, 6 skipped). - `npm test` clean (1469 passed, 0 failed). - `npm run build` + `npx tsc --noEmit` clean. Token-only email-link path is unchanged — backwards-compat mandatory; existing pre-#286 tests on that path continue to pass. * fix(history,api): disable both row-actions during click + tighten empty-token test (CR pass on PR #299) Addresses both actionable items from CodeRabbit's first review on PR #299: 1. **history.ts:603 — disable BOTH Approve and Cancel during a click** (Minor / Quick win): after #286 the two buttons render side-by-side on pending rows, but the click handlers were disabling only the clicked button. A quick double-click could fire conflicting requests on the same row before the reload completes. Extracted a small `sameRowActions(btn)` helper that returns every row-action button (`.history-approve-btn`, `.history-cancel-btn`) in the same `<td>` as the clicked button (falls back to `[btn]` when no parent cell exists — test fixtures may not wrap in a table). Both Approve and Cancel handlers now disable the full set while the API is in flight and re-enable the full set on failure. Successful requests trigger a full history reload that re-renders the row, so the row-action sibling state doesn't matter on the happy path. 2. **coverage_extras_test.go:52 — fail loudly on panic instead of swallowing** (Major / Quick win): the prior `recover()` swallow could let a panic in any new dispatch branch pass the test without ever asserting on `err`, masking real regressions on the empty-token path. The test now (a) `t.Fatalf`s on any panic from `approvePurchase` so the panic path is itself a failure signal, and (b) wires a minimal `MockConfigStore` returning a clean "execution not found" error from `GetExecutionByID` so the dispatch reaches a proper `NewClientError(404)` rather than nil-deref'ing on `h.config`. `require.Error(...)` then asserts the contract: empty token + no session must NOT silently succeed. Tests (regression coverage for #1): * `history-approve-button.test.ts` gains two new cases: - "approve click disables BOTH Approve and Cancel for the row while in flight (CR pass)" — uses an unresolved promise from the api mock to capture the in-flight state and asserts both buttons are disabled until the promise resolves. - "approve API failure re-enables BOTH Approve and Cancel for the row (CR pass)" — asserts both buttons are re-enabled after the API rejection path runs. Verification: - `go test ./...` clean (4265 passed / 0 failed / 6 skipped). - `npm test` clean (1471 passed / 0 failed). - `npx tsc --noEmit` clean.

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread frontend/src/history.ts

Comment thread internal/api/handler_purchases.go Outdated

Comment thread internal/api/handler_purchases.go

cristim added triaged Item has been triaged priority/p2 Backlog-worthy severity/medium Moderate harm urgency/this-sprint Within the current sprint impact/many Affects most users effort/l Weeks type/feat New capability labels Apr 28, 2026

cristim merged commit 8bd1ec8 into feat/multicloud-web-frontend Apr 29, 2026
3 checks passed

This was referenced Apr 30, 2026

fix(purchases): GET purchase details returns 500 for upcoming scheduled purchases #205

Closed

fix(db): idempotently repair retry-linkage schema drift (closes #204, #205) #206

Merged

cristim deleted the feat/history-retry-button branch April 30, 2026 14:00

Conversation

cristim commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notable details

Test plan

Salvage notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

cristim commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Uh oh!

cristim commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cristim commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Uh oh!

cristim commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cristim commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading