Skip to content

feat(history): inline Retry button + retry-{any,own} RBAC + linkage + threshold + ops-hint (closes #47)#168

Merged
cristim merged 2 commits intofeat/multicloud-web-frontendfrom
feat/history-retry-button
Apr 29, 2026
Merged

feat(history): inline Retry button + retry-{any,own} RBAC + linkage + threshold + ops-hint (closes #47)#168
cristim merged 2 commits intofeat/multicloud-web-frontendfrom
feat/history-retry-button

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 27, 2026

Summary

Closes #47.

Adds an inline Retry button to failed Purchase History rows, mirroring the cancel-button infrastructure landed in #145. Three reviewer-confirmed design choices (yes / yes / yes) are implemented:

  1. Linkage — failed→retry chain via retry_execution_id (self-FK on purchase_executions) + retry_attempt_n counter. The History UI renders inline lineage links so users can navigate predecessor↔successor without scrolling.
  2. Threshold soft-blockretry_attempt_n >= 5 rejects with a structured 409; the frontend confirms-with-warning then re-POSTs ?force=true. Prevents an obviously-stuck deployment from accumulating dozens of dead retry rows.
  3. Ops hint — failures whose error matches a known-persistent-misconfig pattern (FROM_EMAIL not configured, SES sandbox, etc.) replace the Retry button entirely with an operator-actionable badge — retrying without changing config is guaranteed to fail again.

Salvages a previously-stalled agent attempt: rebased onto current feat/multicloud-web-frontend tip (which now includes #145 + the created_by_user_id migration 000041). The retry RBAC verbs (retry-own / retry-any) are added alongside the cancel verbs, with DefaultUserPermissions() granting retry-own:purchases to every authenticated user.

Notable details

  • Slice-aliasing fix. The new execution's Recommendations slice is now a defensive deep copy of the failed row's, so the downstream purchase pipeline (internal/purchase/execution.go) can't mutate the historical row's in-memory representation through the shared backing array. DB rows were already isolated; this closes the in-process aliasing footgun.
  • clientError grew a details map so the API can surface ops_hint / retry_attempt_n / threshold as structured JSON fields rather than substring-matched message text. The response writer flattens those alongside error in the body.
  • Already-retried 409 with the descendant ID prevents a stale-cache double-tap from silently overwriting the linkage pointer and orphaning a chain.
  • Tx ordering in persistRetryExecution guarantees the FK constraint: successor INSERT → predecessor UPSERT → suppression INSERTs in one tx; a crash mid-flow leaves either everything or nothing.
  • Migration 000042 adds nullable retry_execution_id (self-FK ON DELETE SET NULL) + retry_attempt_n INT NOT NULL DEFAULT 0 + a partial index on the FK. Legacy rows without backfill look like fresh first-retry candidates.

Test plan

  • go build ./... clean
  • go test ./internal/api/... ./internal/auth/... ./internal/config/... — 14 new retry tests + bumped permission counts
  • npx tsc --noEmit clean
  • npx jest — 1337/1337 frontend tests pass, 11 new retry-button tests
  • npm run build — webpack production build clean
  • gocyclo / gofmt / go vet pre-commit hooks pass
  • 3 consecutive clean review passes (Completeness / Correctness / Security / Bugs / Duplication)
  • Deployed-environment smoke after merge (UI: visible Retry on a failed row; click → confirm → toast + reload + descendant created with retry_attempt_n=1)

Salvage notes

The previous agent stalled mid-3-pass-review with the work uncommitted in the worktree. This PR:

  1. Stashed the inherited changes,
  2. Rebased the branch onto the current integration tip (no conflicts — auto-merge resolved cleanly),
  3. Fixed the previous agent's flagged slice-aliasing concern (real bug — would have corrupted in-memory historical row state),
  4. Fixed a stale comment on persistentFailureHints (case-INsensitive, not case-sensitive),
  5. Refactored retryPurchase cyclomatic complexity from 20 → ≤10 by extracting loadAndValidateRetryRequest, checkRetryRateGates, and persistRetryExecution,
  6. Re-ran the 3-pass review until clean.

Trigger CodeRabbit:

@coderabbitai review

Summary by CodeRabbit

  • New Features
    • Users can now retry failed purchase executions directly from the history view.
    • Inline Retry buttons appear only for authorized users on eligible failed rows.
    • Force-override capability available after reaching the retry attempt threshold.
    • Displays retry lineage and provenance information for traceability.
    • Shows operator guidance for known persistent misconfiguration issues instead of retry option.

…hreshold + ops-hint

Closes #47.

Salvages a previously-stalled agent attempt: the work was implemented but
never committed; rebased onto current feat/multicloud-web-frontend tip
which now includes PR #145 (cancel button + cancel-{any,own} RBAC +
migration 000041_purchase_executions_created_by_user_id). The retry
plumbing mirrors the cancel pattern row-for-row: same session-validation
gate, same authorize-{any,own} matrix, same legacy-NULL-creator
semantics for non-admins.

Backend
=======
* New POST /api/purchases/retry/{id} (AuthUser). Creates a new
  pending execution from a *failed* row's stored Recommendations and
  links the predecessor with retry_execution_id. Three writes happen
  in one tx (successor INSERT, predecessor UPSERT, suppression INSERT)
  so a crash mid-flow leaves either everything or nothing.
* RBAC: ActionRetryOwn / ActionRetryAny in internal/auth/types.go,
  retry-own:purchases added to DefaultUserPermissions(). Admins
  short-circuit, retry-any holders may retry any row, retry-own
  holders may retry only their own. Test matrix mirrors cancel.
* Already-retried guard: a row whose retry_execution_id is already
  set rejects with 409 + retry_execution_id detail so a stale-cache
  double-tap can't silently overwrite the lineage pointer.
* Persistent-failure block (Q3): handler resolves an ops_hint from a
  small map of known-persistent failure substrings (FROM_EMAIL not
  configured, SES sandbox, etc.) and returns 409 with the hint in
  structured detail fields. The History row also surfaces the same
  hint inline, replacing the Retry button entirely.
* Threshold soft-block (Q2): retry_attempt_n on the new row is
  predecessor.retry_attempt_n + 1; an attempt that would land at >= 5
  is rejected with 409 + retry_attempt_n / threshold details. The
  frontend confirms-with-warning then re-POSTs with ?force=true to
  override.
* Migration 000042 adds nullable retry_execution_id (self-FK ON
  DELETE SET NULL) and retry_attempt_n INT NOT NULL DEFAULT 0,
  plus a partial index on the FK for the rare populated case.
  store_postgres.go INSERT/UPSERT/SELECT cover the new columns;
  retry_execution_id is updateable via ON CONFLICT (the retry
  handler re-saves the predecessor to set the pointer), but
  created_by_user_id and retry_attempt_n stay INSERT-only so the
  scheduler's status-only UPSERTs don't rewrite provenance.
* clientError grew an optional details map so ops_hint /
  retry_attempt_n / threshold can reach the frontend as structured
  JSON fields instead of substring-matched message text.

Frontend
========
* History column doubles as the per-row action surface
  (renderActionCell): Cancel on cancellable pending rows (#46),
  Retry on retryable failed rows (this change), ops-hint badge on
  persistent failures, and inline lineage cross-links on retried
  rows. Status-driven mutually-exclusive decision tree.
* canRetryFailedRow gates the button: status=failed AND no ops_hint
  AND no retry_execution_id (already retried) AND admin OR
  matching created_by_user_id. UX gate only — backend remains
  authoritative.
* Threshold-reached rows render an "⚠ Retried 5× — click to
  override" button; the click confirms then POSTs with force=true.
* RetryPurchaseResult exported from frontend/src/api so callers can
  consume execution_id / retry_attempt_n.

Slice-aliasing fix
==================
The retry handler's new execution borrowed failedExec.Recommendations
by slice header. Mutations downstream
(internal/purchase/execution.go writes
exec.Recommendations[i].{Error,Purchased,PurchaseID}) would have
seen through the alias and corrupted the historical failed row's
in-memory representation visible to any caller still holding the
failedExec pointer. Fixed via defensive copy
(append([]RecommendationRecord(nil), failedExec.Recommendations...))
with a comment explaining the in-process aliasing footgun.

Tests
=====
* 14 new backend table tests covering admin/retry-any/retry-own
  paths, non-failed rejection, legacy-null-creator non-admin
  rejection, persistent-failure block + no-match, threshold
  block + force override, just-under-threshold allow,
  already-retried 409, missing-session rejection.
* 11 new frontend jest tests covering button visibility per role,
  ops-hint replacement, threshold-button text + force=true wiring,
  lineage link rendering, retry_attempt_n badge, declined dialog
  skips API, accepted posts + reloads + toasts, API failure surfaces
  toast and re-enables button.
* Auth permission-count tests bumped 7→8 (default user) and
  9→10 (user + 2 groups) to account for the new retry-own grant.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 10197c71-bb4d-445c-b4d3-f191dd32d384

📥 Commits

Reviewing files that changed from the base of the PR and between e686840 and aa3d77d.

📒 Files selected for processing (7)
  • frontend/src/api/client.ts
  • frontend/src/api/types.ts
  • frontend/src/history.ts
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/handler_router.go
  • internal/config/store_postgres_pgxmock_test.go
📝 Walkthrough

Walkthrough

Introduces a complete "retry failed purchase execution" feature across frontend and backend. Adds UI gating and button interaction in the history table, implements a new authenticated retry API endpoint with authorization, detects persistent operator-fixable errors, creates successor executions via transactional logic, extends database schema with retry linkage metadata, and adds authorization rules and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Frontend Types & API
frontend/src/types.ts, frontend/src/api/purchases.ts, frontend/src/api/index.ts
Added retry_execution_id, retry_attempt_n, and ops_hint fields to HistoryPurchase; introduced retryPurchase function calling POST /purchases/retry/{executionId} endpoint and RetryPurchaseResult type to handle retry response payload.
Frontend UI & Interaction
frontend/src/history.ts
Implements retry button rendering with authorization gating (canRetryFailedRow), threshold-based override flow at RETRY_ATTEMPT_N=5, action cell routing for Retry/Cancel/ops-hint badge/lineage links, click handling with confirmation dialogs, API calls with force flag, error/success toasts, and list reload.
Frontend Tests
frontend/src/__tests__/history-retry-button.test.ts
Comprehensive Jest test suite validating retry button visibility, threshold override flow, lineage links, ops-hint badge suppression, permission checks, confirmation dialogs, API integration, error handling, and toast notifications.
Backend API & Error Handling
internal/api/handler_router.go, internal/api/handler.go
Extended clientError with structured details metadata map and Details() method; added NewClientErrorWithDetails constructor to include machine-readable fields in error responses without message parsing.
Backend Retry Handler
internal/api/handler_purchases.go, internal/api/handler_purchases_test.go
Implements retryPurchase with operator-fixable failure detection (resolveOpsHint), authorization via retry-own/retry-any RBAC, retry-attempt soft-block at threshold, successor execution creation with deep-copied recommendations and incremented RetryAttemptN, original row linkage via retry_execution_id, transactional writes, and comprehensive test coverage for permissions, states, thresholds, and error cases.
Backend History & Routing
internal/api/handler_history.go, internal/api/router.go
Updated history projection to copy RetryExecutionID and RetryAttemptN metadata; computes OpsHint at read time for failed rows via resolveOpsHint; added authenticated POST /api/purchases/retry/ endpoint with path parameter forwarding.
Authorization & Permissions
internal/auth/types.go, internal/auth/types_test.go, internal/auth/service_test.go, internal/auth/service_group_test.go
Added ActionRetryOwn and ActionRetryAny constants; updated DefaultUserPermissions() to grant all authenticated users retry-own:purchases; adjusted test expectations to account for new permission entry (+1 to counts across test suites).
Database Schema & Migration
internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.up.sql, internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.down.sql
Up migration adds retry_execution_id (nullable UUID FK with ON DELETE SET NULL) and retry_attempt_n (non-null int, default 0) columns to purchase_executions; creates partial index on retry_execution_id; down migration reverses additions.
Database Store & Types
internal/config/store_postgres.go, internal/config/store_postgres_pgxmock_test.go, internal/config/types.go
Extended PurchaseExecution with RetryExecutionID and RetryAttemptN persistence; extended PurchaseHistoryRecord with RetryExecutionID, RetryAttemptN, and OpsHint synthesis; updated SavePurchaseExecutionTx to INSERT/UPDATE retry columns and all query methods to SELECT/scan new fields; adjusted mock test expectations for column alignment.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as Frontend UI
    participant API as API Handler
    participant DB as Database
    participant Email as Email Service

    User->>Frontend: Click Retry button on failed row
    Frontend->>Frontend: Show confirmation dialog
    User->>Frontend: Accept confirmation
    Frontend->>Frontend: Disable button, start loading
    
    alt Is force override required?
        Frontend->>API: POST /purchases/retry/{id}?force=true
    else Normal retry
        Frontend->>API: POST /purchases/retry/{id}
    end
    
    API->>DB: Lookup original execution
    API->>DB: Validate state (failed, not already retried)
    API->>DB: Resolve ops_hint from error
    
    alt ops_hint exists (persistent error)
        API-->>Frontend: HTTP 409 with ops_hint detail
        Frontend->>Frontend: Show error toast, enable button
    else force=false & threshold exceeded
        API-->>Frontend: HTTP 409 with retry_attempt_n detail
        Frontend->>Frontend: Show error toast, enable button
    else Authorization denied
        API-->>Frontend: HTTP 403
        Frontend->>Frontend: Show error toast
    else Success path
        API->>DB: Start transaction
        API->>DB: Create successor execution (pending, incremented RetryAttemptN)
        API->>DB: Update original with retry_execution_id → successor id
        API->>DB: Insert suppressions for successor
        API->>DB: Commit transaction
        API->>Email: Send approval email for successor
        API-->>Frontend: HTTP 200 with response (new execution id, metadata)
        Frontend->>Frontend: Show success toast
        Frontend->>Frontend: Call loadHistory() to refresh table
        Frontend->>API: GET /history
        API->>DB: Fetch updated executions with retry metadata
        API-->>Frontend: Return rows with retry linkage
        Frontend->>Frontend: Re-render history table with updated rows
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A retry so clever, we've sewn it with care,
From frontend to database, retry's everywhere!
With threshold-check precision and ops-hint so bright,
Failed purchases bounce back—retry done right! ✨
Five times, then force, through the flow they all dance,
One hop closer to victory, purchase's second chance!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically summarizes the main change: adding an inline Retry button with RBAC, linkage, threshold, and ops-hint features to the history feature.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/history-retry-button

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
internal/config/store_postgres_pgxmock_test.go (1)

368-381: Consider asserting the new retry fields directly.

The mock rows now match the expanded projection, but explicit checks for RetryExecutionID and RetryAttemptN would make these tests catch scan-order regressions in the new columns too.

Also applies to: 399-413

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/config/store_postgres_pgxmock_test.go` around lines 368 - 381, Add
explicit assertions in the test that verify the scanned values for the new retry
fields to catch scan-order regressions: after the row is added via
pgxmock.NewRows(cols).AddRow(...) and the result is scanned into the execution
struct (the code paths that currently assert other fields like Status,
StepNumber, etc.), add checks that execution.RetryExecutionID (or the field name
used in your struct) equals the expected nil/empty value and
execution.RetryAttemptN equals the expected numeric value (0 in this mock); do
this for the two locations mentioned around the cols/AddRow block and the
similar block at lines ~399-413 so the test explicitly validates both
RetryExecutionID and RetryAttemptN.
internal/api/handler_router.go (1)

47-51: Defensively copy details to avoid shared-map mutation leaks.

map[string]any is reference-typed; returning/storing it directly allows external mutation of error payload state. Cloning in constructor/accessor makes this safer and more predictable.

♻️ Suggested hardening
 func (e *clientError) Details() map[string]any { return e.details }
 
+func cloneDetails(in map[string]any) map[string]any {
+	if in == nil {
+		return nil
+	}
+	out := make(map[string]any, len(in))
+	for k, v := range in {
+		out[k] = v
+	}
+	return out
+}
+
-func NewClientErrorWithDetails(code int, message string, details map[string]any) error {
-	return &clientError{message: message, code: code, details: details}
+func NewClientErrorWithDetails(code int, message string, details map[string]any) error {
+	return &clientError{message: message, code: code, details: cloneDetails(details)}
 }
-func (e *clientError) Details() map[string]any { return e.details }
+func (e *clientError) Details() map[string]any { return cloneDetails(e.details) }

Also applies to: 63-65

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_router.go` around lines 47 - 51, Return a defensive copy
of the details map instead of the internal reference: in clientError.Details()
clone the e.details map into a new map and return that copy, and likewise make a
copy when assigning/storing details in the clientError constructor/initializer
(any function that constructs or sets clientError.details, e.g., NewClientError
or the clientError literal assignment) to avoid external code mutating the
internal map; perform a shallow copy of keys/values (map[string]any) since
values are interface{}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/history.ts`:
- Around line 598-602: The catch branch currently only shows err.message which
loses structured retry info; update the catch handling for retryError in
frontend/src/history.ts to cast/parses retryError as any and extract backend
retry fields (ops_hint, retry_attempt_n, threshold) and include them in the
toast (e.g., append ops_hint and attempt/threshold details to the message or
pass them in toast metadata) while still falling back to err.message/'unknown
error'; keep re-enabling the button (btn.disabled = false) and use the same
showToast call to surface the structured retry guidance to the user.

In `@internal/api/handler_purchases.go`:
- Around line 651-663: When creating the retry successor PurchaseExecution (the
newExecution variable of type config.PurchaseExecution), preserve the original
plan-scoped metadata by copying PlanID and StepNumber from the failed execution
(failedExec) into the newExecution struct; update the initializer to include
PlanID: failedExec.PlanID and StepNumber: failedExec.StepNumber so the retried
execution remains associated with the same plan and ramp step.
- Around line 581-593: The guard that returns the retry_execution_id leaks
cross-user metadata because it runs before RBAC; move the call to
authorizeSessionRetry(ctx, session, failedExec) to execute before the
already-retried check (the block that inspects failedExec.RetryExecutionID and
returns NewClientErrorWithDetails with retry_execution_id) so authorization is
enforced prior to revealing RetryExecutionID/descendant info; update the logic
around failedExec.RetryExecutionID, failedExec.ExecutionID, and the
NewClientErrorWithDetails call accordingly to ensure no retry_execution_id is
returned unless authorizeSessionRetry has succeeded.

---

Nitpick comments:
In `@internal/api/handler_router.go`:
- Around line 47-51: Return a defensive copy of the details map instead of the
internal reference: in clientError.Details() clone the e.details map into a new
map and return that copy, and likewise make a copy when assigning/storing
details in the clientError constructor/initializer (any function that constructs
or sets clientError.details, e.g., NewClientError or the clientError literal
assignment) to avoid external code mutating the internal map; perform a shallow
copy of keys/values (map[string]any) since values are interface{}.

In `@internal/config/store_postgres_pgxmock_test.go`:
- Around line 368-381: Add explicit assertions in the test that verify the
scanned values for the new retry fields to catch scan-order regressions: after
the row is added via pgxmock.NewRows(cols).AddRow(...) and the result is scanned
into the execution struct (the code paths that currently assert other fields
like Status, StepNumber, etc.), add checks that execution.RetryExecutionID (or
the field name used in your struct) equals the expected nil/empty value and
execution.RetryAttemptN equals the expected numeric value (0 in this mock); do
this for the two locations mentioned around the cols/AddRow block and the
similar block at lines ~399-413 so the test explicitly validates both
RetryExecutionID and RetryAttemptN.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0a6bfe37-5731-4364-9ce9-b79a8e5edf5e

📥 Commits

Reviewing files that changed from the base of the PR and between 5520317 and e686840.

📒 Files selected for processing (20)
  • frontend/src/__tests__/history-retry-button.test.ts
  • frontend/src/api/index.ts
  • frontend/src/api/purchases.ts
  • frontend/src/history.ts
  • frontend/src/types.ts
  • internal/api/handler.go
  • internal/api/handler_history.go
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/handler_router.go
  • internal/api/router.go
  • internal/auth/service_group_test.go
  • internal/auth/service_test.go
  • internal/auth/types.go
  • internal/auth/types_test.go
  • internal/config/store_postgres.go
  • internal/config/store_postgres_pgxmock_test.go
  • internal/config/types.go
  • internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.down.sql
  • internal/database/postgres/migrations/000042_purchase_executions_retry_linkage.up.sql

Comment thread frontend/src/history.ts
Comment thread internal/api/handler_purchases.go Outdated
Comment thread internal/api/handler_purchases.go
Three actionable findings from CR's first review (PR #168):

1. **Major / security**: authorize before the already-retried guard.
   The `failedExec.RetryExecutionID` 409 ran ahead of
   `authorizeSessionRetry`, so any signed-in user who could guess a
   failed-row UUID would learn it had been retried AND get the
   descendant execution ID — a cross-user info leak. Reordered the
   gate sequence so RBAC runs first; the descendant UUID is now
   surfaced only after the caller has proven they're allowed to act
   on the row. New regression test
   `TestHandler_retryPurchase_AlreadyRetried_RBACBeforeLeak` asserts
   the 403 path doesn't carry retry_execution_id in the details map.

2. **Major / correctness**: preserve PlanID + StepNumber on retry
   successors. Without this, retrying a failed planned execution
   dropped the new row out of plan-scoped history and lost its
   ramp-step attribution. Both fields now propagate from the
   predecessor; for ad-hoc executions PlanID="" and StepNumber=0 so
   propagation is a no-op. New regression test
   `TestHandler_retryPurchase_PreservesPlanMetadata` asserts the
   propagation.

3. **Minor / UX**: surface structured retry hints in the failure
   toast. The catch path used to discard `ops_hint` /
   `retry_attempt_n` / `threshold` from a 4xx body and toast only
   `err.message`, leaving the user with a generic failure instead of
   the operator-actionable hint or override guidance the backend
   already returns. To wire this up:
   - `ApiError` gained an optional `details: Record<string, unknown>`
     field (frontend/src/api/types.ts).
   - `apiRequest` now extracts ALL non-`error` keys from a 4xx body
     into `error.details` (frontend/src/api/client.ts) so callers can
     branch on machine-readable hints without parsing message text.
   - The retry catch in history.ts inspects `err.details.ops_hint`
     first, then falls back to a "already retried n× (threshold k)
     — confirm the override prompt to force" message synthesised
     from `retry_attempt_n` + `threshold`, then to `err.message`.

Plus two nits from the same review:

4. **Defensive copy of clientError details map**. The previous shape
   stored the constructor's input map by reference; a caller mutating
   the map after construction would reach into the error's payload.
   `cloneDetails` shallow-copies on both construction and `Details()`
   read so the error is sealed.

5. **Explicit retry-field assertions in pgxmock GetExecutionByID
   tests**. Two new assertions (one for the NULL/0 legacy-row case,
   one for a populated chain-position-2 case) guard against scan-
   order regressions in the new `retry_execution_id` /
   `retry_attempt_n` columns. A column reorder upstream would scan
   into the wrong destination and fail at least one of these.

All 1337 frontend tests + ~16 backend retry/auth/config tests pass.
gocyclo / gofmt / go vet pre-commit hooks pass.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 28, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim cristim added triaged Item has been triaged priority/p2 Backlog-worthy severity/medium Moderate harm urgency/this-sprint Within the current sprint impact/many Affects most users effort/l Weeks type/feat New capability labels Apr 28, 2026
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 28, 2026

P2: substantial feature (retry failed purchases with RBAC, linkage, threshold, ops-hint). CI green, mergeable, 1337 frontend tests pass. Good overall but CodeRabbit flagged docstring coverage (28.57% vs 80% threshold) — worth addressing before merge. Backend-side save guard for invalid combos noted as out of scope; worth tracking as a follow-up issue. (triage agent wave2-F)

@cristim cristim merged commit 8bd1ec8 into feat/multicloud-web-frontend Apr 29, 2026
3 checks passed
cristim added a commit that referenced this pull request Apr 30, 2026
…205) (#206)

* fix(db): idempotently restore retry_execution_id column dropped by migration drift (closes #204, closes #205)

Production /api/purchases/* surfaced 500 errors after PR #161 deployed,
with the underlying CloudWatch error:

    ERROR: column "retry_execution_id" does not exist (SQLSTATE 42703)

User-facing symptoms (dashboard "Upcoming Scheduled Purchases" panel):

  * "Failed to load purchase details: Internal server error"  (#205)
  * "Failed to cancel purchase"                                (#204)

Both buttons hit `Handler.GetExecutionByID` whose SELECT references the
retry_execution_id and retry_attempt_n columns introduced by PR #168.

Root cause: migration version drift.

  * PR #189 added 000042_recommendations_add_engine_to_key earlier on
    the same day. Deployed Lambdas applied it as version 42.
  * PR #168 then added 000042_purchase_executions_retry_linkage,
    colliding on version 42.
  * PR #195 resolved the on-disk duplicate by renaming RECOMMENDATIONS
    to 000043 — the wrong direction. Deployed DBs already had
    schema_migrations.version=42 marked as the engine content, so on
    the next migrate.Up() they:
      - skipped on-disk 000042 (now retry_linkage) entirely (DB
        version >= 42 already)
      - applied on-disk 000043 (now engine again) as a no-op via the
        ADD COLUMN IF NOT EXISTS guards already in that file
    Net effect: the retry_execution_id / retry_attempt_n columns
    were never added on prod, but the Go code expects them.

Forward-only corrective. The new 000044 replays 000042's schema
additions wrapped in IF NOT EXISTS / CREATE INDEX IF NOT EXISTS so it's
a no-op on any DB that already has the columns (fresh deploys,
manually-fixed envs) and additive on the broken ones. We don't try to
repair schema_migrations history — golang-migrate only cares about the
high-water version, and we can't tell from outside whether a given
environment skipped 42 or applied 43 twice.

The migration up + down sql files document the incident and the
edge cases inline so future readers don't have to re-derive the
history from the git log.

Closes #204
Closes #205

* fix(db): repair partially-fixed DBs in 000044, not just no-op (CR pass 1)

CR pass 1 on PR #206 flagged that the original 000044 only added columns
when missing — a manually-fixed DB with the column but no FK, or with
retry_attempt_n nullable, would still be broken after this migration.

Rewrite as five idempotent steps that each check pg_catalog /
information_schema for the precise piece they own, so every reachable
partial state is repaired:

  1. Add retry_execution_id column if missing.
  2. Add the FK on retry_execution_id -> execution_id if no FK on that
     column exists. Match by column relationship (pg_constraint join
     pg_attribute) NOT by constraint name, so a fresh-deploy DB whose
     000042 ran cleanly with the auto-generated
     `purchase_executions_retry_execution_id_fkey` is recognised as
     already-FK'd and we don't add a second.
  3. Add retry_attempt_n column if missing (as nullable, so step 4 can
     backfill before flipping to NOT NULL).
  4. Backfill any NULLs and SET DEFAULT 0 / SET NOT NULL
     unconditionally — Postgres treats those as no-ops when the
     constraint is already as requested.
  5. Create the partial index if missing.

Down stays simple: DROP COLUMN ... CASCADE removes the FK regardless of
its name.

Comments inline document each reachable partial state and why each step
is safe to re-run. Migration is a no-op on healthy DBs, fully repairing
on broken ones.
@cristim cristim deleted the feat/history-retry-button branch April 30, 2026 14:00
cristim added a commit that referenced this pull request May 5, 2026
… RBAC (#286) (#299)

* feat(api,history): in-dashboard purchase approval + approve-{any,own} RBAC (closes #286)

Adds an inline Approve button on pending Purchase History rows + the
session-authed backend dispatch that lets admins (and `approve-own`
holders for their own pending rows) approve without round-tripping
to the SES email. Mirrors the same dual-auth shape as the Cancel
button (PR #145) and Retry button (PR #168).

Backend
-------

* `internal/auth/types.go`: new constants `ActionApproveOwn` /
  `ActionApproveAny` next to the existing cancel-/retry- block.
  `DefaultUserPermissions()` adds `approve-own:purchases` so every
  authenticated user can approve pending executions they themselves
  created. `approve-any` exists for future operator roles; no
  default non-admin grant.

* `internal/api/handler_purchases.go::approvePurchase` refactored
  into the same three-mode dispatch as `cancelPurchase`:
    1. Session present + RBAC matches → session-authed approve via
       `approvePurchaseViaSession` regardless of token.
    2. token != "" → legacy email-link path, unchanged.
    3. token == "" → session-authed dashboard branch.
  Permission-denied (403) on the session check falls through to
  the token branch so a logged-in non-admin who is the per-account
  contact_email recipient can still approve via the email link.

* New `approvePurchaseViaSession` mirrors `cancelPurchaseViaSession`
  minus the suppression-cleanup tx (approve doesn't drop
  suppressions — those persist until cancel/expiry). Status flip
  + `ApprovedBy = session.Email` stamp + SavePurchaseExecution.

* New `authorizeSessionApprove` mirror of `authorizeSessionCancel`:
  admin → ok; else `approve-any` → ok; else `approve-own` AND
  `created_by_user_id == session.UserID` → ok; else 403.

No new migration: existing `purchase_executions.approved_by` (TEXT,
migration 000035) and `created_by_user_id` (UUID FK, migration
000041) are sufficient — the email field stamps the actor and the
UUID drives the own-ness check.

Frontend
--------

* `frontend/src/api/purchases.ts`: new `approvePurchase(executionId)`
  caller wrapping `POST /purchases/approve/{id}` with bearer-session
  auth (no token in URL — backend's session-first dispatch picks
  the auth path).
* `frontend/src/api/index.ts`: re-export the new function from the
  api barrel.
* `frontend/src/history.ts`:
    - new `canApprovePendingRow` predicate mirroring
      `canCancelPendingRow` (admin → yes; non-admin → only own
      pending rows; legacy null-creator rows out of reach via
      this UI).
    - `renderActionCell` now renders Approve + Cancel side-by-side
      for pending rows where the session qualifies for both. Each
      predicate is checked independently so a custom role with
      only one verb renders just that button.
    - new click handler on `.history-approve-btn` mirroring the
      cancel-button pattern: confirmDialog (non-destructive) →
      `api.approvePurchase` → reload history + success toast.
      Failed approve surfaces an error toast + re-enables the
      button.

RI exchange
-----------

`approveRIExchange` has the same shape gap (token-only auth) but
mutates a state-machine and triggers downstream cloud-side
exchange execution — bigger surface than fits in this PR. Filing
as a follow-up issue tracking the symmetric work.

Tests
-----

Backend
* `internal/auth/types_test.go` + `service_test.go` +
  `service_group_test.go`: bumped permission-count assertions
  (8→9, 10→11) and added `ActionApproveOwn:ResourcePurchases`
  membership check in `DefaultUserPermissions returns user access`.
* `internal/api/handler_purchases_test.go`: added `.Maybe()`
  HasPermissionAPI mocks (returning false for both verbs) on
  five existing approve-purchase tests so the new session-first
  dispatch falls through to the token branch they exercise.
* `internal/api/coverage_extras_test.go`:
  `TestHandler_approvePurchase_EmptyToken` now asserts the
  empty-token + zero-handler case returns SOME error (recovered
  from panic via `defer recover`) — the empty-token path is no
  longer a pre-flight 400, since after #286 it's the dispatch
  into the session-authed branch.
* `internal/api/handler_test.go::TestHandler_HandleRequest_ApprovePurchase`:
  same `.Maybe()` mocks for the end-to-end approve route test.

Frontend
* New `frontend/src/__tests__/history-approve-button.test.ts`:
  8 tests covering admin/regular/anonymous render branches,
  non-pending-row absence, side-by-side Approve+Cancel render,
  declined confirmDialog, accepted confirm + reload + toast,
  and API-failure error toast + button re-enable.

Verification:
  - `go test ./...` clean (4265 passed, 0 failed, 6 skipped).
  - `npm test` clean (1469 passed, 0 failed).
  - `npm run build` + `npx tsc --noEmit` clean.

Token-only email-link path is unchanged — backwards-compat
mandatory; existing pre-#286 tests on that path continue to pass.

* fix(history,api): disable both row-actions during click + tighten empty-token test (CR pass on PR #299)

Addresses both actionable items from CodeRabbit's first review on PR
#299:

1. **history.ts:603 — disable BOTH Approve and Cancel during a click**
   (Minor / Quick win): after #286 the two buttons render side-by-side
   on pending rows, but the click handlers were disabling only the
   clicked button. A quick double-click could fire conflicting requests
   on the same row before the reload completes.

   Extracted a small `sameRowActions(btn)` helper that returns every
   row-action button (`.history-approve-btn`, `.history-cancel-btn`)
   in the same `<td>` as the clicked button (falls back to `[btn]`
   when no parent cell exists — test fixtures may not wrap in a
   table). Both Approve and Cancel handlers now disable the full set
   while the API is in flight and re-enable the full set on failure.
   Successful requests trigger a full history reload that re-renders
   the row, so the row-action sibling state doesn't matter on the
   happy path.

2. **coverage_extras_test.go:52 — fail loudly on panic instead of
   swallowing** (Major / Quick win): the prior `recover()` swallow
   could let a panic in any new dispatch branch pass the test
   without ever asserting on `err`, masking real regressions on
   the empty-token path.

   The test now (a) `t.Fatalf`s on any panic from `approvePurchase`
   so the panic path is itself a failure signal, and (b) wires a
   minimal `MockConfigStore` returning a clean
   "execution not found" error from `GetExecutionByID` so the
   dispatch reaches a proper `NewClientError(404)` rather than
   nil-deref'ing on `h.config`. `require.Error(...)` then asserts
   the contract: empty token + no session must NOT silently succeed.

Tests (regression coverage for #1):

* `history-approve-button.test.ts` gains two new cases:
  - "approve click disables BOTH Approve and Cancel for the row
    while in flight (CR pass)" — uses an unresolved promise from
    the api mock to capture the in-flight state and asserts both
    buttons are disabled until the promise resolves.
  - "approve API failure re-enables BOTH Approve and Cancel for
    the row (CR pass)" — asserts both buttons are re-enabled
    after the API rejection path runs.

Verification:
  - `go test ./...` clean (4265 passed / 0 failed / 6 skipped).
  - `npm test` clean (1471 passed / 0 failed).
  - `npx tsc --noEmit` clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/l Weeks impact/many Affects most users priority/p2 Backlog-worthy severity/medium Moderate harm triaged Item has been triaged type/feat New capability urgency/this-sprint Within the current sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant