Skip to content

fix(security): approval-flow hardening — crypto/rand tokens + per-account contact gating#101

Merged
cristim merged 8 commits intofeat/multicloud-web-frontendfrom
fix/approval-token-entropy
Apr 29, 2026
Merged

fix(security): approval-flow hardening — crypto/rand tokens + per-account contact gating#101
cristim merged 8 commits intofeat/multicloud-web-frontendfrom
fix/approval-token-entropy

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 27, 2026

Summary

Closes 3 HIGH findings from the security review on the approval flow:

  • H1 — Approval tokens generated with uuid.New().String() (122 bits entropy in a known format) → swapped all 4 generation sites for a new pkg/common.GenerateApprovalToken() returning 32 random bytes from crypto/rand, hex-encoded (256 bits unpredictability, uniform output space).
  • H3 — authorizeApprovalAction fell back to the global notification email as a catch-all approver when no recommendation had a per-account contact_email. Anyone holding the global mailbox could approve spend on any account. Closed: resolveApprovalRecipients no longer adds globalNotify to the approver set; authorizeApprovalAction returns 403 with a clear "set the cloud account's contact_email before approving" message when no contact is configured.
  • H2 — RI exchange single-use enforcement was already in place at the SQL layer (TransitionRIExchangeStatus(pending → processing) is atomic; concurrent second approval gets 409). Audited and documented; no code change needed.

Why pkg/common

The four token-generation sites span both Go modules — pkg/exchange/auto.go lives in the pkg/ submodule and cannot import internal/. pkg/common is the shared base used by every provider, so the helper lands there.

Behavior change to flag for operators

Any execution whose recommendations all reference cloud accounts WITHOUT a contact_email will fail approval until contact_email is set on the account. Previously the global notification email could approve on those accounts' behalf — that's the exact loophole this PR closes.

Files

  • pkg/common/tokens.go (NEW) — GenerateApprovalToken() + tests.
  • pkg/exchange/auto.go — RI exchange uses the helper; removes uuid import.
  • internal/api/handler_purchases.go — new newPendingExecution + persistExecutionAndSuppressions helpers (gocyclo extraction); tightened resolveApprovalRecipients so global notify is never an approver; authorizeApprovalAction rejects with a clear error when no per-account contact_email exists.
  • internal/api/handler_plans.go — plan execution token uses the helper.
  • internal/purchase/notifications.go — scheduled notification token uses the helper.
  • Test updates: existing approval-flow tests now wire a per-account ContactEmail through MockConfigStore.GetCloudAccountFn (the prior tests relied on the global-fallback this PR removes); 4 new tests cover token uniqueness, the no-contact-email rejection, and the security-hardened recipient policy.

Test plan

  • go test -short -race -count=1 ./internal/api/... ./internal/purchase/... green.
  • (cd pkg && go test -short -race -count=1 ./common/... ./exchange/...) green.
  • gocyclo -over 10 clean.
  • All pre-commit hooks pass.
  • Operator action before merge: query purchase_history/purchase_executions for accounts that have approved purchases via the global mailbox (no per-account contact_email). Notify those owners that they need to set contact_email on the account before the next approval cycle, or future approvals will 403.

Follow-up PRs

PR3 (input cap), PR4 (IAM wildcards), PR5 (supply chain) — independent of this PR.

🤖 Generated with claude-flow

Summary by CodeRabbit

  • Security Improvements

    • Approval tokens now use cryptographically-generated tokens for stronger protection.
  • Validation Changes

    • Approvals/cancellations require per-account contact emails; global notification address is not an approver.
    • Asynchronous approval messages must include and validate an actor email against per-account approvers.
  • Transactional Behavior

    • Creation of planned executions and plan update now occur atomically so partial writes roll back on failure.
  • Error Handling

    • Token-generation and contact-email lookup failures surface and prevent creation/approval actions.
  • Tests

    • Added coverage for token generation, transactional rollback, recipient-resolution, and async gating.

…t contact gating

Addresses 3 HIGH findings from the security review:

H1 — Approval tokens were generated with `uuid.New().String()`. UUID v4 is
122 bits of entropy in a known format with version + variant nibbles fixed,
which is fine for identity but a poor primary credential. Switched all four
token-generation sites to a new `pkg/common.GenerateApprovalToken()` that
returns 32 random bytes from `crypto/rand`, hex-encoded — full 256 bits of
unpredictability and a uniform output space.

  Sites updated:
   - internal/api/handler_purchases.go (web purchase flow)
   - internal/api/handler_plans.go (plan execution creation)
   - internal/purchase/notifications.go (scheduled purchase notifications)
   - pkg/exchange/auto.go (RI exchange automation, in pkg/ submodule)

  The helper lives in pkg/common because pkg/exchange (a separate Go module)
  cannot import internal/. All four sites now propagate the rand.Read error
  rather than panicking — auto.go returns an ExchangeOutcome with the error
  fielded so the surrounding loop continues processing the rest of the recs.

H3 — `authorizeApprovalAction` previously fell back to the global Settings
notification email as a catch-all approver when no recommendation had a
per-account contact_email. Anyone holding the global notify mailbox could
approve spend on any account with the policy. Closed the loophole:
`resolveApprovalRecipients` no longer adds globalNotify to the approver
set — only per-account contact_email values count. authorizeApprovalAction
returns 403 with a clear message ("set the cloud account's contact_email
before approving") when no contact_email is configured.

  Behaviour change to call out for operators: any execution whose
  recommendations all reference accounts WITHOUT a contact_email will fail
  approval until contact_email is set on the account. Existing tenants
  who relied on the global mailbox are affected.

H2 — RI exchange single-use enforcement. Audit confirmed it's already in
place at the SQL layer: `TransitionRIExchangeStatus(pending → processing)`
is atomic, and a concurrent second approval attempt returns 409
("already processed"). `subtle.ConstantTimeCompare` on the token compare.
No code change needed; documenting the audit here so future readers don't
think H2 was missed.

Tests: 4 new tests covering token uniqueness/length, the no-contact-email
rejection path, and updates to existing approval-flow tests so they wire
a per-account ContactEmail through MockConfigStore.GetCloudAccountFn (the
prior tests relied on the global-fallback that this PR removes).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a02cc89e-eeb2-4ee0-8538-da44237bcfc1

📥 Commits

Reviewing files that changed from the base of the PR and between b0d4d7f and 2017748.

📒 Files selected for processing (2)
  • internal/api/handler_plans_test.go
  • internal/scheduler/scheduler_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/scheduler/scheduler_test.go
  • internal/api/handler_plans_test.go

📝 Walkthrough

Walkthrough

Approval tokens are now crypto-random via common.GenerateApprovalToken(). Execution creation and plan next-execution updates can run inside shared DB transactions (*Tx variants). SQS approve/cancel messages must include actor_email validated against per-account contact_email. Tests and mocks updated for tx-aware APIs and recipient-resolution behavior.

Changes

Cohort / File(s) Summary
Token core & tests
pkg/common/tokens.go, pkg/common/tokens_test.go
Add common.GenerateApprovalToken() producing 32 random bytes (64-char hex) and unit tests validating length, uniqueness, decoding, and non-zero output.
Execution creation & plan tx
internal/api/handler_plans.go, internal/api/handler_plans_test.go, internal/config/store_postgres.go, internal/config/store_postgres_pgxmock_test.go, internal/config/interfaces.go, internal/mocks/stores.go
Move execution insert + plan next-execution bump into WithTx transactional flow; add UpdatePurchasePlanTx interface and Postgres TX implementation; tests assert rollback on mid-loop failures and transaction framing.
Purchase flows & approvals
internal/api/handler_purchases.go, internal/api/handler_purchases_test.go, internal/api/handler_test.go, internal/purchase/notifications.go, internal/purchase/approvals.go, internal/purchase/messages.go, internal/purchase/coverage_extra_test.go
Switch to GenerateApprovalToken() with error propagation; extract persistence helpers; require per-account contact_email for approver set (global notify excluded from approvers); SQS handlers require and validate actor_email; tests updated for recommendation→cloud-account contact lookup and negative cases.
Async message & actor enforcement
internal/purchase/messages.go, internal/purchase/approvals.go, internal/purchase/coverage_extra_test.go
Add ActorEmail to AsyncMessage; verify actor + token before invoking Approve/Cancel; reject legacy payloads missing actor_email; tests added for missing/invalid actor and token short-circuiting.
Mocks & test helpers (tx-aware)
internal/api/mocks_test.go, internal/purchase/mocks_test.go, internal/analytics/collector_test.go, internal/scheduler/scheduler_test.go, internal/server/test_helpers_test.go, internal/mocks/stores.go
Introduce UpdatePurchasePlanTx to multiple test mocks with fallback-to-non-tx behavior; adjust test stubs (e.g., GetCloudAccount) to be expectation-aware so tx-based flows compile and test correctly.
Exchange & failure recording
pkg/exchange/auto.go
Manual exchanges use GenerateApprovalToken(); token-gen failures log and persist failed ExchangeRecord with explicit mode; saveFailedRecord extended to accept mode and call sites updated.
Docstrings & minor edits
internal/purchase/approvals.go, internal/purchase/messages.go
Clarified comments about approval gating and actor attribution; updated Approve/Cancel call sites to pass validated actor_email.
sequenceDiagram
    participant Client
    participant Handler as API/SQS Handler
    participant TokenGen as common.GenerateApprovalToken
    participant DB as Postgres (WithTx)
    participant Resolver as Recipient Resolver
    participant CloudAPI as ConfigStore.GetCloudAccount

    Client->>Handler: submit approve/cancel (HTTP session or SQS msg with actor_email)
    Handler->>DB: Get PurchaseExecution (includes Recommendations)
    DB-->>Handler: execution with CloudAccountIDs

    Handler->>TokenGen: GenerateApprovalToken() (when creating execution)
    TokenGen-->>Handler: token / error

    Handler->>Resolver: resolveApprovalRecipients(recommendations)
    Resolver->>CloudAPI: GetCloudAccount(cloudAccountID)
    CloudAPI-->>Resolver: CloudAccount{ContactEmail}

    alt ContactEmail found
        Resolver-->>Handler: approvers includes ContactEmail
        Handler->>DB: WithTx { SavePurchaseExecutionTx / UpdatePurchasePlanTx / ApproveExecution/CancelExecution }
        DB-->>Handler: commit
        Handler-->>Client: success
    else ContactEmail missing or lookup error
        Resolver-->>Handler: approvers empty or error
        Handler-->>Client: reject (403 or propagate error)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

urgency/this-sprint, type/bug

Poem

🐰 I nibble bytes of true-random light,
Tokens spun from cryptographic night,
Each account must wear its own contact crest,
No global gate to shortcut the test,
Secure hops, rolled back if one step isn't right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main security-focused changes: replacing UUID tokens with cryptographically secure random tokens and enforcing per-account contact email for approvals.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/approval-token-entropy

Review rate limit: 3/5 reviews remaining, refill in 23 minutes and 24 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/handler_purchases.go (1)

820-850: ⚠️ Potential issue | 🟠 Major

The SQS approval path remains unprotected by email verification and needs follow-up action.

The authorization gap is confirmed. authorizeApprovalAction correctly enforces session ↔ contact_email matching for the HTTP path, but the asynchronous SQS path (handleApproveMessage / handleCancelMessage) bypasses this entirely. These handlers pass an empty actor directly to ApproveExecution / CancelExecution without ever checking the approver email list. Both methods validate only the token and never verify that the caller matches the approved contacts—a token holder can approve any execution.

The code itself labels this the "legacy token-only path" and includes explicit comments acknowledging the attribution fallback behavior, but no deprecation notice or removal timeline exists. The SQS handlers remain wired into the active message dispatcher.

Either deprecate the SQS flow explicitly or add a comment in messages.go documenting this authorization gap as a known limitation pending future closure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_purchases.go` around lines 820 - 850, The SQS approval
path lacks email verification: locate the asynchronous handlers
handleApproveMessage and handleCancelMessage and either (A) deprecate the SQS
flow by adding a clear deprecation comment and TODO with removal timeline and
link to the secure HTTP path (authorizeApprovalAction / ApproveExecution /
CancelExecution), or (B) add an explicit comment in messages.go documenting the
known authorization gap (that SQS messages use token-only attribution and do not
verify session ↔ contact_email), explain the risk, and add a TODO referencing a
follow-up to enforce approver email checks in ApproveExecution/CancelExecution;
include the function names (handleApproveMessage, handleCancelMessage,
ApproveExecution, CancelExecution, authorizeApprovalAction) in the comment so
reviewers can find the exact code paths.
🧹 Nitpick comments (2)
pkg/exchange/auto.go (1)

277-291: LGTM, with a minor consistency observation.

Token-gen failure correctly aborts the manual exchange and surfaces as result.Failed via the Error != "" branch in processRecommendation. Note that unlike processAutoExchange (which calls saveFailedRecord on its failure paths), this branch logs but does not persist a failed-status record — so an operator auditing the DB won't see this failure, only the log line. Given how rare crypto/rand.Read failures are in practice, this is fine; flagging only so the asymmetry is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/exchange/auto.go` around lines 277 - 291, In processManualExchange, after
GenerateApprovalToken fails, persist a failed-status record the same way
processAutoExchange does by calling saveFailedRecord with the relevant ctx,
params (RunAutoExchangeParams), rec (ReshapeRecommendation), offeringID, and the
error string instead of only logging; ensure the resulting ExchangeOutcome still
returns the Error message and other fields as currently done so the DB audit
shows the failure consistently with other failure paths.
internal/api/coverage_extras_test.go (1)

41-52: LGTM — test setup correctly aligned with per-account contact-email gating.

The added accountID, Recommendations with CloudAccountID, and GetCloudAccountFn returning a CloudAccount whose ContactEmail matches the session approver match the new authorizeApprovalAction requirement (403 unless per-account contact_email resolves to the approver). The error path under test (ApproveExecution returning "approval failed") is now reached only after authorization succeeds, which is what this test intends to exercise.

One minor cleanup opportunity: with H3 removing globalNotify from resolveApprovalRecipients, the GetGlobalConfig stub returning NotificationEmail: &approver (lines 54–56) may no longer influence authorization here. If it's still referenced by another code path on this branch it's fine to leave; otherwise consider trimming to keep the test minimal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/coverage_extras_test.go` around lines 41 - 52, The test
currently stubs GetGlobalConfig to return NotificationEmail (lines referenced)
but since authorizeApprovalAction now uses per-account contact_email via
GetCloudAccount (mocked by mockConfig.GetCloudAccountFn) and
resolveApprovalRecipients no longer uses globalNotify, remove or simplify the
GetGlobalConfig stub to keep the test minimal: delete the
mockConfig.GetGlobalConfig assignment or return a zero-value/global config
without NotificationEmail; ensure the rest of the test still sets accountID,
Recommendations (with CloudAccountID) and mockConfig.GetCloudAccountFn so
authorizeApprovalAction and ApproveExecution exercise the intended authorization
and error path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/api/handler_purchases.go`:
- Around line 820-850: The SQS approval path lacks email verification: locate
the asynchronous handlers handleApproveMessage and handleCancelMessage and
either (A) deprecate the SQS flow by adding a clear deprecation comment and TODO
with removal timeline and link to the secure HTTP path (authorizeApprovalAction
/ ApproveExecution / CancelExecution), or (B) add an explicit comment in
messages.go documenting the known authorization gap (that SQS messages use
token-only attribution and do not verify session ↔ contact_email), explain the
risk, and add a TODO referencing a follow-up to enforce approver email checks in
ApproveExecution/CancelExecution; include the function names
(handleApproveMessage, handleCancelMessage, ApproveExecution, CancelExecution,
authorizeApprovalAction) in the comment so reviewers can find the exact code
paths.

---

Nitpick comments:
In `@internal/api/coverage_extras_test.go`:
- Around line 41-52: The test currently stubs GetGlobalConfig to return
NotificationEmail (lines referenced) but since authorizeApprovalAction now uses
per-account contact_email via GetCloudAccount (mocked by
mockConfig.GetCloudAccountFn) and resolveApprovalRecipients no longer uses
globalNotify, remove or simplify the GetGlobalConfig stub to keep the test
minimal: delete the mockConfig.GetGlobalConfig assignment or return a
zero-value/global config without NotificationEmail; ensure the rest of the test
still sets accountID, Recommendations (with CloudAccountID) and
mockConfig.GetCloudAccountFn so authorizeApprovalAction and ApproveExecution
exercise the intended authorization and error path.

In `@pkg/exchange/auto.go`:
- Around line 277-291: In processManualExchange, after GenerateApprovalToken
fails, persist a failed-status record the same way processAutoExchange does by
calling saveFailedRecord with the relevant ctx, params (RunAutoExchangeParams),
rec (ReshapeRecommendation), offeringID, and the error string instead of only
logging; ensure the resulting ExchangeOutcome still returns the Error message
and other fields as currently done so the DB audit shows the failure
consistently with other failure paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec5c4164-ae28-4468-8913-7d247d903bc3

📥 Commits

Reviewing files that changed from the base of the PR and between e9ab8ba and 040de8c.

📒 Files selected for processing (9)
  • internal/api/coverage_extras_test.go
  • internal/api/handler_plans.go
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/handler_test.go
  • internal/purchase/notifications.go
  • pkg/common/tokens.go
  • pkg/common/tokens_test.go
  • pkg/exchange/auto.go

- internal/purchase/messages.go: document the SQS approve/cancel
  authorization gap explicitly. The async path bypasses the per-account
  contact_email gating enforced by authorizeApprovalAction in the HTTP
  handler — record this as a known limitation with a TODO referencing
  the two viable closures (deprecate SQS, or carry session email +
  enforce in ApproveExecution/CancelExecution). No behaviour change.
- pkg/exchange/auto.go: persist a failed-status record when
  GenerateApprovalToken fails in processManualExchange so the DB audit
  trail mirrors the auto-mode failure paths. Parameterise
  saveFailedRecord with a `mode` argument ("auto" or "manual") rather
  than hardcoding "auto", keeping each path's mode accurate. Existing
  auto-mode callsites pass "auto" verbatim.
- internal/api/coverage_extras_test.go: simplify the GetGlobalConfig
  stub in the approve/cancel error-path tests to return an empty
  GlobalConfig — with per-account contact_email gating, the global
  notify mailbox no longer participates in authorization, so the
  NotificationEmail value was dead weight in the test fixture.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/exchange/auto.go (1)

442-460: Consider constraining mode to constants to prevent silent typos.

saveFailedRecord now accepts an arbitrary string; one typo could silently degrade downstream filtering/UI behavior.

🧩 Optional hardening of mode values
+type ExchangeMode string
+
+const (
+	ExchangeModeAuto   ExchangeMode = "auto"
+	ExchangeModeManual ExchangeMode = "manual"
+)
...
-func saveFailedRecord(ctx context.Context, params RunAutoExchangeParams, rec ReshapeRecommendation, offeringID, paymentDueStr, errMsg, mode string) {
+func saveFailedRecord(ctx context.Context, params RunAutoExchangeParams, rec ReshapeRecommendation, offeringID, paymentDueStr, errMsg string, mode ExchangeMode) {
...
-		Mode:               mode,
+		Mode:               string(mode),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/exchange/auto.go` around lines 442 - 460, The saveFailedRecord function
accepts an arbitrary mode string which risks silent typos; change it to use a
constrained Mode type and constants (e.g., type Mode string with const ModeAuto
= "auto", ModeManual = "manual") and update saveFailedRecord's signature to
accept Mode (or validate the incoming string against those constants at the top
of saveFailedRecord) and set ExchangeRecord.Mode using the constant/validated
value; also update callers of saveFailedRecord (call sites passing
"auto"/"manual") to use the new Mode constants or ensure they pass validated
values so downstream filtering/UI sees only the expected values.
internal/api/handler_plans.go (1)

259-274: Preserve partial progress on loop errors instead of returning 0.

At Line 261 (and similarly Line 273), returning 0 can mask already persisted executions from earlier iterations. Returning created makes retries and incident handling safer.

♻️ Proposed adjustment
-		if err != nil {
-			return 0, fmt.Errorf("failed to generate approval token: %w", err)
-		}
+		if err != nil {
+			return created, fmt.Errorf("failed to generate approval token after creating %d executions: %w", created, err)
+		}
...
-		if err := h.config.SavePurchaseExecution(ctx, execution); err != nil {
-			return 0, fmt.Errorf("failed to save execution: %w", err)
-		}
+		if err := h.config.SavePurchaseExecution(ctx, execution); err != nil {
+			return created, fmt.Errorf("failed to save execution after creating %d executions: %w", created, err)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_plans.go` around lines 259 - 274, The error paths inside
the loop currently return 0 (e.g., after GenerateApprovalToken failure and after
SavePurchaseExecution failure) which hides any executions already persisted;
update those return statements to return the current "created" count instead of
0 so partial progress is preserved. Specifically, in the block where
GenerateApprovalToken is called and where h.config.SavePurchaseExecution(ctx,
execution) is called, replace the "return 0, fmt.Errorf(...)" with "return
created, fmt.Errorf(...)" (use the existing created variable in the surrounding
function) so callers and retry logic can detect how many executions were
successfully created; ensure this change is made in the same function that
constructs config.PurchaseExecution and references plan.RampSchedule.CurrentStep
and execution.ExecutionID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/api/handler_plans.go`:
- Around line 259-274: The error paths inside the loop currently return 0 (e.g.,
after GenerateApprovalToken failure and after SavePurchaseExecution failure)
which hides any executions already persisted; update those return statements to
return the current "created" count instead of 0 so partial progress is
preserved. Specifically, in the block where GenerateApprovalToken is called and
where h.config.SavePurchaseExecution(ctx, execution) is called, replace the
"return 0, fmt.Errorf(...)" with "return created, fmt.Errorf(...)" (use the
existing created variable in the surrounding function) so callers and retry
logic can detect how many executions were successfully created; ensure this
change is made in the same function that constructs config.PurchaseExecution and
references plan.RampSchedule.CurrentStep and execution.ExecutionID.

In `@pkg/exchange/auto.go`:
- Around line 442-460: The saveFailedRecord function accepts an arbitrary mode
string which risks silent typos; change it to use a constrained Mode type and
constants (e.g., type Mode string with const ModeAuto = "auto", ModeManual =
"manual") and update saveFailedRecord's signature to accept Mode (or validate
the incoming string against those constants at the top of saveFailedRecord) and
set ExchangeRecord.Mode using the constant/validated value; also update callers
of saveFailedRecord (call sites passing "auto"/"manual") to use the new Mode
constants or ensure they pass validated values so downstream filtering/UI sees
only the expected values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5578279b-0e3f-4a4d-830e-8c3b47da33b6

📥 Commits

Reviewing files that changed from the base of the PR and between e9ab8ba and 9f23edb.

📒 Files selected for processing (10)
  • internal/api/coverage_extras_test.go
  • internal/api/handler_plans.go
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/handler_test.go
  • internal/purchase/messages.go
  • internal/purchase/notifications.go
  • pkg/common/tokens.go
  • pkg/common/tokens_test.go
  • pkg/exchange/auto.go

cristim added 2 commits April 28, 2026 01:07
…token-entropy

PR #145 (`feat(history): inline Cancel button + cancel-any/own RBAC verbs`)
landed on the base while #101 was in flight. Both touch
`internal/api/handler_purchases.go`'s pending-execution construction in
`executePurchase`:

- #101 hardens the path by extracting `newPendingExecution(req, total,
  savings)` so the approval token comes from `crypto/rand` via
  `common.GenerateApprovalToken` instead of the previous inline
  `uuid.New().String()`.
- #145 adds `CreatedByUserID: resolveCreatorUserID(session)` to every
  pending execution so the cancel-own RBAC path (issue #46) can identify
  the creator on later cancellation.

The two changes are semantically independent — keeping both. Resolution:
keep #101's `newPendingExecution(...)` call, then set
`execution.CreatedByUserID = resolveCreatorUserID(session)` immediately
after construction. The helper stays signature-stable (no session
parameter to plumb) and the cancel-own stamp is in place for every code
path that goes through `executePurchase`.

Other files in the merge auto-resolved cleanly
(`internal/api/coverage_extras_test.go`,
`internal/api/handler_purchases_test.go`,
`internal/api/handler_test.go`).

`go build ./...`, `go test ./...`, `go test ./...` from `pkg/` all clean.
…on plan create

Two CodeRabbit nitpicks from review 4184423401:

1. `pkg/exchange/auto.go` — introduce a typed `ExchangeMode` with
   `ExchangeModeAuto` / `ExchangeModeManual` constants and tighten
   `saveFailedRecord`'s `mode` parameter from `string` to that type.
   The previous round of fixes parameterised `mode` as a raw string,
   which let a caller-side typo silently leak into `ExchangeRecord.Mode`
   and break downstream filtering. `ExchangeRecord.Mode` itself stays
   `string` for serialization stability — this is call-site discipline,
   not a schema change. Updated all five call sites (1 manual, 4 auto)
   to pass the constant.

2. `internal/api/handler_plans.go` — `createPurchaseExecutions` now
   returns the running `created` count alongside an error rather than
   `0`. Callers that retry or audit can see how many executions made it
   to the DB before the loop bailed; the error message also names the
   count for log/incident clarity. The current caller drops the count
   on error, so this is a no-op functionally and a future-proofing fix
   for retry paths.

`go build ./...`, `go test ./...`, `go test ./...` from `pkg/` all clean.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/handler_purchases.go (1)

965-995: ⚠️ Potential issue | 🔴 Critical

The new approver gate is still bypassable through the async token-only path.

This hardens only the HTTP route. internal/purchase/messages.go still forwards approve/cancel SQS messages with just execution_id + token, and internal/purchase/approvals.go accepts those calls after token validation alone. That leaves a live path where a replayed or forwarded async payload can approve/cancel without the session-email check added here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_purchases.go` around lines 965 - 995, The async SQS
approve/cancel path is still authorisable by token only; update the message
forwarding and processing so they perform the same approver-email check as
authorizeApprovalAction: when creating/forwarding SQS messages in the message
sender (messages.go), include the initiating actor/session email (or reject if
not present), and in the approval processor in approvals.go, after token
validation call resolveApprovalRecipients(...) and verify the provided actor
email matches one of the approvers (same logic as in authorizeApprovalAction,
including actorLower normalization and NewClientError-style rejection); ensure
missing or non-matching actor info causes the async message to be rejected and
logged.
🧹 Nitpick comments (1)
pkg/exchange/auto.go (1)

442-451: Use the new mode constants for success-path record writes too.

You introduced ExchangeMode to avoid string typos; reusing it in success-path record.Mode assignments would complete that safety improvement.

♻️ Proposed small consistency patch
diff --git a/pkg/exchange/auto.go b/pkg/exchange/auto.go
@@
-        Mode:               "manual",
+        Mode:               string(ExchangeModeManual),
@@
-        Mode:               "auto",
+        Mode:               string(ExchangeModeAuto),

Also applies to: 456-470

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/exchange/auto.go` around lines 442 - 451, Replace raw string assignments
to record.Mode in the success-path record writes with the new ExchangeMode
constants: use ExchangeModeAuto or ExchangeModeManual (convert to string if
record.Mode is a plain string) wherever record.Mode is set (the success-path
writes near the same area as the saveFailedRecord usage); ensure any comparison
or assignment that previously used literal "auto"/"manual" uses ExchangeMode and
the corresponding ExchangeModeAuto/ExchangeModeManual to avoid typos.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/api/handler_plans.go`:
- Around line 259-279: The loop in createPlannedPurchases writes each
PurchaseExecution (constructed in execution and persisted via
h.config.SavePurchaseExecution) one-by-one so partial failures leave persisted
executions and allow retries to duplicate them; change the flow to make the
batch atomic: either wrap all SavePurchaseExecution calls in a single DB
transaction exposed by h.config (e.g., SavePurchaseExecutionTx/BeginTx/Commit)
and only call updatePlanNextExecutionDate after Commit, or if a transaction API
isn't available, make creation idempotent by checking for an existing execution
(by PlanID+ScheduledDate or ApprovalToken) before saving and skip duplicates,
then call updatePlanNextExecutionDate only when all executions are confirmed
saved. Ensure references: createPlannedPurchases, PurchaseExecution,
SavePurchaseExecution, updatePlanNextExecutionDate, and ApprovalToken are
updated accordingly.

In `@internal/api/handler_purchases.go`:
- Around line 858-866: lookupContactEmail currently swallows GetCloudAccount
errors and returns an empty string, which makes transient lookup failures appear
as "no contact_email" in the handler; modify lookupContactEmail to return
(string, error) (or a second boolean/error sentinel) so the caller can
distinguish "no email configured" from "lookup failed", then update the branch
in handler_purchases.go that currently checks if len(contactEmails) == 0 to
instead check the returned error first (if err != nil, propagate/return the
error as a retriable server error) and only fall back to globalNotify when the
call succeeded and truly returned no contact email; update any callers of
lookupContactEmail accordingly (preserve existing logging but surface the error
to the handler rather than converting it to an empty string).

---

Outside diff comments:
In `@internal/api/handler_purchases.go`:
- Around line 965-995: The async SQS approve/cancel path is still authorisable
by token only; update the message forwarding and processing so they perform the
same approver-email check as authorizeApprovalAction: when creating/forwarding
SQS messages in the message sender (messages.go), include the initiating
actor/session email (or reject if not present), and in the approval processor in
approvals.go, after token validation call resolveApprovalRecipients(...) and
verify the provided actor email matches one of the approvers (same logic as in
authorizeApprovalAction, including actorLower normalization and
NewClientError-style rejection); ensure missing or non-matching actor info
causes the async message to be rejected and logged.

---

Nitpick comments:
In `@pkg/exchange/auto.go`:
- Around line 442-451: Replace raw string assignments to record.Mode in the
success-path record writes with the new ExchangeMode constants: use
ExchangeModeAuto or ExchangeModeManual (convert to string if record.Mode is a
plain string) wherever record.Mode is set (the success-path writes near the same
area as the saveFailedRecord usage); ensure any comparison or assignment that
previously used literal "auto"/"manual" uses ExchangeMode and the corresponding
ExchangeModeAuto/ExchangeModeManual to avoid typos.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43aaec54-8c68-4848-85f2-625e61bfc8ac

📥 Commits

Reviewing files that changed from the base of the PR and between 9f23edb and e3d3ca5.

📒 Files selected for processing (6)
  • internal/api/coverage_extras_test.go
  • internal/api/handler_plans.go
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/handler_test.go
  • pkg/exchange/auto.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/api/coverage_extras_test.go

Comment thread internal/api/handler_plans.go Outdated
Comment thread internal/api/handler_purchases.go
@cristim cristim added triaged Item has been triaged priority/p1 Next up; this sprint severity/high Significant harm urgency/now Drop other things impact/all-users Affects every user effort/m Days type/security Security finding labels Apr 28, 2026
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 28, 2026

P1 — fixes two HIGH security findings from the approval flow: (H1) tokens were generated with uuid.New() (122-bit, known format) rather than crypto/rand, and (H3) the global notification email acted as a catch-all approver bypassing per-account contact gating. Operator action required before merge: ensure all accounts with active purchase executions have contact_email set, or future approvals will 403 until configured. CI green, MERGEABLE. (triage agent wave2-E)

cristim added 3 commits April 28, 2026 18:35
The HTTP approve/cancel route enforces a per-account contact_email gate
via authorizeApprovalAction, but the SQS message handler accepted any
payload after a valid token alone — letting a replayed or forwarded
async message approve or cancel a purchase without the email check.

Add an actor_email field to AsyncMessage and require it for approve/
cancel. handleApproveMessage and handleCancelMessage now run the new
verifyAsyncApprovalActor helper, which:

  1. Rejects messages without actor_email (legacy / replayed payloads
     have no field — they are NOT silently accepted).
  2. Validates the approval token using constant-time compare.
  3. Resolves the per-account contact_email approver list from the
     execution's recommendations and asserts actor_email matches one
     (case-insensitive, trimmed). Empty approver list is rejected with
     the same policy as authorizeApprovalAction.

Legacy in-flight messages without actor_email are rejected by design
(option (a) of the hardening plan): backfilling tokenless approvals
is exactly the threat model this fix addresses, and any stranded
action can be re-issued via the HTTP route.

The approver-list algorithm duplicates ~15 lines from
internal/api/handler_purchases.go gatherAccountContactEmails because
internal/api and internal/purchase share no transitive dependency; a
shared helper would cost more than keeping the two loops in sync.
Both sides cross-reference each other in comments so future edits
don't drift.

Tests cover the four scenarios called out in CodeRabbit's review:
matching actor → success, non-matching actor → reject, missing actor
field → reject, token mismatch → reject (existing behaviour, no
regression). MockConfigStore.GetCloudAccount in internal/purchase
gains the same isExpected pattern internal/api/mocks_test.go uses so
existing tests that don't care about account lookups stay green.
…opagation

Two CodeRabbit findings on PR #101, both touching internal/api purchase
flow.

Finding 2 — partial-failure orphaned rows.

createPlannedPurchases looped SavePurchaseExecution outside any
transaction, then called UpdatePurchasePlan once at the end. A
mid-loop failure (e.g. transient DB error on row 4 of 5) committed
rows 1-3, skipped the plan's next_execution_date bump, and on retry
duplicated rows 1-3. Wrap the loop and the plan update in a single
WithTx so both classes of corruption are impossible — the caller can
safely retry transient errors knowing nothing was committed.

Approach: WithTx (option (a)). The codebase already exposes WithTx +
SavePurchaseExecutionTx and uses both elsewhere (purchase suppressions,
RI exchange flows). Adding the matching UpdatePurchasePlanTx variant
costs less than introducing idempotency keys for one caller, and TX
semantics give the caller a clean retry contract instead of
"may-or-may-not-have-rows-from-the-previous-attempt".

UpdatePurchasePlan now delegates to UpdatePurchasePlanTx via a
single-call WithTx, sharing SQL between the two variants. UpdatedAt
is stamped on the public method before the WithTx call so existing
nil-DB tests that inspect UpdatedAt without exercising the DB still
see it set. The tx variant also stamps if zero, so direct callers
that bundle this with other writes don't have to remember.

A regression test (TestHandler_createPlannedPurchases_MidLoopFail-
ureRollsBack) drives SavePurchaseExecutionTx through the mock to
fail on the third call and asserts UpdatePurchasePlan / Tx is never
reached — the rollback semantics CodeRabbit asked for.

Finding 3 — silent error swallow in lookupContactEmail.

lookupContactEmail's old signature returned bare string and logged
DB errors as "". The caller couldn't distinguish "no contact email
configured" (legitimate fall-through to globalNotify) from a real
transient lookup failure that should surface as retriable. A DB blip
could silently change who is allowed to approve, which is exactly
the wrong failure mode for an authorisation lookup.

New signature: (string, error). Real error → propagated. Account
not found (pgx.ErrNoRows → nil, nil from the postgres store) →
("", nil), same fall-through as before. Account found but empty
ContactEmail → ("", nil), unchanged. The single caller
gatherAccountContactEmails wraps and returns; resolveApprovalRecipi-
ents already plumbed err through, so callers up the chain work
unchanged.

A regression test (TestHandler_resolveApprovalRecipients_LookupErr-
orPropagates) mocks GetCloudAccount to return a transient error and
asserts the error is wrapped (errors.Is preserves the chain) and no
fallback approver list is returned.

Mock surface updates: every config.StoreInterface mock in the
codebase (internal/api, internal/purchase, internal/analytics,
internal/scheduler, internal/server, internal/mocks) gains
UpdatePurchasePlanTx, with the same isExpected fall-through pattern
used for SavePurchaseExecutionTx so tests that don't care about the
Tx variant stay green. The pgxmock-backed UpdatePurchasePlan_NotFound
test now scripts the matching Begin / Rollback frame.
Two success-path record.Mode assignments in pkg/exchange/auto.go still
used the literal strings "manual" and "auto" while the failed-record
path (saveFailedRecord) was already routed through ExchangeMode
constants ExchangeModeManual / ExchangeModeAuto. Replace the literals
with string(ExchangeModeManual) and string(ExchangeModeAuto) so the
constants are the single source of truth for valid values — closes
the residual call-site discipline gap CodeRabbit flagged on PR #101.

The ExchangeRecord.Mode storage type stays string for serialization
stability; this change is purely about preventing typo drift at
write sites.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 28, 2026

All four CodeRabbit findings from the 2026-04-27 review (run id 43aaec54-8c68-4848-85f2-625e61bfc8ac) addressed across three atomic commits.

Finding 1 — CRITICAL: async SQS path bypassed the approver gate

Commit a36e0d557 fix(security): close async-SQS bypass of approver gate (#101 CRITICAL)

  • Added actor_email field to internal/purchase/messages.go AsyncMessage. Required for approve / cancel messages — empty / missing field is rejected before any state mutation.
  • New verifyAsyncApprovalActor helper in internal/purchase/messages.go runs the same approver check authorizeApprovalAction runs on the HTTP path: load execution → constant-time token compare → resolve per-account contact_email approver list → match actor_email (case-insensitive, trimmed). Empty approver list rejected with the same policy as the HTTP path.
  • Legacy in-flight messages — option (a), reject. Decided against backfilling tokenless approvals: that would reopen the exact threat model this fix closes. Stranded actions can be re-issued via the HTTP route gated by authorizeApprovalAction. Documented in the commit body.
  • Approver-list algorithm duplicates ~15 lines from internal/api/handler_purchases.go gatherAccountContactEmails; both sides cross-reference each other in comments. internal/api and internal/purchase share no transitive dependency, and a shared helper in internal/config would cost more than keeping the two loops in sync. Will hoist if a third caller appears.
  • Tests in internal/purchase/coverage_extra_test.go: matching actor → success, non-matching actor → reject, missing actor field → reject, token mismatch → reject, no approvers configured → reject. Existing happy-path tests updated for the new field.

Finding 2 — createPlannedPurchases partial-failure orphaned rows

Commit b49c49564 fix(api): atomic createPlannedPurchases + lookupContactEmail error propagation

  • Approach: WithTx (option (a)), not idempotent-keys. The codebase already exposes WithTx + SavePurchaseExecutionTx (internal/config/interfaces.go) and uses both elsewhere (purchase suppressions, RI exchange flows). Adding the matching UpdatePurchasePlanTx cost less than introducing idempotency keys for one caller, and TX semantics give the caller a clean retry contract.
  • createPlannedPurchases now wraps the per-row execution inserts and the plan's next_execution_date bump in a single WithTx. Mid-loop failure rolls everything back — no orphaned rows, no stale plan pointer, safe to retry on transient errors.
  • UpdatePurchasePlan delegates to UpdatePurchasePlanTx via a single-call WithTx, sharing SQL between the two variants. UpdatedAt is stamped on the public method so existing nil-DB tests that inspect it without exercising the DB still pass.
  • Regression test TestHandler_createPlannedPurchases_MidLoopFailureRollsBack: drives SavePurchaseExecutionTx to fail on row 3 of 5, asserts UpdatePurchasePlan / UpdatePurchasePlanTx is never reached.

Finding 3 — lookupContactEmail swallowed lookup errors

Same commit b49c49564.

  • Signature changed from (string) to (string, error). Real DB error → propagated. Account not found (pgx.ErrNoRowsnil, nil from the postgres store) → ("", nil). Account found but empty ContactEmail("", nil). Single caller gatherAccountContactEmails wraps and propagates; resolveApprovalRecipients already plumbed err through, so up-chain callers work unchanged.
  • Regression test TestHandler_resolveApprovalRecipients_LookupErrorPropagates: mocks GetCloudAccount to return a transient error, asserts the error is wrapped (errors.Is preserves the chain) and no fallback approver list is returned.

Finding 4 — nitpick: success-path Mode literals

Commit b0d4d7f98 chore(exchange): use ExchangeMode constants in success-path writes

  • pkg/exchange/auto.go lines 316 and 429 now use string(ExchangeModeManual) and string(ExchangeModeAuto) instead of bare string literals, matching the saveFailedRecord path that already routed through the constants. ExchangeRecord.Mode storage type stays string for serialization stability.

Verification

  • go test ./... from repo root — all packages PASS.
  • go test ./... from pkg/ submodule — all packages PASS.
  • gh pr view 101 --json mergeStateStatus was CLEAN before push, plain push (no rebase).
  • Pre-commit hooks (gocyclo, AWS secret scanner, gosec, trivy, full Go test suite) passed on every commit.

Mock surface updates

Every config.StoreInterface mock in the codebase gained UpdatePurchasePlanTx (with the same isExpected fall-through pattern used for SavePurchaseExecutionTx): internal/api, internal/purchase, internal/analytics, internal/scheduler, internal/server, internal/mocks. The pgxmock-backed UpdatePurchasePlan_NotFound test now scripts the matching Begin / Rollback frame.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

@cristim Thank you for the thorough remediation write-up. The three-commit breakdown is clear and well-documented:

  • a36e0d5 closes the CRITICAL async-SQS bypass with verifyAsyncApprovalActor mirroring the HTTP gate, including the deliberate reject-legacy-in-flight-messages decision and its rationale.
  • b49c495 addresses both the orphaned-row risk (atomic WithTx wrapping execution inserts + plan bump) and the swallowed lookupContactEmail errors, with regression tests for both.
  • b0d4d7f tidies up the ExchangeMode literal inconsistency on success paths.

Triggering a review of the new commits now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/api/handler_plans_test.go (1)

274-342: This test still bypasses the transaction boundary.

The shared MockConfigStore.WithTx fallback just invokes the callback, so this case only checks the inner loop. If the rollback guarantee is part of the regression, consider a tx-aware spy or a pgxmock-backed assertion so the test fails when the handler stops using WithTx.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler_plans_test.go` around lines 274 - 342, The test
currently bypasses transaction semantics because MockConfigStore.WithTx simply
invokes the callback; change the test to enforce a transaction boundary by
making the mock WithTx either (a) expect to be called and run the callback in a
tx-aware spy that records/blocks calls to
UpdatePurchasePlan/UpdatePurchasePlanTx and simulates a rollback when
SavePurchaseExecutionTx returns an error, or (b) replace the mock with a
pgxmock-backed store so real BEGIN/ROLLBACK occur; specifically alter
TestHandler_createPlannedPurchases_MidLoopFailureRollsBack to set an expectation
on MockConfigStore.WithTx (or use a tx-capable fake) and ensure
SavePurchaseExecutionTx failures cause the WithTx to abort/rollback and that
UpdatePurchasePlan / UpdatePurchasePlanTx are not invoked by
Handler.createPlannedPurchases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/scheduler/scheduler_test.go`:
- Around line 91-101: The mock method UpdatePurchasePlanTx discards the tx
parameter when invoking m.Called, causing expectations to not match; change the
call to include the transaction by calling m.Called(ctx, tx, plan) so registered
expectations for UpdatePurchasePlanTx receive the same (ctx, tx, plan) tuple,
leaving the fallback to m.UpdatePurchasePlan(ctx, plan) unchanged.

---

Nitpick comments:
In `@internal/api/handler_plans_test.go`:
- Around line 274-342: The test currently bypasses transaction semantics because
MockConfigStore.WithTx simply invokes the callback; change the test to enforce a
transaction boundary by making the mock WithTx either (a) expect to be called
and run the callback in a tx-aware spy that records/blocks calls to
UpdatePurchasePlan/UpdatePurchasePlanTx and simulates a rollback when
SavePurchaseExecutionTx returns an error, or (b) replace the mock with a
pgxmock-backed store so real BEGIN/ROLLBACK occur; specifically alter
TestHandler_createPlannedPurchases_MidLoopFailureRollsBack to set an expectation
on MockConfigStore.WithTx (or use a tx-capable fake) and ensure
SavePurchaseExecutionTx failures cause the WithTx to abort/rollback and that
UpdatePurchasePlan / UpdatePurchasePlanTx are not invoked by
Handler.createPlannedPurchases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c2e8ef5d-4886-4b18-aa48-a127c3940de1

📥 Commits

Reviewing files that changed from the base of the PR and between e3d3ca5 and b0d4d7f.

📒 Files selected for processing (17)
  • internal/analytics/collector_test.go
  • internal/api/handler_plans.go
  • internal/api/handler_plans_test.go
  • internal/api/handler_purchases.go
  • internal/api/handler_purchases_test.go
  • internal/api/mocks_test.go
  • internal/config/interfaces.go
  • internal/config/store_postgres.go
  • internal/config/store_postgres_pgxmock_test.go
  • internal/mocks/stores.go
  • internal/purchase/approvals.go
  • internal/purchase/coverage_extra_test.go
  • internal/purchase/messages.go
  • internal/purchase/mocks_test.go
  • internal/scheduler/scheduler_test.go
  • internal/server/test_helpers_test.go
  • pkg/exchange/auto.go
✅ Files skipped from review due to trivial changes (2)
  • internal/server/test_helpers_test.go
  • internal/purchase/approvals.go

Comment thread internal/scheduler/scheduler_test.go
Address CodeRabbit r3 findings:

- internal/scheduler/scheduler_test.go: MockConfigStore.UpdatePurchasePlanTx
  now forwards tx to m.Called(ctx, tx, plan) so registered expectations
  match the same (ctx, tx, plan) tuple as the canonical mock in
  internal/mocks/stores.go. Previously the discarded tx broke any test
  that registered an On("UpdatePurchasePlanTx", ...) expectation.

- internal/api/handler_plans_test.go: TestHandler_createPlannedPurchases_
  MidLoopFailureRollsBack now registers an explicit WithTx expectation
  and asserts createPlannedPurchases ran the save loop inside it. The
  default WithTx mock invokes fn(nil) regardless of whether the handler
  actually uses a transaction, so a future refactor dropping the tx
  boundary would have passed silently. The inner loop error is now
  captured via withTxFnErr so the "failed to save execution" wrapping
  is still asserted on, just one level closer to its source.

No production code changed; test-only hardening.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

CR round 4 follow-up

Addressed both findings from the latest CodeRabbit review (commit 2017748):

Actionable: internal/scheduler/scheduler_test.go L91-101 — fixed.
MockConfigStore.UpdatePurchasePlanTx now forwards tx to m.Called(ctx, tx, plan) so registered expectations match the same (ctx, tx, plan) tuple as the canonical mock in internal/mocks/stores.go. Previously the discarded tx parameter would silently break any test that registered an On("UpdatePurchasePlanTx", ...) expectation.

Nitpick: internal/api/handler_plans_test.go L274-342 — fixed.
TestHandler_createPlannedPurchases_MidLoopFailureRollsBack now registers an explicit On("WithTx", ...) expectation and asserts (via require.True(t, withTxCalled, ...)) that createPlannedPurchases ran the save loop inside it. The default MockConfigStore.WithTx fallback invokes fn(nil) regardless of whether the handler actually uses a transaction, so a future refactor that drops the tx boundary would have passed the previous version of this test silently. The inner-loop error wrapping ("failed to save execution") is now captured via withTxFnErr and asserted directly, one level closer to its source.

No production code changed; this is test-only hardening.

Local go test -count=1 ./... passes; pre-commit hooks (gofmt, vet, gosec, trivy, full test suite) green.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Both fixes look solid — thank you for the detailed write-up.

  • internal/scheduler/scheduler_test.go — forwarding tx into m.Called(ctx, tx, plan) is the correct fix; silently discarding it would have allowed any On("UpdatePurchasePlanTx", mock.Anything, <specific-tx>, plan) expectation to never match, masking test intent.
  • internal/api/handler_plans_test.go — the explicit On("WithTx", ...) + require.True(t, withTxCalled) guard is a meaningful improvement; the previous fallback-only approach would have let a future removal of the tx boundary pass silently, defeating the purpose of the regression test.

Triggering a review of commit 20177482 now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim cristim merged commit eb7bf49 into feat/multicloud-web-frontend Apr 29, 2026
3 checks passed
@cristim cristim deleted the fix/approval-token-entropy branch April 30, 2026 14:00
cristim added a commit that referenced this pull request Apr 30, 2026
…n email-link cancel (#216)

* fix(purchases): admin / cancel-* sessions bypass contact_email gate on email-link cancel

The deep-link cancel flow (frontend purchases-deeplink.ts) always POSTs
/api/purchases/cancel/:id with both an X-Authorization session AND the
URL token from the email link. cancelPurchase took the token branch
unconditionally → authorizeApprovalAction → 403 when the execution had
no per-account contact_email available (e.g. AWS ambient-credentials
recs where CloudAccountID is nil, or any execution whose recommendation
accounts simply have an empty contact_email column):

  Failed to cancel purchase: no per-account contact email configured
  for this execution; set the cloud account's contact_email before
  approving

The same admin (or any user with cancel-any:purchases / cancel-own
matching the creator) could already cancel the same execution from the
History page Cancel button — that path goes through cancelPurchaseViaSession
→ authorizeSessionCancel (RBAC matrix) and never touches contact_email.
The deep-link UX was inconsistent.

Fix: pre-check the session in cancelPurchase before falling into the
token branch. When the caller carries a valid session AND
authorizeSessionCancel approves them, take the session-authed path
regardless of whether a token is in the URL. Tokenless / no-session
callers (forwarded email, shared inbox, scripted flow without auth)
still hit the per-account contact_email gate from PR #101.

Approve flow stays strict — the dashboard has no admin approve override
either, so widening it via the email link would change the security
policy. Out of scope for this fix.

Helpers:
- New tryGetSession returns *Session or nil silently. tryResolveActorEmail
  collapses to a one-line wrapper.

Tests:
- New TestHandler_cancelPurchase_DeepLink_AdminBypassesContactEmailGate:
  admin session + token + ambient-credentials execution → 200, status
  flips, CancelledBy stamped, GetGlobalConfig (the token branch's
  signature call) is asserted NOT called.
- New TestHandler_cancelPurchase_DeepLink_CancelOwnBypassesContactEmailGate:
  non-admin with cancel-own + matching creator → 200.
- New TestHandler_cancelPurchase_DeepLink_NonPrivilegedSessionStillHitsContactGate:
  pins the security model — a logged-in user without admin / cancel-* MUST
  still go through authorizeApprovalAction.
- Existing TestHandler_cancelPurchase, TestHandler_cancelPurchase_PurchaseError,
  and TestHandler_HandleRequest_CancelPurchase get HasPermissionAPI mock
  stubs returning false so the new pre-check correctly falls through to the
  token branch (preserving their original assertions).

go test ./... — every package green.

* fix(purchases): propagate transient auth errors instead of silent fallback (CR #216)

CodeRabbit on PR #216 flagged that the new session-authed pre-check in
cancelPurchase silently swallowed every error from authorizeSessionCancel
and treated them all as "fall through to the contact_email gate". That
masks transient auth-service failures (HasPermissionAPI returning a
wrapped error, h.auth nil, network blip) behind a misleading "set the
contact_email" 403 — exactly the user-facing message #216 was meant to
stop conflating with real failures.

Distinguish the two cases via a new local helper isPermissionDenied,
which uses the existing IsClientError + .code accessor to detect a 403
ClientError specifically. authorizeSessionCancel returns 403 ClientError
on legitimate denials (admin role mismatch, missing cancel-* verb,
cancel-own creator mismatch) and a wrapped non-ClientError on transient
backend failures.

  case err == nil:           → session-authed cancel
  case isPermissionDenied(): → fall through to contact_email gate
  default:                   → return err (propagate)

Regression test
TestHandler_cancelPurchase_DeepLink_TransientAuthErrorPropagates simulates
a HasPermissionAPI failure ("auth backend timeout") and asserts:
  * the error surfaces with "permission check failed" wrapped,
  * the contact_email message is NOT in the error,
  * GetGlobalConfig (the token branch's signature call) is NOT reached.

Existing 403-fall-through path covered by
TestHandler_cancelPurchase_DeepLink_NonPrivilegedSessionStillHitsContactGate.

* fix(purchases): isPermissionDenied uses strict type assertion (CR pass 2 #216)

CodeRabbit pass 2 on PR #216 flagged that the helper introduced in pass 1
relied on IsClientError, which goes through errors.As — that unwraps the
error chain. A wrapped 403 ClientError (e.g. fmt.Errorf("permission check
failed: %w", NewClientError(403, ...))) would still be classified as
permission denied, even though the wrapper changes the failure's outer
category and signals a different intent.

Switch to a direct *clientError type assertion (no unwrapping). Only an
exact, un-wrapped 403 ClientError now triggers the fall-through to the
contact_email gate; anything wrapping a 403 propagates as the wrapper's
own failure mode. This preserves the propagate-vs-fall-through split's
original intent against future code that might decide to wrap auth-layer
errors for context.

Regression: TestIsPermissionDenied table-driven tests pin all six cases
including the new "wrapped 403 is NOT denial" invariant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/m Days impact/all-users Affects every user priority/p1 Next up; this sprint severity/high Significant harm triaged Item has been triaged type/security Security finding urgency/now Drop other things

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant