fix(security): approval-flow hardening — crypto/rand tokens + per-account contact gating#101
Conversation
…t contact gating
Addresses 3 HIGH findings from the security review:
H1 — Approval tokens were generated with `uuid.New().String()`. UUID v4 is
122 bits of entropy in a known format with version + variant nibbles fixed,
which is fine for identity but a poor primary credential. Switched all four
token-generation sites to a new `pkg/common.GenerateApprovalToken()` that
returns 32 random bytes from `crypto/rand`, hex-encoded — full 256 bits of
unpredictability and a uniform output space.
Sites updated:
- internal/api/handler_purchases.go (web purchase flow)
- internal/api/handler_plans.go (plan execution creation)
- internal/purchase/notifications.go (scheduled purchase notifications)
- pkg/exchange/auto.go (RI exchange automation, in pkg/ submodule)
The helper lives in pkg/common because pkg/exchange (a separate Go module)
cannot import internal/. All four sites now propagate the rand.Read error
rather than panicking — auto.go returns an ExchangeOutcome with the error
fielded so the surrounding loop continues processing the rest of the recs.
H3 — `authorizeApprovalAction` previously fell back to the global Settings
notification email as a catch-all approver when no recommendation had a
per-account contact_email. Anyone holding the global notify mailbox could
approve spend on any account with the policy. Closed the loophole:
`resolveApprovalRecipients` no longer adds globalNotify to the approver
set — only per-account contact_email values count. authorizeApprovalAction
returns 403 with a clear message ("set the cloud account's contact_email
before approving") when no contact_email is configured.
Behaviour change to call out for operators: any execution whose
recommendations all reference accounts WITHOUT a contact_email will fail
approval until contact_email is set on the account. Existing tenants
who relied on the global mailbox are affected.
H2 — RI exchange single-use enforcement. Audit confirmed it's already in
place at the SQL layer: `TransitionRIExchangeStatus(pending → processing)`
is atomic, and a concurrent second approval attempt returns 409
("already processed"). `subtle.ConstantTimeCompare` on the token compare.
No code change needed; documenting the audit here so future readers don't
think H2 was missed.
Tests: 4 new tests covering token uniqueness/length, the no-contact-email
rejection path, and updates to existing approval-flow tests so they wire
a per-account ContactEmail through MockConfigStore.GetCloudAccountFn (the
prior tests relied on the global-fallback that this PR removes).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughApproval tokens are now crypto-random via Changes
sequenceDiagram
participant Client
participant Handler as API/SQS Handler
participant TokenGen as common.GenerateApprovalToken
participant DB as Postgres (WithTx)
participant Resolver as Recipient Resolver
participant CloudAPI as ConfigStore.GetCloudAccount
Client->>Handler: submit approve/cancel (HTTP session or SQS msg with actor_email)
Handler->>DB: Get PurchaseExecution (includes Recommendations)
DB-->>Handler: execution with CloudAccountIDs
Handler->>TokenGen: GenerateApprovalToken() (when creating execution)
TokenGen-->>Handler: token / error
Handler->>Resolver: resolveApprovalRecipients(recommendations)
Resolver->>CloudAPI: GetCloudAccount(cloudAccountID)
CloudAPI-->>Resolver: CloudAccount{ContactEmail}
alt ContactEmail found
Resolver-->>Handler: approvers includes ContactEmail
Handler->>DB: WithTx { SavePurchaseExecutionTx / UpdatePurchasePlanTx / ApproveExecution/CancelExecution }
DB-->>Handler: commit
Handler-->>Client: success
else ContactEmail missing or lookup error
Resolver-->>Handler: approvers empty or error
Handler-->>Client: reject (403 or propagate error)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 23 minutes and 24 seconds. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/handler_purchases.go (1)
820-850:⚠️ Potential issue | 🟠 MajorThe SQS approval path remains unprotected by email verification and needs follow-up action.
The authorization gap is confirmed.
authorizeApprovalActioncorrectly enforces session ↔ contact_email matching for the HTTP path, but the asynchronous SQS path (handleApproveMessage/handleCancelMessage) bypasses this entirely. These handlers pass an empty actor directly toApproveExecution/CancelExecutionwithout ever checking the approver email list. Both methods validate only the token and never verify that the caller matches the approved contacts—a token holder can approve any execution.The code itself labels this the "legacy token-only path" and includes explicit comments acknowledging the attribution fallback behavior, but no deprecation notice or removal timeline exists. The SQS handlers remain wired into the active message dispatcher.
Either deprecate the SQS flow explicitly or add a comment in
messages.godocumenting this authorization gap as a known limitation pending future closure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_purchases.go` around lines 820 - 850, The SQS approval path lacks email verification: locate the asynchronous handlers handleApproveMessage and handleCancelMessage and either (A) deprecate the SQS flow by adding a clear deprecation comment and TODO with removal timeline and link to the secure HTTP path (authorizeApprovalAction / ApproveExecution / CancelExecution), or (B) add an explicit comment in messages.go documenting the known authorization gap (that SQS messages use token-only attribution and do not verify session ↔ contact_email), explain the risk, and add a TODO referencing a follow-up to enforce approver email checks in ApproveExecution/CancelExecution; include the function names (handleApproveMessage, handleCancelMessage, ApproveExecution, CancelExecution, authorizeApprovalAction) in the comment so reviewers can find the exact code paths.
🧹 Nitpick comments (2)
pkg/exchange/auto.go (1)
277-291: LGTM, with a minor consistency observation.Token-gen failure correctly aborts the manual exchange and surfaces as
result.Failedvia theError != ""branch inprocessRecommendation. Note that unlikeprocessAutoExchange(which callssaveFailedRecordon its failure paths), this branch logs but does not persist a failed-status record — so an operator auditing the DB won't see this failure, only the log line. Given how rarecrypto/rand.Readfailures are in practice, this is fine; flagging only so the asymmetry is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/exchange/auto.go` around lines 277 - 291, In processManualExchange, after GenerateApprovalToken fails, persist a failed-status record the same way processAutoExchange does by calling saveFailedRecord with the relevant ctx, params (RunAutoExchangeParams), rec (ReshapeRecommendation), offeringID, and the error string instead of only logging; ensure the resulting ExchangeOutcome still returns the Error message and other fields as currently done so the DB audit shows the failure consistently with other failure paths.internal/api/coverage_extras_test.go (1)
41-52: LGTM — test setup correctly aligned with per-account contact-email gating.The added
accountID,RecommendationswithCloudAccountID, andGetCloudAccountFnreturning aCloudAccountwhoseContactEmailmatches the session approver match the newauthorizeApprovalActionrequirement (403 unless per-accountcontact_emailresolves to the approver). The error path under test (ApproveExecutionreturning"approval failed") is now reached only after authorization succeeds, which is what this test intends to exercise.One minor cleanup opportunity: with H3 removing
globalNotifyfromresolveApprovalRecipients, theGetGlobalConfigstub returningNotificationEmail: &approver(lines 54–56) may no longer influence authorization here. If it's still referenced by another code path on this branch it's fine to leave; otherwise consider trimming to keep the test minimal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/coverage_extras_test.go` around lines 41 - 52, The test currently stubs GetGlobalConfig to return NotificationEmail (lines referenced) but since authorizeApprovalAction now uses per-account contact_email via GetCloudAccount (mocked by mockConfig.GetCloudAccountFn) and resolveApprovalRecipients no longer uses globalNotify, remove or simplify the GetGlobalConfig stub to keep the test minimal: delete the mockConfig.GetGlobalConfig assignment or return a zero-value/global config without NotificationEmail; ensure the rest of the test still sets accountID, Recommendations (with CloudAccountID) and mockConfig.GetCloudAccountFn so authorizeApprovalAction and ApproveExecution exercise the intended authorization and error path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/api/handler_purchases.go`:
- Around line 820-850: The SQS approval path lacks email verification: locate
the asynchronous handlers handleApproveMessage and handleCancelMessage and
either (A) deprecate the SQS flow by adding a clear deprecation comment and TODO
with removal timeline and link to the secure HTTP path (authorizeApprovalAction
/ ApproveExecution / CancelExecution), or (B) add an explicit comment in
messages.go documenting the known authorization gap (that SQS messages use
token-only attribution and do not verify session ↔ contact_email), explain the
risk, and add a TODO referencing a follow-up to enforce approver email checks in
ApproveExecution/CancelExecution; include the function names
(handleApproveMessage, handleCancelMessage, ApproveExecution, CancelExecution,
authorizeApprovalAction) in the comment so reviewers can find the exact code
paths.
---
Nitpick comments:
In `@internal/api/coverage_extras_test.go`:
- Around line 41-52: The test currently stubs GetGlobalConfig to return
NotificationEmail (lines referenced) but since authorizeApprovalAction now uses
per-account contact_email via GetCloudAccount (mocked by
mockConfig.GetCloudAccountFn) and resolveApprovalRecipients no longer uses
globalNotify, remove or simplify the GetGlobalConfig stub to keep the test
minimal: delete the mockConfig.GetGlobalConfig assignment or return a
zero-value/global config without NotificationEmail; ensure the rest of the test
still sets accountID, Recommendations (with CloudAccountID) and
mockConfig.GetCloudAccountFn so authorizeApprovalAction and ApproveExecution
exercise the intended authorization and error path.
In `@pkg/exchange/auto.go`:
- Around line 277-291: In processManualExchange, after GenerateApprovalToken
fails, persist a failed-status record the same way processAutoExchange does by
calling saveFailedRecord with the relevant ctx, params (RunAutoExchangeParams),
rec (ReshapeRecommendation), offeringID, and the error string instead of only
logging; ensure the resulting ExchangeOutcome still returns the Error message
and other fields as currently done so the DB audit shows the failure
consistently with other failure paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ec5c4164-ae28-4468-8913-7d247d903bc3
📒 Files selected for processing (9)
internal/api/coverage_extras_test.gointernal/api/handler_plans.gointernal/api/handler_purchases.gointernal/api/handler_purchases_test.gointernal/api/handler_test.gointernal/purchase/notifications.gopkg/common/tokens.gopkg/common/tokens_test.gopkg/exchange/auto.go
- internal/purchase/messages.go: document the SQS approve/cancel
authorization gap explicitly. The async path bypasses the per-account
contact_email gating enforced by authorizeApprovalAction in the HTTP
handler — record this as a known limitation with a TODO referencing
the two viable closures (deprecate SQS, or carry session email +
enforce in ApproveExecution/CancelExecution). No behaviour change.
- pkg/exchange/auto.go: persist a failed-status record when
GenerateApprovalToken fails in processManualExchange so the DB audit
trail mirrors the auto-mode failure paths. Parameterise
saveFailedRecord with a `mode` argument ("auto" or "manual") rather
than hardcoding "auto", keeping each path's mode accurate. Existing
auto-mode callsites pass "auto" verbatim.
- internal/api/coverage_extras_test.go: simplify the GetGlobalConfig
stub in the approve/cancel error-path tests to return an empty
GlobalConfig — with per-account contact_email gating, the global
notify mailbox no longer participates in authorization, so the
NotificationEmail value was dead weight in the test fixture.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/exchange/auto.go (1)
442-460: Consider constrainingmodeto constants to prevent silent typos.
saveFailedRecordnow accepts an arbitrary string; one typo could silently degrade downstream filtering/UI behavior.🧩 Optional hardening of mode values
+type ExchangeMode string + +const ( + ExchangeModeAuto ExchangeMode = "auto" + ExchangeModeManual ExchangeMode = "manual" +) ... -func saveFailedRecord(ctx context.Context, params RunAutoExchangeParams, rec ReshapeRecommendation, offeringID, paymentDueStr, errMsg, mode string) { +func saveFailedRecord(ctx context.Context, params RunAutoExchangeParams, rec ReshapeRecommendation, offeringID, paymentDueStr, errMsg string, mode ExchangeMode) { ... - Mode: mode, + Mode: string(mode),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/exchange/auto.go` around lines 442 - 460, The saveFailedRecord function accepts an arbitrary mode string which risks silent typos; change it to use a constrained Mode type and constants (e.g., type Mode string with const ModeAuto = "auto", ModeManual = "manual") and update saveFailedRecord's signature to accept Mode (or validate the incoming string against those constants at the top of saveFailedRecord) and set ExchangeRecord.Mode using the constant/validated value; also update callers of saveFailedRecord (call sites passing "auto"/"manual") to use the new Mode constants or ensure they pass validated values so downstream filtering/UI sees only the expected values.internal/api/handler_plans.go (1)
259-274: Preserve partial progress on loop errors instead of returning0.At
Line 261(and similarlyLine 273), returning0can mask already persisted executions from earlier iterations. Returningcreatedmakes retries and incident handling safer.♻️ Proposed adjustment
- if err != nil { - return 0, fmt.Errorf("failed to generate approval token: %w", err) - } + if err != nil { + return created, fmt.Errorf("failed to generate approval token after creating %d executions: %w", created, err) + } ... - if err := h.config.SavePurchaseExecution(ctx, execution); err != nil { - return 0, fmt.Errorf("failed to save execution: %w", err) - } + if err := h.config.SavePurchaseExecution(ctx, execution); err != nil { + return created, fmt.Errorf("failed to save execution after creating %d executions: %w", created, err) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_plans.go` around lines 259 - 274, The error paths inside the loop currently return 0 (e.g., after GenerateApprovalToken failure and after SavePurchaseExecution failure) which hides any executions already persisted; update those return statements to return the current "created" count instead of 0 so partial progress is preserved. Specifically, in the block where GenerateApprovalToken is called and where h.config.SavePurchaseExecution(ctx, execution) is called, replace the "return 0, fmt.Errorf(...)" with "return created, fmt.Errorf(...)" (use the existing created variable in the surrounding function) so callers and retry logic can detect how many executions were successfully created; ensure this change is made in the same function that constructs config.PurchaseExecution and references plan.RampSchedule.CurrentStep and execution.ExecutionID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/api/handler_plans.go`:
- Around line 259-274: The error paths inside the loop currently return 0 (e.g.,
after GenerateApprovalToken failure and after SavePurchaseExecution failure)
which hides any executions already persisted; update those return statements to
return the current "created" count instead of 0 so partial progress is
preserved. Specifically, in the block where GenerateApprovalToken is called and
where h.config.SavePurchaseExecution(ctx, execution) is called, replace the
"return 0, fmt.Errorf(...)" with "return created, fmt.Errorf(...)" (use the
existing created variable in the surrounding function) so callers and retry
logic can detect how many executions were successfully created; ensure this
change is made in the same function that constructs config.PurchaseExecution and
references plan.RampSchedule.CurrentStep and execution.ExecutionID.
In `@pkg/exchange/auto.go`:
- Around line 442-460: The saveFailedRecord function accepts an arbitrary mode
string which risks silent typos; change it to use a constrained Mode type and
constants (e.g., type Mode string with const ModeAuto = "auto", ModeManual =
"manual") and update saveFailedRecord's signature to accept Mode (or validate
the incoming string against those constants at the top of saveFailedRecord) and
set ExchangeRecord.Mode using the constant/validated value; also update callers
of saveFailedRecord (call sites passing "auto"/"manual") to use the new Mode
constants or ensure they pass validated values so downstream filtering/UI sees
only the expected values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5578279b-0e3f-4a4d-830e-8c3b47da33b6
📒 Files selected for processing (10)
internal/api/coverage_extras_test.gointernal/api/handler_plans.gointernal/api/handler_purchases.gointernal/api/handler_purchases_test.gointernal/api/handler_test.gointernal/purchase/messages.gointernal/purchase/notifications.gopkg/common/tokens.gopkg/common/tokens_test.gopkg/exchange/auto.go
…token-entropy PR #145 (`feat(history): inline Cancel button + cancel-any/own RBAC verbs`) landed on the base while #101 was in flight. Both touch `internal/api/handler_purchases.go`'s pending-execution construction in `executePurchase`: - #101 hardens the path by extracting `newPendingExecution(req, total, savings)` so the approval token comes from `crypto/rand` via `common.GenerateApprovalToken` instead of the previous inline `uuid.New().String()`. - #145 adds `CreatedByUserID: resolveCreatorUserID(session)` to every pending execution so the cancel-own RBAC path (issue #46) can identify the creator on later cancellation. The two changes are semantically independent — keeping both. Resolution: keep #101's `newPendingExecution(...)` call, then set `execution.CreatedByUserID = resolveCreatorUserID(session)` immediately after construction. The helper stays signature-stable (no session parameter to plumb) and the cancel-own stamp is in place for every code path that goes through `executePurchase`. Other files in the merge auto-resolved cleanly (`internal/api/coverage_extras_test.go`, `internal/api/handler_purchases_test.go`, `internal/api/handler_test.go`). `go build ./...`, `go test ./...`, `go test ./...` from `pkg/` all clean.
…on plan create Two CodeRabbit nitpicks from review 4184423401: 1. `pkg/exchange/auto.go` — introduce a typed `ExchangeMode` with `ExchangeModeAuto` / `ExchangeModeManual` constants and tighten `saveFailedRecord`'s `mode` parameter from `string` to that type. The previous round of fixes parameterised `mode` as a raw string, which let a caller-side typo silently leak into `ExchangeRecord.Mode` and break downstream filtering. `ExchangeRecord.Mode` itself stays `string` for serialization stability — this is call-site discipline, not a schema change. Updated all five call sites (1 manual, 4 auto) to pass the constant. 2. `internal/api/handler_plans.go` — `createPurchaseExecutions` now returns the running `created` count alongside an error rather than `0`. Callers that retry or audit can see how many executions made it to the DB before the loop bailed; the error message also names the count for log/incident clarity. The current caller drops the count on error, so this is a no-op functionally and a future-proofing fix for retry paths. `go build ./...`, `go test ./...`, `go test ./...` from `pkg/` all clean.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/handler_purchases.go (1)
965-995:⚠️ Potential issue | 🔴 CriticalThe new approver gate is still bypassable through the async token-only path.
This hardens only the HTTP route.
internal/purchase/messages.gostill forwards approve/cancel SQS messages with justexecution_id + token, andinternal/purchase/approvals.goaccepts those calls after token validation alone. That leaves a live path where a replayed or forwarded async payload can approve/cancel without the session-email check added here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_purchases.go` around lines 965 - 995, The async SQS approve/cancel path is still authorisable by token only; update the message forwarding and processing so they perform the same approver-email check as authorizeApprovalAction: when creating/forwarding SQS messages in the message sender (messages.go), include the initiating actor/session email (or reject if not present), and in the approval processor in approvals.go, after token validation call resolveApprovalRecipients(...) and verify the provided actor email matches one of the approvers (same logic as in authorizeApprovalAction, including actorLower normalization and NewClientError-style rejection); ensure missing or non-matching actor info causes the async message to be rejected and logged.
🧹 Nitpick comments (1)
pkg/exchange/auto.go (1)
442-451: Use the new mode constants for success-path record writes too.You introduced
ExchangeModeto avoid string typos; reusing it in success-pathrecord.Modeassignments would complete that safety improvement.♻️ Proposed small consistency patch
diff --git a/pkg/exchange/auto.go b/pkg/exchange/auto.go @@ - Mode: "manual", + Mode: string(ExchangeModeManual), @@ - Mode: "auto", + Mode: string(ExchangeModeAuto),Also applies to: 456-470
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/exchange/auto.go` around lines 442 - 451, Replace raw string assignments to record.Mode in the success-path record writes with the new ExchangeMode constants: use ExchangeModeAuto or ExchangeModeManual (convert to string if record.Mode is a plain string) wherever record.Mode is set (the success-path writes near the same area as the saveFailedRecord usage); ensure any comparison or assignment that previously used literal "auto"/"manual" uses ExchangeMode and the corresponding ExchangeModeAuto/ExchangeModeManual to avoid typos.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/api/handler_plans.go`:
- Around line 259-279: The loop in createPlannedPurchases writes each
PurchaseExecution (constructed in execution and persisted via
h.config.SavePurchaseExecution) one-by-one so partial failures leave persisted
executions and allow retries to duplicate them; change the flow to make the
batch atomic: either wrap all SavePurchaseExecution calls in a single DB
transaction exposed by h.config (e.g., SavePurchaseExecutionTx/BeginTx/Commit)
and only call updatePlanNextExecutionDate after Commit, or if a transaction API
isn't available, make creation idempotent by checking for an existing execution
(by PlanID+ScheduledDate or ApprovalToken) before saving and skip duplicates,
then call updatePlanNextExecutionDate only when all executions are confirmed
saved. Ensure references: createPlannedPurchases, PurchaseExecution,
SavePurchaseExecution, updatePlanNextExecutionDate, and ApprovalToken are
updated accordingly.
In `@internal/api/handler_purchases.go`:
- Around line 858-866: lookupContactEmail currently swallows GetCloudAccount
errors and returns an empty string, which makes transient lookup failures appear
as "no contact_email" in the handler; modify lookupContactEmail to return
(string, error) (or a second boolean/error sentinel) so the caller can
distinguish "no email configured" from "lookup failed", then update the branch
in handler_purchases.go that currently checks if len(contactEmails) == 0 to
instead check the returned error first (if err != nil, propagate/return the
error as a retriable server error) and only fall back to globalNotify when the
call succeeded and truly returned no contact email; update any callers of
lookupContactEmail accordingly (preserve existing logging but surface the error
to the handler rather than converting it to an empty string).
---
Outside diff comments:
In `@internal/api/handler_purchases.go`:
- Around line 965-995: The async SQS approve/cancel path is still authorisable
by token only; update the message forwarding and processing so they perform the
same approver-email check as authorizeApprovalAction: when creating/forwarding
SQS messages in the message sender (messages.go), include the initiating
actor/session email (or reject if not present), and in the approval processor in
approvals.go, after token validation call resolveApprovalRecipients(...) and
verify the provided actor email matches one of the approvers (same logic as in
authorizeApprovalAction, including actorLower normalization and
NewClientError-style rejection); ensure missing or non-matching actor info
causes the async message to be rejected and logged.
---
Nitpick comments:
In `@pkg/exchange/auto.go`:
- Around line 442-451: Replace raw string assignments to record.Mode in the
success-path record writes with the new ExchangeMode constants: use
ExchangeModeAuto or ExchangeModeManual (convert to string if record.Mode is a
plain string) wherever record.Mode is set (the success-path writes near the same
area as the saveFailedRecord usage); ensure any comparison or assignment that
previously used literal "auto"/"manual" uses ExchangeMode and the corresponding
ExchangeModeAuto/ExchangeModeManual to avoid typos.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43aaec54-8c68-4848-85f2-625e61bfc8ac
📒 Files selected for processing (6)
internal/api/coverage_extras_test.gointernal/api/handler_plans.gointernal/api/handler_purchases.gointernal/api/handler_purchases_test.gointernal/api/handler_test.gopkg/exchange/auto.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/api/coverage_extras_test.go
|
P1 — fixes two HIGH security findings from the approval flow: (H1) tokens were generated with uuid.New() (122-bit, known format) rather than crypto/rand, and (H3) the global notification email acted as a catch-all approver bypassing per-account contact gating. Operator action required before merge: ensure all accounts with active purchase executions have |
The HTTP approve/cancel route enforces a per-account contact_email gate
via authorizeApprovalAction, but the SQS message handler accepted any
payload after a valid token alone — letting a replayed or forwarded
async message approve or cancel a purchase without the email check.
Add an actor_email field to AsyncMessage and require it for approve/
cancel. handleApproveMessage and handleCancelMessage now run the new
verifyAsyncApprovalActor helper, which:
1. Rejects messages without actor_email (legacy / replayed payloads
have no field — they are NOT silently accepted).
2. Validates the approval token using constant-time compare.
3. Resolves the per-account contact_email approver list from the
execution's recommendations and asserts actor_email matches one
(case-insensitive, trimmed). Empty approver list is rejected with
the same policy as authorizeApprovalAction.
Legacy in-flight messages without actor_email are rejected by design
(option (a) of the hardening plan): backfilling tokenless approvals
is exactly the threat model this fix addresses, and any stranded
action can be re-issued via the HTTP route.
The approver-list algorithm duplicates ~15 lines from
internal/api/handler_purchases.go gatherAccountContactEmails because
internal/api and internal/purchase share no transitive dependency; a
shared helper would cost more than keeping the two loops in sync.
Both sides cross-reference each other in comments so future edits
don't drift.
Tests cover the four scenarios called out in CodeRabbit's review:
matching actor → success, non-matching actor → reject, missing actor
field → reject, token mismatch → reject (existing behaviour, no
regression). MockConfigStore.GetCloudAccount in internal/purchase
gains the same isExpected pattern internal/api/mocks_test.go uses so
existing tests that don't care about account lookups stay green.
…opagation Two CodeRabbit findings on PR #101, both touching internal/api purchase flow. Finding 2 — partial-failure orphaned rows. createPlannedPurchases looped SavePurchaseExecution outside any transaction, then called UpdatePurchasePlan once at the end. A mid-loop failure (e.g. transient DB error on row 4 of 5) committed rows 1-3, skipped the plan's next_execution_date bump, and on retry duplicated rows 1-3. Wrap the loop and the plan update in a single WithTx so both classes of corruption are impossible — the caller can safely retry transient errors knowing nothing was committed. Approach: WithTx (option (a)). The codebase already exposes WithTx + SavePurchaseExecutionTx and uses both elsewhere (purchase suppressions, RI exchange flows). Adding the matching UpdatePurchasePlanTx variant costs less than introducing idempotency keys for one caller, and TX semantics give the caller a clean retry contract instead of "may-or-may-not-have-rows-from-the-previous-attempt". UpdatePurchasePlan now delegates to UpdatePurchasePlanTx via a single-call WithTx, sharing SQL between the two variants. UpdatedAt is stamped on the public method before the WithTx call so existing nil-DB tests that inspect UpdatedAt without exercising the DB still see it set. The tx variant also stamps if zero, so direct callers that bundle this with other writes don't have to remember. A regression test (TestHandler_createPlannedPurchases_MidLoopFail- ureRollsBack) drives SavePurchaseExecutionTx through the mock to fail on the third call and asserts UpdatePurchasePlan / Tx is never reached — the rollback semantics CodeRabbit asked for. Finding 3 — silent error swallow in lookupContactEmail. lookupContactEmail's old signature returned bare string and logged DB errors as "". The caller couldn't distinguish "no contact email configured" (legitimate fall-through to globalNotify) from a real transient lookup failure that should surface as retriable. A DB blip could silently change who is allowed to approve, which is exactly the wrong failure mode for an authorisation lookup. New signature: (string, error). Real error → propagated. Account not found (pgx.ErrNoRows → nil, nil from the postgres store) → ("", nil), same fall-through as before. Account found but empty ContactEmail → ("", nil), unchanged. The single caller gatherAccountContactEmails wraps and returns; resolveApprovalRecipi- ents already plumbed err through, so callers up the chain work unchanged. A regression test (TestHandler_resolveApprovalRecipients_LookupErr- orPropagates) mocks GetCloudAccount to return a transient error and asserts the error is wrapped (errors.Is preserves the chain) and no fallback approver list is returned. Mock surface updates: every config.StoreInterface mock in the codebase (internal/api, internal/purchase, internal/analytics, internal/scheduler, internal/server, internal/mocks) gains UpdatePurchasePlanTx, with the same isExpected fall-through pattern used for SavePurchaseExecutionTx so tests that don't care about the Tx variant stay green. The pgxmock-backed UpdatePurchasePlan_NotFound test now scripts the matching Begin / Rollback frame.
Two success-path record.Mode assignments in pkg/exchange/auto.go still used the literal strings "manual" and "auto" while the failed-record path (saveFailedRecord) was already routed through ExchangeMode constants ExchangeModeManual / ExchangeModeAuto. Replace the literals with string(ExchangeModeManual) and string(ExchangeModeAuto) so the constants are the single source of truth for valid values — closes the residual call-site discipline gap CodeRabbit flagged on PR #101. The ExchangeRecord.Mode storage type stays string for serialization stability; this change is purely about preventing typo drift at write sites.
|
All four CodeRabbit findings from the 2026-04-27 review (run id Finding 1 — CRITICAL: async SQS path bypassed the approver gateCommit
Finding 2 —
|
|
Triggering a review of the new commits now. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/api/handler_plans_test.go (1)
274-342: This test still bypasses the transaction boundary.The shared
MockConfigStore.WithTxfallback just invokes the callback, so this case only checks the inner loop. If the rollback guarantee is part of the regression, consider a tx-aware spy or a pgxmock-backed assertion so the test fails when the handler stops usingWithTx.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_plans_test.go` around lines 274 - 342, The test currently bypasses transaction semantics because MockConfigStore.WithTx simply invokes the callback; change the test to enforce a transaction boundary by making the mock WithTx either (a) expect to be called and run the callback in a tx-aware spy that records/blocks calls to UpdatePurchasePlan/UpdatePurchasePlanTx and simulates a rollback when SavePurchaseExecutionTx returns an error, or (b) replace the mock with a pgxmock-backed store so real BEGIN/ROLLBACK occur; specifically alter TestHandler_createPlannedPurchases_MidLoopFailureRollsBack to set an expectation on MockConfigStore.WithTx (or use a tx-capable fake) and ensure SavePurchaseExecutionTx failures cause the WithTx to abort/rollback and that UpdatePurchasePlan / UpdatePurchasePlanTx are not invoked by Handler.createPlannedPurchases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduler/scheduler_test.go`:
- Around line 91-101: The mock method UpdatePurchasePlanTx discards the tx
parameter when invoking m.Called, causing expectations to not match; change the
call to include the transaction by calling m.Called(ctx, tx, plan) so registered
expectations for UpdatePurchasePlanTx receive the same (ctx, tx, plan) tuple,
leaving the fallback to m.UpdatePurchasePlan(ctx, plan) unchanged.
---
Nitpick comments:
In `@internal/api/handler_plans_test.go`:
- Around line 274-342: The test currently bypasses transaction semantics because
MockConfigStore.WithTx simply invokes the callback; change the test to enforce a
transaction boundary by making the mock WithTx either (a) expect to be called
and run the callback in a tx-aware spy that records/blocks calls to
UpdatePurchasePlan/UpdatePurchasePlanTx and simulates a rollback when
SavePurchaseExecutionTx returns an error, or (b) replace the mock with a
pgxmock-backed store so real BEGIN/ROLLBACK occur; specifically alter
TestHandler_createPlannedPurchases_MidLoopFailureRollsBack to set an expectation
on MockConfigStore.WithTx (or use a tx-capable fake) and ensure
SavePurchaseExecutionTx failures cause the WithTx to abort/rollback and that
UpdatePurchasePlan / UpdatePurchasePlanTx are not invoked by
Handler.createPlannedPurchases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c2e8ef5d-4886-4b18-aa48-a127c3940de1
📒 Files selected for processing (17)
internal/analytics/collector_test.gointernal/api/handler_plans.gointernal/api/handler_plans_test.gointernal/api/handler_purchases.gointernal/api/handler_purchases_test.gointernal/api/mocks_test.gointernal/config/interfaces.gointernal/config/store_postgres.gointernal/config/store_postgres_pgxmock_test.gointernal/mocks/stores.gointernal/purchase/approvals.gointernal/purchase/coverage_extra_test.gointernal/purchase/messages.gointernal/purchase/mocks_test.gointernal/scheduler/scheduler_test.gointernal/server/test_helpers_test.gopkg/exchange/auto.go
✅ Files skipped from review due to trivial changes (2)
- internal/server/test_helpers_test.go
- internal/purchase/approvals.go
Address CodeRabbit r3 findings:
- internal/scheduler/scheduler_test.go: MockConfigStore.UpdatePurchasePlanTx
now forwards tx to m.Called(ctx, tx, plan) so registered expectations
match the same (ctx, tx, plan) tuple as the canonical mock in
internal/mocks/stores.go. Previously the discarded tx broke any test
that registered an On("UpdatePurchasePlanTx", ...) expectation.
- internal/api/handler_plans_test.go: TestHandler_createPlannedPurchases_
MidLoopFailureRollsBack now registers an explicit WithTx expectation
and asserts createPlannedPurchases ran the save loop inside it. The
default WithTx mock invokes fn(nil) regardless of whether the handler
actually uses a transaction, so a future refactor dropping the tx
boundary would have passed silently. The inner loop error is now
captured via withTxFnErr so the "failed to save execution" wrapping
is still asserted on, just one level closer to its source.
No production code changed; test-only hardening.
CR round 4 follow-upAddressed both findings from the latest CodeRabbit review (commit 2017748): Actionable: Nitpick: No production code changed; this is test-only hardening. Local @coderabbitai review |
|
Triggering a review of commit ✅ Actions performedReview triggered.
|
…n email-link cancel (#216) * fix(purchases): admin / cancel-* sessions bypass contact_email gate on email-link cancel The deep-link cancel flow (frontend purchases-deeplink.ts) always POSTs /api/purchases/cancel/:id with both an X-Authorization session AND the URL token from the email link. cancelPurchase took the token branch unconditionally → authorizeApprovalAction → 403 when the execution had no per-account contact_email available (e.g. AWS ambient-credentials recs where CloudAccountID is nil, or any execution whose recommendation accounts simply have an empty contact_email column): Failed to cancel purchase: no per-account contact email configured for this execution; set the cloud account's contact_email before approving The same admin (or any user with cancel-any:purchases / cancel-own matching the creator) could already cancel the same execution from the History page Cancel button — that path goes through cancelPurchaseViaSession → authorizeSessionCancel (RBAC matrix) and never touches contact_email. The deep-link UX was inconsistent. Fix: pre-check the session in cancelPurchase before falling into the token branch. When the caller carries a valid session AND authorizeSessionCancel approves them, take the session-authed path regardless of whether a token is in the URL. Tokenless / no-session callers (forwarded email, shared inbox, scripted flow without auth) still hit the per-account contact_email gate from PR #101. Approve flow stays strict — the dashboard has no admin approve override either, so widening it via the email link would change the security policy. Out of scope for this fix. Helpers: - New tryGetSession returns *Session or nil silently. tryResolveActorEmail collapses to a one-line wrapper. Tests: - New TestHandler_cancelPurchase_DeepLink_AdminBypassesContactEmailGate: admin session + token + ambient-credentials execution → 200, status flips, CancelledBy stamped, GetGlobalConfig (the token branch's signature call) is asserted NOT called. - New TestHandler_cancelPurchase_DeepLink_CancelOwnBypassesContactEmailGate: non-admin with cancel-own + matching creator → 200. - New TestHandler_cancelPurchase_DeepLink_NonPrivilegedSessionStillHitsContactGate: pins the security model — a logged-in user without admin / cancel-* MUST still go through authorizeApprovalAction. - Existing TestHandler_cancelPurchase, TestHandler_cancelPurchase_PurchaseError, and TestHandler_HandleRequest_CancelPurchase get HasPermissionAPI mock stubs returning false so the new pre-check correctly falls through to the token branch (preserving their original assertions). go test ./... — every package green. * fix(purchases): propagate transient auth errors instead of silent fallback (CR #216) CodeRabbit on PR #216 flagged that the new session-authed pre-check in cancelPurchase silently swallowed every error from authorizeSessionCancel and treated them all as "fall through to the contact_email gate". That masks transient auth-service failures (HasPermissionAPI returning a wrapped error, h.auth nil, network blip) behind a misleading "set the contact_email" 403 — exactly the user-facing message #216 was meant to stop conflating with real failures. Distinguish the two cases via a new local helper isPermissionDenied, which uses the existing IsClientError + .code accessor to detect a 403 ClientError specifically. authorizeSessionCancel returns 403 ClientError on legitimate denials (admin role mismatch, missing cancel-* verb, cancel-own creator mismatch) and a wrapped non-ClientError on transient backend failures. case err == nil: → session-authed cancel case isPermissionDenied(): → fall through to contact_email gate default: → return err (propagate) Regression test TestHandler_cancelPurchase_DeepLink_TransientAuthErrorPropagates simulates a HasPermissionAPI failure ("auth backend timeout") and asserts: * the error surfaces with "permission check failed" wrapped, * the contact_email message is NOT in the error, * GetGlobalConfig (the token branch's signature call) is NOT reached. Existing 403-fall-through path covered by TestHandler_cancelPurchase_DeepLink_NonPrivilegedSessionStillHitsContactGate. * fix(purchases): isPermissionDenied uses strict type assertion (CR pass 2 #216) CodeRabbit pass 2 on PR #216 flagged that the helper introduced in pass 1 relied on IsClientError, which goes through errors.As — that unwraps the error chain. A wrapped 403 ClientError (e.g. fmt.Errorf("permission check failed: %w", NewClientError(403, ...))) would still be classified as permission denied, even though the wrapper changes the failure's outer category and signals a different intent. Switch to a direct *clientError type assertion (no unwrapping). Only an exact, un-wrapped 403 ClientError now triggers the fall-through to the contact_email gate; anything wrapping a 403 propagates as the wrapper's own failure mode. This preserves the propagate-vs-fall-through split's original intent against future code that might decide to wrap auth-layer errors for context. Regression: TestIsPermissionDenied table-driven tests pin all six cases including the new "wrapped 403 is NOT denial" invariant.
Summary
Closes 3 HIGH findings from the security review on the approval flow:
uuid.New().String()(122 bits entropy in a known format) → swapped all 4 generation sites for a newpkg/common.GenerateApprovalToken()returning 32 random bytes fromcrypto/rand, hex-encoded (256 bits unpredictability, uniform output space).authorizeApprovalActionfell back to the global notification email as a catch-all approver when no recommendation had a per-accountcontact_email. Anyone holding the global mailbox could approve spend on any account. Closed:resolveApprovalRecipientsno longer adds globalNotify to the approver set;authorizeApprovalActionreturns 403 with a clear "set the cloud account's contact_email before approving" message when no contact is configured.TransitionRIExchangeStatus(pending → processing)is atomic; concurrent second approval gets 409). Audited and documented; no code change needed.Why pkg/common
The four token-generation sites span both Go modules —
pkg/exchange/auto.golives in thepkg/submodule and cannot importinternal/.pkg/commonis the shared base used by every provider, so the helper lands there.Behavior change to flag for operators
Any execution whose recommendations all reference cloud accounts WITHOUT a
contact_emailwill fail approval untilcontact_emailis set on the account. Previously the global notification email could approve on those accounts' behalf — that's the exact loophole this PR closes.Files
pkg/common/tokens.go(NEW) —GenerateApprovalToken()+ tests.pkg/exchange/auto.go— RI exchange uses the helper; removesuuidimport.internal/api/handler_purchases.go— newnewPendingExecution+persistExecutionAndSuppressionshelpers (gocyclo extraction); tightenedresolveApprovalRecipientsso global notify is never an approver;authorizeApprovalActionrejects with a clear error when no per-account contact_email exists.internal/api/handler_plans.go— plan execution token uses the helper.internal/purchase/notifications.go— scheduled notification token uses the helper.ContactEmailthroughMockConfigStore.GetCloudAccountFn(the prior tests relied on the global-fallback this PR removes); 4 new tests cover token uniqueness, the no-contact-email rejection, and the security-hardened recipient policy.Test plan
go test -short -race -count=1 ./internal/api/... ./internal/purchase/...green.(cd pkg && go test -short -race -count=1 ./common/... ./exchange/...)green.gocyclo -over 10clean.purchase_history/purchase_executionsfor accounts that have approved purchases via the global mailbox (no per-account contact_email). Notify those owners that they need to setcontact_emailon the account before the next approval cycle, or future approvals will 403.Follow-up PRs
PR3 (input cap), PR4 (IAM wildcards), PR5 (supply chain) — independent of this PR.
🤖 Generated with claude-flow
Summary by CodeRabbit
Security Improvements
Validation Changes
Transactional Behavior
Error Handling
Tests