feat(purchases): make AWS RDS/ElastiCache/MemoryDB/OpenSearch/Redshift idempotent (refs #641) by cristim · Pull Request #652 · LeanerCloud/CUDly

cristim · 2026-05-21T13:08:20Z

Summary

Extends idempotent commitment creation (started in #636/#638 for EC2 RI + Savings Plans) to the remaining AWS commitment services, so a re-drive / scheduler retry cannot double-purchase. Partially addresses #641 (AWS-other slice).

Stacks on #638 (feat/idempotent-commitment-creation) — merge #638 first.

Per-service mechanism + double-buy argument

RDS / ElastiCache / MemoryDB: customer reservation ID derived deterministically from opts.IdempotencyToken; a pre-purchase Describe-by-ID short-circuits a re-drive, and AWS's native *AlreadyExists fault backstops a guard miss (recovers to the existing commitment). Lookup errors fail loud.
OpenSearch: ReservationName derived from the token (unique per account+region); a duplicate is rejected with ResourceAlreadyExistsException.
Redshift: EC2-style tag-guard (DescribeReservedNodes + per-node DescribeTags, tag post-purchase); the irreducible tag-write window is backstopped by fix(purchases): recover executions stranded in 'approved' (closes #632) #635 safe-fail.

Test plan

133 pkg/common + 208 AWS-service tests pass; build + go vet clean
Per-service re-drive regression: same token does not create a second commitment

Partially addresses #641 (Azure slice + GCP Compute tracked separately).

Summary by CodeRabbit

New Features
- Added idempotent reservation purchase support across AWS ElastiCache, MemoryDB, OpenSearch, RDS, and Redshift services, enabling safe retries without risk of duplicate purchases.
- Implemented automatic recovery from server-side duplication errors during reservation purchases.
Tests
- Added comprehensive test coverage for idempotent purchase behavior, including guard logic, error recovery, and lookup validation.

…t idempotent Extends the #636 IdempotencyToken mechanism to the five remaining AWS commitment executors so a re-drive (and #639's auto-re-drive sweep) cannot double-purchase. Stacks on #638. - RDS/ElastiCache/MemoryDB: derive the customer-supplied reservation ID (ReservedDBInstanceId/ReservedCacheNodeId/ReservationId) deterministically from the token instead of time.Now(). Pre-purchase Describe-by-ID guard short-circuits a re-drive; AWS's native *AlreadyExists* fault backstops a guard miss and is recovered to the existing commitment. Double-buy is impossible by two independent mechanisms. - OpenSearch: derive ReservationName from the token; the name is unique per account+region so AWS rejects a duplicate with ResourceAlreadyExistsException (recovered by name). A pre-purchase by-name Describe guard short-circuits. - Redshift: no customer ID, no native dedupe, no tag filter, so an EC2-style tag-guard: DescribeReservedNodes + per-node DescribeTags matches the token tag, with the token written post-purchase via CreateTags. Residual window (tagging support uncertain) documented and backstopped by #635's safe-fail. Lookup errors fail loud (refuse to purchase) to avoid a guard-miss double-buy. Empty token preserves prior non-idempotent behaviour for non-execution callers. Adds common.IdempotentReservationID helper. Per-provider regression tests: same token on re-drive does not create a second commitment (guard short-circuit, AlreadyExists recovery, fail-loud-on-lookup-error). Refs #641.

coderabbitai · 2026-05-21T13:08:27Z

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 24 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c2e46cb-2fca-4d33-aa51-d1f069435692

📥 Commits

Reviewing files that changed from the base of the PR and between 5995f20 and d2d5013.

📒 Files selected for processing (7)

pkg/common/tokens.go
pkg/common/tokens_test.go
providers/aws/services/elasticache/client.go
providers/aws/services/memorydb/client.go
providers/aws/services/opensearch/client.go
providers/aws/services/rds/client.go
providers/aws/services/redshift/client.go

📝 Walkthrough

Walkthrough

This PR implements deterministic, idempotent reservation purchasing across five AWS services (RDS, ElastiCache, MemoryDB, OpenSearch, Redshift) by introducing a common idempotency token utility and applying consistent guard-and-recovery patterns. Each service derives stable reservation identifiers from tokens, checks for duplicates before purchasing, and recovers gracefully from server-side duplication errors.

Changes

Idempotent AWS commitment purchasing

Layer / File(s)	Summary
Idempotent reservation ID foundation `pkg/common/identifiers.go`, `pkg/common/identifiers_test.go`	New exported `IdempotentReservationID(prefix, token)` deterministically derives AWS-safe reservation IDs by truncating tokens to 40 hex characters and concatenating with prefix. Tests validate determinism, distinct tokens, empty-token behavior, and formatting constraints.
RDS idempotent reservation purchasing `providers/aws/services/rds/client.go`, `providers/aws/services/rds/client_test.go`	RDS `PurchaseCommitment` derives `ReservedDBInstanceId` from token, guards against existing active/payment-pending reservations, and recovers from `ReservedDBInstanceAlreadyExistsFault`. Added `deriveReservationID`, `idempotencyGuard`, `recoverAlreadyExists`, and `findReservationByID` helpers with four test scenarios.
ElastiCache idempotent reservation purchasing `providers/aws/services/elasticache/client.go`, `providers/aws/services/elasticache/client_test.go`	ElastiCache `PurchaseCommitment` derives token-based `ReservedCacheNodeId`, performs pre-purchase guard lookup, and recovers from `ReservedCacheNodeAlreadyExistsFault`. Added `findReservationByID`, `idempotencyGuard`, and `recoverAlreadyExists` helpers with test cases covering guard short-circuit, not-found proceed, already-exists recovery, and lookup error paths.
MemoryDB idempotent reservation purchasing `providers/aws/services/memorydb/client.go`, `providers/aws/services/memorydb/client_test.go`	MemoryDB `PurchaseCommitment` derives deterministic `ReservationId` from token, guards with pre-purchase active/payment-pending lookup, and recovers from `ReservedNodeAlreadyExistsFault`. Added helper methods for by-ID lookup, guard short-circuiting, and already-exists recovery with test coverage.
OpenSearch idempotent reservation purchasing `providers/aws/services/opensearch/client.go`, `providers/aws/services/opensearch/client_test.go`	OpenSearch `PurchaseCommitment` derives `ReservationName` deterministically from token, uses `findReservationByName` to page and filter reservations, guards against duplicates, and recovers from `ResourceAlreadyExistsException`. Tests validate guard short-circuiting, not-found proceed, already-exists recovery, and lookup error handling.
Redshift idempotent reservation purchasing via tags `providers/aws/services/redshift/client.go`, `providers/aws/services/redshift/client_test.go`	Redshift implements idempotency via tag-based guard: `RedshiftAPI` interface gains `DescribeTags` method, pre-purchase lookup scans nodes for existing idempotency token tags and short-circuits if found, post-purchase tags with both source and idempotency token, and fails loudly on lookup errors. Tests cover tag guard short-circuiting, non-tagged first-time purchase with tagging, and lookup error refusal.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

LeanerCloud/CUDly#641: PR directly implements the requested IdempotencyToken support for AWS commitment executors across RDS, ElastiCache, MemoryDB, OpenSearch, and Redshift.
LeanerCloud/CUDly#617: Related to RDS ReservedDBInstanceId derivation and idempotent purchase flow updates in providers/aws/services/rds/client.go.

Possibly related PRs

LeanerCloud/CUDly#638: Both PRs implement commitment idempotency by threading IdempotencyToken through PurchaseCommitment options across multiple AWS services to create deterministic, duplicate-resistant purchase flows.

Suggested labels

priority/p2, severity/medium, urgency/this-quarter, type/feat

Poem

🐰 Five AWS services, now idempotent and kind,
Tokens derived, no duplicates left behind,
Guards standing watch, recovery paths wide,
Reservation purchases, safe and allied! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding idempotent purchase support to five AWS services (RDS, ElastiCache, MemoryDB, OpenSearch, Redshift), which is the primary focus of this changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/641-aws-other-idem

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cristim · 2026-05-21T13:08:48Z

@coderabbitai review

coderabbitai · 2026-05-21T13:08:54Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/aws/services/elasticache/client.go`:
- Line 225: The log currently prints the full idempotency token (variable token)
which can leak caller identifiers; update the log in the ElastiCache reservation
path so it does not emit the raw token — either redact it (e.g., replace with
fixed mask), show only a safe suffix/prefix, or log a hashed version of token,
and keep existingID visible; modify the log.Printf call that includes token and
existingID so it outputs the sanitized token representation instead of the raw
token.

In `@providers/aws/services/memorydb/client.go`:
- Line 219: The log currently prints the raw idempotency token via the
log.Printf call (variables token and existingID); change it to avoid emitting
the full token by omitting or redacting it (e.g., replace token with a fixed
placeholder like "<redacted>" or a masked version showing only the last 4 chars)
and keep existingID if needed; update the existing log.Printf invocation where
token is used so it no longer logs the raw token and ensure any helper you add
(e.g., maskToken) is used by the same call.

In `@providers/aws/services/rds/client.go`:
- Line 221: The log message prints the raw caller-supplied idempotency token
(variable token) which may leak identifiers; change the log in the RDS purchase
flow that calls log.Printf(...) to avoid logging the raw token by either
omitting it or emitting a redacted form (e.g., mask all but last N chars or log
a stable hash of token) and keep existingID as-is; update the log.Printf
invocation that currently references token and existingID so it uses the
redacted/hashed token variable instead of the raw token.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: caf805b0-dbd4-44da-9e0c-1218bdb34578

📥 Commits

Reviewing files that changed from the base of the PR and between 45a92e7 and 5995f20.

📒 Files selected for processing (12)

pkg/common/identifiers.go
pkg/common/identifiers_test.go
providers/aws/services/elasticache/client.go
providers/aws/services/elasticache/client_test.go
providers/aws/services/memorydb/client.go
providers/aws/services/memorydb/client_test.go
providers/aws/services/opensearch/client.go
providers/aws/services/opensearch/client_test.go
providers/aws/services/rds/client.go
providers/aws/services/rds/client_test.go
providers/aws/services/redshift/client.go
providers/aws/services/redshift/client_test.go

CodeRabbit (PR #652) flagged that the "already exists; skipping purchase" log lines emit the full caller-supplied idempotency token, a stable per-execution identifier that should not leak verbatim into persistent logs. Add common.MaskToken (8-char prefix + ellipsis, "(none)" for empty) and use it across all five idempotent AWS executors (RDS, ElastiCache, MemoryDB, OpenSearch, Redshift). The masked prefix still correlates log lines for a single purchase. CR flagged three; OpenSearch and Redshift carried the identical pattern and are fixed here too for consistency. Log-message-only change: the idempotency guard, derived reservation IDs, and AlreadyExists recovery are untouched, so the same-token-no-double-buy invariant is preserved per service. Refs #641.

cristim · 2026-05-22T10:07:39Z

CodeRabbit pass 1 addressed (commit `d2d5013`)

All 3 actionable findings were the same issue: the "already exists; skipping purchase" log lines emitted the raw idempotency token.

Fixed (all addressed, not dismissed):

Added common.MaskToken (pkg/common/tokens.go) - 8-char prefix + ellipsis, (none) for empty - and applied it to the skip-purchase logs in all five idempotent AWS executors: RDS, ElastiCache, MemoryDB, plus OpenSearch and Redshift (the latter two carried the identical pattern but were not individually flagged; fixed for consistency).
The masked prefix still correlates log lines for a single purchase, so we keep operational debuggability without leaking the full per-execution identifier.
Unit tests added for MaskToken (empty / short / long / never-emits-full-token).

Idempotency invariant preserved: this is a log-message-only change. The idempotency guard, derived reservation IDs, and AlreadyExists recovery are untouched, so a re-drive with the same token still cannot create a second commitment, per service.

Out of scope, follow-up filed: the EC2 RI executor (#636, not part of this PR) has the same raw-token log line - tracked in #656.

@coderabbitai review

coderabbitai · 2026-05-22T10:07:57Z

@cristim: Thanks for the thorough write-up! The common.MaskToken approach (8-char prefix + ellipsis) is a clean solution — it preserves log correlatability across a single purchase flow without leaking the full token. Applying it proactively to OpenSearch and Redshift (even though they weren't individually flagged) is the right call for consistency. Good to see unit tests covering the edge cases (empty, short, long, never-emits-full-token) as well.

Triggering a review of the new commit now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

cristim · 2026-05-22T10:55:44Z

@coderabbitai review

coderabbitai · 2026-05-22T10:55:49Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

GCP Compute CUD creation named commitments cud-<unix-second> and ignored opts.IdempotencyToken, so a re-drive of the same execution (issue #639's recovery sweep) would create a SECOND committed-use discount: a financial double-purchase. Thread opts.IdempotencyToken into RegionCommitments.Insert via two deterministic mechanisms so the same token can never buy twice: - RequestId: GCP's native server-side idempotency key on Insert, which the API documents as preventing duplicate commitments. It must be a valid non-zero UUID, so we format the SHA-256 token into a canonical UUID with the existing common.IdempotencyGUID helper (the same derivation PR #653 used for the Azure reservationOrderID). The same token always yields the same RequestId, so the second Insert is a server-side no-op. - Name: also derived from the token (cud-<first-32-hex>, RFC1035-valid) as defense in depth. Commitment names are unique per project+region, so a re-drive that somehow reached Insert collides on the name and GCP rejects it with ALREADY_EXISTS rather than creating a duplicate. An empty token preserves the prior non-idempotent timestamp-based name (the CLI path, which has no owning execution). The token is masked in logs via common.MaskToken (never logged verbatim), matching PR #652. Adds a re-drive regression test mirroring the AWS/Azure ones: the same token on a second PurchaseCommitment yields an identical RequestId and commitment name, plus an empty-token test confirming the CLI path stays non-idempotent. Scope is computeengine only; CloudSQL/Storage/Memorystore were made advisory-only in #649 and are untouched. Closes #654 refs #641

New Manager.ReapStuckExecutions(ctx, reapAfter): one sweep that finds executions stuck in approved/running longer than reapAfter and atomically transitions them to "failed" via the existing TransitionExecutionStatus CAS. Each successful transition also writes a canonical, human-readable error string so the History UI (#621) shows operators why the row was reaped and confirms it's safe to retry. Safety properties: - Local-status-only: never touches provider commitments. If the real executor did manage to create a commitment before dying, the retry hits the idempotency path (#636/#638/#652) which surfaces a duplicate-reservation error cleanly. - CAS-protected: if the real executor wakes up and finishes between the SELECT and the transition, the CAS rejects; logged at INFO, not surfaced as an error. The real executor wins the race. - Per-row error-isolation: a failure on row N never blocks N+1..K. Adds ParseReapAfterFromEnv to read PURCHASE_APPROVED_REAP_AFTER and fall back to a 10m default on missing-or-malformed env (with a WARN log so ops can spot a typo). Never panics — a misconfigured env var must not crash the other scheduled tasks sharing the Lambda. Tests cover: - stale approved row → flipped to failed with canonical message - stale running row → same - younger-than-threshold rows are filtered out by the SELECT (no TransitionExecutionStatus / SavePurchaseExecution calls happen) - terminal-status rows are never passed to the SELECT (regression guard against accidentally widening stuckStatuses) - CAS race lost → logged + counted, not surfaced as sweep error - CAS (nil, nil) defensive path treated as race-lost - 3-stuck-row integration with per-row CAS expectations - SELECT error → propagated as sweep error - per-row save error after successful CAS → Reaped++ AND Errored++ - env-var: unset, valid, invalid, non-default unit (2h30m)

closes #678) (#681) * feat(config): add ListStuckExecutions store method Selects purchase_executions rows whose status is in a caller-supplied set AND whose updated_at is older than a caller-supplied interval. Newest-stuck-first, capped at MaxListLimit per call. Mirrors the existing GetStaleProcessingExchanges pattern for RI exchanges. Used by the reaper sweep (issue #678) to find executions stuck in approved/running because the synchronous executor failed mid-flight (Lambda timeout, OOM, network hang) without flipping the row to a terminal state. Local SELECT only — no provider-side mutation. Adds the method to StoreInterface and to every test-side mock store that satisfies it (purchase, api, analytics, scheduler, server health). Adds pgxmock coverage for the happy path, the empty-statuses short-circuit, and a query-error path. * feat(purchase): add ReapStuckExecutions sweep + per-row CAS to failed New Manager.ReapStuckExecutions(ctx, reapAfter): one sweep that finds executions stuck in approved/running longer than reapAfter and atomically transitions them to "failed" via the existing TransitionExecutionStatus CAS. Each successful transition also writes a canonical, human-readable error string so the History UI (#621) shows operators why the row was reaped and confirms it's safe to retry. Safety properties: - Local-status-only: never touches provider commitments. If the real executor did manage to create a commitment before dying, the retry hits the idempotency path (#636/#638/#652) which surfaces a duplicate-reservation error cleanly. - CAS-protected: if the real executor wakes up and finishes between the SELECT and the transition, the CAS rejects; logged at INFO, not surfaced as an error. The real executor wins the race. - Per-row error-isolation: a failure on row N never blocks N+1..K. Adds ParseReapAfterFromEnv to read PURCHASE_APPROVED_REAP_AFTER and fall back to a 10m default on missing-or-malformed env (with a WARN log so ops can spot a typo). Never panics — a misconfigured env var must not crash the other scheduled tasks sharing the Lambda. Tests cover: - stale approved row → flipped to failed with canonical message - stale running row → same - younger-than-threshold rows are filtered out by the SELECT (no TransitionExecutionStatus / SavePurchaseExecution calls happen) - terminal-status rows are never passed to the SELECT (regression guard against accidentally widening stuckStatuses) - CAS race lost → logged + counted, not surfaced as sweep error - CAS (nil, nil) defensive path treated as race-lost - 3-stuck-row integration with per-row CAS expectations - SELECT error → propagated as sweep error - per-row save error after successful CAS → Reaped++ AND Errored++ - env-var: unset, valid, invalid, non-default unit (2h30m) * feat(purchase): wire periodic reaper invocation (closes #678) Wires the ReapStuckExecutions sweep (added in the previous commit) into the existing scheduled-task pipeline: - New ScheduledTaskType "reap_stuck_purchases" registered alongside the other periodic tasks (cleanup, analytics_refresh, ri_exchange_reshape). Adds a dedicated handleReapStuckPurchases dispatcher that reads the threshold via purchase.ParseReapAfterFromEnv on every invocation so an ops-side env-var tune via PURCHASE_APPROVED_REAP_AFTER takes effect on the next sweep without a redeploy. - Adds ReapStuckExecutions to PurchaseManagerInterface so the handler contract is symmetric with the Manager + the testutil mock can satisfy it. MockPurchaseManager gains ReapStuckExecutionsFunc following the existing per-method-Func mock convention. - New EventBridge rule "${stack_name}-reap-stuck-purchases" on rate(5 minutes), targeting the main Lambda with {"action":"reap_stuck_purchases"}. Cadence is intentionally more frequent than the 10m default threshold so a stuck row is reaped within ~1 threshold-window. The reaper itself is CAS-protected (TransitionExecutionStatus from approved/running → failed) so an over-run is safe — the real executor wins the race. - New terraform vars: enable_reap_stuck_purchases_schedule (bool, default true), reap_stuck_purchases_schedule (string, default "rate(5 minutes)"), purchase_approved_reap_after (string, default "" → use in-code DefaultReapAfter). Empty-string default for the threshold avoids pinning a Lambda env value when ops hasn't taken a position. - ParseScheduledEvent learns the "reap_stuck_purchases" action so the EventBridge → Lambda payload round-trips through the dispatcher. Tests cover the dispatcher success path (asserts the default 10m threshold is passed when the env var is unset), the store-error propagation path, the lock-ID uniqueness guard, and the ParseScheduledEvent action mapping. * fix(purchase/reaper): distinguish CAS race-loss from real DB errors The reaper's per-row error handler bucketed every failure from TransitionExecutionStatus into RaceLost, including real DB outages (connection refused, query timeout, etc.). That masked persistent store failures: a downed DB would surface only as a quiet count inflation on RaceLost, with no errored signal for ops to alert on. Wrap the two legitimate race-loss outcomes from TransitionExecutionStatus in sentinel errors so callers can use errors.Is rather than brittle string matching: - ErrExecutionNotInExpectedStatus: row exists but its status moved out of the allowed set before the CAS (the real executor finished between SELECT and UPDATE — race lost). - ErrNotFound (reused for the "row vanished" case): row was deleted between SELECT and UPDATE — also a race outcome, nothing for the reaper to do. In the reaper, classify errors.Is(err, ErrExecutionNotInExpectedStatus) or errors.Is(err, ErrNotFound) as RaceLost (INFO log); everything else bumps Errored with an ERROR log so the ops signal is visible. Addresses A1 from CodeRabbit round-1 review on #681. Regression tests: - TestReapStuckExecutions_CASRaceLostNoError now wraps the sentinel (was raw errors.New); guards the "race-loss is INFO-only" path. - TestReapStuckExecutions_RowVanishedTreatedAsRaceLost covers the ErrNotFound branch (manual DELETE between SELECT and CAS). - TestReapStuckExecutions_HardDBErrorClassifiedAsErrored asserts a non-sentinel error bumps Errored, not RaceLost — the regression guard for the original A1 finding. The wrapped errors keep the original message substrings ("not found", "cannot transition from ...") so existing call sites that match on substrings (e.g. internal/purchase/manager.go's RecoverStrandedApprovals, internal/purchase/approvals_test.go) are unaffected. * fix(purchase/reaper): reject non-positive reap durations + stabilize env tests ParseReapAfterFromEnv accepted "0s" and negative durations such as "-5m" because time.ParseDuration treats them as syntactically valid. Feeding either into ListStuckExecutions would be catastrophic: - 0s: the cutoff "updated_at < NOW() - 0s" matches every approved/running row regardless of age — the reaper would flip fresh, in-flight executions to failed. - -5m: the cutoff "updated_at < NOW() - (-5m)" == "updated_at < NOW() + 5m" matches rows from 5 minutes in the future, i.e. effectively every row. Same outcome. The store-side guard added in the original PR (ListStuckExecutions rejects olderThan <= 0) prevents the broken SELECT from executing, but a misconfigured env value would still cause every sweep to fail silently with a confusing store error rather than a WARN at the config-parse boundary. Reject non-positive durations at the env-parse layer with a WARN log + fallback to DefaultReapAfter so the misconfig is visible in the Lambda's startup logs. Addresses A2 + A3 (defense-in-depth) + A4 + A5 from CodeRabbit round-1 review on #681. Regression tests (A4): - TestParseReapAfterFromEnv_ZeroFallsBackToDefault — "0s" - TestParseReapAfterFromEnv_NegativeFallsBackToDefault — "-5m" - TestParseReapAfterFromEnv_GarbageFallsBackToDefault — explicit "garbage" case complementing the existing _Invalid* test, named per the CR finding so the regression intent is searchable. handler_test.go (A5): - TestHandleScheduledTask's table-driven loop now calls t.Setenv("PURCHASE_APPROVED_REAP_AFTER", "") inside the per-case sub-Run. The two reap_stuck_purchases cases assert reapAfter == 10*time.Minute (the default); without an explicit env clear, an ambient PURCHASE_APPROVED_REAP_AFTER in CI/dev would silently make them flaky. t.Setenv auto-restores the prior value at cleanup. The store-side olderThan <= 0 guard (A3) was already folded into the earlier rebased commit "feat(config): add ListStuckExecutions store method" so no additional store-side change is needed here — A3 is covered by the existing defense-in-depth, and this commit completes the env-side validation A2 calls for.

cristim added bug Something isn't working triaged Item has been triaged priority/p1 Next up; this sprint severity/high Significant harm urgency/this-sprint Within the current sprint impact/all-users Affects every user effort/l Weeks type/bug Defect labels May 21, 2026

This was referenced May 21, 2026

feat(purchases): make GCP Compute commitment creation idempotent (remaining slice of #641) #654

Closed

feat(purchases): extend IdempotencyToken to all commitment executors (AWS RDS/ElastiCache/MemoryDB/OpenSearch/Redshift, Azure, GCP) — blocks #639 #641

Closed

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread providers/aws/services/elasticache/client.go Outdated

Comment thread providers/aws/services/memorydb/client.go Outdated

Comment thread providers/aws/services/rds/client.go Outdated

cristim mentioned this pull request May 22, 2026

fix(purchases): redact idempotency token in EC2 RI re-drive skip log #656

Open

cristim merged commit 97a1cd0 into feat/multicloud-web-frontend May 22, 2026
4 checks passed

cristim mentioned this pull request May 22, 2026

feat(purchases): make GCP Compute commitment creation idempotent #666

Merged

This was referenced May 22, 2026

bug(purchases): reap purchase_executions stuck in approved/running for >10min (mark as failed) #678

Closed

fix(purchase): reap purchase_executions stuck in approved/running >10m (closes #678) #681

Merged

coderabbitai Bot mentioned this pull request May 22, 2026

fix(purchases): tighten AWS SDK retry budget + bump Lambda timeout to 300s (closes #683) #684

Merged

5 tasks

cristim mentioned this pull request May 22, 2026

feat(providers/aws): rich self-describing reservation names matching the Azure #686 format #687

Open

This was referenced May 22, 2026

feat(aws): rich self-describing reservation names matching Azure #686 (closes #687) #689

Merged

feat(purchases): idempotent commitment creation (ClientToken/dedupe) to enable auto-re-drive of stranded approvals #636

Closed

cristim mentioned this pull request May 30, 2026

fix(purchases): mask idempotency token in EC2 RI re-drive log (closes #656) #855

Open

3 tasks

Conversation

cristim commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-service mechanism + double-buy argument

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

cristim commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cristim commented May 22, 2026

CodeRabbit pass 1 addressed (commit d2d5013)

Uh oh!

coderabbitai Bot commented May 22, 2026

Uh oh!

Uh oh!

cristim commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cristim commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

CodeRabbit pass 1 addressed (commit `d2d5013`)