fix(scheduler): include term + account + engine in rec ID hash (#187, #188)#189
Conversation
Closes #187. Closes #188. Both bugs share a single root cause in `internal/scheduler/scheduler.go`: the recommendation ID was a SHA-256 of a 5-tuple `(provider, service, region, resource_type, payment)` — omitting `term`, `account`, and `engine`. Any two recs that differed only in those fields collapsed to the same record ID. Downstream: - **Frontend (#187)**: both rows render with the same `data-rec-id` in `recommendations.ts:1067-1069`. The selection toggle at :1639-1660 walks `input[data-rec-id]` and matches by id, so a click on one box flips the underlying selection set; resync then redraws BOTH boxes from `state.selectedRecommendations` — they end up "checked together". This was most visible on Azure where Advisor frequently emits multiple recs per (provider, service, region, SKU, payment) cell varying only by subscription, but every provider could trigger it whenever 1yr+3yr recs existed for the same cell. - **Backend (#188)**: any stage that dedupes by ID via a Map/Set (response build, the in-memory cache, plan materialisation) drops the duplicate. Cost Explorer commonly returns 1yr and 3yr recs for the same cell as separate `Recommendation` objects; the AWS iteration order put the 3yr variant last in the slice, so it overwrote the 1yr one in any ID-keyed map. Hence the user-visible symptom: AWS recs only ever showed `Term = 3 years`. Fix: the hash key now includes the parsed integer `term`, `rec.Account` (populated by every provider — AWS RI/SP parsers, Azure subscriptionID, GCP projectID), and the extracted `engine` (MySQL vs Postgres RDS recs at the same SKU + payment + term collided otherwise). Provider verification: - AWS RI parser: `rec.Account = aws.ToString(details.AccountId)` - AWS SP parser: `rec.Account = accountID` - Azure Advisor: `Account: r.subscriptionID` - GCP services: `Account: c.projectID` All three populate the field — no upstream change needed. Tests: - `TestScheduler_ConvertRecommendations_HashUniqueness` — 4 subtests covering the 4 collision dimensions: term, account, payment, engine. Each seeds two recs differing in only that field and asserts distinct IDs. - `TestScheduler_ConvertRecommendations_HashDeterminism` — same input across two calls produces the same ID (the hash is stable, not random/time-dependent — important so frontend selection state survives a refresh). - Frontend: `toggling one row does NOT flip a sibling row with a distinct ID` — pins the symptom-level fix for #187. - All existing scheduler + api Go tests pass; all 1357 frontend tests pass (added 1 over the prior 1356). No FK exists on `recommendations.id`; persisted purchase rows duplicate the rec snapshot inline rather than referencing `recommendations.id`, so changing the hash key doesn't break any historical data — fresh collections produce fresh IDs and the old ones are simply replaced. Suppression rows are keyed by `(account, provider, service, region, resource_type, engine)` (see `RecommendationRecord.SuppressedCount` doc, types.go:227-242), not by rec.id, so they're unaffected.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughReplace truncated SHA-256 record IDs with deterministic, delimiter-separated composite IDs that include account, engine, and canonical term; add engine-aware DB upsert and migrations; add scheduler tests for ID uniqueness/determinism and a frontend regression test ensuring independent checkbox toggling. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Scheduler
participant DB as Postgres DB
participant Frontend as Browser/UI
participant State as UI State
Scheduler->>Scheduler: convertRecommendations(recList)\nbuild composite recordID (account:engine:term:...)
Scheduler->>DB: INSERT/UPSERT rows (include engine, recordID)\nON CONFLICT using expanded natural key
DB-->>Scheduler: upsert result
Frontend->>DB: fetch recommendations (records with recordID)
Frontend->>State: render rows with `data-rec-id` attributes
Frontend->>State: user toggles checkbox (data-rec-id=X)
State->>State: update selectedRecommendations set using recordID
State-->>Frontend: re-render row checked states (only matching recordIDs)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 17 minutes and 46 seconds. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduler/scheduler_test.go`:
- Around line 1362-1372: The engine-only subtest currently compares an EC2 base
to an RDS record so other fields differ; change the setup so both recs share the
same RDS SKU and only Details.Engine differs: create an RDS-version of the base
(e.g., clone base into rdsBase and set Service/ResourceType/SKU to match
enginePostgresTwin), then set recs := []common.Recommendation{rdsBase,
enginePostgresTwin} or apply the mutator to rdsBase so only Details.Engine is
changed; update references to base/recs in the test so the only differing field
between the two recommendations is Details.Engine.
In `@internal/scheduler/scheduler.go`:
- Around line 846-848: The SQL natural key used by UpsertRecommendations must
include engine to match the scheduler's ID/hash; update the upsert/ON CONFLICT
key tuple in UpsertRecommendations (and any related unique index or conflict
target) to add engine alongside account_key, provider, service, region,
resource_type, term, payment_option so rows differing only by engine no longer
overwrite each other; ensure the column name used matches the engine field
referenced in the scheduler fmt.Sprintf key.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81d0a4fe-0a46-4591-aee3-971b60664f07
📒 Files selected for processing (3)
frontend/src/__tests__/recommendations.test.tsinternal/scheduler/scheduler.gointernal/scheduler/scheduler_test.go
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 2 file(s) based on 2 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 2 file(s) based on 2 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Per review feedback on PR #189: the SHA-256 + 16-char hex truncation on top of the composite key was over-engineered. Simplify the rec ID to the canonical composite key joined by `|`: "<provider>|<account>|<service>|<region>|<resource_type>|<engine>|<term>|<payment>" Same fields, same uniqueness guarantee — but now by construction rather than by hash. Why this is preferable: 1. **Self-documenting in DevTools.** An ID like `aws|123456789012|ec2|us-east-1|m5.large||1|all-upfront` makes a future regression visibly identical, instead of two unrelated-looking 16-char hex strings. 2. **No truncation collision risk.** SHA-256 → 64 bits had a birthday-bound risk that's negligible at this scale, but it's a "negligible" we no longer need to reason about — the ID literally IS the cell. 3. **No new opacity guarantee broken.** Nothing in the codebase parses rec.ID; both encodings are equally opaque to downstream consumers (frontend selection Set, plan-target matching, suppression keying). 4. **Slightly longer in the DOM.** ~80 chars vs 16. Negligible for typical rec counts and well worth the readability win. Field-content safety: provider slugs, AWS/Azure/GCP account IDs and subscription UUIDs, AWS region names, instance-type SKUs, payment enums are all alphanumeric/hyphen by upstream contract — `|` cannot appear inside any component, so no escaping is needed. Tests rename: `*_HashUniqueness` → `*_IDUniqueness`, `*_HashDeterminism` → `*_IDDeterminism`. Same assertions; the names just no longer claim the encoding is a hash. Drops the now-unused `crypto/sha256` and `encoding/hex` imports.
|
@coderabbitai review Per review feedback that the SHA-256 was over-engineered: simplified the rec ID to the canonical composite key joined by |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
internal/scheduler/scheduler_test.go (2)
1346-1354: Minor: engine mutator is unused for the engine subtest.The mutator for the engine subtest (lines 1347-1352) builds a MySQL RDS rec, but lines 1373-1375 override
recswithrdsBase/enginePostgresTwinanyway. The mutator code is dead for this specific subtest. Consider removing the mutator body and adding a comment, or using the mutator pattern consistently by having it returnenginePostgresTwin:Suggested simplification
{ name: "engine: MySQL vs Postgres at same RDS SKU", mutator: func(r common.Recommendation) common.Recommendation { - r.Service = common.ServiceRDS - r.ResourceType = "db.m5.large" - r.Details = common.DatabaseDetails{Engine: "mysql"} - return r + // Unused — engine subtest overrides recs below. + return r }, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduler/scheduler_test.go` around lines 1346 - 1354, The engine subtest's mutator function (the anonymous mutator that sets r.Service, r.ResourceType and r.Details) is never used because recs is later overwritten with rdsBase/enginePostgresTwin; either remove the dead mutator body (leave an explanatory comment) or make the mutator return enginePostgresTwin so the mutator pattern is used consistently; update the subtest to reference the mutator (or remove it) and ensure recs remains derived from the mutator instead of being overwritten by rdsBase/enginePostgresTwin.
1408-1408: Nit: assertion message still references "hash".The ID is now a natural composite key, not a hash. The assertion message should reflect this:
- assert.Equal(t, first[0].ID, second[0].ID, "hash must be deterministic across calls") + assert.Equal(t, first[0].ID, second[0].ID, "ID must be deterministic across calls")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduler/scheduler_test.go` at line 1408, Update the assertion message for assert.Equal that compares first[0].ID and second[0].ID to not mention "hash"; change it to a message that reflects the ID is a natural composite key (e.g., "ID must be deterministic across calls" or "composite ID must be deterministic across calls") so the test message matches the current implementation where ID is a composite key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/scheduler/scheduler_test.go`:
- Around line 1346-1354: The engine subtest's mutator function (the anonymous
mutator that sets r.Service, r.ResourceType and r.Details) is never used because
recs is later overwritten with rdsBase/enginePostgresTwin; either remove the
dead mutator body (leave an explanatory comment) or make the mutator return
enginePostgresTwin so the mutator pattern is used consistently; update the
subtest to reference the mutator (or remove it) and ensure recs remains derived
from the mutator instead of being overwritten by rdsBase/enginePostgresTwin.
- Line 1408: Update the assertion message for assert.Equal that compares
first[0].ID and second[0].ID to not mention "hash"; change it to a message that
reflects the ID is a natural composite key (e.g., "ID must be deterministic
across calls" or "composite ID must be deterministic across calls") so the test
message matches the current implementation where ID is a composite key.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 721cd6d9-5812-4749-8100-719a0e22106b
📒 Files selected for processing (3)
internal/config/store_postgres_recommendations.gointernal/scheduler/scheduler.gointernal/scheduler/scheduler_test.go
…ne subtest Addresses both actionable items from CodeRabbit's review of PR #189: 1. **SQL natural key** — `internal/database/postgres/migrations/000042`: The Go-side ID and per-row INSERT in store_postgres_recommendations.go (the latter via CR's auto-fix) already include `engine`, but the underlying `recommendations_natural_key_idx` UNIQUE INDEX did not — so the UPSERT's `ON CONFLICT (..., engine, ...)` clause referenced a constraint that didn't actually exist with that signature, which would have errored at runtime. New migration adds an `engine TEXT NOT NULL DEFAULT ''` column, drops the 7-tuple index, and re-creates it as the 8-tuple index that matches the Go side. Strategy mirrors migration 000032: ADD COLUMN is metadata-only; pre-flight RAISE NOTICE on duplicate-row groups before the UNIQUE index creation; pre-migration rows land at default `engine = ''` and naturally age out via UpsertRecommendations' eviction-by-collected_at on the next scheduler tick. 2. **Test isolation** — `scheduler_test.go::TestScheduler_ConvertRecommendations_IDUniqueness`: The previous engine subtest mutated three fields at once (Service, ResourceType, Details) which made the "engine-only diff" claim accidental — the test would still pass on a regression that broke one of the other fields. Restructured the table-driven test from `mutator(common.Recommendation) common.Recommendation` to `recs() (common.Recommendation, common.Recommendation)`, which gives each subtest direct control over the (a, b) pair. The engine subtest now constructs an `rdsBase` and an `rdsTwin` that share Service / ResourceType / everything-else and differ ONLY in `Details.Engine`. The special-case `if tc.name == ...` branch is gone. All scheduler / api / config Go tests pass.
|
@coderabbitai review Addressed the 2 actionables from the prior review:
|
|
Triggering a review of the updated changes now. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/scheduler/scheduler_test.go (1)
1402-1402: Rename failure text from “hash” to “ID” for consistency.The assertion message still references the old hash approach, which can mislead future debugging.
Suggested wording update
- assert.Equal(t, first[0].ID, second[0].ID, "hash must be deterministic across calls") + assert.Equal(t, first[0].ID, second[0].ID, "ID must be deterministic across calls")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduler/scheduler_test.go` at line 1402, Update the assertion failure message to reference the ID rather than hash: in the assert.Equal call comparing first[0].ID and second[0].ID (function/statement using assert.Equal), change the message string from "hash must be deterministic across calls" to something like "ID must be deterministic across calls" so the assertion text matches the compared symbols (first[0].ID, second[0].ID).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@internal/database/postgres/migrations/000042_recommendations_add_engine_to_key.down.sql`:
- Around line 9-13: The down migration recreates the 7-column UNIQUE index
(recommendations_natural_key_idx) before removing engine, which will fail if
rows only differ by engine; modify the down migration to first dedupe rows that
are identical across (account_key, provider, service, region, resource_type,
term, payment_option) but differ by engine (e.g., use a DELETE ... USING or WITH
cte AS (SELECT id, ROW_NUMBER() OVER (PARTITION BY account_key, provider,
service, region, resource_type, term, payment_option ORDER BY id) rn FROM
recommendations) DELETE FROM recommendations WHERE id IN (SELECT id FROM cte
WHERE rn > 1)), then DROP COLUMN IF EXISTS engine, and only after deduplication
recreate the UNIQUE index recommendations_natural_key_idx on (account_key,
provider, service, region, resource_type, term, payment_option).
---
Nitpick comments:
In `@internal/scheduler/scheduler_test.go`:
- Line 1402: Update the assertion failure message to reference the ID rather
than hash: in the assert.Equal call comparing first[0].ID and second[0].ID
(function/statement using assert.Equal), change the message string from "hash
must be deterministic across calls" to something like "ID must be deterministic
across calls" so the assertion text matches the compared symbols (first[0].ID,
second[0].ID).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 61759120-330f-422b-be7c-8e0896711f51
📒 Files selected for processing (3)
internal/database/postgres/migrations/000042_recommendations_add_engine_to_key.down.sqlinternal/database/postgres/migrations/000042_recommendations_add_engine_to_key.up.sqlinternal/scheduler/scheduler_test.go
…est message
Addresses CodeRabbit's iter3 review (1 actionable + 1 nitpick).
1. **Down migration** (`000042_recommendations_add_engine_to_key.down.sql`):
The previous rollback DROPped and recreated the 7-column UNIQUE
index BEFORE removing engine, which would have failed with a
duplicate-key error on any database where the up-migration had
actually distinguished engine-variant rows (the whole reason
000042 exists).
New rollback order:
1. Pre-flight RAISE NOTICE on engine-divergent groups so the
operator sees the destructive count in the migration log
BEFORE any rows disappear.
2. DELETE all but one row per 7-tuple (account_key, provider,
service, region, resource_type, term, payment_option). The
ROW_NUMBER() over `id` ordering picks a deterministic
winner — this is a rollback, the loser data is thrown away
either way; deterministic ordering keeps the migration
idempotent (re-running picks the same winner).
3. DROP the 8-column unique index.
4. DROP the engine column.
5. Recreate the 7-column unique index.
2. **Test message** (`scheduler_test.go:1402`): renamed the assertion
failure text from "hash must be deterministic across calls" to
"ID must be deterministic across calls" — we no longer compute a
hash; the message now matches the symbols being compared
(first[0].ID, second[0].ID).
|
@coderabbitai review Addressed both findings:
|
|
Triggering a review of the updated changes now. ✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
) Builds on the term loop landed in the prior commit. Same root cause (Cost Explorer requires the parameter on each request and returns recs only for that single value), same fix shape (replace the hardcoded default with a small package-level constant slice and iterate). The prior commit covered TermInYears; this one covers PaymentOption. Before: GetRecommendationsForService → 1 call per service at the hardcoded ("3yr", "partial-upfront") cell. After PR #189's ID-uniqueness fix the absent variants couldn't even surface from the cache; from the user's perspective the "all upfront" / "no upfront" filter values were always empty. After: GetRecommendationsForService iterates the Cartesian product of defaultDiscoveryTerms = ["1yr", "3yr"] defaultDiscoveryPaymentOptions = ["all-upfront", "partial-upfront", "no-upfront"] → 6 calls per service. Per-call CE errors are tolerated and skipped (matches GetAllRecommendations's per-service tolerance); only an error where every (term, payment) combo fails is propagated. Downstream wiring is already correct: - Parser tags rec.PaymentOption = params.PaymentOption (parser_ri.go:42). - Scheduler ID encoding includes PaymentOption (scheduler.go:856-858 — PR #189). - DB natural-key index includes payment_option (migration 000042). - Frontend column-filter already exposes payment-option as a facet. So the three payment variants land as distinct DB rows and render as distinct UI rows for free. Tests: - TestGetRecommendationsForService: now expects 6 recs (one per {term} × {payment} combo) and asserts the full combo set is present. - TestGetRecommendationsForService_QueriesBothTerms → TestGetRecommendationsForService_QueriesEveryCombo: regression expanded to assert all 6 (term, payment) pairs were requested against Cost Explorer, so a future change that quietly drops a combo fails the test even if the surviving recs parse correctly. Cost note: at the actual scheduler cadence (daily 2 AM cron + at most one opportunistic refresh per 6h window when users are active, so ~5 refreshes/day per account), 5 services × 6 combos × 5 refreshes/day ≈ 4,500 Cost Explorer requests/month/account. That's ~3,500 above the 1,000-request free tier, or ~$35/month/account at $0.01/request. Worth the user-visible "every term × payment is selectable" win.
Closes #188. PR #189 made recommendation IDs unique enough to keep 1yr and 3yr variants from colliding downstream — but the AWS client never asked Cost Explorer for 1yr recs in the first place. The discovery flow hardcoded `Term: "3yr"` in providers/aws/recommendations/client.go, and Cost Explorer's GetReservationPurchaseRecommendation requires a `TermInYears` enum on each request and returns recs only for that specific term — there is no "give me both" mode. So even after #189, the user still saw only 3-year-term AWS recs: the upstream call simply never fetched the 1yr variants. Fix: GetRecommendationsForService now iterates over a small package constant `defaultDiscoveryTerms = ["1yr", "3yr"]` and concatenates the returned recs. Per-term Cost Explorer errors are tolerated and skipped (matching GetAllRecommendations's per-service tolerance) so a single throttle on one term doesn't suppress the other; only an error where *every* term fails is propagated. Downstream: the parser already tags `rec.Term = params.Term`, the scheduler ID encoding from #189 already includes term, and the DB natural-key index from migration 000042 already includes term — no further changes needed for the two recs to land in distinct DB rows and render as distinct rows in the UI. Tests: - TestGetRecommendationsForService: now expects 2 recs (one per term), with terms ["1yr", "3yr"] present. - TestGetRecommendationsForService_QueriesBothTerms (new): captures the per-call CostExplorer input slice via a new `riCalls` field on the mock and asserts both types.TermInYearsOneYear and types.TermInYearsThreeYears were requested. This pins the regression at the API-call layer so a future change that quietly drops one term fails the test even if the parser tagging happens to produce two recs for other reasons. - TestGetAllRecommendations / TestGetAllRecommendations_SomeServicesFail: unchanged assertions still hold (5 services × 2 terms = 10 parsed-and-warned attempts; the test only asserts NotEmpty). Verified `go test ./...` passes from the repo root and from the providers/aws submodule. API-call cost: doubles the Cost Explorer call count for the recommendations refresh (5 services × 2 terms = 10 calls vs 5 previously). The existing rate limiter and 100ms inter-service sleep handle this; refresh runs at the ~15 min cadence already budgeted for.
) Builds on the term loop landed in the prior commit. Same root cause (Cost Explorer requires the parameter on each request and returns recs only for that single value), same fix shape (replace the hardcoded default with a small package-level constant slice and iterate). The prior commit covered TermInYears; this one covers PaymentOption. Before: GetRecommendationsForService → 1 call per service at the hardcoded ("3yr", "partial-upfront") cell. After PR #189's ID-uniqueness fix the absent variants couldn't even surface from the cache; from the user's perspective the "all upfront" / "no upfront" filter values were always empty. After: GetRecommendationsForService iterates the Cartesian product of defaultDiscoveryTerms = ["1yr", "3yr"] defaultDiscoveryPaymentOptions = ["all-upfront", "partial-upfront", "no-upfront"] → 6 calls per service. Per-call CE errors are tolerated and skipped (matches GetAllRecommendations's per-service tolerance); only an error where every (term, payment) combo fails is propagated. Downstream wiring is already correct: - Parser tags rec.PaymentOption = params.PaymentOption (parser_ri.go:42). - Scheduler ID encoding includes PaymentOption (scheduler.go:856-858 — PR #189). - DB natural-key index includes payment_option (migration 000042). - Frontend column-filter already exposes payment-option as a facet. So the three payment variants land as distinct DB rows and render as distinct UI rows for free. Tests: - TestGetRecommendationsForService: now expects 6 recs (one per {term} × {payment} combo) and asserts the full combo set is present. - TestGetRecommendationsForService_QueriesBothTerms → TestGetRecommendationsForService_QueriesEveryCombo: regression expanded to assert all 6 (term, payment) pairs were requested against Cost Explorer, so a future change that quietly drops a combo fails the test even if the surviving recs parse correctly. Cost note: at the actual scheduler cadence (daily 2 AM cron + at most one opportunistic refresh per 6h window when users are active, so ~5 refreshes/day per account), 5 services × 6 combos × 5 refreshes/day ≈ 4,500 Cost Explorer requests/month/account. That's ~3,500 above the 1,000-request free tier, or ~$35/month/account at $0.01/request. Worth the user-visible "every term × payment is selectable" win.
* fix(aws): fetch both 1yr and 3yr Cost Explorer terms (#188) Closes #188. PR #189 made recommendation IDs unique enough to keep 1yr and 3yr variants from colliding downstream — but the AWS client never asked Cost Explorer for 1yr recs in the first place. The discovery flow hardcoded `Term: "3yr"` in providers/aws/recommendations/client.go, and Cost Explorer's GetReservationPurchaseRecommendation requires a `TermInYears` enum on each request and returns recs only for that specific term — there is no "give me both" mode. So even after #189, the user still saw only 3-year-term AWS recs: the upstream call simply never fetched the 1yr variants. Fix: GetRecommendationsForService now iterates over a small package constant `defaultDiscoveryTerms = ["1yr", "3yr"]` and concatenates the returned recs. Per-term Cost Explorer errors are tolerated and skipped (matching GetAllRecommendations's per-service tolerance) so a single throttle on one term doesn't suppress the other; only an error where *every* term fails is propagated. Downstream: the parser already tags `rec.Term = params.Term`, the scheduler ID encoding from #189 already includes term, and the DB natural-key index from migration 000042 already includes term — no further changes needed for the two recs to land in distinct DB rows and render as distinct rows in the UI. Tests: - TestGetRecommendationsForService: now expects 2 recs (one per term), with terms ["1yr", "3yr"] present. - TestGetRecommendationsForService_QueriesBothTerms (new): captures the per-call CostExplorer input slice via a new `riCalls` field on the mock and asserts both types.TermInYearsOneYear and types.TermInYearsThreeYears were requested. This pins the regression at the API-call layer so a future change that quietly drops one term fails the test even if the parser tagging happens to produce two recs for other reasons. - TestGetAllRecommendations / TestGetAllRecommendations_SomeServicesFail: unchanged assertions still hold (5 services × 2 terms = 10 parsed-and-warned attempts; the test only asserts NotEmpty). Verified `go test ./...` passes from the repo root and from the providers/aws submodule. API-call cost: doubles the Cost Explorer call count for the recommendations refresh (5 services × 2 terms = 10 calls vs 5 previously). The existing rate limiter and 100ms inter-service sleep handle this; refresh runs at the ~15 min cadence already budgeted for. * feat(aws): also fan out across all 3 Cost Explorer payment options (#188) Builds on the term loop landed in the prior commit. Same root cause (Cost Explorer requires the parameter on each request and returns recs only for that single value), same fix shape (replace the hardcoded default with a small package-level constant slice and iterate). The prior commit covered TermInYears; this one covers PaymentOption. Before: GetRecommendationsForService → 1 call per service at the hardcoded ("3yr", "partial-upfront") cell. After PR #189's ID-uniqueness fix the absent variants couldn't even surface from the cache; from the user's perspective the "all upfront" / "no upfront" filter values were always empty. After: GetRecommendationsForService iterates the Cartesian product of defaultDiscoveryTerms = ["1yr", "3yr"] defaultDiscoveryPaymentOptions = ["all-upfront", "partial-upfront", "no-upfront"] → 6 calls per service. Per-call CE errors are tolerated and skipped (matches GetAllRecommendations's per-service tolerance); only an error where every (term, payment) combo fails is propagated. Downstream wiring is already correct: - Parser tags rec.PaymentOption = params.PaymentOption (parser_ri.go:42). - Scheduler ID encoding includes PaymentOption (scheduler.go:856-858 — PR #189). - DB natural-key index includes payment_option (migration 000042). - Frontend column-filter already exposes payment-option as a facet. So the three payment variants land as distinct DB rows and render as distinct UI rows for free. Tests: - TestGetRecommendationsForService: now expects 6 recs (one per {term} × {payment} combo) and asserts the full combo set is present. - TestGetRecommendationsForService_QueriesBothTerms → TestGetRecommendationsForService_QueriesEveryCombo: regression expanded to assert all 6 (term, payment) pairs were requested against Cost Explorer, so a future change that quietly drops a combo fails the test even if the surviving recs parse correctly. Cost note: at the actual scheduler cadence (daily 2 AM cron + at most one opportunistic refresh per 6h window when users are active, so ~5 refreshes/day per account), 5 services × 6 combos × 5 refreshes/day ≈ 4,500 Cost Explorer requests/month/account. That's ~3,500 above the 1,000-request free tier, or ~$35/month/account at $0.01/request. Worth the user-visible "every term × payment is selectable" win. * fix(aws): short-circuit Cost Explorer fan-out on canceled context Addresses CodeRabbit's actionable finding on PR #195: the (term, payment) loop in GetRecommendationsForService was treating a canceled / deadline-exceeded ctx as a per-combo failure to be tolerated. If earlier combos succeeded before cancellation, the function would return partial data with a nil error; if all combos ran after cancellation, it would burn 6× the wasted Cost Explorer attempts and aggregate them into "all variants failed", masking the real cancellation reason. Fix: after each c.GetRecommendations(ctx, params) error, check ctx.Err() first. If non-nil, return immediately so the caller sees the ctx error verbatim. Per-combo errors (throttle, 5xx) keep the existing skip-and-continue tolerance. Same pattern applied to GetAllRecommendations's per-service block, which had the identical risk: a canceled ctx during one service's fan-out would silently skip to the next service rather than short-circuiting. Test: TestGetRecommendationsForService_ContextCancelShortCircuits — pre-cancel a ctx, mock CE to return context.Canceled on every call, assert the function (a) returns context.Canceled verbatim via errors.Is and (b) issues at most one CE call before bailing (rather than marching through all 6 combos). * fix(aws): short-circuit pre-canceled recommendation fetches Check context cancellation before each Cost Explorer recommendation variant request so a pre-canceled context returns context.Canceled without touching the API path. * fix(db): renumber recommendations engine migration Move the recommendations engine-key migration from 000042 to 000043 so it no longer collides with the retry-linkage migration now present on the PR base. * fix(aws): avoid partial results on cancellation Return nil with ctx.Err from GetAllRecommendations cancellation paths so callers cannot accidentally treat partial results as valid.
…205) (#206) * fix(db): idempotently restore retry_execution_id column dropped by migration drift (closes #204, closes #205) Production /api/purchases/* surfaced 500 errors after PR #161 deployed, with the underlying CloudWatch error: ERROR: column "retry_execution_id" does not exist (SQLSTATE 42703) User-facing symptoms (dashboard "Upcoming Scheduled Purchases" panel): * "Failed to load purchase details: Internal server error" (#205) * "Failed to cancel purchase" (#204) Both buttons hit `Handler.GetExecutionByID` whose SELECT references the retry_execution_id and retry_attempt_n columns introduced by PR #168. Root cause: migration version drift. * PR #189 added 000042_recommendations_add_engine_to_key earlier on the same day. Deployed Lambdas applied it as version 42. * PR #168 then added 000042_purchase_executions_retry_linkage, colliding on version 42. * PR #195 resolved the on-disk duplicate by renaming RECOMMENDATIONS to 000043 — the wrong direction. Deployed DBs already had schema_migrations.version=42 marked as the engine content, so on the next migrate.Up() they: - skipped on-disk 000042 (now retry_linkage) entirely (DB version >= 42 already) - applied on-disk 000043 (now engine again) as a no-op via the ADD COLUMN IF NOT EXISTS guards already in that file Net effect: the retry_execution_id / retry_attempt_n columns were never added on prod, but the Go code expects them. Forward-only corrective. The new 000044 replays 000042's schema additions wrapped in IF NOT EXISTS / CREATE INDEX IF NOT EXISTS so it's a no-op on any DB that already has the columns (fresh deploys, manually-fixed envs) and additive on the broken ones. We don't try to repair schema_migrations history — golang-migrate only cares about the high-water version, and we can't tell from outside whether a given environment skipped 42 or applied 43 twice. The migration up + down sql files document the incident and the edge cases inline so future readers don't have to re-derive the history from the git log. Closes #204 Closes #205 * fix(db): repair partially-fixed DBs in 000044, not just no-op (CR pass 1) CR pass 1 on PR #206 flagged that the original 000044 only added columns when missing — a manually-fixed DB with the column but no FK, or with retry_attempt_n nullable, would still be broken after this migration. Rewrite as five idempotent steps that each check pg_catalog / information_schema for the precise piece they own, so every reachable partial state is repaired: 1. Add retry_execution_id column if missing. 2. Add the FK on retry_execution_id -> execution_id if no FK on that column exists. Match by column relationship (pg_constraint join pg_attribute) NOT by constraint name, so a fresh-deploy DB whose 000042 ran cleanly with the auto-generated `purchase_executions_retry_execution_id_fkey` is recognised as already-FK'd and we don't add a second. 3. Add retry_attempt_n column if missing (as nullable, so step 4 can backfill before flipping to NOT NULL). 4. Backfill any NULLs and SET DEFAULT 0 / SET NOT NULL unconditionally — Postgres treats those as no-ops when the constraint is already as requested. 5. Create the partial index if missing. Down stays simple: DROP COLUMN ... CASCADE removes the FK regardless of its name. Comments inline document each reachable partial state and why each step is safe to re-run. Migration is a no-op on healthy DBs, fully repairing on broken ones.
#224) (#231) * fix(frontend/recs): enforce one-variant-per-cell radio selection (closes #224) After PR #195 (issue #188) the recommendations refresh fans out across 2 terms × 3 payments per `(provider, account, service, region, resource_type, engine)` cell — up to 6 alternative rec rows per physical resource. They're alternatives, not additions, but the checkboxes had no mutual-exclusion logic, producing wrong purchase intent in two ways: 1. **Manual checking**: a user could check `1yr/all-upfront` AND `3yr/no-upfront` for the same EC2 m5.large in the same account. Both fed into the purchase plan → double commitment for one resource. 2. **`select-all` was the destructive form**: every visible row got added, so 3 cells × 6 variants = 18 commitments instead of the 3 the user expected when clicking "buy what's recommended". 6× the intended spend per cell. # What changes - New `cellKey(rec)` helper returns the `(provider, cloud_account_id, service, region, resource_type, engine)` prefix (same as the scheduler ID encoding from #189, minus the `(term, payment)` suffix). Recs sharing this key are alternatives for the same physical resource. - Per-row checkbox change handler now enforces radio behaviour: on check, scan the visible-recs list for any sibling in the same cell that's already selected, deselect it FIRST, then add the new rec. Cells are independent — selecting in cell X doesn't touch cell Y's selection. - `select-all` handler rewritten: clear current selection, then call new `pickBestVariantPerCell(recs)` which groups by cell and picks the variant with the highest **effective monthly savings**: `effective = savings - (upfront_cost / (term * 12))`. Amortizing the upfront over the commitment term means a 3yr/all-upfront with a huge lump-sum doesn't beat a 1yr/no-upfront just on raw `savings`. Sibling issue #223 will replace this tiebreaker with "matches resolved GlobalConfig.DefaultTerm + DefaultPayment" when it lands; until then, amortized savings is the right deterministic default. # Out of scope (deliberate) - **Native `<input type="radio" name="cell-X">` markup**. Per the issue: "Designer call — if cell-grouping (sibling issue #226) lands first, radios become visually correct." Stayed with checkboxes-with-radio-behaviour for this PR; markup switch waits for cell-grouping visual. - **Default-select per cell from settings** (sibling #223). # Tests 4 new tests in `frontend/src/__tests__/recommendations.test.ts` inside the `'Issue #224: one-variant-per-cell radio selection'` describe block: (a) Manual toggle within a cell — checking variant B with variant A already selected: A is removed, B is added. Sibling A was not also added; B was not also removed. (b) Cross-cell independence — selecting in cell X must NOT remove cell Y's existing selection. (c) `select-all` collapses 18 → 3: 3 cells × 6 variants. After click, exactly 3 add calls; clearSelectedRecommendations was called first to drop stale state. (d) Tiebreaker pin — single cell with 3 variants whose `(savings, upfront_cost, term)` produce known amortized values ($200, $300, $400). The middle variant wins ($400 effective) despite the high-upfront variant having $1200 raw savings, proving the amortization is actually computed. `tsc --noEmit` clean. `npx jest --testPathPatterns="recommendations"` exit 0. `npm run build` (webpack production) exit 0. * fix(frontend/recs): address CR feedback on PR#231 — term clamp + full-set sibling eviction - pickBestVariantPerCell: fix term=0 clamp from Math.max(1, r.term*12) to Math.max(1, r.term||1)*12 so a malformed zero-term rec is treated as 1yr (12 months) not 1 month; keeps the effective-savings winner correct (CR actionable: 1-month fallback could skew per-cell winner selection) - checkbox change handler: iterate state.getRecommendations() (full loaded set) instead of filtered `recommendations` when evicting same-cell siblings, so variants hidden by a column filter are also deselected, preserving the one-variant-per-cell contract across filter changes (CR actionable: hidden siblings could remain selected, breaking invariant) - test: update issue-#224 test (a) and (b) to mock state.getRecommendations(); add new regression test for the hidden-by-filter scenario * fix(frontend): address issue 224 review comments
Summary
Closes #187. Closes #188.
Both bugs share a single root cause in
internal/scheduler/scheduler.go:827: the rec-record ID was a SHA-256 of a 5-tuple(provider, service, region, resource_type, payment)— omittingterm,account, andengine. Two recs that differed only in those fields collapsed to the same record ID. Downstream:data-rec-id. The selection toggle keys on that, so a click on one box flips both. Most visible on Azure where Advisor frequently emits multiple recs per cell varying only by subscription.What changed
The hash key now includes the parsed integer
term,rec.Account, and the extractedengine:Provider verification —
rec.Accountis populated by every provider:rec.Account = aws.ToString(details.AccountId)rec.Account = accountIDAccount: r.subscriptionIDAccount: c.projectIDTests
TestScheduler_ConvertRecommendations_HashUniqueness— 4 subtests (term, account, payment, engine) each seeding two recs differing in only that field and asserting distinct IDs.TestScheduler_ConvertRecommendations_HashDeterminism— same input across two calls produces the same ID (so frontend selection state survives a refresh).toggling one row does NOT flip a sibling row with a distinct ID— symptom-level test for fix(recommendations): selecting one Azure row checkbox toggles two rows together #187.Persisted-state safety
No FK on
recommendations.id. Persisted purchase rows duplicate the rec snapshot inline rather than referencing rec_id by FK. Suppression rows are keyed by(account, provider, service, region, resource_type, engine)perRecommendationRecord.SuppressedCountdoc, not by rec.id. Fresh collections produce fresh IDs and the old ones are simply replaced — no migration needed.Test plan
🤖 Generated with claude-flow
Summary by CodeRabbit
Bug Fixes
Tests
Chores