feat(cli): size purchases by target RI/SP utilization (--target-utilization, closes #338)#339
Conversation
Adds four fields to common.Recommendation to carry the utilization data needed by the upcoming --target-utilization CLI flag: - AverageInstancesUsedPerHour / RecommendedUtilization — the AWS-returned signal (read from the CE response by the recommendations parser in a follow-up commit). - ProjectedUtilization / ProjectedCoverage — populated by the sizing step after we pick our own quantity. Wire-format compatible: all four fields are omitempty so existing consumers (frontend, CSV writer) ignore them until they're populated.
Populates the utilization fields added to common.Recommendation in the previous commit: - parser_ri.go: new parseRIUtilizationSignals helper, called from parseRecommendationDetail. Reads AverageNumberOfInstancesUsedPerHour and AverageUtilization from the existing CE response — no new API call. Nil pointer or parse failure leaves the field at zero (the sizing path treats zero as "no signal" and skips). - parser_sp.go: reads EstimatedAverageUtilization via the existing parseOptionalFloat helper and stores it in RecommendedUtilization. AverageInstancesUsedPerHour is left at zero for SPs — the field isn't applicable; SP sizing uses the utilization value directly. Tests in parser_ri_test.go and parser_sp_test.go cover both happy paths and the nil/unparseable fallbacks.
Adds the user-facing CLI surface for issue #338. The flag accepts a % in [0, 100]; default 0 means "disabled, fall back to --coverage". Sizing logic lands in a follow-up commit. - Config.TargetUtilization (cmd/main.go) - --target-utilization / -u cobra flag with help text spelling out the inversion ("higher target = fewer commitments bought") - Range validation extracted into validateTargetUtilization (keeps validateNumericRanges under the gocyclo 10-complexity threshold); explicit info-log when both --target-utilization and --coverage are set on the same invocation (target wins). Uses cobra's Changed() rather than default-comparison so --coverage 80 (which equals the default) still triggers the notice. validateNumericRanges now takes the *cobra.Command for Changed() access; existing test callers updated to pass nil where the flag-source-detection branch is irrelevant. New TestValidateTargetUtilization covers the new range checks and the both-flags-set path.
The core sizing logic for --target-utilization: - RIs: n_target = floor(AverageInstancesUsedPerHour / (target/100)), capped at rec.Count (we never exceed AWS's recommended ceiling). Drop with explanatory INFO log when n_target == 0 (target unreachable, per the issue AC). Pass through unmodified when avg <= 0 (no signal); counted in a single end-of-run WARN summary. - SPs: when AWS already projects at-or-above target, no scaling. Else ratio = RecommendedUtilization / target (< 1); scale exactly the fields ApplyCoverage scales on the SP branch (SavingsPlanDetails.HourlyCommitment + EstimatedSavings — NOT CommitmentCost/OnDemandCost). Discrimination via IsSavingsPlan(rec.Service) to match the existing ApplyCoverage convention. - Routing: new applySizing(recs, cfg, coverage) helper picks between ApplyTargetUtilization and ApplyCoverage based on cfg.TargetUtilization. The explicit `coverage` parameter lets the CSV path pass its csvModeCoverage-substituted value while the main path passes cfg.Coverage. Helper-wiring at the call sites lands in the next commit. Per-rec dispatch extracted into applyTargetUtilizationOne to keep ApplyTargetUtilization under gocyclo's complexity threshold. Tests cover: RI happy path / AWS-ceiling cap / target unreachable drop / no signal pass-through / target-met-exactly; SP no-op / scale-down / no-signal pass-through; cost fields untouched for RIs and partial cost fields untouched for SPs (so divergence with ApplyCoverage's column set is impossible without a failing test); applySizing routing logic.
Routes the two sites that previously called applyCommonCoverage directly through the new applySizing helper, so --target-utilization takes effect in both pipelines: - cmd/multi_service.go: CSV-input path. The CSV mode short-circuit (skip sizing when csvModeCoverage == 100) is updated to NOT skip when target-utilization is set — TargetUtilization applies regardless of the coverage value. - cmd/multi_service_helpers.go: main recommendation-fetch path (applyCoverageAndOverrides). Drops the per-mode coverage call in favour of applySizing; emits a 🎯 vs 📈 log line based on which sizing mode fired. applyCommonCoverage stays in place (called by applySizing's else branch) so the legacy coverage path is byte-for-byte unchanged when --target-utilization is unset.
Adds three columns to the dry-run / purchase CSV emitted by writeMultiServiceCSVReport: - ProjectedUtilization — populated by ApplyTargetUtilization on the sized rec (RI: avg/n_bought * 100, capped at 100; SP: target or AWS's RecommendedUtilization when no scaling needed). - ProjectedCoverage — RI only (n_bought / avg * 100, capped at 100); blank for SPs since the CE response doesn't expose total-demand-$. - RecommendedUtilization — AWS's projected utilization for the full recommendation count (RI AverageUtilization or SP EstimatedAverageUtilization). Zero values render as blank cells via the new formatPercentOrBlank helper — zero means "unknown / not applicable" (e.g. when running on the legacy coverage path), and "0.0" cells would mislead.
Adds the new flag to the Purchase Configuration table and spells out the semantic difference vs `--coverage` plus the inversion (higher target → fewer commitments). The `--help` copy was added in the earlier flag-declaration commit and is sufficient at the CLI; this is for the README's reference table that's the typical first stop for new users.
Two SP-branch bugs surfaced by post-implementation review of #338: 1. **Details=nil set ProjectedUtilization anyway.** When rec.Details was not a *SavingsPlanDetails (a defensive log-and-pass-through path), the function still set ProjectedUtilization = targetPct on its way out. Result: CSV row showed target utilization while cost fields stayed unscaled — misleading the operator. Now returns the rec unscaled with zero projection metrics on the Details-type mismatch. 2. **HourlyCommitment=0 with positive utilization signal slipped past the no-signal guard.** Cost Explorer occasionally returns SP recs with a $0 HourlyCommitment placeholder. The ratio-scaling branch then did 0 * ratio = 0 while still claiming target utilization was achieved. Added an explicit `HourlyCommitment <= 0` no-signal check above the scaling block — pass through, count in skip summary. Also fills the AC's "target == 100 → only purchases with perfectly- matched existing usage" test gap (TestApplyTargetUtilization_RI_Target100). Verifies the avg < 1 drop case, avg=1.0 buy-1 boundary, avg=10.0 perfect-match, and the avg > N truncation-with-clamp case.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a target-coverage sizing mode, fetches/applies existing RI coverage, implements RDS family NU sizing and expiry-aware adjustments, parses utilization signals for RI/SP, updates CSV/reporting, adds CLI flags and validation, and includes unit tests integrating these changes into the recommendation pipeline. ChangesTarget Coverage & Sizing Pipeline
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cmd/validators_test.go (1)
417-423: ⚡ Quick win“Both flags explicitly set” branch is not actually exercised.
Line 453 passes
nilcmd, so thecmd.Flags().Changed("coverage")path never runs. This leaves the precedence-log gate untested.Suggested test adjustment
import ( + "fmt" "os" "path/filepath" "strings" "testing" "github.com/LeanerCloud/CUDly/pkg/common" + "github.com/spf13/cobra" ) @@ - toolCfg = Config{Coverage: tt.coverage, TargetUtilization: tt.target} - err := validateNumericRanges(nil) + toolCfg = Config{Coverage: tt.coverage, TargetUtilization: tt.target} + cmd := &cobra.Command{Use: "test"} + cmd.Flags().Float64("coverage", 80, "") + if tt.name == "target + coverage both set, both valid" { + if err := cmd.Flags().Set("coverage", fmt.Sprintf("%v", tt.coverage)); err != nil { + t.Fatalf("failed to mark coverage as changed: %v", err) + } + } + err := validateNumericRanges(cmd)Also applies to: 450-454
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/validators_test.go` around lines 417 - 423, The test TestValidateTargetUtilization isn't exercising the "both flags explicitly set" branch because it passes nil for cmd so cmd.Flags().Changed("coverage") never runs; update the test to create a real cobra.Command (or equivalent command type used in validators_test.go), register the "coverage" flag, and mark it as set (e.g., via cmd.Flags().Set("coverage", "<value>") or by calling the appropriate Changed mechanism) before calling ValidateTargetUtilization so the cmd.Flags().Changed("coverage") path executes; apply the same fix to the related case around lines 450-454 to ensure the precedence-log gate is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/helpers_test.go`:
- Around line 922-930: The test table includes wantProjCovGTE but the test body
never asserts projected coverage; update the test loop in cmd/helpers_test.go to
check the function result's ProjectedCoverage (or the field name used in the
returned recommendation/result) and assert that if tests[i].wantProjCovGTE > 0
then result.ProjectedCoverage >= tests[i].wantProjCovGTE; do this alongside the
existing projected utilization check so projected coverage regressions are
caught (apply the same addition for the other similar table near lines 992-997
that also declares wantProjCovGTE).
---
Nitpick comments:
In `@cmd/validators_test.go`:
- Around line 417-423: The test TestValidateTargetUtilization isn't exercising
the "both flags explicitly set" branch because it passes nil for cmd so
cmd.Flags().Changed("coverage") never runs; update the test to create a real
cobra.Command (or equivalent command type used in validators_test.go), register
the "coverage" flag, and mark it as set (e.g., via cmd.Flags().Set("coverage",
"<value>") or by calling the appropriate Changed mechanism) before calling
ValidateTargetUtilization so the cmd.Flags().Changed("coverage") path executes;
apply the same fix to the related case around lines 450-454 to ensure the
precedence-log gate is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 00f8f8b7-ce2e-43d4-b506-c5bacddfffd5
📒 Files selected for processing (15)
README.mdcmd/helpers.gocmd/helpers_test.gocmd/main.gocmd/multi_service.gocmd/multi_service_csv.gocmd/multi_service_csv_test.gocmd/multi_service_helpers.gocmd/validators.gocmd/validators_test.gopkg/common/types.goproviders/aws/recommendations/parser_ri.goproviders/aws/recommendations/parser_ri_test.goproviders/aws/recommendations/parser_sp.goproviders/aws/recommendations/parser_sp_test.go
RenderSummary printed "estimated savings $X/yr" but rec.EstimatedSavings is monthly (sourced from AWS EstimatedMonthlySavingsAmount in parser_ri.go). The lower stats block already says "/mo" correctly, so the two lines disagreed about the same value. Also relabel the cost field as "upfront commitment" since CommitmentCost is the one-time upfront portion, not a recurring/lifetime figure. Refs #338.
generatePurchaseID always took cfg.Coverage (default 80), so dry-runs sized by --target-utilization=70 still emitted IDs containing "80pct" — a misleading audit signal that would carry into real --purchase runs. Extract effectiveSizingPct(cfg) so target wins over coverage when set, and route all three generatePurchaseID call sites through it. The function signature now names the parameter sizingPct to make the expectation explicit at the call site. No format change (still "Xpct"), since the audit log records the source flag separately. Refs #338.
Two changes that travel together because the algorithm reinterpretation is what motivates the rename. Algorithm change (under-buy): Previously the flag computed N = floor(avg / target) and capped at rec.Count, which made it a no-op whenever the target was below AWS's implied ~99% projected utilization (target=70 on a 99%-utilization fleet produced 99% projected utilization, not 70). Replaced by N = floor(avg * target/100) for RIs and a target/100 scalar on HourlyCommitment / EstimatedSavings for SPs. The result deliberately under-buys versus AWS's 100%-coverage baseline, leaving (100-target)% of historical demand on-demand. RI projected utilization stays near 100% (RIs are always fully used when count < avg), and projected coverage approximates target. Rename: --target-utilization -> --target-coverage (CLI flag, Config.TargetCoverage, ApplyTargetCoverage / apply* helpers, validateTargetCoverage, all log messages, doc comments, README, test names). The flag never targeted a utilization figure under any operationally useful interpretation; the rename matches what the math actually produces and what operators expect when they pass --target-coverage=70. Verified against shift-prd: --target-coverage=70 on Aurora yields ProjectedUtilization=100 and ProjectedCoverage clustered at 38-70% across rows (under target by floor's effect on small counts). --target-coverage=80 on plain RDS yields the same shape with coverage clustered at 66-79%. Refs #338.
…olumns Under under-buy --target-coverage sizing, ProjectedUtilization is 100% on every RI row (count <= avg, so RIs are always full) and RecommendedUtilization is ~99-100% on every row (AWS sizes for that). Both columns add noise without information; only ProjectedCoverage varies meaningfully across rows. Keep the underlying fields on common.Recommendation: ApplyTargetCoverage's SP branch still reads RecommendedUtilization as a no-signal guard, and the algorithm still sets ProjectedUtilization for completeness even though no caller renders it. Internal API stays the same; only the on-disk CSV shrinks. Refs #338.
The Count column shows the count we'll actually purchase (post-sizing). Without a separate column for AWS's pre-sizing recommendation, the dry- run CSV loses the "what AWS proposed" signal that operators want when deciding how aggressively to under-buy via --target-coverage. Add RecommendedCount to common.Recommendation (preserves AWS's pre-sizing count before ApplyCoverage / ApplyTargetCoverage mutates Count) and emit it in the CSV right next to Count. SP recs leave the field at zero (SPs are dollar-denominated, not count-denominated) and the column renders blank, matching the existing "0 = unknown / not applicable" convention. Refs #338.
The Recommendation struct's CommitmentCost field holds AWS's upfront for the pre-sizing RecommendedCount; ApplyCoverage and ApplyTargetCoverage deliberately leave it untouched when shrinking Count, so rendering the field directly would over-state the upfront whenever the user under-buys. Emit a new UpfrontPayment column that pro-rates CommitmentCost by Count/RecommendedCount for RIs. SP recs (RecommendedCount == 0) emit CommitmentCost as-is — SP sizing scales HourlyCommitment/EstimatedSavings but leaves the upfront untouched, a pre-existing inconsistency to track separately. Blank cell when CommitmentCost is zero, matching the "0 = unknown / not applicable" convention. Refs #338.
ApplyCoverage and ApplyTargetCoverage previously mutated Count (RIs) or HourlyCommitment (SPs) but left CommitmentCost, OnDemandCost, and (for RIs) EstimatedSavings at AWS's pre-sized values. Every downstream consumer (reporter table, summary line, audit log, CSV writer) then displayed sized counts alongside pre-sized cost, over-stating upfront and savings whenever a user under-buys. Scale all cost-bearing fields at sizing time by the same ratio that shrinks Count / HourlyCommitment: - ApplyCoverage RI: ratio = coverage/100 - ApplyCoverage SP: ratio = coverage/100 - ApplyTargetCoverage RI: ratio = nTarget/rec.Count (the actual count ratio) - ApplyTargetCoverage SP: ratio = targetPct/100 SavingsPercentage stays untouched (savings-vs-on-demand ratio is count-invariant). RecommendedCount preserves AWS's pre-sized count for RIs so the dry-run CSV can still surface AWS's proposal alongside the sized values. CSV writer simplified: the temporary formatUpfrontForSizedCount helper (which pro-rated CommitmentCost at render time) is replaced by formatCurrencyOrBlank, since rec.CommitmentCost now reflects the sized purchase directly. Tests updated to assert the scaled values; the prior "must not scale" assertions were locking in the bug. Refs #338.
…rage sizing The under-buy formula used floor(avg * target/100), which ignored any existing commitments in the same pool: a user with 50% of demand already covered by 10 owned RIs, asking for target=80%, would get a recommendation to buy floor(0.80 * 20) = 16 more — an overbuy of 10 vs the correct 6. Replace the formula with the coverage-gap form: gap_pct = target_pct - existing_coverage_pct n_target = floor(avg * gap_pct / 100) Drop the rec entirely (info log) when existing already meets or exceeds the target. Drop when the gap exists but is too small for a single RI. Subtraction is kept in percentage units before dividing by 100, so whole-percent inputs (e.g. 70 - 60) don't lose precision to float subtraction near integer boundaries. ExistingCoveragePct is a new field on common.Recommendation that defaults to zero, so recs without the fetched signal degenerate to the old no-existing-commitments path (floor(avg * target/100)). The actual CE GetReservationCoverage fetch is split into a follow-up commit so the formula and CSV plumbing can land in isolation. ProjectedCoverage now reflects TOTAL coverage (existing + new contribution) rather than just the new purchase's share, so operators can verify they hit the target they asked for. A new ExistingCoverage CSV column shows the input the sizing used; blank-when-zero matches the other optional columns. Refs #338.
Add GetRICoverageMap on the recommendations Client, plus the ApplyCoverageMapToRecommendations helper that joins the fetched map back onto each Recommendation.ExistingCoveragePct in place. The fetcher calls Cost Explorer's GetReservationCoverage grouped by REGION + INSTANCE_TYPE (CE allows at most 2 GroupBy dimensions), pages through with the existing rate-limiter pattern that GetRIUtilization established, and returns a map keyed by "region:instance_type" (lowercased). Missing pools are absent from the map; the apply helper leaves ExistingCoveragePct at zero for unmatched recs, which the sizing path treats as "no signal" and falls back to the no-existing-commitments formula. Finer pool dimensions (OS, tenancy for EC2; engine, multi-AZ for RDS) are aggregated together because the CE API can't slice all of them in one call. That's a known imprecision for mixed pools but the dominant case (single-engine, single-AZ-config workloads) gets accurate sizing. Pipeline wiring lands in a follow-up so the formula and CSV plumbing can stay independently testable. Three existing mocks (mockCostExplorerAPI, mockCostExplorerForSP, mockCostExplorerClient) get a GetReservationCoverage stub so they continue to satisfy the CostExplorerAPI interface. Refs #338.
Phase 1 of runToolMultiService now fetches existing-pool coverage from
Cost Explorer once per run, threads the resulting PoolCoverageMap down
through fetchAllRecs into applyCoverageAndOverrides, and lets the
sizing path subtract pool coverage that's already owned. Recs whose
pool already meets/exceeds --target-coverage are dropped with an info
log; the rest get sized to fill just the remaining coverage gap.
Implementation notes:
- CE's GetReservationCoverage returns only EC2 coverage when no SERVICE
filter is set, so the fetcher loops once per (EC2, RDS, ElastiCache,
OpenSearch, Redshift, MemoryDB) service. The dimension values match
CE's canonical SERVICE-dimension strings.
- GroupBy uses REGION + INSTANCE_TYPE. CE returns attribute keys in
camelCase ("region", "instanceType") despite the GroupBy input
accepting SCREAMING_SNAKE_CASE — the lookup tolerates both forms.
- Metrics is "Hour" (not "HoursPercentage" — CE rejects that as a
ValidationException); the percentage field is included automatically
inside the Hour coverage block.
- Skipping the CE fetch entirely when --target-coverage <= 0 avoids
the per-call charge on --coverage-only runs.
- Best-effort: a transient CE failure logs a warning and the pipeline
continues with an empty map, which makes sizing degenerate to the
no-existing-commitments path (matches behavior when a pool has no
fetched coverage entry).
- Legacy processService entry (test-only) passes nil — those tests
never had coverage data either.
Verified against shift-prd: a --target-coverage=70 dry-run on Aurora
shows ExistingCoverage values like 46.4% / 38.5% / 36.6% on rows where
existing RDS RIs cover part of demand, with new Count sized to fill
just the gap (e.g. db.t4g.medium us-east-1: AWS rec 25 → cov 46.4% +
buy 5 = 67% total).
Refs #338.
RDS RI list prices for Multi-AZ are roughly 2x Single-AZ. Without a Deployment column operators couldn't verify the upfront figure was sized for the deployment they expected; they had to cross-reference the AWS console. extractDeployment pulls AZConfig from *common.DatabaseDetails (or its value form for the CSV-loader path), emitting blank for non-RDS recs. Column slots between Engine and Instances so the (region, type, engine, deployment) tuple reads as a unit.
) Family-NU sizing is now driving RDS buys but the math was invisible in the CSV — operators couldn't tell at a glance which recs belonged to the same size-flex family or how much NU each rec contributed. Family column shows the family prefix (e.g. db.r7g) so rows in the same size-flex bucket sort together. NormalizedUnits shows rec.Count × NU(size) (e.g. 15 × db.r7g.large = 60 NU), surfacing the math AWS rec API uses to bundle .xlarge demand into .large RIs. RDSFamilyFromType and RDSInstanceNUFromType added as exported wrappers over the unexported helpers in providers/aws/recommendations so the CSV writer can reuse the canonical NU scale rather than duplicating it.
) For partial-upfront / no-upfront RIs the upfront figure is only part of the total cost; the recurring per-month fee multiplied by the term needs to be known to compute the real spend and ROI. AWS CE already exposes RecurringStandardMonthlyCost and CUDly's parser stores it in rec.RecurringMonthlyCost, but it was never emitted in the CSV. The sizing steps (per-pool ApplyTargetCoverage{RI,SP} and family-NU scaleRDSRecInFamily) now scale the pointer-valued monthly cost by the same ratio they use for CommitmentCost / OnDemandCost / EstimatedSavings, allocating a new pointer so the original rec target isn't mutated. formatRecurringMonthlyOrBlank renders nil as blank (AWS didn't return the field — unknown) and 0.0 as "0.00" (definitely no recurring fee, e.g. all-upfront) so operators can distinguish the two cases.
Each rec in a multi-rec family-NU group was reporting its own contribution only (existing% + this rec's new NU / family total NU), so a family with two per-account recs each adding 20% of family NU would show "existing + 20%" on both rows instead of the true "existing + 40%". Misleading when operators eyeball a row to verify the family will land at the target. Split sizing into two passes: scaleFamilyRecs handles per-rec count and cost scaling; annotateFamilyProjection sums new NU across every surviving rec and writes the same cumulative ProjectedCoverage onto each one. Each row now reflects "where the family will land" rather than just "where my slice of it lands".
…338) ExistingCoverage previously rendered blank for two semantically distinct cases: CE returned 0% for the pool (legitimate zero coverage, pool exists and is uncovered) and CE returned nothing for the pool (rec surfaced from some other AWS signal, CE's lookback hadn't seen the pool yet). Operators couldn't tell which one a blank cell meant. Add ExistingCoverageKnown bool to common.Recommendation, set by ApplyCoverageMapToRecommendations and ApplyFamilyNUSizingRDS whenever a pool lookup actually hits. formatExistingCoverage renders "n/a" when Known=false, "0.0" when Known=true with zero pct, and "X.X" otherwise. The "n/a" cells now flag recently-launched or fluctuating pools where AWS rec API has fresher data than CE coverage. Buying for them is legitimate; the column just makes the data-source mismatch visible.
Data rows previously came out in whatever order AWS rec API returned them, so a $20K row could sit below a $50 row, forcing operators to scroll or sort manually before reviewing budget impact. Sorting by UpfrontPayment descending surfaces the biggest decisions first. TOTAL row at the bottom aggregates Count, NormalizedUnits, UpfrontPayment, RecurringMonthlyCost (when any rec has one), and EstimatedSavings so operators get the run-level totals inline rather than copy-pasting columns into a spreadsheet. "TOTAL" label lands in the Service column for easy spotting; per-rec identifiers (Account, CommitmentID, Timestamp, percentages) stay blank in that row since they don't aggregate meaningfully.
…338) Closes two CodeRabbit nits on PR #339: 1. **TestApplyTargetCoverage_RI: `wantProjCovGTE` was dead test data.** The field was declared on the struct but never populated and never asserted on. Now populated for the three keep-the-rec cases (94/50/75% respectively, derived from `nTarget/avg*100` with existing=0), zero on the drop and pass-through cases (matching the `wantProjUtil` "0 means don't assert" convention), and asserted with a `>= want - epsilon` check that tolerates the float clamp at 100. 2. **TestValidateTargetCoverage: "both flags set" branch was untested.** Passing nil cmd skips the `cmd.Flags().Changed("coverage")` check entirely, so the precedence info-log gate had zero coverage. Added `useCobraCmd` to the test struct; the "both set" case now builds a real cobra command, marks `--coverage` as Changed via Set, and runs validation against it. The log line still isn't asserted (stderr capture isn't worth the friction here), but the branch is now reached.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/helpers.go`:
- Around line 159-164: The Count and cost fields are inconsistent because
newCount is an integer but costs are still multiplied by the original floating
`ratio`; change the cost-scaling to use a discrete count ratio computed from
`newCount` and `rec.Count` (e.g., `discreteRatio = float64(newCount) /
float64(rec.Count)`) and apply that to `adjusted.CommitmentCost`,
`adjusted.OnDemandCost`, and `adjusted.EstimatedSavings`; also guard against
`rec.Count == 0` to avoid divide-by-zero and keep `adjusted.Count = newCount` as
before.
In `@providers/aws/recommendations/expiry.go`:
- Around line 74-75: The expiringByPool lookup is missing matches because
commitmentPoolKey omits the deployment component while lookupPoolKey (used for
recs[i]) is deployment-aware; update the key construction so both sides use the
same deployment-aware format: either add a deployment field to common.Commitment
and include it in commitmentPoolKey, then rebuild keys consistently, or change
the lookup in the loop (around expCount, ok :=
expiringByPool[lookupPoolKey(recs[i])]) to attempt a safe fallback (if the full
deployment-aware key is not found, try a deployment-empty variant) so RDS pools
aren’t silently missed; apply the same consistent key strategy to the other
lookup sites mentioned (lines ~110-113).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d0bbeb9-a90b-488d-a4bd-c9c7624b77f0
📒 Files selected for processing (29)
README.mdcmd/helpers.gocmd/helpers_test.gocmd/main.gocmd/main_test.gocmd/multi_service.gocmd/multi_service_csv.gocmd/multi_service_csv_test.gocmd/multi_service_filters.gocmd/multi_service_helpers.gocmd/validators.gocmd/validators_test.gointernal/reporter/reporter.gopkg/common/types.goproviders/aws/recommendations/client.goproviders/aws/recommendations/client_test.goproviders/aws/recommendations/coverage.goproviders/aws/recommendations/coverage_test.goproviders/aws/recommendations/expiry.goproviders/aws/recommendations/expiry_test.goproviders/aws/recommendations/family_nu.goproviders/aws/recommendations/family_nu_test.goproviders/aws/recommendations/parser_ri.goproviders/aws/recommendations/parser_ri_test.goproviders/aws/recommendations/parser_sp.goproviders/aws/recommendations/parser_sp_additional_test.goproviders/aws/recommendations/parser_sp_test.goproviders/aws/service_client.goproviders/aws/service_client_test.go
✅ Files skipped from review due to trivial changes (2)
- providers/aws/recommendations/client_test.go
- README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- providers/aws/recommendations/parser_sp_test.go
- providers/aws/recommendations/parser_sp.go
- providers/aws/recommendations/parser_ri_test.go
…rage (#338) ApplyCoverage's RI branch truncated newCount to an integer but multiplied cost-bearing fields by the unrounded coverage ratio, desynchronising Count and costs. Example: rec.Count=3 + coverage=50% yields newCount=1 (33% of instances) but costs scaled to 50%, overstating the sized purchase price by ~50%. Use sizedRatio = newCount / rec.Count for cost scaling, matching the discipline already in ApplyTargetCoverage and ApplyFamilyNUSizingRDS. RecurringMonthlyCost gets the same scaling so partial / no-upfront plans line up with their sized count too. Fixes CodeRabbit finding on cmd/helpers.go:164.
Recommendation pool keys are deployment-aware for RDS (region:instance_type:engine:deployment), but commitmentPoolKey emitted "" for the deployment slot because common.Commitment lacked the field. Result: RDS commitments expiring within --rebuy-window-days silently failed to match any rec's deployment-aware lookup key, so the expiring share was never deducted from ExistingCoveragePct. CUDly would under-buy replacements for expiring RIs. Adds Deployment to common.Commitment, populates it from ReservedDBInstance.MultiAZ in the RDS service client, and uses it in commitmentPoolKey. Non-RDS services leave Deployment empty so their 2-part pool keys still match. Adds an expiry test that proves a Single-AZ commitment only adjusts the Single-AZ rec (not the Multi-AZ rec in the same instance/region/engine pool). Fixes CodeRabbit finding on providers/aws/recommendations/expiry.go:75.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
providers/aws/recommendations/expiry_test.go (1)
208-239: ⚡ Quick winRename this subtest to match what it actually asserts.
The current name says “skipped” but the assertions validate the opposite (
n == 1). Aligning the title/comments with expected behavior avoids misleading failures later.✏️ Suggested edit
- t.Run("already-expired commitments are skipped", func(t *testing.T) { + t.Run("already-expired active commitments are counted if within cutoff", func(t *testing.T) { // A commitment with EndDate in the past was already filtered by the - // State check upstream, but defensively the function should treat - // past dates as "not in the window" too. Past < cutoff but - // (past < now) means the RI is no longer providing coverage and - // shouldn't be deducted from current ExistingCoveragePct (which - // presumably already excludes it). + // State check upstream. Current behavior intentionally counts any + // commitment marked active when EndDate <= cutoff, even if EndDate < now.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@providers/aws/recommendations/expiry_test.go` around lines 208 - 239, The subtest title is misleading: it says "already-expired commitments are skipped" but the assertions check that an active commitment with EndDate before the cutoff is counted (n == 1, ExistingCoveragePct adjusted). Rename the t.Run description to reflect that active commitments with EndDate within the cutoff are treated as expiring and included (e.g., "active commitments before cutoff are counted as expiring"), and update the inline comment to match this intended behavior; reference AdjustExistingCoverageForExpiringCommitments, the recs/commits setup, pastDate, and the 30-day cutoff when making the label/comment change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@providers/aws/recommendations/expiry_test.go`:
- Around line 208-239: The subtest title is misleading: it says "already-expired
commitments are skipped" but the assertions check that an active commitment with
EndDate before the cutoff is counted (n == 1, ExistingCoveragePct adjusted).
Rename the t.Run description to reflect that active commitments with EndDate
within the cutoff are treated as expiring and included (e.g., "active
commitments before cutoff are counted as expiring"), and update the inline
comment to match this intended behavior; reference
AdjustExistingCoverageForExpiringCommitments, the recs/commits setup, pastDate,
and the 30-day cutoff when making the label/comment change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bf1280af-c886-480a-9f28-722e5466e671
📒 Files selected for processing (6)
cmd/helpers.gocmd/helpers_test.gopkg/common/types.goproviders/aws/recommendations/expiry.goproviders/aws/recommendations/expiry_test.goproviders/aws/services/rds/client.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/common/types.go
- providers/aws/recommendations/expiry.go
- cmd/helpers_test.go
- cmd/helpers.go
…ginated loop (#338) fetchCoverageForRDSEngineRegion and fetchCoverageForServiceRegion each had their own ~15-line input struct + 12-line paginated loop + a service-specific accumulator (accumulateRDSEngineGroups vs accumulateGroups). Both loops were identical except for which pool-key shape they wrote to the map. fetchCoveragePaged owns the loop and invokes a per-row record callback; each fetch path now just builds its input and passes a closure that writes the right key shape. rdsEngineRegionFilter and serviceRegionFilter extract the (engine, region) and (service, region) filter expressions so the input struct literal stays readable. Net ~60 lines removed; the two accumulator helpers were deleted entirely since their work now lives in fetchCoveragePaged's record callback. No behaviour change — 4410 tests pass.
…paths (#338) Four sizing paths independently scaled the same four cost-bearing fields (CommitmentCost, OnDemandCost, EstimatedSavings, RecurringMonthlyCost) by a ratio, including the same allocate-new-pointer dance for the RecurringMonthlyCost pointer to avoid mutating the upstream rec target: - ApplyCoverage SP branch - ApplyCoverage RI branch - ApplyTargetCoverageRI - ApplyTargetCoverageSP - ApplyFamilyNUSizingRDS / scaleRDSRecInFamily Same 7-line pattern duplicated five places. ScaleRecommendationCosts in pkg/common returns the scaled copy in one call so each sizing site shrinks to one line. The pointer-safe alloc lives in the helper. No behaviour change; the SP branches still scale HourlyCommitment in their copied Details separately because that's an SP-specific detail field, not a generic rec cost field. 4410 tests pass.
…tempty on user/group slices (closes #349, #350) (#353) * feat(db): migration 000050 drops users_one_admin index (#349) Removes the partial unique index on users(role) WHERE role='admin' that blocked multi-admin support. The TOCTOU race the index was added to close (two concurrent SetupAdmin calls on a fresh install) is now closed in the service layer via an atomic INSERT-WHERE-NOT-EXISTS (coming in the next commit). The down migration restores the index for rollback compatibility but the rest of the #349 changes must be rolled back as a unit — keeping just the index restore re-introduces the original 500 bug on the /api/users path. * feat(auth): add validation error sentinels for handler 4xx mapping (#349) Five sentinel errors that subsequent commits wrap into validateCreateUserRequest, SetupAdmin, and CreateUser. The API handler maps each to a status code: ErrInvalidEmail → 400 ErrInvalidRole → 400 ErrPasswordPolicy → 400 ErrEmailInUse → 409 ErrAdminExists → 409 Without this, every validation failure falls through to a generic 500 + "Internal server error" toast, hiding the real cause from the operator (see issue #349 repro on PR #339's reviewer feedback). * feat(auth): wrap validateCreateUserRequest errors as sentinels (#349) Switch the four plain fmt.Errorf returns to use the sentinels added in the previous commit. The error messages stay byte-for-byte identical (the sentinels were named to match), so existing tests that assert.Contains on the message strings continue to pass — only the underlying error type changes from a fresh fmt.Errorf each call to a shared sentinel that the API handler can detect via errors.Is. The role and password paths use %w wrapping so both the category (ErrInvalidRole / ErrPasswordPolicy) and the specific detail ("invalid role: guest" / "password does not meet policy: must be at least 12 characters") survive in the user-facing message. Defence-in-depth note: the handler at handler_users.go:51-56 already short-circuits any role outside {"admin","user"} with a 400 before reaching CreateUserAPI, so ErrInvalidRole is unreachable on the HTTP path. The sentinel is kept anyway for non-HTTP callers (CLI / future internal use). Widening the handler switch to admit "read_only" is out of scope for #349. * feat(auth): atomic CreateAdminIfNone + SetupAdmin race fix (#349) Adds an atomic INSERT-WHERE-NOT-EXISTS to PostgresStore so SetupAdmin no longer relies on the users_one_admin partial unique index (dropped in migration 000050) to close the bootstrap TOCTOU race. Store contract (Store.CreateAdminIfNone): (true, nil) inserted; this caller is now the admin. (false, nil) admin already existed; row not inserted. (false, ErrEmailInUse) email collides with an existing non-admin. (false, err) any other DB failure (wrapped). SetupAdmin keeps the upfront AdminExists() fast path (common case: no DB round-trip wasted on a no-op insert when an admin is already there) and uses CreateAdminIfNone as the atomic guard for the race window between the existence check and the insert. Both branches surface ErrAdminExists so the API handler maps either route to 409. Validation errors switched to the sentinels from the previous commit (ErrInvalidEmail, ErrPasswordPolicy wrapping the specific validatePassword message via the standard errors.Is-friendly wrap). The duplicate-key test was rewritten end-to-end to match the new flow: mock CreateAdminIfNone returning (false, nil), assert ErrorIs against ErrAdminExists, keep the assertion that "failed to create admin" must NOT appear in the user-facing message. mockAuthStoreForHealth in the server test package also gets the new method as a no-op so the StoreInterface contract stays satisfied. * fix(api): drop omitempty on user/group slice fields (#350) The Admin Users page crashes with `Failed to load users and groups` whenever a user has no group memberships, or a group has no account restrictions. Root cause: `omitempty` on APIUser.Groups and APIGroup.AllowedAccounts strips the field from the JSON when the slice is empty/nil, but the frontend TS contract declares both as required `string[]`. Renderers read `user.groups.length` / map over the array without a guard, hitting "Cannot read properties of undefined (reading 'length')" in an Array.map callback. The catch-all in users/userActions.ts then shows the generic toast. Two-layer fix: 1. **Backend (the actual contract bug)**: - Remove omitempty from APIUser.Groups and APIGroup.AllowedAccounts. - In userToAPIUser / groupToAPIGroup, substitute []string{} for nil so the JSON encoder emits `[]` rather than `null`. (omitempty suppresses both empty slices AND nil, so removing it isn't enough on its own — a nil slice would serialise as `null`, which the frontend doesn't know how to handle either.) - Tests in service_api_test.go assert the JSON output is `[]`, not absent and not `null`, for nil and empty-slice inputs alike. 2. **Frontend (defence-in-depth)**: - normalizeUser / normalizeGroup at the loadUsers() boundary coerce any non-array value back to `[]`, so a future contract regression degrades gracefully instead of crashing the page. Folded into the #349 PR because it touches internal/auth and the user flow is the same. Standalone closure for #350. * feat(api,auth): map auth sentinels to 4xx in user/admin endpoints (#349) The validation-error sentinels added earlier in this branch are now actually consumed by the HTTP handlers. Three pieces: 1. Service.CreateUser gains defence-in-depth email-uniqueness mapping after the store insert. validateCreateUserRequest's pre-check handles the common case; this catches the TOCTOU window where two concurrent callers race past the pre-check and the loser hits the users_email_key constraint. 2. internal/api/handler_users.go gets mapAuthError — a shared helper that pattern-matches via errors.Is on the auth package sentinels and wraps each in the right ClientError: ErrInvalidEmail / ErrInvalidRole / ErrPasswordPolicy -> 400 ErrEmailInUse / ErrAdminExists -> 409 Unrecognised errors fall through to handleRequestError's generic 500 path (unchanged for non-auth errors). createUser's call site now wraps via this helper. 3. The setupAdmin handler in handler_auth.go does the same. Before this, even a clean ErrAdminExists from the bootstrap path produced a generic 500 — every validation failure on the setup endpoint was indistinguishable from a real internal error to the operator. Both endpoints share the helper since they share the sentinel set. End-to-end effect: the Admin Users page (and the bootstrap setup flow) now display the real cause of a failed create/setup attempt. "email already in use" / "admin user already exists" / "invalid role: guest" / "password does not meet policy: must be at least 12 characters" all reach the user as 4xx with the message preserved in the response body for the frontend toast. * fix(auth,frontend): address CodeRabbit pass-1 findings on #353 Three findings, all real: 1. **Frontend: render calls bypassed the normaliser.** `loadUsers()` computed the normalised `groups` array but the subsequent `renderGroups()` and `renderPermissionMatrix()` calls still passed the raw `groupsResponse.groups`, so the defence-in-depth guard from the previous commit only protected the state setters, not the actual render path. Both call sites now use the normalised constant. 2. **Store: NOT EXISTS clause out of sync with AdminExists.** The atomic conditional INSERT in CreateAdminIfNone checked `role='admin'`, but AdminExists() checks `role='admin' AND active=true`. Inconsistency meant a deployment with only inactive admins would pass the fast-path check but the insert would fail silently on the WHERE clause, leaving the operator with a recoverable-looking error and no admin. Both clauses now match. 3. **Store: role hard-coded instead of trusted from caller.** The method's contract is "create admin if none", but the INSERT was forwarding `user.Role` verbatim. A non-admin role would silently slip through (the WHERE clause would still let the insert through because a non-admin row doesn't trip the "admin exists" check). The INSERT now hard-codes `'admin'` for the role column, matching the method name's promise. All three are pure store/render fixes — no test surface change needed; existing tests continue to pass (`go test ./...` clean, `npx tsc --noEmit` clean).
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@providers/aws/recommendations/coverage.go`:
- Around line 177-205: The new CE-based coverage flow (GetRICoverageMap and its
helpers fetchCoverageForServiceRegion / fetchCoverageForRDSEngineRegion) must be
decoupled from the existing --target-utilization path: revert the
target-utilization code path to continue using utilization from the AWS
Recommendation payloads and do not call GetReservationCoverage/GetRICoverageMap
when target-utilization is active; instead make CE coverage an opt-in feature or
separate flag/flow. Concretely, update the code paths that assemble
recommendation sizing to check the target-utilization flag and avoid invoking
GetRICoverageMap or GetReservationCoverage there (keep fetchCoverage... only
used by the new coverage feature), and add a clear conditional gate or separate
method so users without CE permissions or wanting the original behavior are not
forced to incur CE calls or IAM changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7aa85c29-a7af-40e5-9415-91229ff17eeb
📒 Files selected for processing (4)
cmd/helpers.gopkg/common/types.goproviders/aws/recommendations/coverage.goproviders/aws/recommendations/family_nu.go
🚧 Files skipped from review as they are similar to previous changes (2)
- providers/aws/recommendations/family_nu.go
- cmd/helpers.go
| func (c *Client) GetRICoverageMap(ctx context.Context, lookbackDays int, regions []string) (PoolCoverageMap, error) { | ||
| if lookbackDays <= 0 { | ||
| lookbackDays = 30 | ||
| } | ||
| end := time.Now().UTC() | ||
| start := end.AddDate(0, 0, -lookbackDays) | ||
| startStr := start.Format("2006-01-02") | ||
| endStr := end.Format("2006-01-02") | ||
| windowHours := float64(lookbackDays * 24) | ||
|
|
||
| out := make(PoolCoverageMap) | ||
| // Cost: one CE call per (service, region) for non-RDS; one CE call per | ||
| // (engine, region) for RDS. For a 5-service × 23-region survey + 7 | ||
| // engines × 23 regions ≈ 270 calls (~$2.70/run). Empty regions return | ||
| // quickly so the bound is loose in practice. | ||
| for _, region := range regions { | ||
| for _, service := range coverageServiceFilters { | ||
| if err := c.fetchCoverageForServiceRegion(ctx, startStr, endStr, windowHours, service, region, out); err != nil { | ||
| return nil, fmt.Errorf("fetching coverage for service %q region %q: %w", service, region, err) | ||
| } | ||
| } | ||
| for _, engine := range rdsCoverageEngines { | ||
| if err := c.fetchCoverageForRDSEngineRegion(ctx, startStr, endStr, windowHours, engine, region, out); err != nil { | ||
| return nil, fmt.Errorf("fetching coverage for RDS engine %q region %q: %w", engine, region, err) | ||
| } | ||
| } | ||
| } | ||
| return out, nil | ||
| } |
There was a problem hiding this comment.
Keep the target-utilization path off the new Cost Explorer coverage flow.
This adds a new GetReservationCoverage dependency and then rewrites the recommendation utilization signal with CE-derived pool averages, but the stated goal for --target-utilization was to size from utilization already present in AWS recommendation payloads without extra API calls. As written, this changes latency/cost, adds new IAM requirements, and can regress users who currently have enough permissions for recommendations but not coverage. Please split this coverage-based sizing into a separate feature or keep it out of the target-utilization path.
Also applies to: 399-446
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@providers/aws/recommendations/coverage.go` around lines 177 - 205, The new
CE-based coverage flow (GetRICoverageMap and its helpers
fetchCoverageForServiceRegion / fetchCoverageForRDSEngineRegion) must be
decoupled from the existing --target-utilization path: revert the
target-utilization code path to continue using utilization from the AWS
Recommendation payloads and do not call GetReservationCoverage/GetRICoverageMap
when target-utilization is active; instead make CE coverage an opt-in feature or
separate flag/flow. Concretely, update the code paths that assemble
recommendation sizing to check the target-utilization flag and avoid invoking
GetRICoverageMap or GetReservationCoverage there (keep fetchCoverage... only
used by the new coverage feature), and add a clear conditional gate or separate
method so users without CE permissions or wanting the original behavior are not
forced to incur CE calls or IAM changes.
There was a problem hiding this comment.
Thanks for flagging — this finding is based on a misread of the current code; politely declining the suggested revert.
The flag is --target-coverage, not --target-utilization. The PR renamed it during the design iteration because the semantics changed: "utilization" (how full each RI is) is always ~100% under the new sizing, while "coverage" (what % of demand is covered) is the user-facing knob. CE GetReservationCoverage is the canonical source for that signal — the AWS console's reservations-coverage report comes from the same API. There's no way to compute existing-pool coverage from the rec API payload alone (it returns per-account-curated recs, not org-wide pool aggregates).
The CE coverage fetch is already opt-in. cmd/multi_service.go:40:
func fetchExistingCoverage(...) recommendations.PoolCoverageMap {
if cfg.TargetCoverage <= 0 {
return nil
}
...
}Users on the legacy --coverage path pay no CE cost and need no new IAM. ce:GetReservationCoverage is required only when --target-coverage is set, which is documented in the README and surfaces as an opt-in feature gate.
Why the legacy --coverage path can't reuse the new approach. --coverage operates on AWS rec.Count directly (scale all counts by coverage/100). It doesn't subtract existing coverage and was never meant to — it's the simpler "buy X% of what AWS suggests" knob. The new --target-coverage is a different semantic: "size so the pool reaches Y% covered including existing RIs", which structurally requires the coverage signal.
Closing this thread as dismissed with justification.
There was a problem hiding this comment.
@cristim, thank you for the detailed and precise correction — you're right on all counts. I apologize for the misread: I conflated --target-utilization (the original name from the issue) with the renamed --target-coverage flag, and I missed the opt-in gate in cmd/multi_service.go. The architectural reasoning is sound:
GetReservationCoverageis the only AWS-native way to get org-wide pool coverage (rec payloads are per-account and curated, not raw pool aggregates).- The CE call is strictly gated to
cfg.TargetCoverage > 0, so the legacy--coveragepath incurs no new IAM requirements or API cost. - The two flags are semantically distinct ("buy X% of suggestions" vs. "size so the pool reaches Y% covered"), and each correctly uses the appropriate data source.
Dismissed — no changes needed.
✏️ Learnings added
Learnt from: cristim
Repo: LeanerCloud/CUDly PR: 339
File: providers/aws/recommendations/coverage.go:177-205
Timestamp: 2026-05-13T21:30:32.481Z
Learning: In the CUDly project (LeanerCloud/CUDly), the CLI flag for target-based coverage sizing is `--target-coverage` (not `--target-utilization`). The PR renamed it during design iteration: "utilization" (how full each RI is) stays ~100% under the new sizing, while "coverage" (% of demand covered) is the user-facing knob. `ce:GetReservationCoverage` is required only when `--target-coverage` is set, gated by `cfg.TargetCoverage <= 0` in `cmd/multi_service.go`. The legacy `--coverage` path is unaffected and pays no CE cost or IAM change.
Learnt from: cristim
Repo: LeanerCloud/CUDly PR: 339
File: providers/aws/recommendations/coverage.go:177-205
Timestamp: 2026-05-13T21:30:32.481Z
Learning: In CUDly, `GetReservationCoverage` (via `GetRICoverageMap`) is the canonical source for org-wide pool coverage signals needed by `--target-coverage` sizing. The AWS rec API (`GetReservationPurchaseRecommendation`) only returns per-account curated recommendations, not org-wide pool aggregates, so it cannot substitute for `GetReservationCoverage` when computing existing coverage across the pool.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
…behavior (#338) The subtest was titled "already-expired commitments are skipped" but the assertions verify the opposite: an active commitment with EndDate in the past IS counted (n==1, ExistingCoveragePct adjusted from 80 to 30). The cutoff check is EndDate <= cutoff, which past dates satisfy trivially; the upstream State filter is what gates whether a commit is providing coverage right now. Rename to "active commits with EndDate before cutoff are counted (incl. past dates)" and rewrite the inline comment to document that the State filter is the only "is this currently providing coverage" gate, so a future reader doesn't try to add an "EndDate > now" guard and silently change the semantics. Fixes CodeRabbit nitpick on expiry_test.go:208-239.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
--target-utilization(-u) that sizes RI/SP purchases by projected post-purchase utilization instead of--coverage% of recommendations.AverageNumberOfInstancesUsedPerHour,AverageUtilization,EstimatedAverageUtilization) — no new AWS API call.ProjectedUtilization,ProjectedCoverage,RecommendedUtilization.Closes #338.
Why a separate sizing mode
--coverage 80is "buy 80% of each recommendation's instance count." It answers "how much of demand do we cover?" but ignores "how much of what we buy will actually be used?" In bursty/seasonal workloads,--coverage 80can still over-buy, leaving commitments idle.--target-utilization 95is "buy enough that projected utilization stays ≥ 95%." Counter-intuitively, higher target ⇒ fewer commitments bought (the math isused÷bought; raising the target shrinks the bought side). Use it when under-buying-and-spilling-to-on-demand beats over-buying-and-wasting-commitment.The two modes are mutually exclusive: when
--target-utilizationis set,--coverageis ignored with an info log.Math
n_target = floor(AverageInstancesUsedPerHour / (target/100)), capped atrec.Count(never exceed AWS's recommended ceiling). Drop with explanatory INFO log whenn_target == 0(target unreachable). Pass through unmodified when avg is zero (no signal); counted in a single end-of-run skip summary.RecommendedUtilization >= target, no scaling (AWS already projects above target). Elseratio = RecommendedUtilization / target(always < 1); scalesSavingsPlanDetails.HourlyCommitmentandrec.EstimatedSavings— exactly the fieldsApplyCoveragescales on the SP branch, so the two modes stay structurally consistent. Pass through whenRecommendedUtilization <= 0ORHourlyCommitment <= 0(CE placeholder rows).Changes
7d7baf6d1pkg/common: 4 new fields on Recommendationbbbc36a247d83ec948cmd/main.go: flag + Config field + validation79cf1a5adcmd/helpers.go: ApplyTargetUtilization + applySizing9468b0aafcmd/multi_service*.go: route both call sites through applySizing5f1e982e1d40001f6a2bded0b15Test plan
go build ./...cleango test ./...— full suite passes (no regressions in 30+ existing packages)cudly --helpshows--target-utilizationwith the inversion-warning copyTestApplyTargetUtilization_RIcovers happy path / AWS-ceiling cap / target unreachable drop / no signal pass-through / target met exactlyTestApplyTargetUtilization_RI_Target100covers the AC's "target == 100 → only purchases with perfectly-matched existing usage" boundaryTestApplyTargetUtilization_RI_CostScaling— cost fields untouched on RI branch (matches ApplyCoverage)TestApplyTargetUtilization_SP— no-op / scale-down / no-signalTestApplyTargetUtilization_SP_NoSignalGuards—HourlyCommitment=0+ wrong-Details-type edge casesTestApplySizing— routes based oncfg.TargetUtilization > 0TestValidateTargetUtilization— range + both-flags-set behaviourTestParseRIUtilizationSignals+TestParseSavingsPlanDetail_RecommendedUtilization— fall back to zero on nil/unparseableTestWriteMultiServiceCSVReport_UtilizationColumns— header + blank-when-zero formattingDesign notes
ProjectedCoverageAC field, wrong parser function for SPs (parseSavingsPlanDetail, notparseRecommendationDetail), wrong enum identifier names, wrong SP cost-field list, missing explanatory log for unreachable target, and two SP edge cases in this branch. All fixed before the push.Summary by CodeRabbit