Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a CommitmentUUID field to reservation types and CRD, threads it through reservation creation/patching and sync logic, refactors commitment-change processing for deterministic iteration and aggregated failure handling with configurable watch timeouts, expands tests into an integration-style suite, and integrates gotestsum into Makefile and docs. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,220,255,0.5)
participant Client as Client (Change API)
participant HTTPAPI as HTTPAPI (with Config)
participant Scheduler as Commitment Processor
participant K8s as Kubernetes API (Reservations)
participant Watcher as Reservation Watcher
end
Client->>HTTPAPI: POST /v1/commitments/change-commitments
HTTPAPI->>Scheduler: enqueue/process commitments (uses Config)
Scheduler->>K8s: List/Create/Patch Reservations (set CommitmentUUID)
K8s-->>Scheduler: return reservation objects
Scheduler->>Watcher: watch reservations until ready (timeout/poll from Config)
alt reservations ready
Watcher-->>Scheduler: ready events
Scheduler->>HTTPAPI: respond success
else failures/timeouts
Watcher-->>Scheduler: failure/timeout
Scheduler->>Scheduler: accumulate failedCommitments (reasons)
Scheduler->>K8s: best-effort rollback (delete/patch)
Scheduler->>HTTPAPI: respond with aggregated rejection reasons
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/scheduling/reservations/commitments/reservation_manager.go (1)
208-235:⚠️ Potential issue | 🟠 MajorClear stale metadata fields when the desired state unsets them.
This patch path is still one-way: it only updates
AvailabilityZone,StartTime, andEndTimewhen the desired value is non-empty/non-nil. If a commitment drops an AZ restriction or expiry, the old value stays on the Reservation and later syncs keep reconciling against stale metadata. Compare against the full desired state and explicitly clear fields that are no longer set.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/reservation_manager.go` around lines 208 - 235, The current patch only writes non-empty desired fields and leaves old values in reservation.Spec, causing stale AZ/StartTime/EndTime (and commitment) to persist; modify the update block in reservation_manager.go (around the MergeFrom/reservation.Spec mutations) to mirror the full desired state: if state.AvailabilityZone == "" set reservation.Spec.AvailabilityZone = ""; if state.StartTime == nil set reservation.Spec.StartTime = nil; if state.EndTime == nil set reservation.Spec.EndTime = nil; and similarly clear reservation.Spec.CommittedResourceReservation.CommitmentUUID when state.CommitmentUUID == "" so the Reservation is explicitly cleared when the commitment/unset fields are removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 704-706: processedReserv is keyed only by reservation name so
recreated reservations with the same name are skipped; change the tracking to
include the reservation's unique identifier (e.g., use reservation.UID or a
name+UID composite) instead of just name in the processedReserv map and/or clear
the entry when a reservation delete is processed; update usages in
processReservations(), the reservation delete handler, and any tests that
reference processedReserv to use the new key (e.g., switch map[string]bool to
map[string]bool where the key is uidKey(reservation) or explicitly
delete(processedReserv, oldKey) on delete) so recreated reservations are
processed anew.
- Around line 373-380: The test suite fails to assert RetryAt behavior: update
the "Knowledge not ready - clear rejection with RetryAt" test (the
ExpectedAPIResponse created by newAPIResponse) to set RetryAtPresent=true and
then modify VerifyAPIResponse() to actually check
APIResponseExpectation.RetryAtPresent (and validate presence/absence of RetryAt
accordingly); ensure the same fix is applied to the other unasserted cases that
use APIResponseExpectation (where RetryAtPresent is relevant) so the test fails
if the retry hint is dropped.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 249-272: watchReservationsUntilReady can return non-empty errors
but no failedReservations, causing requireRollback=true while failedCommitments
stays empty and resp.RejectionReason is left blank; update the rollback path
that checks the (failedReservations, errors) result from
watchReservationsUntilReady so that if failedCommitments is empty but the
returned errors slice/map is non-empty you populate resp.RejectionReason with a
fallback message that summarizes those watch errors (or list reservationsToWatch
/ still-pending reservation IDs so caller can map to commitment UUIDs); do the
same change for the other similar watch handling block that follows the same
pattern.
- Around line 42-45: The readiness loop using watchTimeout and pollInterval only
removes one reservation per poll because each ready/failed branch removes an
entry from reservationsToWatch and then breaks, causing timeouts for larger
batches; modify the loop that processes reservationsToWatch (the readiness loop
in api_change_commitments.go) to scan and remove all ready or failed
reservations in a single poll iteration instead of breaking after the first
removal — e.g., iterate over reservationsToWatch fully (or build a new slice of
remaining items) and only sleep/retry after processing the entire slice; keep
watchTimeout and pollInterval semantics but eliminate the premature break so
multiple ready items are retired per poll.
---
Outside diff comments:
In `@internal/scheduling/reservations/commitments/reservation_manager.go`:
- Around line 208-235: The current patch only writes non-empty desired fields
and leaves old values in reservation.Spec, causing stale AZ/StartTime/EndTime
(and commitment) to persist; modify the update block in reservation_manager.go
(around the MergeFrom/reservation.Spec mutations) to mirror the full desired
state: if state.AvailabilityZone == "" set reservation.Spec.AvailabilityZone =
""; if state.StartTime == nil set reservation.Spec.StartTime = nil; if
state.EndTime == nil set reservation.Spec.EndTime = nil; and similarly clear
reservation.Spec.CommittedResourceReservation.CommitmentUUID when
state.CommitmentUUID == "" so the Reservation is explicitly cleared when the
commitment/unset fields are removed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: efb68466-eed7-49c3-869a-10881d9e3d7a
📒 Files selected for processing (7)
Makefileapi/v1alpha1/reservation_types.gohelm/library/cortex/files/crds/cortex.cloud_reservations.yamlinternal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/reservation_manager.gointernal/scheduling/reservations/commitments/state.go
internal/scheduling/reservations/commitments/api_change_commitments_test.go
Outdated
Show resolved
Hide resolved
| availableResources map[string]int64 // host -> available memory MB | ||
| processedReserv map[string]bool // track processed reservations | ||
| } |
There was a problem hiding this comment.
Recreated reservations are skipped by the fake controller.
processedReserv is keyed only by reservation name and never cleared on delete. Rollback and repair paths reuse deterministic slot names, so a recreated reservation is skipped in processReservations() and never receives a fresh Ready/Failed state. That makes the rollback scenarios optimistic.
Suggested fix
func (c *FakeReservationController) OnReservationDeleted(res *v1alpha1.Reservation) {
+ delete(c.env.processedReserv, res.Name)
if c.env.availableResources != nil && res.Status.Host != "" {
memoryQuantity := res.Spec.Resources["memory"]
memoryBytes := memoryQuantity.Value()Also applies to: 717-729, 1016-1022, 1036-1040
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`
around lines 704 - 706, processedReserv is keyed only by reservation name so
recreated reservations with the same name are skipped; change the tracking to
include the reservation's unique identifier (e.g., use reservation.UID or a
name+UID composite) instead of just name in the processedReserv map and/or clear
the entry when a reservation delete is processed; update usages in
processReservations(), the reservation delete handler, and any tests that
reference processedReserv to use the new key (e.g., switch map[string]bool to
map[string]bool where the key is uidKey(reservation) or explicitly
delete(processedReserv, oldKey) on delete) so recreated reservations are
processed anew.
internal/scheduling/reservations/commitments/api_change_commitments.go
Outdated
Show resolved
Hide resolved
internal/scheduling/reservations/commitments/api_change_commitments.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (2)
719-720:⚠️ Potential issue | 🟠 MajorRecreated reservations can still be skipped after delete.
Line 1053 marks by
res.Name, and Line 731-743 does not clearprocessedReservon deletion. Recreated reservations with reused names can be ignored.🔧 Minimal fix
func (c *FakeReservationController) OnReservationDeleted(res *v1alpha1.Reservation) { + delete(c.env.processedReserv, res.Name) if c.env.availableResources != nil && res.Status.Host != "" { memoryQuantity := res.Spec.Resources["memory"] memoryBytes := memoryQuantity.Value()Also applies to: 731-743, 1030-1033, 1053-1053
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 719 - 720, The test uses processedReserv map keyed by res.Name and never clears entries when a reservation is deleted, so if a new reservation is recreated with the same name it will be treated as already processed; update the cleanup logic to remove the reservation's key from processedReserv when the test deletes a reservation (or when handling deletion events) instead of leaving stale entries—specifically remove processedReserv[res.Name] (or use a unique identifier instead of Name) in the code paths around the delete/recreate sequence referenced by processedReserv and res.Name.
374-381:⚠️ Potential issue | 🟡 Minor
RetryAtPresentis still unasserted in response verification.Line 564 introduces
RetryAtPresent, butVerifyAPIResponse()never checks it, so retry-hint regressions can still pass unnoticed.Also applies to: 561-565, 1141-1166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 374 - 381, The test adds RetryAtPresent to ExpectedAPIResponse but VerifyAPIResponse() does not assert it; update VerifyAPIResponse (used by commitments/api_change_commitments_test.go) to compare the actual API response's RetryAtPresent vs the expected ExpectedAPIResponse.RetryAtPresent field (and add appropriate nil/false handling) so tests like the case "Knowledge not ready - clear rejection with RetryAt" actually validate the retry hint; also update other affected test cases (lines around 561-565 and 1141-1166) to ensure their ExpectedAPIResponse includes the correct RetryAtPresent expectation.
🧹 Nitpick comments (1)
Makefile (1)
32-40: Trimtestsumrecipe length to satisfy checkmake.Line 32’s body currently exceeds the configured
maxbodylengthwarning threshold. Consider extracting flags into a variable to keep lint output clean.♻️ Example refactor
+TESTSUM_FLAGS = \ + $(if $(WATCH),--watch) \ + --format $(if $(FORMAT),$(FORMAT),testname) \ + --hide-summary=all + testsum: gotestsum ## Run all tests (clean output for passing, verbose for failing). Options: WATCH=1, RUN=<pattern>, PACKAGE=<pkg>, FORMAT=<fmt> (e.g., standard-verbose for all output) - $(GOTESTSUM) \ - $(if $(WATCH),--watch) \ - --format $(if $(FORMAT),$(FORMAT),testname) \ - --hide-summary=all \ + $(GOTESTSUM) $(TESTSUM_FLAGS) \ -- \ $(if $(VERBOSE),-v) \ $(if $(RUN),-run $(RUN)) \ $(if $(PACKAGE),$(PACKAGE),./...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 32 - 40, The testsum recipe body is too long for checkmake's maxbodylength; shorten it by moving the long flags/arguments into a Makefile variable and using that variable in the testsum target. For example, create a variable like TESTSUM_FLAGS (or TESTSUM_OPTS) that assembles --format $(if $(FORMAT),$(FORMAT),testname) --hide-summary=all -- $(if $(VERBOSE),-v) $(if $(RUN),-run $(RUN)) $(if $(PACKAGE),$(PACKAGE),./...) and the conditional --watch flag, then update the testsum target to call $(GOTESTSUM) $(TESTSUM_FLAGS) so the testsum recipe body is concise while preserving all conditionals and references to GOTESTSUM, WATCH, FORMAT, VERBOSE, RUN and PACKAGE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 731-743: The test helper duplicates returning memory to
env.availableResources in FakeReservationController.OnReservationDeleted and
again in the other deletion path (the block around lines 986-1010), causing
available capacity to be double-credited; fix by consolidating the update so
only one code path adjusts env.availableResources: keep the logic in
FakeReservationController.OnReservationDeleted (which reads
res.Spec.Resources["memory"], converts to MB, and updates
env.availableResources[res.Status.Host]) and remove or guard the duplicate
update in the other function/block (the second deletion handler) so the map is
incremented exactly once per deleted reservation.
- Around line 922-937: The test has a data race between the polling goroutine
that calls env.processReservations() and the interceptor callbacks
(OnReservationDeleted) and processNewReservation which access the same maps
(processedReserv and availableResources); protect all reads/writes to these maps
by adding a sync.RWMutex on the env struct or a dedicated mapLock and acquiring
RLock/RUnlock for readers (e.g., where the test reads processedReserv) and
Lock/Unlock for writers (processNewReservation, OnReservationDeleted, and any
other mutating code paths), or alternatively funnel all mutations through a
single goroutine/channel and stop concurrent direct map access; update
env.processReservations(), processNewReservation(), OnReservationDeleted, and
any test reads of processedReserv/availableResources to use the chosen
synchronization.
- Around line 1066-1073: The helper processNewReservation currently iterates
env.availableResources (a map) to pick selectedHost for the
"First-come-first-serve" logic, which is nondeterministic; change it to select
the first host from a deterministic ordering: build a slice of host names (keys
of env.availableResources), sort it (or use an existing deterministic
env.hostOrder if available), then iterate that ordered slice and pick the first
host whose availableMB >= memoryMB and assign it to selectedHost. Ensure you
update only the selection loop (referencing selectedHost,
env.availableResources, and memoryMB) so tests are deterministic and stable.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 719-720: The test uses processedReserv map keyed by res.Name and
never clears entries when a reservation is deleted, so if a new reservation is
recreated with the same name it will be treated as already processed; update the
cleanup logic to remove the reservation's key from processedReserv when the test
deletes a reservation (or when handling deletion events) instead of leaving
stale entries—specifically remove processedReserv[res.Name] (or use a unique
identifier instead of Name) in the code paths around the delete/recreate
sequence referenced by processedReserv and res.Name.
- Around line 374-381: The test adds RetryAtPresent to ExpectedAPIResponse but
VerifyAPIResponse() does not assert it; update VerifyAPIResponse (used by
commitments/api_change_commitments_test.go) to compare the actual API response's
RetryAtPresent vs the expected ExpectedAPIResponse.RetryAtPresent field (and add
appropriate nil/false handling) so tests like the case "Knowledge not ready -
clear rejection with RetryAt" actually validate the retry hint; also update
other affected test cases (lines around 561-565 and 1141-1166) to ensure their
ExpectedAPIResponse includes the correct RetryAtPresent expectation.
---
Nitpick comments:
In `@Makefile`:
- Around line 32-40: The testsum recipe body is too long for checkmake's
maxbodylength; shorten it by moving the long flags/arguments into a Makefile
variable and using that variable in the testsum target. For example, create a
variable like TESTSUM_FLAGS (or TESTSUM_OPTS) that assembles --format $(if
$(FORMAT),$(FORMAT),testname) --hide-summary=all -- $(if $(VERBOSE),-v) $(if
$(RUN),-run $(RUN)) $(if $(PACKAGE),$(PACKAGE),./...) and the conditional
--watch flag, then update the testsum target to call $(GOTESTSUM)
$(TESTSUM_FLAGS) so the testsum recipe body is concise while preserving all
conditionals and references to GOTESTSUM, WATCH, FORMAT, VERBOSE, RUN and
PACKAGE.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 345fa0c1-14fe-4426-ab58-8838437495bf
📒 Files selected for processing (3)
Makefiledocs/develop.mdinternal/scheduling/reservations/commitments/api_change_commitments_test.go
| func (c *FakeReservationController) OnReservationDeleted(res *v1alpha1.Reservation) { | ||
| if c.env.availableResources != nil && res.Status.Host != "" { | ||
| memoryQuantity := res.Spec.Resources["memory"] | ||
| memoryBytes := memoryQuantity.Value() | ||
| memoryMB := memoryBytes / (1024 * 1024) | ||
|
|
||
| // Call the handler | ||
| api.HandleChangeCommitments(w, req) | ||
| if _, exists := c.env.availableResources[res.Status.Host]; exists { | ||
| c.env.availableResources[res.Status.Host] += memoryMB | ||
| c.env.T.Logf("↩ Returned %d MB to %s (now %d MB available) before deleting reservation %s", | ||
| memoryMB, res.Status.Host, c.env.availableResources[res.Status.Host], res.Name) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Deletion credits host memory in two separate paths.
Line 731-743 and Line 986-1010 both return memory for deleted reservations, which can inflate available capacity and make scheduling outcomes overly optimistic.
Also applies to: 986-1010
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`
around lines 731 - 743, The test helper duplicates returning memory to
env.availableResources in FakeReservationController.OnReservationDeleted and
again in the other deletion path (the block around lines 986-1010), causing
available capacity to be double-credited; fix by consolidating the update so
only one code path adjusts env.availableResources: keep the logic in
FakeReservationController.OnReservationDeleted (which reads
res.Spec.Resources["memory"], converts to MB, and updates
env.availableResources[res.Status.Host]) and remove or guard the duplicate
update in the other function/block (the second deletion handler) so the map is
incremented exactly once per deleted reservation.
internal/scheduling/reservations/commitments/api_change_commitments_test.go
Show resolved
Hide resolved
internal/scheduling/reservations/commitments/api_change_commitments_test.go
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (3)
1118-1123:⚠️ Potential issue | 🟠 MajorHost choice is nondeterministic due map iteration.
The “first-come-first-serve” behavior is currently random because map iteration order is random in Go.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 1118 - 1123, The host selection loop is nondeterministic because it iterates over the map env.availableResources; change it to iterate in a deterministic order by extracting the map keys into a slice, sorting that slice (e.g., alphabetically or by a stable criterion), then loop over the sorted keys to pick the first host with availableMB >= memoryMB and assign selectedHost. Update the loop around env.availableResources and preserve the same comparison using memoryMB and selectedHost.
756-758:⚠️ Potential issue | 🔴 CriticalShared maps are accessed concurrently without synchronization.
availableResourcesandprocessedReservare read/written from the poller goroutine and interceptor callback paths, which can triggerconcurrent map writesand flaky tests.Also applies to: 973-984
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 756 - 758, The test uses shared maps availableResources and processedReserv that are accessed concurrently by the poller goroutine and interceptor callbacks causing data races; protect these maps by adding synchronization (e.g., a sync.Mutex or sync.RWMutex field on the struct that owns availableResources/processedReserv or replace them with sync.Map) and ensure every read/write to availableResources and processedReserv (including in the poller goroutine, interceptor callback functions, and any helpers referenced in the test) acquires the appropriate lock or uses the sync.Map APIs consistently to prevent concurrent map writes.
769-777:⚠️ Potential issue | 🟠 MajorDeletion path credits host memory twice.
Memory is returned in
OnReservationDeleted()and again inprocessReservations()deletion handling, inflating capacity and making scheduling outcomes too optimistic.Also applies to: 1041-1059
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 769 - 777, OnReservationDeleted is returning host memory and processReservations also credits memory on deletion, causing double-counting; remove or guard the credit in OnReservationDeleted so only one path adjusts c.env.availableResources. Specifically, either delete the memory-crediting block in FakeReservationController.OnReservationDeleted (the math using res.Spec.Resources["memory"], Value(), memoryMB and the addition to c.env.availableResources[res.Status.Host]) or add a clear guard (e.g., check a deletion-processed flag or reservation status) so processReservations remains the single place that refunds capacity; ensure references to OnReservationDeleted and processReservations are used to locate and fix the duplicated logic.
🧹 Nitpick comments (3)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (1)
868-871: Usemaps.Copyfor cloningPerHost.This loop is a manual map copy; prefer
maps.Copyfor clarity and consistency with project rules.As per coding guidelines
Use maps.Copy instead of iteratively copying a map.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 868 - 871, Replace the manual loop that clones resources.PerHost into availableResources with a single maps.Copy call: create availableResources as make(map[string]int64) and call maps.Copy(availableResources, resources.PerHost); also add the appropriate import for the maps package (standard library "maps" or project's approved maps package) if not already imported. This keeps the same semantics but uses maps.Copy for clarity and consistency.internal/scheduling/reservations/commitments/api_change_commitments.go (1)
263-265: Make rollback rejection output deterministic.
failedCommitmentsis a map; iterating it directly makesRejectionReasonorder unstable across runs.♻️ Deterministic message order
- for commitmentUUID, reason := range failedCommitments { + for _, commitmentUUID := range sortedKeys(failedCommitments) { + reason := failedCommitments[commitmentUUID] reasonBuilder.WriteString(fmt.Sprintf("\n- commitment %s: %s", commitmentUUID, reason)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments.go` around lines 263 - 265, The RejectionReason text is non-deterministic because failedCommitments (a map) is iterated directly; to fix, collect the map keys from failedCommitments, sort them (e.g., sort.Strings(keys)), then iterate the sorted keys to append lines to reasonBuilder (instead of ranging the map), and add the necessary import for sort if not present; reference the failedCommitments variable and reasonBuilder.WriteString usage to locate where to change iteration order.internal/scheduling/reservations/commitments/api.go (1)
26-30: Normalize invalid config values inNewAPIWithConfig().Non-positive durations can cause immediate rollbacks or effectively hot polling downstream. Consider clamping to defaults at construction time.
♻️ Proposed hardening
func NewAPIWithConfig(client client.Client, config Config) *HTTPAPI { + defaults := DefaultConfig() + if config.ChangeAPIWatchReservationsTimeout <= 0 { + config.ChangeAPIWatchReservationsTimeout = defaults.ChangeAPIWatchReservationsTimeout + } + if config.ChangeAPIWatchReservationsPollInterval <= 0 { + config.ChangeAPIWatchReservationsPollInterval = defaults.ChangeAPIWatchReservationsPollInterval + } return &HTTPAPI{ client: client, config: config, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api.go` around lines 26 - 30, NewAPIWithConfig currently assigns config as-is which can contain non-positive durations that cause immediate rollbacks or tight polling; update NewAPIWithConfig to normalize/clamp duration fields on the passed Config (e.g., PollInterval, RetryInterval, CommitTimeout or whatever duration fields exist) to sensible defaults before storing on the returned *HTTPAPI. Implement by copying config into a local variable, check each duration field and if <= 0 set it to the appropriate default constant (create DefaultPollInterval/DefaultRetryInterval/DefaultCommitTimeout if they don't exist), then return &HTTPAPI{client: client, config: normalizedConfig}.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 246-248: The loop that populates failedCommitments incorrectly
attributes every entry in failedReservations to "not sufficient capacity";
inspect each reservation failure (from failedReservations produced by
watchReservationsUntilReady) and determine the actual cause—e.g., distinguish
capacity exhaustion vs an error returned from
Get()/watchReservationsUntilReady—and set
failedCommitments[res.Spec.CommittedResourceReservation.CommitmentUUID] to a
message derived from the real failure (propagate the error string or use a
specific "insufficient capacity" marker only when capacity is known), ensuring
you reference failedReservations, failedCommitments,
res.Spec.CommittedResourceReservation.CommitmentUUID and the watcher
(watchReservationsUntilReady/Get) when making the change.
- Around line 350-366: The loop currently only returns when allChecked is true,
but when every reservation is terminal (ConditionFalse) stillWaiting is empty
while allChecked remains false, causing an extra sleep; update the post-loop
termination check in the same block that inspects readyCond/collections
(readyCond, allChecked, stillWaiting, failedReservations, errors) to also return
immediately when len(stillWaiting) == 0 by logging and returning
failedReservations, errors (or set allChecked true) so the function exits
without another poll cycle.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 1118-1123: The host selection loop is nondeterministic because it
iterates over the map env.availableResources; change it to iterate in a
deterministic order by extracting the map keys into a slice, sorting that slice
(e.g., alphabetically or by a stable criterion), then loop over the sorted keys
to pick the first host with availableMB >= memoryMB and assign selectedHost.
Update the loop around env.availableResources and preserve the same comparison
using memoryMB and selectedHost.
- Around line 756-758: The test uses shared maps availableResources and
processedReserv that are accessed concurrently by the poller goroutine and
interceptor callbacks causing data races; protect these maps by adding
synchronization (e.g., a sync.Mutex or sync.RWMutex field on the struct that
owns availableResources/processedReserv or replace them with sync.Map) and
ensure every read/write to availableResources and processedReserv (including in
the poller goroutine, interceptor callback functions, and any helpers referenced
in the test) acquires the appropriate lock or uses the sync.Map APIs
consistently to prevent concurrent map writes.
- Around line 769-777: OnReservationDeleted is returning host memory and
processReservations also credits memory on deletion, causing double-counting;
remove or guard the credit in OnReservationDeleted so only one path adjusts
c.env.availableResources. Specifically, either delete the memory-crediting block
in FakeReservationController.OnReservationDeleted (the math using
res.Spec.Resources["memory"], Value(), memoryMB and the addition to
c.env.availableResources[res.Status.Host]) or add a clear guard (e.g., check a
deletion-processed flag or reservation status) so processReservations remains
the single place that refunds capacity; ensure references to
OnReservationDeleted and processReservations are used to locate and fix the
duplicated logic.
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 868-871: Replace the manual loop that clones resources.PerHost
into availableResources with a single maps.Copy call: create availableResources
as make(map[string]int64) and call maps.Copy(availableResources,
resources.PerHost); also add the appropriate import for the maps package
(standard library "maps" or project's approved maps package) if not already
imported. This keeps the same semantics but uses maps.Copy for clarity and
consistency.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 263-265: The RejectionReason text is non-deterministic because
failedCommitments (a map) is iterated directly; to fix, collect the map keys
from failedCommitments, sort them (e.g., sort.Strings(keys)), then iterate the
sorted keys to append lines to reasonBuilder (instead of ranging the map), and
add the necessary import for sort if not present; reference the
failedCommitments variable and reasonBuilder.WriteString usage to locate where
to change iteration order.
In `@internal/scheduling/reservations/commitments/api.go`:
- Around line 26-30: NewAPIWithConfig currently assigns config as-is which can
contain non-positive durations that cause immediate rollbacks or tight polling;
update NewAPIWithConfig to normalize/clamp duration fields on the passed Config
(e.g., PollInterval, RetryInterval, CommitTimeout or whatever duration fields
exist) to sensible defaults before storing on the returned *HTTPAPI. Implement
by copying config into a local variable, check each duration field and if <= 0
set it to the appropriate default constant (create
DefaultPollInterval/DefaultRetryInterval/DefaultCommitTimeout if they don't
exist), then return &HTTPAPI{client: client, config: normalizedConfig}.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 28e5237c-20a5-4a5e-982a-5c13709a373e
📒 Files selected for processing (4)
internal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/config.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (1)
771-790:⚠️ Potential issue | 🟠 MajorDeletion may double-credit host memory.
OnReservationDeletedreturns memory at lines 776-785, andprocessReservationsalso returns memory for reservations with deletion timestamps at lines 1049-1067. If a reservation has finalizers, both paths execute for the same reservation, causing available capacity to be inflated.🔧 Suggested fix
Track whether memory was already returned, or consolidate to a single path:
func (c *FakeReservationController) OnReservationDeleted(res *v1alpha1.Reservation) { - c.env.mu.Lock() - defer c.env.mu.Unlock() - - // Return memory when Delete() is called directly (before deletion timestamp is set) - if c.env.availableResources != nil && res.Status.Host != "" { - memoryQuantity := res.Spec.Resources["memory"] - memoryBytes := memoryQuantity.Value() - memoryMB := memoryBytes / (1024 * 1024) - - if _, exists := c.env.availableResources[res.Status.Host]; exists { - c.env.availableResources[res.Status.Host] += memoryMB - c.env.T.Logf("↩ Returned %d MB to %s (now %d MB available) via OnReservationDeleted for %s", - memoryMB, res.Status.Host, c.env.availableResources[res.Status.Host], res.Name) - } - } - + c.env.mu.Lock() // Clear tracking so recreated reservations with same name are processed delete(c.env.processedReserv, res.Name) + c.env.mu.Unlock() + // Memory return is handled by processReservations when deletion timestamp is set }This consolidates memory return to
processReservations, which handles reservations after the deletion timestamp is set.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 771 - 790, OnReservationDeleted currently returns memory to env.availableResources and processReservations also returns memory when a deletion timestamp is present, causing double-crediting for reservations with finalizers; remove the memory-return logic from OnReservationDeleted (the block that reads res.Spec.Resources["memory"] and updates env.availableResources) and instead rely on processReservations to perform the single canonical return; ensure processedReserv and any tracking for recreated reservations remain handled by delete(c.env.processedReserv, res.Name) so only processReservations adjusts env.availableResources when it observes res.DeletionTimestamp/finalizers.internal/scheduling/reservations/commitments/api_change_commitments.go (1)
245-247:⚠️ Potential issue | 🟠 MajorDon't map all watch failures to "not sufficient capacity".
failedReservationscontains reservations whereReady=False, but the failure reason could be something other than capacity exhaustion. The actual failure reason should be extracted from the reservation's condition message.🔧 Suggested approach
for _, res := range failedReservations { - failedCommitments[res.Spec.CommittedResourceReservation.CommitmentUUID] = "not sufficient capacity" + reason := "scheduling failed" + readyCond := meta.FindStatusCondition(res.Status.Conditions, v1alpha1.ReservationConditionReady) + if readyCond != nil && readyCond.Message != "" { + reason = readyCond.Message + } + failedCommitments[res.Spec.CommittedResourceReservation.CommitmentUUID] = reason }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments.go` around lines 245 - 247, The loop that populates failedCommitments currently assigns the hardcoded string "not sufficient capacity" for every entry; instead, inspect each reservation's Ready condition (e.g., check res.Status.Conditions for Type == "Ready" and Status == "False") and use the condition's Message (or Reason if Message empty) as the failure text when setting failedCommitments[res.Spec.CommittedResourceReservation.CommitmentUUID]; if no Ready condition or no message is present, fall back to a generic message like "unknown failure". Ensure this change is applied in the same loop that iterates failedReservations so the actual failure reason is recorded.
🧹 Nitpick comments (3)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (2)
1588-1613: Unused function.
newHypervisorWithAZis defined but never called. If it's intended for future tests, consider adding a comment explaining its purpose; otherwise, remove it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 1588 - 1613, The helper function newHypervisorWithAZ is defined but unused; either remove it or document/use it: if it's unnecessary, delete the newHypervisorWithAZ function to avoid dead code; if it's intended for future tests, add a short comment above newHypervisorWithAZ explaining its purpose and why it remains unused or update existing tests to call newHypervisorWithAZ (instead of newHypervisor) where an availability-zone label is needed (referencing the newHypervisorWithAZ symbol to locate the code).
606-609: Unused type definition.
ReservationVerificationis defined but never used in this file. Consider removing it to reduce dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 606 - 609, Remove the unused type ReservationVerification from the test file to eliminate dead code: locate the type declaration "type ReservationVerification struct { Host string; Allocations map[string]string }" and delete it (or, if it was intended to be used, replace its removal by actually using it in relevant tests such as those in api_change_commitments_test.go); ensure no other symbols reference ReservationVerification before removing to avoid compile errors.internal/scheduling/reservations/commitments/api_change_commitments.go (1)
172-172: Use camelCase for variable names.Go convention uses camelCase for local variables. Consider renaming to
allReservations.🔧 Suggested fix
- var all_reservations v1alpha1.ReservationList + var allReservations v1alpha1.ReservationListAlso apply to related references.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments.go` at line 172, The local variable all_reservations should follow Go camelCase conventions; rename the variable to allReservations and update every usage/reference in the surrounding function(s) (e.g., where all_reservations is declared and later read/written) including any return values or assignments that reference it so compilation and behavior remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 1615-1625: The createCommitment function accepts az ...string but
discards it; update it to either persist the AZ into the TestCommitment struct
and propagate it into buildRequestJSON, or remove the az parameter and update
callers (e.g., the "Reservation repair" test) accordingly; specifically, add a
field (e.g., AZ or AvailabilityZone) to TestCommitment, set it in
createCommitment (use az[0] when len(az)>0), and modify buildRequestJSON to
include that field in the generated request, or alternatively delete the az
...string parameter from createCommitment and remove the AZ argument from tests.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 771-790: OnReservationDeleted currently returns memory to
env.availableResources and processReservations also returns memory when a
deletion timestamp is present, causing double-crediting for reservations with
finalizers; remove the memory-return logic from OnReservationDeleted (the block
that reads res.Spec.Resources["memory"] and updates env.availableResources) and
instead rely on processReservations to perform the single canonical return;
ensure processedReserv and any tracking for recreated reservations remain
handled by delete(c.env.processedReserv, res.Name) so only processReservations
adjusts env.availableResources when it observes
res.DeletionTimestamp/finalizers.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 245-247: The loop that populates failedCommitments currently
assigns the hardcoded string "not sufficient capacity" for every entry; instead,
inspect each reservation's Ready condition (e.g., check res.Status.Conditions
for Type == "Ready" and Status == "False") and use the condition's Message (or
Reason if Message empty) as the failure text when setting
failedCommitments[res.Spec.CommittedResourceReservation.CommitmentUUID]; if no
Ready condition or no message is present, fall back to a generic message like
"unknown failure". Ensure this change is applied in the same loop that iterates
failedReservations so the actual failure reason is recorded.
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 1588-1613: The helper function newHypervisorWithAZ is defined but
unused; either remove it or document/use it: if it's unnecessary, delete the
newHypervisorWithAZ function to avoid dead code; if it's intended for future
tests, add a short comment above newHypervisorWithAZ explaining its purpose and
why it remains unused or update existing tests to call newHypervisorWithAZ
(instead of newHypervisor) where an availability-zone label is needed
(referencing the newHypervisorWithAZ symbol to locate the code).
- Around line 606-609: Remove the unused type ReservationVerification from the
test file to eliminate dead code: locate the type declaration "type
ReservationVerification struct { Host string; Allocations map[string]string }"
and delete it (or, if it was intended to be used, replace its removal by
actually using it in relevant tests such as those in
api_change_commitments_test.go); ensure no other symbols reference
ReservationVerification before removing to avoid compile errors.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Line 172: The local variable all_reservations should follow Go camelCase
conventions; rename the variable to allReservations and update every
usage/reference in the surrounding function(s) (e.g., where all_reservations is
declared and later read/written) including any return values or assignments that
reference it so compilation and behavior remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6768eae7-ae84-44b5-8ebe-94aa4a217274
📒 Files selected for processing (2)
internal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_test.go
| // createCommitment creates a TestCommitment for use in test cases. | ||
| // The az parameter is optional - if empty string, no AZ constraint is set. | ||
| func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64, az ...string) TestCommitment { | ||
| return TestCommitment{ | ||
| ResourceName: liquid.ResourceName(resourceName), | ||
| ProjectID: projectID, | ||
| ConfirmationID: confirmationID, | ||
| State: state, | ||
| Amount: amount, | ||
| } | ||
| } |
There was a problem hiding this comment.
The az parameter is accepted but never used.
The variadic az ...string parameter is declared but never used in the function body. Test case "Reservation repair" at line 366 passes "az-a" expecting it to affect the commitment, but this value is silently discarded.
🐛 Suggested fix
If the az parameter is intended for future use with per-commitment AZ constraints, either:
- Store it in
TestCommitmentand use it inbuildRequestJSON:
type TestCommitment struct {
ResourceName liquid.ResourceName
ProjectID string
ConfirmationID string
State string
Amount uint64
+ AZ string // optional AZ constraint
}
func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64, az ...string) TestCommitment {
- return TestCommitment{
+ tc := TestCommitment{
ResourceName: liquid.ResourceName(resourceName),
ProjectID: projectID,
ConfirmationID: confirmationID,
State: state,
Amount: amount,
}
+ if len(az) > 0 {
+ tc.AZ = az[0]
+ }
+ return tc
}- Or remove the unused parameter to avoid confusion:
-func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64, az ...string) TestCommitment {
+func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64) TestCommitment {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`
around lines 1615 - 1625, The createCommitment function accepts az ...string but
discards it; update it to either persist the AZ into the TestCommitment struct
and propagate it into buildRequestJSON, or remove the az parameter and update
callers (e.g., the "Reservation repair" test) accordingly; specifically, add a
field (e.g., AZ or AvailabilityZone) to TestCommitment, set it in
createCommitment (use az[0] when len(az)>0), and modify buildRequestJSON to
include that field in the generated request, or alternatively delete the az
...string parameter from createCommitment and remove the AZ argument from tests.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (2)
771-790:⚠️ Potential issue | 🟠 MajorCapacity can be double-credited on reservation deletion.
At Line 775 memory is refunded in
OnReservationDeleted(), and memory is refunded again in deletion handling at Line 1048. A single delete flow can therefore inflateavailableResources.Suggested direction
Choose one canonical refund path. For this harness, keeping refunds in
processReservations()and removing them fromOnReservationDeleted()avoids double accounting.func (c *FakeReservationController) OnReservationDeleted(res *v1alpha1.Reservation) { c.env.mu.Lock() defer c.env.mu.Unlock() - // Return memory when Delete() is called directly (before deletion timestamp is set) - if c.env.availableResources != nil && res.Status.Host != "" { - memoryQuantity := res.Spec.Resources["memory"] - memoryBytes := memoryQuantity.Value() - memoryMB := memoryBytes / (1024 * 1024) - - if _, exists := c.env.availableResources[res.Status.Host]; exists { - c.env.availableResources[res.Status.Host] += memoryMB - c.env.T.Logf("↩ Returned %d MB to %s (now %d MB available) via OnReservationDeleted for %s", - memoryMB, res.Status.Host, c.env.availableResources[res.Status.Host], res.Name) - } - } - // Clear tracking so recreated reservations with same name are processed delete(c.env.processedReserv, res.Name) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 771 - 790, The OnReservationDeleted method currently refunds memory to c.env.availableResources, which duplicates the refund already done in processReservations and can double-credit capacity; remove the refund logic inside OnReservationDeleted (the block that reads res.Spec.Resources["memory"], computes memoryMB and adds it to c.env.availableResources) so only processReservations performs refunds, and keep the final delete(c.env.processedReserv, res.Name) behavior intact; alternatively, if you prefer to keep OnReservationDeleted, add a clear guard (e.g., a deletion-handled flag on the reservation or env) that processReservations sets so OnReservationDeleted skips refunding when processReservations already refunded.
1615-1625:⚠️ Potential issue | 🟡 Minor
az ...stringis accepted but ignored increateCommitment().Line 1617 advertises optional AZ handling, but the parameter is never persisted or serialized. This is misleading, especially with callers like Line 366 passing AZ explicitly.
Suggested cleanup
If per-commitment AZ is not supported in request JSON, remove the parameter and update callers:
-func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64, az ...string) TestCommitment { +func createCommitment(resourceName, projectID, confirmationID, state string, amount uint64) TestCommitment { return TestCommitment{ ResourceName: liquid.ResourceName(resourceName), ProjectID: projectID, ConfirmationID: confirmationID, State: state, Amount: amount, } }And update Line 366 call accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 1615 - 1625, The createCommitment function signature advertises an optional az ...string but never stores it; remove the unused az parameter instead of leaving misleading API surface: update the createCommitment signature to drop az ...string, remove the az mention from its doc comment, and update all call sites (tests that call createCommitment with an AZ argument) to stop passing AZ; ensure the TestCommitment construction remains unchanged and run tests to confirm no remaining callers reference the removed parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 771-790: The OnReservationDeleted method currently refunds memory
to c.env.availableResources, which duplicates the refund already done in
processReservations and can double-credit capacity; remove the refund logic
inside OnReservationDeleted (the block that reads res.Spec.Resources["memory"],
computes memoryMB and adds it to c.env.availableResources) so only
processReservations performs refunds, and keep the final
delete(c.env.processedReserv, res.Name) behavior intact; alternatively, if you
prefer to keep OnReservationDeleted, add a clear guard (e.g., a deletion-handled
flag on the reservation or env) that processReservations sets so
OnReservationDeleted skips refunding when processReservations already refunded.
- Around line 1615-1625: The createCommitment function signature advertises an
optional az ...string but never stores it; remove the unused az parameter
instead of leaving misleading API surface: update the createCommitment signature
to drop az ...string, remove the az mention from its doc comment, and update all
call sites (tests that call createCommitment with an AZ argument) to stop
passing AZ; ensure the TestCommitment construction remains unchanged and run
tests to confirm no remaining callers reference the removed parameter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 57044766-a4ce-46b0-a6cc-765fa70d26a1
📒 Files selected for processing (2)
internal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_change_commitments_test.go
Test Coverage ReportTest Coverage 📊: 69.6% |
Uh oh!
There was an error while loading. Please reload this page.