Implement placement shim e2e tests#715
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 36 minutes and 26 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
📝 WalkthroughWalkthroughThe pull request implements end-to-end tests for the placement shim by replacing placeholder test stubs with full HTTP-based integration tests. Changes include updating the CLI flag format in the entrypoint, adding e2e configuration to Helm values, implementing 12 test functions that exercise placement API endpoints, and enhancing logging with structured formats. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 20
🧹 Nitpick comments (2)
cmd/shim/main.go (1)
115-121: Consider usingsetupLogfor consistent logging.This block uses the standard library
log.Fatalf(line 118), butctrl.SetLoggerhas already been called (line 113), sosetupLogis available. UsingsetupLog.Error(err, ...)followed byos.Exit(1)would be consistent with the error handling pattern used throughout the rest of this file.♻️ Suggested change for consistent logging
// Custom entrypoint for placement shim e2e tests. if runPlacementShimE2E { if err := placement.RunE2E(ctx); err != nil { - log.Fatalf("E2E tests failed: %v", err) + setupLog.Error(err, "E2E tests failed") + os.Exit(1) } os.Exit(0) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/shim/main.go` around lines 115 - 121, Replace the use of the standard library logger in the runPlacementShimE2E branch with the controller-runtime logger: call placement.RunE2E(ctx) as before but on error use setupLog.Error(err, "E2E tests failed for placement shim") and then call os.Exit(1) (instead of log.Fatalf). Update the block that checks runPlacementShimE2E to reference setupLog and placement.RunE2E to keep logging consistent with ctrl.SetLogger usage.internal/shim/placement/handle_root_e2e.go (1)
32-36: Consider adding a timeout to prevent indefinite hangs.
http.DefaultClienthas no timeout configured. If the placement shim is unresponsive, this test could hang indefinitely. Other e2e tests in this PR usemakeE2EServiceClientwhich provides a client with configured transport timeouts.♻️ Option 1: Use makeE2EServiceClient for consistency
func e2eTestGetRoot(ctx context.Context) error { log := logf.FromContext(ctx) log.Info("Running root endpoint e2e test") config, err := conf.GetConfig[e2eRootConfig]() if err != nil { log.Error(err, "failed to get e2e config") return err } + sc, err := makeE2EServiceClient(ctx, config) + if err != nil { + log.Error(err, "failed to create placement service client") + return err + } req, err := http.NewRequestWithContext(ctx, - http.MethodGet, config.E2E.SVCURL+"/", http.NoBody) + http.MethodGet, sc.Endpoint+"/", http.NoBody) if err != nil { log.Error(err, "failed to create request for placement shim") return err } - resp, err := http.DefaultClient.Do(req) + resp, err := sc.HTTPClient.Do(req)♻️ Option 2: Add explicit timeout to DefaultClient usage
+ client := &http.Client{Timeout: 30 * time.Second} req, err := http.NewRequestWithContext(ctx, http.MethodGet, config.E2E.SVCURL+"/", http.NoBody) if err != nil { log.Error(err, "failed to create request for placement shim") return err } - resp, err := http.DefaultClient.Do(req) + resp, err := client.Do(req)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/shim/placement/handle_root_e2e.go` around lines 32 - 36, The call uses http.DefaultClient.Do(req) with no timeout and can hang; replace the DefaultClient usage with a client that has timeouts (either by calling makeE2EServiceClient(...) as used elsewhere in tests or by constructing an explicit http.Client with a Timeout, then call client.Do(req) instead of http.DefaultClient.Do(req)). Update the code around resp, err := http.DefaultClient.Do(req) to use the new client (preserve existing error logging via log.Error) and ensure any required context/parameters used by makeE2EServiceClient are passed through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/shim/placement/handle_allocation_candidates_e2e.go`:
- Around line 93-95: The slice expression list.Traits[:5] can panic when
list.Traits has fewer than 5 items; change both occurrences (in
handle_allocation_candidates_e2e.go and handle_traits_e2e.go) to compute a safe
bound like n := 5; if len(list.Traits) < n { n = len(list.Traits) } and then use
list.Traits[:n] (or equivalent min(5, len(list.Traits))) wherever
list.Traits[:5] is used (look for the log call around "Successfully created
custom resource class" and the similar place in handle_traits_e2e.go) so slicing
is always within bounds.
- Around line 69-72: The loop in e2eTestAllocationCandidates currently uses
defer resp.Body.Close() which leaks response bodies until the function returns;
replace the defer with an immediate resp.Body.Close() call after you finish
using resp (for example, after the log.Info statement that references
cleanup.url and resp.StatusCode) or use io.Copy(io.Discard, resp.Body) then
resp.Body.Close() to ensure the body is fully read and closed within each
iteration; update the code around the resp variable and remove the defer to
prevent resource leaks.
In `@internal/shim/placement/handle_allocations_e2e.go`:
- Around line 76-83: The loop in handle_allocations_e2e.go calls
sc.HTTPClient.Do(req) and uses defer resp.Body.Close() (variable resp) inside
the loop which defers closing until function exit and leaks connections; change
this to close the response body immediately each iteration by reading/draining
the body (or simply calling resp.Body.Close()) after checking resp and logging
(ensure any error from reading is handled), i.e., remove the defer inside the
loop and call resp.Body.Close() (or io.ReadAll then Close) before the next
iteration so connections are released promptly.
- Around line 415-427: The defer resp.Body.Close() inside the consumer deletion
loop causes response bodies to remain open until the function exits; change it
to close the body immediately after the HTTP call is handled (replace defer with
an explicit resp.Body.Close() call in each control path). In practice, after
sc.HTTPClient.Do(req) returns and after you check resp and construct any error
(the status code branch and the success branch), call resp.Body.Close() before
returning or continuing the loop so every response body is closed promptly;
update the code around sc.HTTPClient.Do(req), resp, and the consumer deletion
loop accordingly.
In `@internal/shim/placement/handle_reshaper_e2e.go`:
- Around line 77-84: The loop currently defers resp.Body.Close() after calling
sc.HTTPClient.Do(req), which accumulates deferred calls and leaks bodies until
the function returns; update the pre-cleanup loop in handle_reshaper_e2e.go to
close the response body immediately after use instead of deferring it—either
call resp.Body.Close() directly after logging the status or wrap the
request/response handling in a short-lived helper function so a defer can run at
the end of that scope; target the sc.HTTPClient.Do(req) call and replace the
defer resp.Body.Close() with an immediate close (or scoped defer) to ensure each
resp.Body is released before the next iteration.
- Around line 165-177: The defer resp.Body.Close() inside the loop that calls
sc.HTTPClient.Do causes response bodies to remain open until function exit;
replace the defer with an explicit close after handling the response:
immediately read and discard the body (e.g., io.Copy(io.Discard, resp.Body)) and
then call resp.Body.Close() before returning or continuing the loop; update the
code paths around the status-check (where you log using rp.uuid and return err)
to close the body before every return, ensuring no response body leaks from
sc.HTTPClient.Do or on error paths.
- Around line 553-565: The defer resp.Body.Close() inside the cleanup loop
causes response bodies to remain open until function exit; replace the defer
with explicit resp.Body.Close() calls so each response body is closed
immediately after handling the response. Specifically, after calling
sc.HTTPClient.Do(req) and after checking resp.StatusCode, call resp.Body.Close()
before returning on error and after logging success (use resp.Body.Close() in
the error branch and the success branch), ensuring every code path in the loop
closes the body; reference sc.HTTPClient.Do, resp.Body.Close, and cleanup.desc
to locate and update the loop.
In `@internal/shim/placement/handle_resource_provider_aggregates_e2e.go`:
- Around line 225-259: After the PUT that clears aggregates, perform a follow-up
GET to /resource_providers/{uuid}/aggregates for the same testRPUUID (using
sc.HTTPClient and ctx) and unmarshal the response body, then assert that the
returned "aggregates" slice is empty (and that the returned
resource_provider_generation matches/has advanced as expected); if not, log and
return an error. In other words, add a postcondition check after the PUT block
to re-fetch aggregates, parse the response, and fail the test when the
aggregates are not actually empty.
- Around line 64-69: The pre-cleanup currently treats HTTP 409 (StatusConflict)
as acceptable; change the conditional in the pre-cleanup check (the block
inspecting resp.StatusCode) to not accept http.StatusConflict—only treat 404 or
2xx as OK. Update the error message/logging in the same block (where err :=
fmt.Errorf(...) and log.Error(...)) to run when resp.StatusCode ==
http.StatusConflict so the function returns an error on 409, ensuring the
subsequent POST will surface the conflict and the test remains idempotent.
- Around line 72-105: After successfully creating the test resource provider
(after the POST that logs "Successfully created test resource provider for
aggregates test"), add a deferred cleanup that always runs on exit to delete the
created provider using the fixed testRPUUID; perform an HTTP DELETE to
sc.Endpoint+"/resource_providers/"+testRPUUID (using sc.HTTPClient and proper
headers like X-Auth-Token and OpenStack-API-Version) and log any error in the
defer so failed assertions do not leave the fixed UUID behind; ensure the defer
is placed immediately after confirming resp.StatusCode is 2xx so it only runs
when creation succeeded.
- Around line 217-223: The current check only verifies len(aggResp.Aggregates)
== 2; instead replace it with a presence check that both expected aggregate
UUIDs are in aggResp.Aggregates using slices.Contains: build or reference an
expected slice (e.g., expectedAggregates containing the two expected UUID
strings), loop or individually assert slices.Contains(aggResp.Aggregates,
expectedID) for each expected ID, and return an error (logged via log.Error with
"uuid", testRPUUID) if any expected ID is missing, including which IDs were
absent in the error message; keep the successful log.Info to print the verified
aggregates.
In `@internal/shim/placement/handle_resource_provider_allocations_e2e.go`:
- Around line 186-199: The loop uses defer rpResp.Body.Close() after
sc.HTTPClient.Do(rpReq), causing response bodies to remain open until function
exit; replace the defer with an explicit rpResp.Body.Close() called after you
finish reading/validating the response (after the status-code check and any
processing) so each rpResp is closed before the next iteration; locate the HTTP
request block that calls sc.HTTPClient.Do(rpReq) and remove the defer
rpResp.Body.Close(), inserting rpResp.Body.Close() at the end of that loop
iteration (after logging and any response handling for rp.UUID).
In `@internal/shim/placement/handle_resource_provider_inventories_e2e.go`:
- Around line 72-79: The defer resp.Body.Close() inside the pre-cleanup loop in
the function invoking sc.HTTPClient.Do(req) causes response bodies to remain
open until function exit and can leak resources; replace the defer with an
explicit resp.Body.Close() (or io.Copy(io.Discard, resp.Body) then
resp.Body.Close()) immediately after you finish using resp (after logging the
status) within each loop iteration so each response body is closed before the
next request; locate the call to sc.HTTPClient.Do(req), the local resp variable
and the cleanup.url logging to apply this change.
In `@internal/shim/placement/handle_resource_provider_traits_e2e.go`:
- Around line 154-165: The decode of the initial traits response currently only
logs the result; add an explicit assertion that traitsResp.Traits is empty for
the fresh provider and fail the test if not: after
json.NewDecoder(...).Decode(&traitsResp) and before the log.Info call, check
len(traitsResp.Traits) == 0 and return an error (or use the test helper)
including testRPUUID and the unexpected traits; this ensures the GET in
handle_resource_provider_traits_e2e.go enforces the initial-empty requirement
rather than merely logging it.
- Around line 75-130: After successfully creating the custom trait and the test
resource provider, register cleanup immediately so failures later do not leave
test data behind: after the "Successfully created custom trait" log (right after
the PUT /traits success for testTrait) add a cleanup registration that deletes
the trait (using sc.Endpoint + "/traits/"+testTrait and sc.HTTPClient) and
likewise right after the "Successfully created test resource provider for RP
traits test" log (after POST /resource_providers success for
testRPUUID/testRPName) add a cleanup registration that deletes the resource
provider (sc.Endpoint + "/resource_providers/"+testRPUUID and sc.HTTPClient);
ensure the cleanup runs unconditionally (use defer or the test framework's
cleanup helper) so any subsequent return err still triggers deletion.
- Around line 58-71: The pre-cleanup loop around http.NewRequestWithContext /
sc.HTTPClient.Do is treating any non-transport response as success; update the
block after receiving resp to validate resp.StatusCode against an explicit
allowed set (e.g., 200, 204, 404 or whatever statuses are acceptable for your
DELETE cleanup) and return an error for any other codes. Read and include the
response body (or status text) in the error, close resp.Body, and replace the
unconditional log.Info("Pre-cleanup request completed", ...) with a success-only
log when the status is in the allowed set; use the existing identifiers (req,
resp, cleanup.url, cleanup.method, sc.HTTPClient) and return the error so the
caller fails fast on 401/403/5xx etc.
In `@internal/shim/placement/handle_resource_provider_usages_e2e.go`:
- Around line 186-199: Inside the loop in handle_resource_provider_usages_e2e
(where rpResp, err := sc.HTTPClient.Do(rpReq) is called), remove the defer
rpResp.Body.Close() and instead close rpResp.Body explicitly once you're done
with the response (e.g., after checking StatusCode and after any
reading/processing) to avoid holding resources across iterations; ensure you
still call rpResp.Body.Close() in both success and error paths (after creating
fmt.Errorf for bad status) so the Response body is always closed.
In `@internal/shim/placement/handle_traits_e2e.go`:
- Line 95: The loop currently slices list.Traits with [:5] which will panic if
len(list.Traits) < 5; change to compute a safe upper bound (e.g., n :=
len(list.Traits); if n > 5 { n = 5 }) and iterate over list.Traits[:n] so the
loop in handle_traits_e2e.go safely handles lists with fewer than five traits;
update any test comments accordingly.
- Around line 107-122: The code defers traitResp.Body.Close() inside the loop
that calls sc.HTTPClient.Do(traitReq), which accumulates open bodies until the
function returns; remove the defer and instead close the response body
explicitly at the end of each loop iteration (e.g., call traitResp.Body.Close()
after checking/reading the response and before continuing to the next trait).
Update the logic around traitResp (the result of sc.HTTPClient.Do) so the body
is closed in all paths (success and error-returning status checks) to avoid
leaking connections.
In `@internal/shim/placement/handle_usages_e2e.go`:
- Line 99: In e2eTestUsages, remove the defer projResp.Body.Close() inside the
loop and instead close projResp.Body immediately after you finish
reading/processing the response for that iteration (e.g., after reading the body
or unmarshalling), ensuring you still handle nil projResp and any read errors;
this prevents holding connections open across iterations and keeps the loop from
leaking HTTP bodies.
---
Nitpick comments:
In `@cmd/shim/main.go`:
- Around line 115-121: Replace the use of the standard library logger in the
runPlacementShimE2E branch with the controller-runtime logger: call
placement.RunE2E(ctx) as before but on error use setupLog.Error(err, "E2E tests
failed for placement shim") and then call os.Exit(1) (instead of log.Fatalf).
Update the block that checks runPlacementShimE2E to reference setupLog and
placement.RunE2E to keep logging consistent with ctrl.SetLogger usage.
In `@internal/shim/placement/handle_root_e2e.go`:
- Around line 32-36: The call uses http.DefaultClient.Do(req) with no timeout
and can hang; replace the DefaultClient usage with a client that has timeouts
(either by calling makeE2EServiceClient(...) as used elsewhere in tests or by
constructing an explicit http.Client with a Timeout, then call client.Do(req)
instead of http.DefaultClient.Do(req)). Update the code around resp, err :=
http.DefaultClient.Do(req) to use the new client (preserve existing error
logging via log.Error) and ensure any required context/parameters used by
makeE2EServiceClient are passed through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c258e977-6f18-4b5a-b1a7-832d282b13a4
📒 Files selected for processing (18)
Tiltfilecmd/shim/main.gohelm/bundles/cortex-placement-shim/values.yamlinternal/shim/placement/handle_allocation_candidates_e2e.gointernal/shim/placement/handle_allocations_e2e.gointernal/shim/placement/handle_reshaper_e2e.gointernal/shim/placement/handle_resource_classes_e2e.gointernal/shim/placement/handle_resource_provider_aggregates_e2e.gointernal/shim/placement/handle_resource_provider_allocations_e2e.gointernal/shim/placement/handle_resource_provider_inventories_e2e.gointernal/shim/placement/handle_resource_provider_traits_e2e.gointernal/shim/placement/handle_resource_provider_usages_e2e.gointernal/shim/placement/handle_resource_providers_e2e.gointernal/shim/placement/handle_root_e2e.gointernal/shim/placement/handle_traits_e2e.gointernal/shim/placement/handle_usages_e2e.gointernal/shim/placement/shim.gointernal/shim/placement/shim_e2e.go
- Fix defer resp.Body.Close() inside loops (resource leak) in all e2e files - Fix list.Traits[:5] slice bounds panic when fewer than 5 traits exist - Aggregates: verify actual UUIDs not just count, verify empty after clear, add deferred cleanup, remove 409 acceptance in pre-cleanup - RP traits: validate pre-cleanup status codes, assert initial traits empty, add deferred cleanup - Root test: use makeE2EServiceClient instead of http.DefaultClient - cmd/shim/main.go: use setupLog instead of log.Fatalf for consistency
Test Coverage ReportTest Coverage 📊: 68.8% |
Implements end-to-end tests for all placement shim API endpoints. Each test
runs against a live placement instance, creates isolated test resources with
deterministic UUIDs, exercises the full CRUD lifecycle, and cleans up after
itself.
Design:
Idempotent: every test does a pre-cleanup pass (ignoring 404s) before
creating fixtures, so re-runs never fail due to leftover state.
Isolated: each test uses unique deterministic UUIDs (e2e10000-...-0001
through ...000a) and CUSTOM_CORTEX_E2E_* resource names to avoid
collisions with real data or other tests.
Infrastructure: shim_e2e.go provides shared config (e2eRootConfig),
OpenStack client setup (makeE2EServiceClient), and the test runner
(RunE2E).