ci: run helm tests with ctlptl registry#211
Conversation
2b46fa1 to
2599915
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/operator-ci.yaml:
- Around line 121-126: Add SHA256 verification to the "Install kind" step:
download the kind binary (currently saved to /tmp/kind), also download the
corresponding kind-linux-amd64.sha256sum for v0.31.0, run sha256sum -c against
the downloaded checksum file and only then run sudo install /tmp/kind
/usr/local/bin/kind; update the workflow's "Install kind" run block to perform
these three actions in sequence. For "Install ctlptl" note that upstream does
not provide checksums so either document that limitation or replace with an
out-of-band pinned verification mechanism if you want parity with kind.
In `@docs/development.md`:
- Around line 11-15: Update the documented ctlptl install command to pin the
same version CI uses instead of `@latest`; replace the current example line `go
install github.com/tilt-dev/ctlptl/cmd/ctlptl@latest` with the explicit version
used in CI (v0.9.3) so the docs read `go install
github.com/tilt-dev/ctlptl/cmd/ctlptl@v0.9.3`, and add a short note to keep this
value in sync with the CI configuration when that CI version is bumped.
In `@k8s-tests/chainsaw/helm/helm-chart-test/assert-scheduled.yaml`:
- Line 52: The assertion currently hard-codes the allowed image values (the
matcher comparing image to 'ghcr.io/nvidia/skyhook/operator:latest' or
'localhost:5005/skyhook-operator:testing'), which rejects valid overrides;
update the matcher to accept the actual override by comparing the pod/container
image against the LOCAL_OPERATOR_IMG value (or allow any tag under the localhost
repo) instead of a single hard-coded testing tag — e.g., change the condition to
check image == LOCAL_OPERATOR_IMG or to match a prefix like image startsWith
'localhost:5005/skyhook-operator:' so overrides such as
'localhost:5005/skyhook-operator:dev' pass; adjust references to the image
variable used in assert-scheduled.yaml accordingly.
In `@k8s-tests/chainsaw/helm/install-helm-chart.sh`:
- Around line 33-50: The script extracts LOCAL_OPERATOR_IMG_DIGEST but then
discards it when building LOCAL_OPERATOR_IMG_ARGS; change the third --set entry
in LOCAL_OPERATOR_IMG_ARGS to pass the parsed digest instead of an empty string
(use --set
"controllerManager.manager.image.digest=${LOCAL_OPERATOR_IMG_DIGEST}"), and
ensure the parsing logic around LOCAL_OPERATOR_IMG_WITHOUT_DIGEST /
LOCAL_OPERATOR_IMG_TAG /LOCAL_OPERATOR_IMG_LAST_PART remains intact so
repo:tag@digest and repo@digest forms resolve correctly into
LOCAL_OPERATOR_IMG_DIGEST and LOCAL_OPERATOR_IMG_REPOSITORY/TAG before building
LOCAL_OPERATOR_IMG_ARGS.
In `@operator/config/local-dev/kind-config.yaml`:
- Around line 17-19: The top-of-file comment claiming "# Retained for review
only." is incorrect because this KinD config is still used as the default for
non-helm test suites (see operator-ci.yaml); update the comment in
local-dev/kind-config.yaml to accurately reflect current usage (e.g., "Default
KinD config used by non-helm test suites / CI") or remove the misleading phrase
so future contributors won't assume it is unused; ensure the updated comment
clearly states it is the three-node (two-worker) KinD config used by the CI
non-helm suites.
In `@operator/Makefile`:
- Around line 292-294: The create-kind-cluster target is not idempotent because
setup-kind-cluster uses `kubectl create secret` for `node-init-secret` which
fails on re-runs; update setup-kind-cluster to make secret creation idempotent
(e.g., replace `kubectl create secret` with an apply pattern such as `kubectl
create secret ... --dry-run=client -o yaml | kubectl apply -f -` or check for
the secret and patch/update it), ensuring any references to `node-init-secret`
are handled safely so create-kind-cluster can be re-run without triggering
AlreadyExists errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 7755fb93-131f-48a4-a31c-de9430d251da
📒 Files selected for processing (8)
.github/workflows/operator-ci.yamldocs/development.mdk8s-tests/chainsaw/helm/helm-chart-test/assert-no-schedule.yamlk8s-tests/chainsaw/helm/helm-chart-test/assert-scheduled.yamlk8s-tests/chainsaw/helm/install-helm-chart.shoperator/Makefileoperator/config/local-dev/ctlptl-config.yamloperator/config/local-dev/kind-config.yaml
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
2599915 to
9fa25ef
Compare
lockwobr
left a comment
There was a problem hiding this comment.
Review
Thanks for picking this up — overall direction is right and the script-level handling of LOCAL_OPERATOR_IMG (digest/tag parsing) is clean. A few changes I'd like to see before merge.
Acceptance criteria check (against #209)
| # | Criterion | Status |
|---|---|---|
| 1 | make create-kind-cluster brings up kind + registry at localhost:5005 |
✅ |
| 2 | make helm-tests installs locally-built image with no chart edits / no manual --set |
✅ |
| 3 | helm-tests matrix entry runs in operator-ci.yaml |
✅ (first time helm-tests is in GH CI — nice) |
| 4 | Webhook iteration loop (build → push → rollout restart) | |
| 5 | Teardown leaves no kind cluster or registry | make target — only raw ctlptl delete is documented |
| 6 | Local-dev docs updated | ✅ |
| — | "Single source of truth for the local cluster shape" | ❌ — fine to defer; just need an accurate comment (see #3) |
Required changes
1. Hide docker behind $(DOCKER_CMD) and wrap the iteration loop in make targets. DOCKER_CMD ?= podman (Makefile:54) is the project default, but docs/development.md documents raw docker tag / docker push, which silently breaks for podman users. The fix is to wrap the tag/push in a make target — that also removes the duplication with the helm-tests recipe and stops hardcoding ghcr.io/nvidia/skyhook/operator:testing (which becomes wrong if anyone overrides IMG_REPO):
.PHONY: push-local-image
push-local-image: docker-build ## tag and push the operator image to the local ctlptl registry
$(DOCKER_CMD) tag $(IMG):testing $(LOCAL_OPERATOR_IMG)
$(DOCKER_CMD) push $(LOCAL_OPERATOR_IMG)
.PHONY: rollout-local
rollout-local: push-local-image ## push locally-built operator and restart the deployment
$(KUBECTL) -n $(SKYHOOK_NAMESPACE) rollout restart deploy/skyhook-operator
.PHONY: delete-kind-cluster
delete-kind-cluster: ctlptl ## delete the local ctlptl kind cluster and registry
$(CTLPTL) delete -f config/local-dev/ctlptl-config.yamlhelm-tests then collapses to:
helm-tests: helm chainsaw ctlptl ensure-test-symlinks push-local-image
$(MAKE) run
$(MAKE) uninstall ignore-not-found=true
$(MAKE) kill
LOCAL_OPERATOR_IMG=$(LOCAL_OPERATOR_IMG) $(CHAINSAW) test --test-dir ../k8s-tests/chainsaw/helm $(CHAINSAW_ARGS)docs/development.md becomes:
cd operator
make rollout-local # build + tag + push + restart
make delete-kind-cluster # teardown2. Move ctlptl install into deps.mk. Every other tool we depend on is pinned and installed there into $(LOCALBIN); ctlptl should follow the same pattern instead of an inline curl in the workflow. This also drops the "v0.9.3 does not publish checksum assets" comment as a non-issue, since the pin lives next to every other tool's pin.
## deps.mk
CTLPTL_VERSION ?= v0.9.3
install-deps: ... ctlptl # add to fan-out
CTLPTL ?= $(LOCALBIN)/ctlptl
# ctlptl release naming: darwin -> "mac", amd64 -> "x86_64"
CTLPTL_OS = $(if $(filter darwin,$(OS)),mac,$(OS))
CTLPTL_ARCH = $(if $(filter amd64,$(ARCH)),x86_64,$(ARCH))
CTLPTL_VERSION_NO_V = $(patsubst v%,%,$(CTLPTL_VERSION))
.PHONY: ctlptl
ctlptl: $(LOCALBIN) ## Download ctlptl binary if necessary.
test -s $(LOCALBIN)/ctlptl || curl -sSfL \
https://github.com/tilt-dev/ctlptl/releases/download/$(CTLPTL_VERSION)/ctlptl.$(CTLPTL_VERSION_NO_V).$(CTLPTL_OS).$(CTLPTL_ARCH).tar.gz | \
tar --no-same-owner -zxv -C $(LOCALBIN) ctlptlcreate-kind-cluster and helm-tests add ctlptl as a prerequisite and call $(CTLPTL). The CI workflow drops the Install ctlptl step entirely (kind's separate ad-hoc install can stay — kind isn't a new tool here).
3. Fix the framing on kind-config.yaml. The PR description says it's "retained for review only" and the new comment in the file says "retained for comparing the ctlptl local development cluster shape." Both are wrong — the file is still actively consumed by:
.github/workflows/operator-ci.yaml:116— default fore2eandcli-e2e(config: ${{ matrix.kind-config || '...kind-config.yaml' }}).github/workflows/agent-ci.yaml:299
Issue #209's "single source of truth" goal isn't met as a result, but converging the rest of the matrix onto ctlptl is a bigger surface (touches agent-ci.yaml, e2e, cli-e2e, plus templating for the multi-k8s-version coverage) — fine to leave for a follow-up. Just update the comment to say so honestly. Suggested replacement for the YAML comment:
# kind cluster config used by the e2e, cli-e2e (operator-ci.yaml) and agent-ci.yaml
# matrix entries via helm/kind-action. The local-dev path and the helm-tests CI
# entry use ctlptl-config.yaml in this directory instead. Converging the
# helm/kind-action paths onto ctlptl is tracked as a follow-up to #209.And drop the "retained for review only" line from the PR description.
Should-do
4. Webhook iteration validation. Issue criterion: "a code change in webhook_controller.go is observable in the running operator within a single rebuild cycle." Validation section only mentions make helm-tests. With make rollout-local in place, a one-line "tested webhook iteration: edited webhook_controller.go, ran make rollout-local, observed restart in N seconds" closes this.
Nice-to-have / follow-up (don't need to land here)
5. Unrelated change in setup-kind-cluster. The PR makes the node-init-secret creation idempotent (--dry-run=client | apply). Good change but outside #209's scope and unmentioned in the PR description. Either call it out in the body or split into its own PR.
6. LOCAL_REGISTRY parameterization. LOCAL_REGISTRY ?= localhost:5005 is overridable in the Makefile but the chainsaw assertions hardcode localhost:5005/skyhook-operator — overriding breaks the assert. Probably easiest to drop the ?= and make it a constant.
Spot checks that pass
- Chart key paths (
controllerManager.manager.image.repository/tag/digest) matchchart/values.yaml:85-88. ✓ - Bash
${VAR##*/}/${VAR%:*}repository/tag parsing ininstall-helm-chart.shcorrectly handles digest, tag, and digest-only forms. ✓ - CI condition
matrix.test-suite != 'unit-tests' && matrix.test-suite != 'helm-tests'correctly skipshelm/kind-actionfor the new entry. ✓ - Other helm tests (
helm-node-affinity-test,helm-scale-test) use the sameinstall-helm-chart.sh, so they inherit the override automatically; their assertions don't pin the image, so no other assert files needed updating. ✓
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/development.md`:
- Line 30: Update the documentation text so the config path is shown relative to
the documented working directory (after cd operator): replace the path string
"operator/config/local-dev/ctlptl-config.yaml" with
"config/local-dev/ctlptl-config.yaml" in the sentence that describes `make
create-kind-cluster` so readers know the correct relative path to apply.
In `@operator/Makefile`:
- Around line 245-246: The rollout-local Makefile target restarts the operator
but doesn't wait for the rollout to complete, causing races; update the
rollout-local target (the rule labeled "rollout-local") to invoke $(KUBECTL)
rollout status on the same deployment selector
(app.kubernetes.io/name=skyhook-operator,control-plane=controller-manager) in
namespace $(SKYHOOK_NAMESPACE) after the existing rollout restart command, using
--timeout (e.g., 60s) so the target blocks until the deployment is ready.
- Around line 306-307: The Makefile currently unconditionally runs the kubectl
secret creation which fails if $(DOCKER_AUTH_FILE) is missing; update the
setup-kind-cluster (or the Makefile recipe containing the shown command) to
first check for the existence of $(DOCKER_AUTH_FILE) and only run the $(KUBECTL)
create secret generic node-init-secret ... pipeline when the file exists,
otherwise skip the step (optionally echo a message indicating the docker auth
file was not found) so fresh local setups and CI without registry auth do not
hard-fail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 771513ea-ccc9-4f50-81ae-bd61f810c934
📒 Files selected for processing (2)
docs/development.mdoperator/Makefile
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
|
Updated the PR description and pushed a small follow-up for the latest review notes. CodeRabbit is green on the current head; please take another look when you have time. |
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
|
Pushed a follow-up for the webhook CI failure and updated the PR notes. Local |
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Summary
localhost:5005registry.make create-kind-cluster,make helm-tests, and local rollout targets to use the local registry path.helm-teststo the operator CI matrix using the ctlptl-backed cluster.Fixes #209
Notes
operator/config/local-dev/kind-config.yamlis still used by non-Helm CI paths; the local-dev path and thehelm-testsCI entry useoperator/config/local-dev/ctlptl-config.yaml.ctlptlis pinned and installed throughoperator/deps.mk.setup-kind-clusteris idempotent and skipsnode-init-secretcreation when the local Docker auth file is missing.rollout-localbuilds, tags, pushes, restarts the Helm-managed operator deployment, and waits for rollout completion.Validation
bash -n k8s-tests/chainsaw/helm/install-helm-chart.shgit diff --checkmake unit-testsfromoperator/make create-kind-cluster DOCKER_CMD=dockerfromoperator/make setup-kind-cluster DOCKER_CMD=dockermake setup-kind-cluster DOCKER_CMD=docker DOCKER_AUTH_FILE=/tmp/nodewright-missing-docker-auth.jsonkubectl -n skyhook rollout status deployment -l app.kubernetes.io/name=skyhook-operator,control-plane=controller-manager --timeout=60smake helm-tests DOCKER_CMD=dockerfromoperator/operator/internal/controller/webhook_controller.go, ranmake rollout-local DOCKER_CMD=docker, saw the operator deployment restart and complete, and found the marker in the restarted manager pod logs. The temporary marker was removed, thenmake rollout-local DOCKER_CMD=dockerwas run again.