Skip to content

chore: Cherry pick again#254

Merged
lockwobr merged 3 commits into
release/v0.16.xfrom
cherrypickagain
May 22, 2026
Merged

chore: Cherry pick again#254
lockwobr merged 3 commits into
release/v0.16.xfrom
cherrypickagain

Conversation

@lockwobr
Copy link
Copy Markdown
Collaborator

Description

git cherry-pick 2dfec98 6d6f126 bdfa3e6

ayuskauskas and others added 3 commits May 22, 2026 15:05
…fter-uninstall (#241)

The test left mypkg@1.2.3 freshly installed at chainsaw's automatic
CLEANUP, where the new delete-time uninstall finalizer (from #200) must
run uninstall pods on every node before releasing the CR. That exceeds
chainsaw's default cleanup window, causing "context deadline exceeded".

Add a final uninstall-v1 step that flips uninstall.apply=true on the
downgraded version and waits for nodeState to empty, so the CR has no
installed packages when cleanup deletes it.

Signed-off-by: Alex Yuskauskas <ayuskauskas@nvidia.com>
* fix(test): harden core e2e pool against finalizer-cleanup races and improve diagnosability

Two changes across the six core-pool chainsaw tests:

1. Add `timeouts.cleanup: 120s` to every core-pool test. Chainsaw's
   default cleanup window (~30s) is too tight for the project's CR
   finalizer, which must uncordon nodes, remove labels/annotations,
   and GC package pods before releasing. Same failure mode that hit
   downgrade-after-uninstall (PR #241).

2. Add `catch:` blocks to `depends-on` and `simple-skyhook`, mirroring
   the pattern used by the other four core tests. When these tests
   flake, CI now captures the Node, Skyhook CR, and package Pods for
   diagnosis instead of failing without artifacts.

No assert logic changes — this is a foundation pass to remove a known
flake class and make future flakes diagnosable.

Signed-off-by: Alex Yuskauskas <ayuskauskas@nvidia.com>

* fix(operator): close StageInterrupt trap when configUpdates signal decays

A package whose only interrupt is a `configInterrupts` entry (no top-level
`interrupt` block) could get stuck at `StageInterrupt/StateSkipped`
permanently when `Status.ConfigUpdates` cleared or never persisted (e.g.
due to a 409 on the spec patch). The state machine queried the dynamic
`HasInterrupt(config)` signal at four points, and once that signal
decayed to false, the package was untouchable: `ProgressSkipped`
wouldn't promote it, `NextStage` wouldn't advance it past Interrupt,
and `GetComplete`/`IsPackageComplete` wouldn't count it complete.

Decouple progression-past-Interrupt from the dynamic signal:

- `NextStage`: add `StageInterrupt → StagePostInterrupt` to the
  no-interrupt default map. The with-interrupt full-replacement map is
  preserved as-is — PR #200 deliberately omits `StageUninstall →
  StageApply` from it so with-interrupt uninstalls route via
  `StageUninstallInterrupt`; collapsing the maps would silently
  re-enable that transition.
- `ProgressSkipped`: drop the `HasInterrupt` gate. The only writer of
  `StateSkipped` at `StageInterrupt` is `ProcessInterrupt`'s
  budget-contention branch, which already decided the package needed
  an interrupt to schedule. Stage alone is sufficient.
- `GetComplete`, `IsPackageComplete`: treat `StagePostInterrupt` as
  unconditionally terminal. The only way to reach it is via the
  interrupt cycle; gating the terminal check on a signal that can
  decay is redundant with the entry gate and only becomes load-bearing
  when the trap fires.

Reproducer in `skyhook_types_test.go` exercises all three sites.

Surfaced by the `chainsaw/config-skyhook` "update while running" step,
which patches the spec mid-flight and concurrently triggers the
`HandleMigrations` and `processSkyhooksPerNode` 409 conflict paths —
those stretch convergence enough that ConfigUpdates can be lost
between the package reaching StageInterrupt and ProgressSkipped firing.

Signed-off-by: Alex Yuskauskas <ayuskauskas@nvidia.com>

* refactor(operator): GetComplete delegates to IsPackageComplete

GetComplete and IsPackageComplete encoded the same terminal-state logic
twice. Have GetComplete iterate the spec packages and call
IsPackageComplete for each, and move the explanatory comment to
IsPackageComplete where the predicate lives.

Behavior preserved: the prior implementation iterated node state and
filtered to spec packages by name+version match; the new one iterates
spec packages and looks them up in node state by unique name (name|version).
Same set of packages either way.

Signed-off-by: Alex Yuskauskas <ayuskauskas@nvidia.com>

---------

Signed-off-by: Alex Yuskauskas <ayuskauskas@nvidia.com>
@lockwobr lockwobr self-assigned this May 22, 2026
@lockwobr lockwobr changed the title Cherrypickagain chore: Cherry pick again May 22, 2026
@lockwobr lockwobr enabled auto-merge (squash) May 22, 2026 22:08
@lockwobr lockwobr merged commit 507fa5e into release/v0.16.x May 22, 2026
27 of 31 checks passed
@lockwobr lockwobr deleted the cherrypickagain branch May 22, 2026 22:08
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: dfff033e-760b-4143-8517-1223d475656a

📥 Commits

Reviewing files that changed from the base of the PR and between 6bfd4cc and 60714f1.

⛔ Files ignored due to path filters (1)
  • operator/go.sum is excluded by !**/*.sum
📒 Files selected for processing (184)
  • docs/README.md
  • docs/designs/webhook-bootstrap-lease.md
  • k8s-tests/chainsaw/skyhook/config-skyhook/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/depends-on/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/downgrade-after-uninstall/README.md
  • k8s-tests/chainsaw/skyhook/downgrade-after-uninstall/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/downgrade-after-uninstall/update-trigger-uninstall-v1.yaml
  • k8s-tests/chainsaw/skyhook/simple-skyhook/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/simple-update-skyhook/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/strict-order/chainsaw-test.yaml
  • k8s-tests/chainsaw/skyhook/validate-packages/chainsaw-test.yaml
  • operator/Makefile
  • operator/api/v1alpha1/deployment_policy_webhook_test.go
  • operator/api/v1alpha1/skyhook_types.go
  • operator/api/v1alpha1/skyhook_types_test.go
  • operator/cmd/manager/main.go
  • operator/deps.mk
  • operator/go.mod
  • operator/internal/controller/webhook_controller.go
  • operator/vendor/golang.org/x/net/html/iter.go
  • operator/vendor/golang.org/x/net/html/node.go
  • operator/vendor/golang.org/x/net/html/nodetype_string.go
  • operator/vendor/golang.org/x/net/html/parse.go
  • operator/vendor/golang.org/x/net/html/render.go
  • operator/vendor/golang.org/x/net/html/token.go
  • operator/vendor/golang.org/x/net/http2/README.md
  • operator/vendor/golang.org/x/net/http2/client_conn_pool.go
  • operator/vendor/golang.org/x/net/http2/client_priority_go126.go
  • operator/vendor/golang.org/x/net/http2/client_priority_go127.go
  • operator/vendor/golang.org/x/net/http2/clientconn.go
  • operator/vendor/golang.org/x/net/http2/config.go
  • operator/vendor/golang.org/x/net/http2/frame.go
  • operator/vendor/golang.org/x/net/http2/http2.go
  • operator/vendor/golang.org/x/net/http2/server.go
  • operator/vendor/golang.org/x/net/http2/server_common.go
  • operator/vendor/golang.org/x/net/http2/server_wrap.go
  • operator/vendor/golang.org/x/net/http2/transport.go
  • operator/vendor/golang.org/x/net/http2/transport_common.go
  • operator/vendor/golang.org/x/net/http2/transport_wrap.go
  • operator/vendor/golang.org/x/net/http2/writesched.go
  • operator/vendor/golang.org/x/net/http2/writesched_common.go
  • operator/vendor/golang.org/x/net/http2/writesched_priority_rfc7540.go
  • operator/vendor/golang.org/x/net/http2/writesched_priority_rfc9218.go
  • operator/vendor/golang.org/x/net/http2/writesched_random.go
  • operator/vendor/golang.org/x/net/http2/writesched_roundrobin.go
  • operator/vendor/golang.org/x/net/idna/go118.go
  • operator/vendor/golang.org/x/net/idna/idna.go
  • operator/vendor/golang.org/x/net/idna/idna9.0.0.go
  • operator/vendor/golang.org/x/net/idna/pre_go118.go
  • operator/vendor/golang.org/x/net/idna/punycode.go
  • operator/vendor/golang.org/x/net/idna/tables10.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables11.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables12.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables13.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables15.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables17.0.0.go
  • operator/vendor/golang.org/x/net/idna/tables9.0.0.go
  • operator/vendor/golang.org/x/net/idna/trie12.0.0.go
  • operator/vendor/golang.org/x/net/idna/trie13.0.0.go
  • operator/vendor/golang.org/x/net/internal/httpcommon/request.go
  • operator/vendor/golang.org/x/net/internal/httpsfv/httpsfv.go
  • operator/vendor/golang.org/x/sys/plan9/syscall_plan9.go
  • operator/vendor/golang.org/x/sys/unix/affinity_linux.go
  • operator/vendor/golang.org/x/sys/unix/ioctl_signed.go
  • operator/vendor/golang.org/x/sys/unix/ioctl_unsigned.go
  • operator/vendor/golang.org/x/sys/unix/mkall.sh
  • operator/vendor/golang.org/x/sys/unix/mkerrors.sh
  • operator/vendor/golang.org/x/sys/unix/readv_unix.go
  • operator/vendor/golang.org/x/sys/unix/syscall_darwin.go
  • operator/vendor/golang.org/x/sys/unix/syscall_linux.go
  • operator/vendor/golang.org/x/sys/unix/syscall_linux_arm.go
  • operator/vendor/golang.org/x/sys/unix/syscall_linux_arm64.go
  • operator/vendor/golang.org/x/sys/unix/syscall_linux_loong64.go
  • operator/vendor/golang.org/x/sys/unix/syscall_linux_riscv64.go
  • operator/vendor/golang.org/x/sys/unix/syscall_openbsd.go
  • operator/vendor/golang.org/x/sys/unix/syscall_solaris.go
  • operator/vendor/golang.org/x/sys/unix/syscall_unix.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_386.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_arm.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_arm64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_loong64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_mips.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_ppc.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_riscv64.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go
  • operator/vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_linux.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_386.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_386.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_amd64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_amd64.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_arm.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_arm.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_arm64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_arm64.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_mips64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_mips64.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_ppc64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_ppc64.s
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_riscv64.go
  • operator/vendor/golang.org/x/sys/unix/zsyscall_openbsd_riscv64.s
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_386.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_amd64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_arm.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_arm64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_loong64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_mips.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_mips64le.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_mipsle.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_ppc64le.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_riscv64.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_s390x.go
  • operator/vendor/golang.org/x/sys/unix/zsysnum_linux_sparc64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_386.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_amd64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_arm.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_arm64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_loong64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_mips.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_mips64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_mips64le.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_mipsle.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_ppc.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_ppc64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_ppc64le.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_riscv64.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_s390x.go
  • operator/vendor/golang.org/x/sys/unix/ztypes_linux_sparc64.go
  • operator/vendor/golang.org/x/sys/windows/aliases.go
  • operator/vendor/golang.org/x/sys/windows/dll_windows.go
  • operator/vendor/golang.org/x/sys/windows/security_windows.go
  • operator/vendor/golang.org/x/sys/windows/syscall_windows.go
  • operator/vendor/golang.org/x/sys/windows/types_windows.go
  • operator/vendor/golang.org/x/sys/windows/zsyscall_windows.go
  • operator/vendor/golang.org/x/text/secure/bidirule/bidirule.go
  • operator/vendor/golang.org/x/text/secure/bidirule/bidirule10.0.0.go
  • operator/vendor/golang.org/x/text/secure/bidirule/bidirule9.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables10.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables11.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables12.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables13.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables15.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables17.0.0.go
  • operator/vendor/golang.org/x/text/unicode/bidi/tables9.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/forminfo.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables10.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables11.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables12.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables15.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables17.0.0.go
  • operator/vendor/golang.org/x/text/unicode/norm/tables9.0.0.go
  • operator/vendor/golang.org/x/tools/go/ast/inspector/cursor.go
  • operator/vendor/golang.org/x/tools/go/ast/inspector/inspector.go
  • operator/vendor/golang.org/x/tools/go/ast/inspector/iter.go
  • operator/vendor/golang.org/x/tools/go/packages/golist.go
  • operator/vendor/golang.org/x/tools/go/packages/packages.go
  • operator/vendor/golang.org/x/tools/go/types/objectpath/objectpath.go
  • operator/vendor/golang.org/x/tools/internal/aliases/aliases.go
  • operator/vendor/golang.org/x/tools/internal/aliases/aliases_go122.go
  • operator/vendor/golang.org/x/tools/internal/event/core/event.go
  • operator/vendor/golang.org/x/tools/internal/event/keys/keys.go
  • operator/vendor/golang.org/x/tools/internal/event/label/label.go
  • operator/vendor/golang.org/x/tools/internal/gcimporter/iexport.go
  • operator/vendor/golang.org/x/tools/internal/gcimporter/iimport.go
  • operator/vendor/golang.org/x/tools/internal/gcimporter/ureader.go
  • operator/vendor/golang.org/x/tools/internal/gocommand/version.go
  • operator/vendor/golang.org/x/tools/internal/pkgbits/version.go
  • operator/vendor/golang.org/x/tools/internal/stdlib/deps.go
  • operator/vendor/golang.org/x/tools/internal/typeparams/coretype.go
  • operator/vendor/golang.org/x/tools/internal/typeparams/free.go
  • operator/vendor/golang.org/x/tools/internal/typesinternal/types.go
  • operator/vendor/golang.org/x/tools/internal/versions/features.go
  • operator/vendor/modules.txt

📝 Walkthrough

Walkthrough

Adds a design doc and implements a separate webhook-bootstrap manager/lease with concurrent startup, adjusts Skyhook NodeState completion/interrupt transitions with tests, extends Chainsaw tests and cleanup timeouts, updates Makefile and deps (govulncheck, docker default), and vendors significant golang.org/x/* (http2, idna, sys, tools) and Windows/Unix syscall changes.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • NVIDIA/nodewright#241: Adjusts the same downgrade-after-uninstall scenario with a v1 uninstall trigger and cleanup assertions.
  • NVIDIA/nodewright#243: Implements dedicated webhook-bootstrap lease and split manager to resolve webhook deadlock.
  • NVIDIA/nodewright#244: Cherry-picks the webhook bootstrap, Skyhook FSM fixes, and Chainsaw cleanup changes mirrored here.

Suggested reviewers

  • mskalka
  • rice-riley
  • ayuskauskas
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cherrypickagain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants