refactor(linux): only start secure-tls-bootstrap.service via kubelet WantedBy=#8632
refactor(linux): only start secure-tls-bootstrap.service via kubelet WantedBy=#8632cameronmeissner wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Linux CSE secure TLS bootstrapping flow so secure-tls-bootstrap.service is enabled with WantedBy=kubelet.service (and ordered Before=kubelet.service) rather than being explicitly started during CSE. This aligns secure TLS bootstrapping execution with kubelet startup timing, after the CSE’s API server connectivity validation.
Changes:
- Rename
configureAndStartSecureTLSBootstrapping→configureAndEnableSecureTLSBootstrappingand switch behavior from “enable+start” to “enable only”. - Update the secure TLS bootstrapping systemd drop-in to include an
[Install]section withWantedBy=kubelet.service. - Adjust error code naming and ShellSpec coverage to reflect the new enable-only behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
parts/linux/cloud-init/artifacts/cse_config.sh |
Renames the secure TLS bootstrapping configurator and changes it to systemctl enable secure-tls-bootstrap (no explicit start), while writing a drop-in with WantedBy=kubelet.service. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Updates the nodePrep call site and event name to use configureAndEnableSecureTLSBootstrapping. |
parts/linux/cloud-init/artifacts/cse_helpers.sh |
Renames the secure TLS bootstrapping error code constant to reflect “enable” failure semantics. |
spec/parts/linux/cloud-init/artifacts/cse_config_spec.sh |
Updates ShellSpec tests/mocks to assert enable-only behavior and the new function name. |
🕵️ AgentBaker E2E Detective — Daily TME runFailed E2E run: Failed job: Failed tests (2 / 401 executed):
Likely cause: Perf-threshold flake under westus capacity/SKU contention (node create 3m21s, pod-ready 1m36s in the same run). PR #8632 only re-orders Flake vs regression: Suggested owner: AgentBaker Linux Node SIG (Node Lifecycle) — perf-threshold owner; LocalDNS plugin scenario owner for the second test. Recommended next action: Do not block this PR on this run. Re-run E2E once to confirm flake; if Posted automatically by Clawpilot (local detective). Analysis only — no pipeline mutation. If this is wrong, ping @sylvainboily. |
What this PR does / why we need it:
only start secure-tls-bootstrap.service via kubelet WantedBy= - this ensure that we only start the bootstrap process after we've established outbound connectivity with the cluster's API server - this should generally increase secure TLS bootstrapping QoS and lower bootstrapping latency
Which issue(s) this PR fixes:
Fixes #