Skip to content

feat(deploy): add cgroup v2 CPU controller preflight check#367

Merged
fslongjin merged 1 commit into
TencentCloud:masterfrom
fslongjin:jinlong-add-cgroup-cpu-preflight
May 27, 2026
Merged

feat(deploy): add cgroup v2 CPU controller preflight check#367
fslongjin merged 1 commit into
TencentCloud:masterfrom
fslongjin:jinlong-add-cgroup-cpu-preflight

Conversation

@fslongjin
Copy link
Copy Markdown
Member

  • Validate cgroup v2 'cpu' controller availability in install.sh and online-install.sh
  • Add troubleshooting entry in deployment docs for Ubuntu/Debian cgroup v2 CPU controller issues

- Validate cgroup v2 'cpu' controller availability in install.sh and
online-install.sh
- Add troubleshooting entry in deployment docs for Ubuntu/Debian cgroup v2 CPU
controller issues

Signed-off-by: jinlong <jinlong@tencent.com>
@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented May 27, 2026

Claude finished @fslongjin's task in 42s —— View job


Review: feat(deploy): add cgroup v2 CPU controller preflight check

Overview

This PR adds a cgroup v2 cpu controller preflight check to both install.sh and online-install.sh, plus troubleshooting documentation entries. The check detects whether the cpu controller is available in cgroup v2, attempts to enable it if absent, and fails with actionable diagnostics when it can't be enabled.

Changes

  • deploy/one-click/install.sh (+44 lines): New check_cgroup_cpu_preflight() function, called before dependency installation
  • deploy/one-click/online-install.sh (+35/-1): Inline cgroup v2 check inside check_early_preflight()
  • docs/guide/troubleshooting/deployment.md (+1 line): Entry for cgroup v2 cpu controller issue
  • docs/zh/guide/troubleshooting/deployment.md (+1 line): Chinese translation of the troubleshooting entry

Positive Observations

  1. Fail-fast placement: The check runs before dependency installation (line 420 of install.sh), consistent with the existing check_hardware_preflight and check_cubelet_fs_preflight pattern. This avoids wasted work if the system isn't viable.

  2. Correct cgroup v1/v2 detection: Uses stat -fc %T /sys/fs/cgroup to distinguish cgroup2fs from tmpfs (v1). Returns silently for v1, which is the right behavior since cubelet handles v1 through a different path.

  3. Graceful degradation: Reads cgroup.controllers and cgroup.subtree_control with || true guards, so the check doesn't crash on unreadable files. The write attempt to cgroup.subtree_control is inside an if condition, which properly handles failure under set -e.

  4. Diagnostic quality: Error messages include the actual cgroup.controllers content (or <empty>), the specific fix commands (systemctl disable --now multipathd...), and a link to the tracking issue. This matches the helpful tone of existing error messages in the codebase.

  5. Consistency between scripts: The logic in online-install.sh correctly mirrors install.sh but uses direct echo statements rather than the log/die helper functions (since online-install.sh is self-contained before the bundle is downloaded).

  6. Naming convention: check_cgroup_cpu_preflight follows the existing check_hardware_preflight / check_cubelet_fs_preflight pattern.

Observations & Suggestions

1. ShellCheck warning: SC2181 (minor)
In install.sh, the subtree check reads the file into a variable, then passes it via <<< to grep:

local subtree=""
if [[ -r "${cgroot}/cgroup.subtree_control" ]]; then
  subtree="$(cat "${cgroot}/cgroup.subtree_control" 2>/dev/null || true)"
fi
if grep -qw cpu <<<"${subtree}"; then
  return 0
fi

ShellCheck may suggest using grep -qw cpu "${cgroot}/cgroup.subtree_control" 2>/dev/null directly instead of reading into a variable. However, the current approach is fine — it avoids a double read of the same file (one for check, one for the error message). No change needed, but worth noting.

2. online-install.sh re-check after download
The online-install.sh runs check_early_preflight (which includes the cgroup v2 check) before downloading, then delegates to install.sh which runs the same check again via check_cgroup_cpu_preflight. This means the check runs twice in the online path. This is intentional and correct — the early check prevents a large download on an unsupported system, and the downstream check in install.sh is a safety net for any other entry points.

3. Possible race: cpu controller is read separately from cgroup.controllers and cgroup.subtree_control
These files are not expected to change during runtime (cgroup controllers are configured at boot), so this is not a practical concern.

4. The die function's exit code
The install.sh die function exits with code 1, while online-install.sh uses exit 3 for this check. This is consistent with online-install.sh's own exit code convention (hardware/memory/KVM issues use exit 3). The inconsistency between the two scripts is pre-existing and not introduced by this PR.

Documentation Review

The troubleshooting table entries (both English and Chinese) are well-written:

  • They correctly attribute the issue to Ubuntu/Debian cloud images not delegating the cpu controller
  • They identify multipathd as the blocker
  • They link to the tracking issue for the full repro and fix
  • The Chinese translation is faithful to the original

Summary

This is a well-crafted PR. The implementation is correct, robust, and consistent with existing patterns. The error messages are actionable and well-documented. I have no blocking concerns — this is ready to merge.

@fslongjin fslongjin merged commit a4ff1b9 into TencentCloud:master May 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant