Skip to content

feat(guest-image): shrink ext4 image after creation and optimize Dockerfile#347

Merged
fslongjin merged 2 commits into
TencentCloud:masterfrom
fslongjin:jinlong-feat-shrink-guest-image
May 25, 2026
Merged

feat(guest-image): shrink ext4 image after creation and optimize Dockerfile#347
fslongjin merged 2 commits into
TencentCloud:masterfrom
fslongjin:jinlong-feat-shrink-guest-image

Conversation

@fslongjin
Copy link
Copy Markdown
Member

@fslongjin fslongjin commented May 25, 2026

Summary

This PR shrinks the one-click guest image cube-guest-image-cpu.img by removing runtime-only packages from the guest rootfs and shrinking the generated ext4 image after creation.

Motivation

The guest image used to be around 768MB, while the actual rootfs contents were much smaller. That left a large amount of unused space in the released package and on disk.

What changed

  • Simplified deploy/guest-image/Dockerfile to keep only the runtime essentials.
  • Added yum clean all and removed cache/docs/man/locale/temporary files during image build.
  • Added an ext4 shrink step in deploy/one-click/build-vm-assets.sh.
  • Made the post-shrink headroom configurable through ONE_CLICK_GUEST_IMAGE_RESERVED_BYTES and defaulted it to 32MB.

Results

Measured on the built package:

  • cube-guest-image-cpu.img: 245MB
  • ext4 used space: 205MB
  • ext4 free space: 41MB
  • one-click release tarball: 261MB

This reduces the guest image size from roughly 768MB to 245MB .

…erfile

- Add shrink_ext4_image to minimize guest image size with configurable headroom
- Strip runtime-only packages from Dockerfile (util-linux, busybox) and clean
caches
- Expose ONE_CLICK_GUEST_IMAGE_RESERVED_BYTES env variable for headroom tuning

Signed-off-by: jinlong <jinlong@tencent.com>
@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented May 25, 2026

Claude finished @fslongjin's task in 1m 13s —— View job


Code Review: PR #347 — feat(guest-image): shrink ext4 image after creation and optimize Dockerfile

I've reviewed all three changed files. Overall this is a well-structured PR with good error handling and clear intent. Below are my findings.

Summary

Category Count
⚠️ Potential issues 2
💡 Suggestions 5
✅ Correct patterns 4

deploy/guest-image/Dockerfile — ✅ Looks good

The Dockerfile changes are solid:

  • Good: --setopt=install_weak_deps=False reduces image size by skipping unnecessary dependencies
  • Good: --setopt=tsflags=nodocs skips documentation, saving significant space
  • Good: Thorough cache/documentation/locale cleanup after yum install
  • Good: Removing gcc, gawk, which, vim, sudo, lsof — these are indeed build-time or convenience tools, not runtime essentials

Nit: The comment on line 7 says busybox provides awk/sed/grep applets replacing gawk/which/etc. — this is slightly misleading. busybox applets are lighter-weight reimplementations, not drop-in replacements. For example, busybox awk lacks some GNU awk extensions. If the guest agent script relies on any GNU awk features, this could cause subtle breakage. Consider explicitly noting this trade-off in the comment.


deploy/one-click/build-vm-assets.sh — Findings

1. ⚠️ run_as_root loses the calling command's exit code in one path

run_as_root() {
  if [[ "${EUID}" -eq 0 ]]; then
    "$@"
    return $?     # Works: propagates exit code
  fi
  require_cmd sudo
  sudo "$@"       # No return statement — exit code of sudo is lost
}

When EUID != 0, sudo "$@" runs but its exit code is implicitly the function's exit code only because it's the last command. If require_cmd sudo precedes it, the exit code is actually require_cmd's exit (0). This works correctly by accident, but the asymmetry with the EUID == 0 branch is inconsistent. Fix this →

2. ⚠️ truncate before resize2fs can silently truncate data if resize2fs -M gave wrong minimum size

# Line 263
run_as_root resize2fs -M "${img}" >&2     # (A) shrink FS to minimum
# ...
# Line 277
run_as_root truncate -s "${final_bytes}" "${img}"  # (B) shrink file
# Line 278
run_as_root resize2fs "${img}" "${target_blocks}" >&2  # (C) grow FS to target

Step (B) reduces the file size. If resize2fs -M at step (A) reports too few blocks (e.g., due to a FS bug or unexpected FS state), step (C) could fail or, worse, the e2fsck at line 279 could report corruption. The e2fsck afterwards is a good safety net, but consider reordering to run resize2fs "${img}" "${target_blocks}" before truncate (i.e., resize the FS to the target size while the file is still large, then truncate). This way, the FS resize has room to work.

Or better: use truncate -s to grow the file first (to final_bytes), then run resize2fs "${img}" "${target_blocks}" to extend the FS into the new space.

3. 💡 run_as_root is less flexible than the existing *_with_optional_sudo pattern

The existing pattern (e.g., run_mkfs_ext4_with_optional_sudo) tries the command directly first, then falls back to sudo. The new run_as_root always escalates via sudo if not root. This means:

  • If the user already has permission to run e2fsck/resize2fs/dumpe2fs directly (e.g., the image file is user-owned), the existing pattern would avoid sudo entirely
  • With run_as_root, sudo will prompt for a password if not configured passwordless

Consider whether run_as_root should try without sudo first when the calling user already has the necessary permissions.

4. 💡 dumpe2fs output parsing could be fragile

dumpe2fs_out="$(run_as_root dumpe2fs -h "${img}" 2>/dev/null)"
block_size="$(printf '%s\n' "${dumpe2fs_out}" | awk -F': *' '/^Block size/ {print $2; exit}')"

The awk patterns expect the fields to start at column 0 with the exact casing Block size and Block count. This is stable on current e2fsprogs but could break with future versions or locale settings. Consider either:

  • Setting LC_ALL=C to avoid locale issues
  • Or using more flexible matching (e.g., case-insensitive)

5. 💡 Sparse file consideration

truncate creates a sparse file. If the image is later copied without --sparse=always or equivalent, it will inflate to its full apparent size. Consider adding a comment noting this, or explicitly converting to a non-sparse format (like QEMU qcow2) if the intended deployment pipeline doesn't handle sparse files well.

6. 💡 Double headroom reservation

The calculate_guest_image_size_bytes function at line 220-228 already reserves 64MB of headroom over the rootfs data when creating the filesystem. Then shrink_ext4_image adds another 32MB (default SHRINK_RESERVED_BYTES) of reserved space. The net effect is:

  1. Initial FS image: (rootfs + 64MB) aligned to 256MB boundary
  2. After shrink: FS is compacted to min_blocks + 32MB headroom

So total effective free space = 64MB (from initial allocation) + 32MB (shrink headroom) = ~96MB. This seems intentional, but double-check this is the desired total. If the intent is to have exactly 32MB of headroom, consider reducing the 64MB in calculate_guest_image_size_bytes or passing a smaller reserved_bytes to shrink_ext4_image.


deploy/one-click/env.example — ✅ Clean

Good documentation. The env variable name is self-documenting and consistent with existing naming conventions (ONE_CLICK_* prefix).


Positive Patterns

  • e2fsck before and after: Running fsck before shrink (to ensure clean FS) and after (to verify integrity) is excellent practice
  • Graceful fallback for numfmt: The || echo "${final_bytes}" fallback handles systems without GNU coreutils gracefully
  • Human-readable logging: Using numfmt --to=iec for log output improves debuggability
  • Sourcing env file: The SHRINK_RESERVED_BYTES assignment at line 253 is evaluated at script runtime (after env file is loaded), so the env override works correctly

…ndling

- Optimize run_as_root to try without sudo first for better performance
- Add dump_ext4_header to ensure C locale for dumpe2fs parsing
- Add defensive check against shrinking below filesystem minimum size
- Clarify busybox limitations in Dockerfile comments

Signed-off-by: jinlong <jinlong@tencent.com>
@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented May 25, 2026

Claude finished @fslongjin's task in 1m 41s —— View job


Code Review: PR #347 — feat(guest-image): shrink ext4 image after creation and optimize Dockerfile

I've reviewed all three changed files in detail. The second commit (5a7d73f) addresses most of the earlier review feedback well. Below are my findings.


deploy/guest-image/Dockerfile

The Dockerfile changes are clean and well-motivated. Removing build-time tools (gcc, gawk, which, vim, sudo, lsof) is correct for a runtime-only image.

Note: --setopt=install_weak_deps=False and --setopt=tsflags=nodocs are good yum optimizations — no concerns here.


deploy/one-click/build-vm-assets.sh

Finding 1: e2fsck exit code 4+ (uncorrected errors) is silently swallowed

Line 291: run_as_root e2fsck -fy "${img}" >&2 || true

e2fsck uses these exit codes:

Code Meaning
0 No errors
1 Errors corrected
2 Errors corrected, system should be rebooted
4 Errors left uncorrected
8 Operational error

The || true swallows all non-zero exits, including exit code 4 (uncorrected errors). If e2fsck encounters corruption it cannot fix, execution continues to resize2fs -M on a corrupt FS. The second e2fsck at line 320 offers a safety net, but operating on a corrupt filesystem with resize2fs risks data loss.

Recommendation: Check for exit code >= 4 and abort:

run_as_root e2fsck -fy "${img}" >&2 || {
  rc=$?
  if (( rc >= 4 )); then
    die "e2fsck encountered uncorrectable errors (exit ${rc}) on ${img}"
  fi
}

Fix this →

Finding 2: run_as_root stderr silencing can hide non-permission failures

Line 254: "$@" 2>/dev/null

The first (non-sudo) attempt silences all stderr, not just permission errors. If the command fails for a reason other than EACCES (e.g., invalid arguments, wrong image format), the error is hidden. The sudo fallback then runs and fails with its own error, which may be less informative than the original error message.

This is a minor concern since the commands in this file are well-tested system tools. But if someone debugs a failure here, the hidden stderr could be confusing.

Recommendation: Not strictly necessary, but if you want to improve it, capture stderr from the first attempt and log it if the command fails:

local errfile; errfile="$(mktemp)"
"$@" 2>"${errfile}"
rc=$?
if [[ ${rc} -eq 0 ]]; then
  rm -f "${errfile}"
  return 0
fi
# First attempt failed — show its stderr before retrying with sudo
[[ -s "${errfile}" ]] && log "info: first attempt stderr: $(< "${errfile}")" >&2
rm -f "${errfile}"
require_cmd sudo
sudo "$@"

Finding 3: shrink_ext4_image error message lacks debugging context

Line 298:

die "failed to parse ext4 metadata from ${img}"

If this triggers, the operator has no information about what dumpe2fs actually returned. Including the raw output (or part of it) would aid debugging:

die "failed to parse ext4 metadata from ${img}: block_size=${block_size:-<unset>} min_blocks=${min_blocks:-<unset>}"

Or log the full dumpe2fs output ahead of time:

dumpe2fs_out="$(dump_ext4_header "${img}")" || die "dumpe2fs failed on ${img}"

Finding 4: (Minor) The initial 64MB headroom is wasted when followed by shrink

In calculate_guest_image_size_bytes (line 222-223), 64MB is reserved above the raw file-size estimate before mkfs.ext4 creates the image. This is intended to give mkfs.ext4 -d enough space for metadata overhead.

However, shrink_ext4_image then runs resize2fs -M, which compacts the FS to its absolute minimum — discarding that 64MB. The final image gets 32MB from the new SHRINK_RESERVED_BYTES.

The net effect is correct (32MB headroom in the final image), but the shrink step has to scan a much larger image (256MB/512MB step) to get there. If performance of the shrink step matters, you could reduce calculate_guest_image_size_bytes's reserved_bytes now that the shrink step provides its own headroom. Not a correctness issue, just an efficiency observation.

Adjusted Finding: truncate ordering is correct

The previous review noted a concern about truncating before resize2fs. Looking at the current code:

  1. resize2fs -M shrinks FS to min_blocks (inside a still-large file)
  2. truncate -s final_bytes trims the file to min_bytes + reserved_bytes
  3. resize2fs img target_blocks extends the FS to fill the trimmed file

The min_bytes <= final_bytes assertion on line 309 guarantees no data is truncated. This is correct.

Adjusted Finding: run_as_root flexibility

The previous review noted that run_as_root is less flexible than run_mkfs_ext4_with_optional_sudo. Commit 5a7d73f addressed this by making run_as_root try without sudo first — this is now equivalent in behavior to the existing pattern. Good fix.

Adjusted Finding: LC_ALL=C locale handling

Commit 5a7d73f added dump_ext4_header with LC_ALL=C. This fix properly addresses the original concern. Good.


deploy/one-click/env.example

Clean documentation. The variable is well-named (ONE_CLICK_GUEST_IMAGE_RESERVED_BYTES) and consistent with the existing ONE_CLICK_* naming convention.


Summary

Category Count
Potential issues 2
Suggestions 2
Previously addressed issues 3
Correct patterns 4+

Key strengths of this PR:

  • e2fsck before and after the shrink — excellent safety practice
  • Sparse file documentation — clarifies trade-offs for downstream packagers
  • Configurable reserved bytes via env variable — flexible without hardcoding
  • Defensive min_bytes <= final_bytes assertion — prevents silent data truncation
  • Second commit (5a7d73f) responded well to earlier review feedback

The two issues I'd prioritize are: (1) handling e2fsck exit code 4+ with a hard abort rather than silent continuation, and (2) making the metadata parse error message more actionable.

@fslongjin
Copy link
Copy Markdown
Member Author

Item 3 (sudo escalation): Fixed. run_as_root now mirrors the *_with_optional_sudo pattern (try first, fall back to sudo). Only stderr is silenced so stdout can still be captured by $(...).

Item 4 (locale-fragile parsing): Fixed. Added a dump_ext4_header wrapper that always runs dumpe2fs -h under LC_ALL=C. Verified under LC_ALL=zh_CN.UTF-8.

Dockerfile nit: Comment updated to call out that busybox applets are reimplementations without every GNU extension, and to note the current startup path doesn't rely on those.

Item 1 (exit code asymmetry): Adjusted for readability — both branches now return $? explicitly. The original behavior was already correct though: a bash function returns the exit code of its last command (sudo "\$@"), not require_cmd sudo (which would die before reaching sudo on failure).

Item 2 (truncate before resize2fs): Added a final_bytes >= min_blocks * block_size assertion as defense. The concern itself doesn't apply to this flow: truncate is growing the file from min to min + headroom; the bytes it releases sit above the shrunk FS boundary, so no live FS data is at risk.

Item 5 (sparse): Added an inline comment documenting the sparse output. No behavior change needed — the current tar.gz pipeline produces a 261MB release.

Item 6 (double headroom 96MB): Doesn't hold. The 64MB in calculate_guest_image_size_bytes is initial mkfs working space and is fully discarded by resize2fs -M. Final free space comes only from SHRINK_RESERVED_BYTES (32MB). Measured ~41MB free = 32MB headroom + ~9MB ext4 internal reserved (bitmaps/inode tables/journal).

@fslongjin fslongjin merged commit 75f0c17 into TencentCloud:master May 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant