refactor: full-codebase simplify pass — reuse + quality#30
Conversation
Tier 1 fixes from 5-agent simplify audit (200+ Go files). Skipped findings that were either already-accepted trade-offs (config.json re-reads on clone — Copilot review accepted), separate-PR scope (generic DoAPIJSON across CH/FC, status event-loop dedup), or change churn beyond their value (info.Hypervisor JSON tag, IndexLayout helper across 4 backends). Reuse: - utils.ReadJSONFile: 5+ os.ReadFile + json.Unmarshal duplicates collapsed (snapshot/envelope.go, hypervisor/snapshot.go, hypervisor/cloudhypervisor/clone.go.parseCHConfig). errors.Is through %w preserves the fs.ErrNotExist signal callers rely on. Quality: - Drop dead VerifyProcess + verifyProcessExe (utils/process.go, process_linux.go, process_other.go). All production callers go through VerifyProcessCmdline with non-empty expectArg; the bare variant was unreachable. Tests of dead code removed. - types.ImageTypeOCI / ImageTypeCloudImg constants replace bare 'oci' / 'cloudimg' string comparisons in cmd/core/helpers.go, cmd/vm/run.go; backend typ consts in images/oci, images/cloudimg reference the public constant. - cloneResumeOpts in hypervisor/cloudhypervisor/clone.go drops 5 fields (cpu, diskQueueSize, noDirectIO, windows, onDemand) that were just transcribed from vmCfg. Pass *types.VMConfig directly; restoreAndResumeClone reads the originals via opts.vmCfg. Modernize and generics audits returned zero actionable findings. Codebase is already idiomatic Go 1.25 (slices/maps/cmp/errors.Join all in active use, range-over-int adopted). Lint: 0 issues both platforms. Tests: full suite passes. Testbed: 25/25 regression tests green.
There was a problem hiding this comment.
Pull request overview
This PR performs a codebase-wide simplification/refactor pass focused on deduplication and consistency, while aiming to keep runtime behavior unchanged.
Changes:
- Added
utils.ReadJSONFileand refactored multiple JSON read+unmarshal call sites to use it. - Removed dead
VerifyProcess/verifyProcessExelogic and related tests, leavingVerifyProcessCmdlineas the single entry point. - Centralized image backend type strings via
types.ImageTypeOCI/types.ImageTypeCloudImgand updated call sites to use the constants. - Simplified Cloud Hypervisor clone resume options by passing
*types.VMConfiginstead of copying multiple fields.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| utils/process.go | Removes VerifyProcess and simplifies VerifyProcessCmdline behavior. |
| utils/process_test.go | Deletes tests for removed dead APIs; keeps cmdline-verification tests. |
| utils/process_other.go | Removes now-unused non-Linux verifyProcessExe stub. |
| utils/process_linux.go | Removes now-unused Linux verifyProcessExe implementation and related import. |
| utils/atomic.go | Adds ReadJSONFile helper for shared JSON file loading. |
| types/config.go | Introduces exported image type constants used across the codebase. |
| snapshot/envelope.go | Refactors snapshot envelope loading to use utils.ReadJSONFile. |
| hypervisor/snapshot.go | Refactors snapshot metadata loading to use utils.ReadJSONFile. |
| hypervisor/cloudhypervisor/clone.go | Simplifies clone resume options and refactors CH config parsing to use utils.ReadJSONFile. |
| images/oci/oci.go | Replaces hardcoded "oci" type string with types.ImageTypeOCI. |
| images/cloudimg/cloudimg.go | Replaces hardcoded "cloudimg" type string with types.ImageTypeCloudImg. |
| cmd/vm/run.go | Uses types.ImageTypeCloudImg constant in post-clone hint logic. |
| cmd/core/helpers.go | Uses types.ImageTypeOCI constant in OCI digest pinning logic. |
Comments suppressed due to low confidence (1)
utils/process.go:57
- VerifyProcessCmdline no longer special-cases expectArg == "". On Linux, verifyProcessCmdline uses strings.Contains(cmdline, expectArg), so an empty expectArg always passes and the only remaining check is a substring match on binaryName, which is weaker than the previous /proc/{pid}/exe basename check. Consider restoring a strict executable-name verification path for expectArg == "" (e.g., readlink(/proc/pid/exe) or exact argv[0] basename), or explicitly reject empty expectArg and update callers/docs accordingly to avoid misidentifying/killing the wrong process.
// VerifyProcessCmdline checks binary name and that expectArg appears in
// /proc/{pid}/cmdline. This prevents cross-instance misidentification when
// multiple processes of the same binary are running (e.g. multiple VMs).
// On non-Linux platforms, falls back to IsProcessAlive.
func VerifyProcessCmdline(pid int, binaryName, expectArg string) bool {
if pid <= 0 {
return false
}
if match, ok := verifyProcessCmdline(pid, binaryName, expectArg); ok {
return match
}
return IsProcessAlive(pid)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot review on PR #30 surfaced that VerifyProcessCmdline with expectArg="" silently degrades to a binary-name substring match (strings.Contains(cmdline, "") is always true), risking cross-VM PID confusion. No production caller passes empty today, but the guard is cheap and prevents future misuse. Also trim the godoc to 3 lines and shrink the ImageType const header to 1 line.
The labels claimed root/cocoon credentials for SSH access. Only the 24.04-picoclaw variant actually installed openssh-server; the other four (22.04, 24.04, 24.04-chrome, 24.04-xface) had the labels but no sshd, so any external tool that took them at face value would hit a closed port. Verified zero readers across the cocoonstack: glance, vk-cocoon, and epoch all grep clean for cocoon.ssh / Labels["cocoon.ssh"]. The README claim that the labels exist 'for external tooling' was aspirational and never landed in code. Drop the label everywhere (including picoclaw, where the credentials are real but still unread by tooling) and rewrite the README note to point at the actual control-plane path: cocoon-agent over vsock for kubectl exec/logs. picoclaw's sshd is preserved for its existing serial/network use cases.
CI fail: empty expectArg now returns false (defensive guard added in prior commit), so terminateWithPidfd returns handled=true without killing, and TestTerminateProcess_SIGTERMIgnored_FallsBackToKill hangs waiting for the bash process to exit.
Follow-up to 180fd4c which removed the LABEL from all 5 ubuntu image variants. Verified across the whole tree (vk-cocoon, glance, epoch, cocoon-operator, cocoon-webhook, cocoon-agent, cocoon-net, cocoon-common, cocoon-specs, cocoonv2 itself): no code reads the label, but glance operators consult it as the human-facing source of truth when populating their own glance-credentials Secret. picoclaw is the one variant that actually bakes openssh-server + chpasswd's root:cocoon, so the label is honest documentation there. The four lying variants (22.04, 24.04, chrome, xface) stay stripped.
818dadd to
38648e2
Compare
Earlier attempt at this revert (818dadd) didn't land in the branch history — codex re-flagged it via direct probe (TerminateProcess with empty expectArg still no-ops). TerminateProcess godoc explicitly says 'optional cmdline arg check', so empty expectArg must remain a valid binary-only path. Production callers all pass non-empty sockPath, so the substring-collision risk Copilot raised earlier is unreachable in practice.
Copilot pointed out that strings.Contains(cmdline, binaryName) hits 'binaryName anywhere in argv' rather than argv0 specifically — a process running 'bash -c "cloud-hypervisor --version"' would false-positive. Real risk during PID reuse: a recycled PID whose cmdline happens to mention the VMM binary as a string would be treated as the VMM. Split /proc/<pid>/cmdline on NUL, compare filepath.Base(argv0) to binaryName, then run the optional expectArg substring check on the remainder. ~3 lines, strictly more correct, no API change.
|
@CMGS Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
Summary
Audited 200+ Go files via 5 parallel agents (reuse / quality / efficiency / modernize / generics). Applied the Tier 1 findings — high-confidence fixes with bounded scope and zero behavior change. Skipped findings that were already-accepted trade-offs (config.json re-reads on clone, accepted in PR #27 Copilot review), or separate-PR scope (generic
DoAPIJSON[T]across CH/FC, status event-loop dedup, IndexLayout helper across 4 backends), or churn-beyond-value (info.HypervisorJSON tag — touches 5+ construction sites for cosmetic gain).What changed
Reuse (cross-file dedup)
utils.ReadJSONFile(path, v)— 5+os.ReadFile+json.Unmarshalduplicates collapsed. Sites:snapshot/envelope.go::ReadSnapshotEnvelopehypervisor/snapshot.go::LoadSnapshotMetahypervisor/cloudhypervisor/clone.go::parseCHConfigerrors.Is(err, fs.ErrNotExist)still works through%wso callers (e.g.ReadSnapshotEnvelope→ErrEnvelopeMissing) keep their sentinel mapping.Quality
VerifyProcess+verifyProcessExe(utils/process.go,process_linux.go,process_other.go). All production callers go throughVerifyProcessCmdlinewith a non-emptyexpectArg(sockPath); the bare variant was unreachable. Tests of dead code removed.types.ImageTypeOCI/types.ImageTypeCloudImgconstants replace bare"oci"/"cloudimg"string comparisons incmd/core/helpers.go::digestPullRefandcmd/vm/run.go::printPostCloneHints. Backendtypconsts inimages/ociandimages/cloudimgnow reference the public constant.cloneResumeOptscleanup (hypervisor/cloudhypervisor/clone.go): drops 5 fields (cpu,diskQueueSize,noDirectIO,windows,onDemand) that were just transcribed fromvmCfg. Pass*types.VMConfigdirectly;restoreAndResumeClonereads the originals viaopts.vmCfg.*.Modernize / Generics
Both audits returned zero actionable findings. Codebase already uses
slices,maps,cmp.Or,errors.Join, range-over-int idiomatically. Existing generics (Store[T],ResolveRef[T],ForEach[T]) cover the structural-dedup space; remaining candidates are either single-use, semantically distinct per type, or the boxed-anywas an intentional encoder sink.Net diff
Test plan
make lintclean on darwin + linuxgo test ./...passes (incl.utils/process_test.goafter dead-test removal)Skipped findings (separate PRs / accepted trade-offs)
utils.DoAPIJSON[T]for HTTP+JSON pattern across CH/FC) — bigger refactor, separate PR.statusEventLoop/statusEventLoopJSONnear-duplicates) — needs design for shared diff abstraction, separate PR.info.HypervisorJSON tag) — would touch 5+ construction sites for cosmetic gain.IndexLayouthelper across 4 backends) — embedding shapes differ, marginal win.