probe: multi-BRANCH anomaly — thread-level attribution refinement#140
Merged
Conversation
Two polish items on snapshot management:
1. \`forkd images\` table output revamped:
- new columns: MEMORY (memory.bin size) and CREATED (relative age)
- dynamic TAG column width
- sorted most-recent-first instead of alphabetical
- footer with snapshot count + total bytes
Before:
TAG SIZE ROOTFS?
coding-agent-fork-prewarm-v1 2.4 GiB yes
After:
TAG SIZE MEMORY CREATED ROOTFS
coding-agent-fork-prewarm-v1 2.4 GiB 512.0 MiB 3d ago yes
python-numpy 1.8 GiB 512.0 MiB 12h ago yes
2 snapshots · 4.2 GiB total
2. New \`forkd rmi <TAG>...\` subcommand (docker-style):
- tries DELETE /v1/snapshots/:tag first (clean: daemon removes
registry entry + on-disk files atomically)
- falls back to direct disk removal when the daemon is unreachable
or doesn't know the tag (404)
- reports source per tag: \"(daemon)\", \"(disk)\", or
\"(disk (daemon unreachable))\"
Examples:
forkd rmi pyagent
forkd rmi pyagent langgraph python-numpy
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #128. The original probe used \`strace -c\` on the whole FC process; that can't distinguish user-space CPU from off-CPU blocked-waiting. This pass added two more tools and refined the picture significantly: - bpftrace profile:hz:199 on the FC pid → ~18 samples in 1.6 s of BRANCH out of ~320 expected, i.e. FC is off-CPU ~94 % of the BRANCH window. Original \"user-space CPU\" claim was too strong. - /proc/\$pid/task/*/stack polled at 30 ms across all FC threads → top kernel-sleep frames during the slow window: ep_poll x90 (main thread, idle) [kvm] x88 (vCPU thread parked in kvm_vcpu_halt — the pause working as designed) vhost_task_fn x50 (vhost-net, idle) futex_wait x17 ← actual signal: thread blocked on userspace futex submit_bio_wait x3 (snapshot writer waiting on block IO) jbd2_log_wait x2 (ext4 journal commit) Full kvm stack: kvm_vcpu_block ← kvm_vcpu_halt ← ... ← __x64_sys_ioctl Full futex stack: futex_wait_queue ← futex_wait ← do_futex ← __x64_sys_futex (kernel can't tell *which* futex from a static stack.) Revised picture (3 contributors, not just user CPU): 1. Userspace futex contention — a worker waits on a mutex; lock hold-time may scale with accumulated snapshot count 2. ext4 journal / block IO writeback (~2 % of off-CPU) 3. User-space CPU on the snapshot worker (~70 % of off-CPU time returned empty kernel stack = thread was in user mode; FC's static-pie release build has no frame pointers so bpftrace can't symbolize) Implications for #118 Phase 2/3: - Phase 2 (io_uring) addresses #2 only (~2 % of the window) - Phase 3 (1 s tick) may compound the futex contention; need to identify the lock first - New candidate work: bpftrace on tracepoint:syscalls:sys_enter_futex with args.uaddr capture to identify the specific futex Ships: - bench/pause-window/probe-bpftrace-fc.sh — user-stack sampler - bench/pause-window/probe-syscall-poll.sh — /proc/syscall poll loop - bench/pause-window/PROBE-multi-branch-anomaly.md — \"Follow-up\" section appended with the refined picture and revised #118 scope Refs #118. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #128. The original `strace -c` probe couldn't distinguish user-space CPU from off-CPU blocked-waiting. This pass added bpftrace + per-thread `/proc/$pid/task/*/stack` polling and substantially refined the picture.
TL;DR — original "user-space CPU" claim was too strong
bpftrace at 199 Hz on the FC pid during a slow BRANCH: only ~18 samples in 1.6 s out of ~320 expected = FC is off-CPU ~94 % of the BRANCH window.
What FC's 5 threads do during a slow BRANCH (kernel-sleep histogram)
```
ep_poll x90 main thread (HTTP idle)
[kvm] x88 vCPU thread parked in kvm_vcpu_halt — the pause working
vhost_task_fn x50 vhost-net (idle)
futex_wait_queue x17 ← actual signal: thread blocked on userspace futex
submit_bio_wait x3 snapshot writer waiting on block IO
jbd2_log_wait x2 ext4 journal commit
```
Full kvm stack: `kvm_vcpu_block ← kvm_vcpu_halt ← ... ← __x64_sys_ioctl`
Full futex stack: `futex_wait_queue ← futex_wait ← do_futex ← __x64_sys_futex` (kernel can't say which futex from a static stack).
Revised picture — 3 contributors
Implications for #118 Phase 2/3 (refined)
Files
Refs #118.
🤖 Generated with Claude Code