feat(inject): Python perf-trampoline injector via ptrace by dpsoft · Pull Request #12 · dpsoft/perf-agent

dpsoft · 2026-04-29T23:07:46Z

Summary

Adds perf-agent --profile --inject-python to activate CPython 3.12+'s perf trampoline (sys.activate_stack_trampoline('perf')) on a running target via ptrace, so users get Python JIT names without restarting the target with python -X perf.
Per-PID mode is strict (any failure exits non-zero); system-wide -a mode is lenient (failures logged + counted, profile continues).
Activation runs at profile start (before BPF attach), deactivation at profile end — so trampoline overhead does not persist past the profiling window.
Late-arriving Python processes during -a runs are picked up via a new lazy mmap-watcher hook.

Architecture

inject/
├── elfsym/        # SHARED — ELF symbol resolution + libpython SONAME parser (any runtime)
├── ptraceop/      # SHARED — ptrace primitives, three-call sequence, amd64 + arm64 frame setup
└── python/        # Python-specific: detector, manager, payload encoding

The split anticipates future runtime injectors (Node.js inspector protocol, JVM perf-map-agent) plugging in as siblings of python/ without touching the shared layer.

Lifecycle

Agent.Start() → python.Manager.ActivateAll(pids) walks /proc, identifies Python 3.12+ via libpython SONAME + ELF symbols, ptrace-attaches each, runs three remote calls (PyGILState_Ensure → PyRun_SimpleString(payload) → PyGILState_Release) inside a single ptrace session, detaches.
BPF profiler attaches; sampling runs.
Agent.Stop() → python.Manager.DeactivateAll(ctx) runs the same dance with the deactivate payload, capped at 5s total to prevent shutdown hangs on uninterruptible-sleep targets.

Activation is idempotent — continuous profiling pattern works without cleanup. perf-map files persist between runs (we don't delete them); blazesym reads them on demand for symbol resolution.

Lazy mmap-watcher hook

For -a mode, new exec events fire python.Manager.ActivateLate(pid). The hook is registered via PIDTracker.SetOnNewExec(fn) only when --inject-python is on; with the flag off, the producer does a single nil check and skips dispatch entirely (zero overhead).

Critical correctness fixes landed during review/testing

runtime.LockOSThread in runSequence: on Linux, only the OS thread that issued PTRACE_ATTACH may issue subsequent ptrace ops on the tracee. Without pinning, Go's scheduler could migrate the goroutine mid-sequence and the next ptrace op would fail with EPERM/ESRCH — or worse, target a different stopped tracee.
Deferred PtraceSetRegs before deferred detach: on error paths after a remote call, the target's PC was 0 (the SIGSEGV sentinel); without restoring registers before detach, the target would resume from PC=0 and immediately segfault. The deferred restore guarantees clean target state on every exit path.
/proc/<pid>/maps truncation fix: replaced 64KB fixed read with bufio.Scanner + 4MB max — Python apps with hundreds of mappings would silently miss the [stack] line.
Orig_rax = -1 in the call frame: when ptrace attaches mid-syscall (e.g. nanosleep), the kernel runs syscall-return processing on PtraceCont and clobbers RAX. Setting orig_rax = -1 disables this. (Standard pyrasite/py-spy/manhole workaround.)
Dropped _PyPerf_Callbacks pre-flight check: distributors strip internal symbols from production libpython, so this 'is the trampoline compiled in?' filter wrongly rejected stock Fedora/Ubuntu Python builds. We now rely on the runtime return value of PyRun_SimpleString.
Dropped preexisting-marker check: was breaking continuous profiling — Run 2+ would see the perf-map file from Run 1 and skip activation, leaving the trampoline OFF for the rest of the run.

Test plan

Unit

inject/elfsym/ — SONAME parsing + ELF symbol resolution against host libc.so.6.
inject/ptraceop/ — register frame setup for amd64 + arm64 (build-tagged), alignment validation, return-value extraction.
inject/python/ — detection ladder (synthetic /proc trees), Manager strict/lenient policy, dedupe under concurrency, bounded shutdown.
unwind/ehmaps/ — lazy new-exec hook (TestNewExec_NoSubscribersZeroDispatch, TestNewExec_HookFires).

Integration (caps-gated, verified locally)

TestInjectPython_ActivatesTrampoline — Python 3.14 launched WITHOUT -X perf, --inject-python activates trampoline, profile contains py::Thread.run, py::cpu_work, etc.; deactivation fires at profile end (perf-map size stops growing).
TestInjectPython_StrictFailsOnNonPython — Go binary target, perf-agent exits non-zero with not_python reason.

Cross-arch

GOOS=linux GOARCH=arm64 go build ./inject/ptraceop/... — clean.

Lint/vet

gofmt -l clean across all touched packages.
go vet clean (with CGO env for blazesym).
Race detector clean: go test -race ./inject/python/... ./unwind/ehmaps/....

Documentation

README.md — new section "Profiling running Python processes" with worked example.
docs/python-profiling.md — deeper dive: how it works, when injection is skipped (table), continuous profiling, container caveats, performance impact, troubleshooting.

Build / run

make generate
CGO_CFLAGS="..." CGO_LDFLAGS="-L $BLAZESYM/target/release -Wl,-rpath,$BLAZESYM/target/release -lblazesym_c" go build .
sudo setcap cap_perfmon,cap_bpf,cap_sys_admin,cap_sys_ptrace,cap_checkpoint_restore+ep ./perf-agent

# Profile a running Python web server for 30 seconds
./perf-agent --profile --pid $(pgrep -f gunicorn) --duration 30s --inject-python

Out of scope (deliberate non-goals for v1)

Python < 3.12 (no equivalent activation primitive).
CPython without --enable-perf-trampoline (skipped at activation time).
PID namespace translation for containers (host-side PIDs only; documented limitation).
Concurrent perf-agent runs against the same target (-X perf-launched targets will get deactivated on profile end — documented tradeoff for users who want -X perf always-on).

CPython 3.12+ ships an opt-in perf-compatible trampoline that emits JIT entries to /tmp/perf-<pid>.map. Today perf-agent users must restart targets with `python -X perf` to get Python frame names — hostile UX for production debugging. This spec proposes a v1 injector that uses ptrace to remotely call sys.activate_stack_trampoline("perf") on running Python 3.12+ processes when the user passes --profile --inject-python, and deactivates on shutdown so trampoline overhead doesn't persist past the profiling window. Decisions reached during brainstorming (full Q&A in spec): - 3.12+ only; older versions fall back to existing DWARF unwinding - Profiling-time flag (--profile --inject-python), not subcommand - Strict per-PID, lenient system-wide - Pure Go via golang.org/x/sys/unix - Both amd64 and arm64 in v1 - SONAME-or-exe detection + debug/elf for symbol resolution - Inject before BPF attach - Three remote calls per activation: PyGILState_Ensure → PyRun_SimpleString(payload) → PyGILState_Release; SIGSEGV-at-0 return sentinel; stack-with-mmap-fallback for payload bytes - Idempotency via /tmp/perf-PID.map preexisting check - 5s bounded shutdown deactivation

Move from flat inject/ (Python-only assumption) to a tree that splits language-agnostic primitives from language-specific logic: inject/ elfsym/ (shared — ELF symbol resolution, any compiled runtime) ptraceop/ (shared — low-level ptrace primitives, any C-ABI runtime) python/ (Python-specific: detector, manager, payload, errors) Future runtimes (Node.js, JVM, Ruby) plug in as siblings of python/ without churn in the shared layer. Some runtimes may not use ptrace at all — e.g., a Node.js injector via the V8 inspector protocol would sit alongside ptraceop/ without using it. The layout supports both. Public API change: import "inject/python"; python.NewManager(...) ptraceop has no dependency on python — it takes primitive SymbolAddrs to avoid an import cycle, which has the side benefit of making ptraceop trivially reusable for non-Python runtimes that need a brief remote-function-call sequence. Test paths and InjectStats accessor renamed to PythonInjectStats to reserve the namespace for future runtimes.

12 tasks from sentinel-error scaffolding through caps-gated integration tests: 1. errors + payload encoding (inject/python) 2. SONAME parsing (inject/elfsym/soname.go) 3. ELF symbol resolver (inject/elfsym/elfsym.go) 4. detection ladder (inject/python/detector.go) 5. ptraceop arch-generic + amd64 (inject/ptraceop) 6. ptraceop arm64 7. Manager + lifecycle + dedupe (inject/python/{python,manager}.go) 8. wire into perfagent.Agent + main.go --inject-python flag 9. mmap-watcher new-exec hook for late arrivals 10. bench harness flag 11. integration tests (caps-gated) 12. docs (README + docs/python-profiling.md) Each task is TDD-shaped with bite-sized steps. Tests exercise stubs for ptrace operations (real ptrace only in integration tests, since unit tests can't fake a target process safely). The plan flags one intentional placeholder: the haveCaps helper in Task 11 must be copied from existing integration_test.go conventions.

Hard requirement made explicit: producer must do zero work when no subscriber is registered. Two patterns documented (callback or Subscribe-on-demand channel). Added TestNewExec_NoSubscribersZeroDispatch to guard the invariant against refactors.

…V on error

…LowAddr

…utdown

Add --inject-python CLI flag that injects sys.activate_stack_trampoline('perf') into CPython 3.12+ targets via ptrace before BPF attach, so early samples carry Python frame names. Strict per-PID mode with --pid; lenient with -a. Validates that --inject-python requires --profile and CAP_SYS_PTRACE.

… profiling) Remove PreexistingMarkerCheck, defaultPreexistingMarkerCheck, SkippedPreexisting, and the preexisting field on trackedTarget. Always activate and always deactivate: the marker check conflated "trampoline active now" with "file left from prior run", causing Run 2+ in continuous profiling to silently skip activation. Remove ErrPreexisting (now unused). Update package doc to document the idempotent activation model and the -X perf tradeoff.

When ptrace attaches to a target mid-syscall (e.g. nanosleep), the kernel records the syscall number in orig_rax. On PtraceCont the kernel runs syscall-return processing, which clobbers RAX with -ERESTART or similar, overwriting our remote-call return value. The pyrasite/py-spy/manhole convention is to set orig_rax = -1 ('not in a syscall') in the call frame to disable that processing. Symptom: PyRun_SimpleString remote call returned 230 deterministically on a Python target attached during time.sleep(). With the fix, all three remote calls (PyGILState_Ensure → PyRun_SimpleString → PyGILState_Release) return their actual values cleanly. Verified end-to-end: integration test now produces a perf-map with py:: entries and a profile with Python qualnames (Thread.run, cpu_work, etc.) from a Python 3.14 target launched without -X perf.

The pre-flight check on _PyPerf_Callbacks was meant as a free 'is the trampoline compiled in?' filter, but it does not survive real-world packaging: distributors (Fedora, Ubuntu, ...) strip internal/private symbols from production libpython builds, so the symbol is absent from .dynsym/.symtab even when --enable-perf-trampoline was set at compile time. Result: detector wrongly skipped Python 3.12+ on standard distro builds with ErrNoPerfTrampoline. We now rely on the runtime return value of PyRun_SimpleString — if the trampoline isn't available, the call returns -1 and the manager surfaces ErrNoPerfTrampoline at activation time. Cost: one extra ptrace round-trip per genuinely-unsupported target during scan. Verified on Fedora's stock python3.14 (libpython3.14.so.1.0 with all internal symbols stripped) — detector now lets it through and activation succeeds.

requirePython312Plus now also runs sys.activate_stack_trampoline + deactivate_stack_trampoline at the command line to confirm the build actually supports the trampoline. Hosts where Python is 3.12+ but the build lacks --enable-perf-trampoline (rare, mostly custom builds) will skip cleanly instead of failing.

errcheck flagged unchecked defer f.Close() in inject/elfsym, inject/ptraceop, inject/python/detector. Wrap with the project's existing `defer func() { _ = f.Close() }()` idiom. staticcheck QF1006 in tracker_newexec_test: lift the break condition into the for header.

Copilot

Pull request overview

This PR adds a Python “perf trampoline” injector to perf-agent, enabling --profile --inject-python to activate/deactivate CPython 3.12+’s sys.activate_stack_trampoline('perf') on already-running processes via ptrace, with a system-wide late-process hook via the mmap watcher.

Changes:

Introduces a new inject/ package tree (elfsym, ptraceop, python) for detection + remote-call injection + lifecycle management.
Wires --inject-python through CLI → perfagent.Agent → dwarfagent/ehmaps to support late execs in -a mode.
Adds unit tests, integration tests, bench plumbing, and documentation for Python injection.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
unwind/ehmaps/tracker_newexec_test.go	Adds tests for lazy new-exec hook dispatch behavior
unwind/ehmaps/tracker.go	Adds `onNewExec` callback + dispatch on group-leader fork events
unwind/dwarfagent/hooks.go	Extends dwarfagent Hooks with `OnNewExec` callback
unwind/dwarfagent/common.go	Registers `OnNewExec` hook onto PIDTracker before Run starts
test/integration_inject_python_test.go	Adds caps-gated integration tests for Python injection behavior
perfagent/options.go	Adds config + option for `InjectPython`
perfagent/agent.go	Constructs python Manager, runs activate/deactivate, wires dwarf hooks, adds PID scan
main.go	Adds `--inject-python` flag and forwards option
inject/python/python.go	Defines Manager/Options/Stats and lifecycle entry points
inject/python/manager.go	Implements activation, skip/summary logic, and ESRCH detection
inject/python/manager_test.go	Unit tests for strict/lenient policy, dedupe, deadline behavior, ESRCH tolerance
inject/python/payload.go	Defines activate/deactivate payload bytes
inject/python/payload_test.go	Tests payload encoding/termination
inject/python/errors.go	Defines sentinel errors for detection/activation outcomes
inject/python/detector.go	/proc-based detector + ELF symbol resolution for libpython/exe
inject/python/detector_test.go	Unit tests for detector behavior with synthetic /proc trees
inject/ptraceop/ptraceop.go	Implements ptrace attach → 3-call sequence → restore/detach + stack discovery
inject/ptraceop/regs_amd64.go	amd64 call-frame setup / return extraction helpers
inject/ptraceop/regs_amd64_test.go	Unit tests for amd64 call-frame setup helpers
inject/ptraceop/regs_arm64.go	arm64 call-frame setup / return extraction helpers
inject/ptraceop/regs_arm64_test.go	Unit tests for arm64 call-frame setup helpers
inject/elfsym/soname.go	libpython SONAME parsing + 3.12+ gating helper
inject/elfsym/soname_test.go	Unit tests for SONAME parsing + version gating
inject/elfsym/elfsym.go	ELF symbol resolver (.dynsym first, .symtab fallback)
inject/elfsym/elfsym_test.go	Unit tests for ELF symbol resolution against host libc
docs/superpowers/specs/2026-04-28-python-perf-injector-design.md	Adds detailed injector design doc
docs/superpowers/plans/2026-04-28-python-perf-injector.md	Adds implementation plan doc
docs/python-profiling.md	Adds user documentation for Python injection
bench/internal/schema/schema.go	Records `inject_python` in bench output schema
bench/cmd/scenario/main.go	Adds `--inject-python` flag to bench harness and plumbs into scenarios
README.md	Adds README section describing profiling running Python processes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T00:23:18Z

+// DeactivateAll runs the bounded shutdown deactivation pass. Tolerates ESRCH
+// (process gone). Honors ctx cancellation AND the configured deactivation
+// deadline (5s default).
+func (m *Manager) DeactivateAll(ctx context.Context) {
+	deadline, cancel := context.WithTimeout(ctx, m.opts.DeactivateDeadline)
+	defer cancel()
+
+	snapshot := m.snapshotTracked()
+	for pid, tt := range snapshot {
+		select {
+		case <-deadline.Done():
+			m.log.Warn("python deactivate cancelled (deadline or ctx)",
+				"abandoned", len(snapshot)-int(m.stats.Deactivated.Load()))
+			return
+		default:
+		}
+		addrs := SymbolAddrsForTarget{
+			PyGILEnsure:  tt.target.PyGILEnsureAddr,
+			PyGILRelease: tt.target.PyGILReleaseAddr,
+			PyRunString:  tt.target.PyRunStringAddr,
+		}
+		if err := m.opts.Injector.RemoteDeactivate(pid, addrs); err != nil {
+			if isProcessGone(err) {
+				continue
+			}
+			m.log.Warn("python deactivate failed", "pid", pid, "err", err)
+			m.stats.DeactivateFailed.Add(1)
+			continue


DeactivateAll's timeout only gets checked between iterations; a single RemoteDeactivate call can still block indefinitely (e.g., ptrace attach/wait on an uninterruptible-sleep target), so this doesn't actually cap total shutdown time despite the comment. Consider running each RemoteDeactivate in a separate goroutine and returning once the deadline fires (even if a goroutine remains stuck), or extend the LowLevelInjector API to accept a context/deadline that can be enforced at the low level.

Copilot · 2026-04-30T00:23:19Z

+	if runErr != nil {
+		return fmt.Errorf("PyRun_SimpleString: %w", runErr)
+	}
+	if runResult != 0 {
+		return fmt.Errorf("PyRun_SimpleString returned non-zero: %d (likely activation refused at runtime)", runResult)
+	}


When PyRun_SimpleString returns non-zero, this returns a generic fmt.Errorf. The higher-level python.Manager expects to be able to classify this as ErrNoPerfTrampoline (and expose a structured skip reason / increment SkippedNoTramp), but ptraceop can't currently communicate this in a machine-readable way. Consider returning a typed error (or ptraceop-level sentinel) carrying the remote return code so the python bridge/Manager can translate it into python.ErrNoPerfTrampoline and treat it as a skip rather than a generic activation failure.

Copilot · 2026-04-30T00:23:19Z

+	addrs := SymbolAddrsForTarget{
+		PyGILEnsure:  target.PyGILEnsureAddr,
+		PyGILRelease: target.PyGILReleaseAddr,
+		PyRunString:  target.PyRunStringAddr,
+	}
+	if err := m.opts.Injector.RemoteActivate(pid, addrs); err != nil {
+		m.stats.ActivateFailed.Add(1)
+		m.log.Warn("python inject failed", "pid", pid, "err", err)
+		if m.opts.StrictPerPID {
+			return fmt.Errorf("activate pid=%d: %w", pid, err)
+		}
+		return nil


activateOne increments ActivateFailed for any RemoteActivate error, so failures due to missing perf-trampoline support (PyRun_SimpleString non-zero) won't be counted as SkippedNoTramp / reason=no_perf_trampoline even though ErrNoPerfTrampoline exists and recordSkipReason handles it. Once the low-level injector returns a typed/sentinel error for 'no perf trampoline', consider mapping it to python.ErrNoPerfTrampoline here and counting it as a skip (and, in lenient mode, not as a generic failure) so stats/reasons match the documented behavior.

Copilot · 2026-04-30T00:23:19Z

+	ctx, cancel := context.WithCancel(t.Context())
+	done := make(chan struct{})
+	go func() {
+		tracker.Run(ctx, w)
+		close(done)
+	}()
+
+	// Send a group-leader ForkEvent (pid == tid).
+	w.ch <- MmapEventRecord{Kind: ForkEvent, PID: 99, TID: 99}
+	// Give Run time to process the event.
+	time.Sleep(50 * time.Millisecond)
+	cancel()
+	<-done


TestNewExec_NoSubscribersZeroDispatch doesn't actually prove the ForkEvent was processed: even if Run never consumes from w.ch before the cancel, the test still passes because onNewExec remains nil. To make this a real guardrail, use Run's observer hook to signal when the ForkEvent is observed/handled (e.g., close a channel from an observer when ev.Kind==ForkEvent && ev.PID==99) and wait on that instead of time.Sleep.

scanForLibpython and scanForExeBase picked the first executable mapping as load_base. That assumes the executable PT_LOAD segment is the first one (file offset 0). Modern glibc (Ubuntu 24.04+, default for actions/setup-python's CPython 3.12 build) uses -Wl,-z,separate-code, which produces an ELF with a leading r-- LOAD segment (header + RELRO) at offset 0 and the r-x .text segment at a non-zero offset. Old logic returned start_of_text as load_base, off by the size of the first segment. Every absolute symbol address derived from it was wrong by that delta, so the remote call's RIP jumped into garbage; the target SEGVed mid-execution and we read its pre-SEGV RAX — which still held the kernel's -ERESTARTNOHAND (-514) from when ptrace attached during clock_nanosleep. Symptom: TestInjectPython_ActivatesTrampoline failed deterministically on GHA runners (Ubuntu 24.04, Python 3.12.13) with "PyRun_SimpleString returned non-zero: 18446744073709551102" while passing on Fedora hosts (Python 3.14, executable-first layout). Fix: filter mappings by file offset == 0 to anchor on the first PT_LOAD segment regardless of perms. Both static and dynamic detection paths get the same fix. Also: - ptraceop: validate post-call RIP == sentinel(0); if SIGSEGV fired at a non-zero address, the function crashed mid-execution — surface that with the actual fnAddr instead of returning whatever was in RAX. - python/manager: include libpython, load_base, py_run_string in the "python inject failed" warn — makes future regressions debuggable from a single CI log line. - detector_test: TestDetect_DynamicLinkedPython_SeparateCodeLayout covers the leading-r-- case directly.

Previous fix (filter mappings by file offset==0) gave the right mapping_start, but the detector still treated mapping_start as the load_bias. That works for PIE binaries with first-LOAD p_vaddr==0 (PIE convention) — and exclusively that case happens on Fedora local hosts, hence the false-pass locally. It is wrong for non-PIE ET_EXEC binaries: Ubuntu /usr/bin/python3.12 is a non-PIE executable with first-LOAD p_vaddr=0x400000 and absolute symbol values; adding mapping_start (0x400000) to an already-absolute sym.Value pushes RIP past the binary's last mapping → SIGSEGV. The right formula: load_bias = mapping_start - p_vaddr_of_first_LOAD abs_addr = load_bias + sym.Value For PIE p_vaddr==0 it collapses to mapping_start (current behavior, unchanged). For non-PIE p_vaddr=0x400000 it correctly resolves to 0. CI evidence: previous build's diagnostic showed loadbase=0x400000, fnAddr=0xab27d0, SIGSEGV at fnAddr — proving fnAddr was off the binary's executable text. With load_bias=0 we get fnAddr=sym.Value, which lands inside .text. Same ehmaps/LoadProcessMappings already does this for unwind tables; inject/python now matches the convention.

…as SkippedNoTramp ptraceop.RemoteActivate / RemoteDeactivate previously returned a generic fmt.Errorf when the called function (PyRun_SimpleString) completed the SIGSEGV-sentinel handshake but returned non-zero. The python.Manager had no machine-readable way to distinguish that "Python-level error" case from any other ptrace failure, so every such case incremented ActivateFailed even though commit 384b01a's design intent was to surface ErrNoPerfTrampoline at activation time after dropping the _PyPerf_Callbacks pre-flight. Wiring delivered: ptraceop: return *ErrRemoteCallNonZero{Op, Result} on non-zero return. ptraceop stays language-agnostic — it does not import inject/python and does not interpret the int. perfagent: ptraceopBridge translates *ErrRemoteCallNonZero with Result that sign-extends to int32(-1) into a wrap of python.ErrNoPerfTrampoline. The bridge is the only piece that knows both layers, which keeps the existing import boundary intact. python: activateOne checks errors.Is(err, ErrNoPerfTrampoline) and classifies via recordSkipReason → SkippedNoTramp + reason=no_perf_trampoline, mirroring detect-time skips. Strict mode wraps with the sentinel so callers can errors.Is. Also clarify NewManager's doc — Detector and Injector are not defaulted inside this package (which deliberately doesn't import ptraceop); they are caller-injected by perfagent.Agent. Calling activate/deactivate with either nil panics at first use, intentionally — a silent default would mask wiring mistakes. Rename TestPayloadsAreImmutable → TestActivatePayloadIsConsistent and clarify what it actually asserts (consistency between consecutive calls, not in-place mutation safety; the slice is intentionally shared). Tests: ptraceop: errors.As round-trip + Error() formatting for the typed err. python: SkippedNoTramp counted (not ActivateFailed) on lenient path; strict path returns wrap that errors.Is the sentinel.

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

bench/cmd/scenario/main.go:163

The new --inject-python bench flag is plumbed through function parameters (e.g., measureOnePID(..., injectPython bool)) but never used to alter profiler construction or hooks. As-is, the flag only changes the emitted JSON field and has no behavioral effect. Either wire it into the benchmarked code path (e.g., by constructing a perfagent.Agent with WithInjectPython / appropriate dwarf hooks), or remove the flag/parameters until it can be supported to avoid misleading results.

// measureOnePID times one NewProfilerWithMode(pid=...) call and the
// per-binary breakdown collected via OnCompile.
func measureOnePID(pid, runN int, unwind string, injectPython bool) schema.Run {
	var (
		mu      sync.Mutex
		entries []schema.Binary
	)
	hooks := &dwarfagent.Hooks{
		OnCompile: func(path, buildID string, ehFrameBytes int, dur time.Duration) {
			mu.Lock()
			defer mu.Unlock()
			entries = append(entries, schema.Binary{
				Path:         path,
				BuildID:      buildID,
				EhFrameBytes: ehFrameBytes,
				CompileMs:    float64(dur.Microseconds()) / 1000.0,
			})
		},
	}
	t0 := time.Now()
	prof, err := dwarfagent.NewProfilerWithMode(pid, false, []uint{0}, nil, 99, hooks, modeFromFlag(unwind))
	totalMs := float64(time.Since(t0).Microseconds()) / 1000.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…sserts Five separate review items folded into one commit (each independent within its file): * perfagent/agent.go: hasCapSysPtrace - Nil-check cap.GetProc() before dereferencing (mirrors the same guard in test/integration_test.go). - Check Permitted ∥ Effective. validate() runs before Start() promotes Permitted → Effective via SetFlag, so Effective-only rejected legitimate runs where the cap was granted via setcap or inherited but not yet promoted. * inject/python/python.go: NewManager - Default opts.Detector = NewDetector("/proc", logger) when nil (zero-cost, both live in this package — no new import). - Panic at construction with a clear message if opts.Injector is nil instead of letting the deeper NPE fire later in activateOne. inject/python deliberately does not import inject/ptraceop so Injector cannot be defaulted; surfacing the wiring mistake at NewManager-time is friendlier than at first activate. * bench/cmd/scenario, bench/internal/schema: drop --inject-python - The flag was plumbed through measureOnePID/measureSystemWide parameters and emitted into the JSON, but no codepath in the bench actually invokes Python injection — measure* constructs dwarfagent.NewProfilerWithMode directly instead of going through perfagent.Agent (which owns the python.Manager wiring). Reporting "with injection" results identical to "without" is misleading; drop the flag, parameter, and schema field. Re-add when the bench is reworked around perfagent.Agent. Comment in main.go records the intent. * test/integration_inject_python_test.go: TestInjectPython_* - TestInjectPython_ActivatesTrampoline: handle os.ReadFile error with t.Fatalf so a transient read failure produces a clear message instead of silently asserting against an empty buffer. - TestInjectPython_StrictFailsOnNonPython: drop the bare "python" substring fallback that would mask unrelated failures (missing CAP_SYS_PTRACE, validate-time flag rejection, etc.) and assert only on the structured "not python" reason tokens. Match case-insensitive because slog/fmt.Errorf chains differ in casing.

The committed architecture.excalidraw was the original sketch from the FP-only era — it still showed only profile/, offcpu/, cpu/, perf.bpf.c, offcpu.bpf.c, and cpu.bpf.c. Since then the codebase has grown: - unwind/dwarfagent/ (DWARF hybrid CPU + off-CPU walker, PRs #1–#11) - unwind/{ehcompile,ehmaps,procmap} (CFI compile, per-PID lifecycle, /proc resolver — PRs #3–#9) - internal/perfevent (per-CPU perf_event_open + AttachRawLink shared helper extracted from profile/dwarfagent — PR #13) - inject/{python,ptraceop,elfsym} (Python perf-trampoline activator via ptrace — PR #12) - internal/{nspid,k8slabels} (namespace-aware --pid + k8s pprof labels — PR #14) - bpf/perf_dwarf.bpf.c, bpf/offcpu_dwarf.bpf.c (DWARF kernel-side) The new file groups them into pre-profile setup (optional, dashed — nspid, k8slabels, inject), profilers (FP, DWARF, PMU), helpers (perfevent, ehcompile, ehmaps, procmap, pprof), and the four BPF programs in the kernel band, with arrows showing data + control flow. Output lands in *-on-cpu.pb.gz / *-off-cpu.pb.gz / PMU stdout-or-file. Layout was generated programmatically; open the file in https://excalidraw.com or the VS Code extension to fine-tune positions and colours. The README's ASCII version of the same diagram is unchanged — both exist intentionally so readers can grok the architecture without opening Excalidraw, and the .excalidraw is the editable source for future iterations.

dpsoft added 25 commits April 28, 2026 21:55

inject/python: sentinel errors and trampoline payload encoding

743243b

inject/elfsym: parse libpython SONAME and version-gate at 3.12+

0bdbf41

inject/elfsym: gofmt -w (post-review formatting fix)

3276f31

inject/elfsym: ELF symbol resolution with dynsym→symtab fallback

70f8932

inject/python: detection ladder (SONAME match + ELF symbol resolve)

0d1c17f

inject/python: wire detector debug logs and document first-match scan

e0ae714

inject/ptraceop: arch-generic remote-call sequence + amd64 frame setup

f8327b7

inject/ptraceop: restore registers via defer to prevent target SIGSEG…

83f7cce

…V on error

inject/ptraceop: pin to OS thread; fix /proc maps truncation in stack…

3677c38

…LowAddr

inject/ptraceop: arm64 frame setup (AAPCS64)

e4b696a

inject/python: Manager with strict/lenient policy, dedupe, bounded sh…

1de64a1

…utdown

perfagent: subscribe inject/python to mmap-watcher new-exec events

516b84b

bench: --inject-python flag + InjectPython field on Document

c4fda44

test: integration tests for --inject-python (caps-gated)

b72050d

docs: --inject-python README section + docs/python-profiling.md

2f2492c

dpsoft requested a review from Copilot April 30, 2026 00:18

dpsoft marked this pull request as ready for review April 30, 2026 00:18

Copilot started reviewing on behalf of dpsoft April 30, 2026 00:19 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

dpsoft requested a review from Copilot April 30, 2026 13:49

Copilot started reviewing on behalf of dpsoft April 30, 2026 13:50 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread inject/python/python.go

Comment thread perfagent/agent.go Outdated

Comment thread test/integration_inject_python_test.go Outdated

Comment thread test/integration_inject_python_test.go Outdated

dpsoft merged commit e657738 into main Apr 30, 2026
10 checks passed

dpsoft deleted the feat/python-perf-injector branch April 30, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inject): Python perf-trampoline injector via ptrace#12

feat(inject): Python perf-trampoline injector via ptrace#12
dpsoft merged 29 commits into
mainfrom
feat/python-perf-injector

dpsoft commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dpsoft commented Apr 29, 2026

Summary

Architecture

Lifecycle

Lazy mmap-watcher hook

Critical correctness fixes landed during review/testing

Test plan

Unit

Integration (caps-gated, verified locally)

Cross-arch

Lint/vet

Documentation

Build / run

Out of scope (deliberate non-goals for v1)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants