Skip to content

feat(inject): Python perf-trampoline injector via ptrace#12

Merged
dpsoft merged 29 commits into
mainfrom
feat/python-perf-injector
Apr 30, 2026
Merged

feat(inject): Python perf-trampoline injector via ptrace#12
dpsoft merged 29 commits into
mainfrom
feat/python-perf-injector

Conversation

@dpsoft
Copy link
Copy Markdown
Owner

@dpsoft dpsoft commented Apr 29, 2026

Summary

  • Adds perf-agent --profile --inject-python to activate CPython 3.12+'s perf trampoline (sys.activate_stack_trampoline('perf')) on a running target via ptrace, so users get Python JIT names without restarting the target with python -X perf.
  • Per-PID mode is strict (any failure exits non-zero); system-wide -a mode is lenient (failures logged + counted, profile continues).
  • Activation runs at profile start (before BPF attach), deactivation at profile end — so trampoline overhead does not persist past the profiling window.
  • Late-arriving Python processes during -a runs are picked up via a new lazy mmap-watcher hook.

Architecture

inject/
├── elfsym/        # SHARED — ELF symbol resolution + libpython SONAME parser (any runtime)
├── ptraceop/      # SHARED — ptrace primitives, three-call sequence, amd64 + arm64 frame setup
└── python/        # Python-specific: detector, manager, payload encoding

The split anticipates future runtime injectors (Node.js inspector protocol, JVM perf-map-agent) plugging in as siblings of python/ without touching the shared layer.

Lifecycle

  1. Agent.Start()python.Manager.ActivateAll(pids) walks /proc, identifies Python 3.12+ via libpython SONAME + ELF symbols, ptrace-attaches each, runs three remote calls (PyGILState_EnsurePyRun_SimpleString(payload)PyGILState_Release) inside a single ptrace session, detaches.
  2. BPF profiler attaches; sampling runs.
  3. Agent.Stop()python.Manager.DeactivateAll(ctx) runs the same dance with the deactivate payload, capped at 5s total to prevent shutdown hangs on uninterruptible-sleep targets.

Activation is idempotent — continuous profiling pattern works without cleanup. perf-map files persist between runs (we don't delete them); blazesym reads them on demand for symbol resolution.

Lazy mmap-watcher hook

For -a mode, new exec events fire python.Manager.ActivateLate(pid). The hook is registered via PIDTracker.SetOnNewExec(fn) only when --inject-python is on; with the flag off, the producer does a single nil check and skips dispatch entirely (zero overhead).

Critical correctness fixes landed during review/testing

  • runtime.LockOSThread in runSequence: on Linux, only the OS thread that issued PTRACE_ATTACH may issue subsequent ptrace ops on the tracee. Without pinning, Go's scheduler could migrate the goroutine mid-sequence and the next ptrace op would fail with EPERM/ESRCH — or worse, target a different stopped tracee.
  • Deferred PtraceSetRegs before deferred detach: on error paths after a remote call, the target's PC was 0 (the SIGSEGV sentinel); without restoring registers before detach, the target would resume from PC=0 and immediately segfault. The deferred restore guarantees clean target state on every exit path.
  • /proc/<pid>/maps truncation fix: replaced 64KB fixed read with bufio.Scanner + 4MB max — Python apps with hundreds of mappings would silently miss the [stack] line.
  • Orig_rax = -1 in the call frame: when ptrace attaches mid-syscall (e.g. nanosleep), the kernel runs syscall-return processing on PtraceCont and clobbers RAX. Setting orig_rax = -1 disables this. (Standard pyrasite/py-spy/manhole workaround.)
  • Dropped _PyPerf_Callbacks pre-flight check: distributors strip internal symbols from production libpython, so this 'is the trampoline compiled in?' filter wrongly rejected stock Fedora/Ubuntu Python builds. We now rely on the runtime return value of PyRun_SimpleString.
  • Dropped preexisting-marker check: was breaking continuous profiling — Run 2+ would see the perf-map file from Run 1 and skip activation, leaving the trampoline OFF for the rest of the run.

Test plan

Unit

  • inject/elfsym/ — SONAME parsing + ELF symbol resolution against host libc.so.6.
  • inject/ptraceop/ — register frame setup for amd64 + arm64 (build-tagged), alignment validation, return-value extraction.
  • inject/python/ — detection ladder (synthetic /proc trees), Manager strict/lenient policy, dedupe under concurrency, bounded shutdown.
  • unwind/ehmaps/ — lazy new-exec hook (TestNewExec_NoSubscribersZeroDispatch, TestNewExec_HookFires).

Integration (caps-gated, verified locally)

  • TestInjectPython_ActivatesTrampoline — Python 3.14 launched WITHOUT -X perf, --inject-python activates trampoline, profile contains py::Thread.run, py::cpu_work, etc.; deactivation fires at profile end (perf-map size stops growing).
  • TestInjectPython_StrictFailsOnNonPython — Go binary target, perf-agent exits non-zero with not_python reason.

Cross-arch

  • GOOS=linux GOARCH=arm64 go build ./inject/ptraceop/... — clean.

Lint/vet

  • gofmt -l clean across all touched packages.
  • go vet clean (with CGO env for blazesym).
  • Race detector clean: go test -race ./inject/python/... ./unwind/ehmaps/....

Documentation

  • README.md — new section "Profiling running Python processes" with worked example.
  • docs/python-profiling.md — deeper dive: how it works, when injection is skipped (table), continuous profiling, container caveats, performance impact, troubleshooting.

Build / run

make generate
CGO_CFLAGS="..." CGO_LDFLAGS="-L $BLAZESYM/target/release -Wl,-rpath,$BLAZESYM/target/release -lblazesym_c" go build .
sudo setcap cap_perfmon,cap_bpf,cap_sys_admin,cap_sys_ptrace,cap_checkpoint_restore+ep ./perf-agent

# Profile a running Python web server for 30 seconds
./perf-agent --profile --pid $(pgrep -f gunicorn) --duration 30s --inject-python

Out of scope (deliberate non-goals for v1)

  • Python < 3.12 (no equivalent activation primitive).
  • CPython without --enable-perf-trampoline (skipped at activation time).
  • PID namespace translation for containers (host-side PIDs only; documented limitation).
  • Concurrent perf-agent runs against the same target (-X perf-launched targets will get deactivated on profile end — documented tradeoff for users who want -X perf always-on).

dpsoft added 25 commits April 28, 2026 21:55
CPython 3.12+ ships an opt-in perf-compatible trampoline that emits
JIT entries to /tmp/perf-<pid>.map. Today perf-agent users must
restart targets with `python -X perf` to get Python frame names —
hostile UX for production debugging.

This spec proposes a v1 injector that uses ptrace to remotely call
sys.activate_stack_trampoline("perf") on running Python 3.12+
processes when the user passes --profile --inject-python, and
deactivates on shutdown so trampoline overhead doesn't persist
past the profiling window.

Decisions reached during brainstorming (full Q&A in spec):
- 3.12+ only; older versions fall back to existing DWARF unwinding
- Profiling-time flag (--profile --inject-python), not subcommand
- Strict per-PID, lenient system-wide
- Pure Go via golang.org/x/sys/unix
- Both amd64 and arm64 in v1
- SONAME-or-exe detection + debug/elf for symbol resolution
- Inject before BPF attach
- Three remote calls per activation: PyGILState_Ensure →
  PyRun_SimpleString(payload) → PyGILState_Release; SIGSEGV-at-0
  return sentinel; stack-with-mmap-fallback for payload bytes
- Idempotency via /tmp/perf-PID.map preexisting check
- 5s bounded shutdown deactivation
Move from flat inject/ (Python-only assumption) to a tree that splits
language-agnostic primitives from language-specific logic:

  inject/
    elfsym/   (shared — ELF symbol resolution, any compiled runtime)
    ptraceop/ (shared — low-level ptrace primitives, any C-ABI runtime)
    python/   (Python-specific: detector, manager, payload, errors)

Future runtimes (Node.js, JVM, Ruby) plug in as siblings of python/
without churn in the shared layer. Some runtimes may not use ptrace
at all — e.g., a Node.js injector via the V8 inspector protocol would
sit alongside ptraceop/ without using it. The layout supports both.

Public API change: import "inject/python"; python.NewManager(...)
ptraceop has no dependency on python — it takes primitive SymbolAddrs
to avoid an import cycle, which has the side benefit of making
ptraceop trivially reusable for non-Python runtimes that need a
brief remote-function-call sequence.

Test paths and InjectStats accessor renamed to PythonInjectStats
to reserve the namespace for future runtimes.
12 tasks from sentinel-error scaffolding through caps-gated integration
tests:

  1. errors + payload encoding (inject/python)
  2. SONAME parsing (inject/elfsym/soname.go)
  3. ELF symbol resolver (inject/elfsym/elfsym.go)
  4. detection ladder (inject/python/detector.go)
  5. ptraceop arch-generic + amd64 (inject/ptraceop)
  6. ptraceop arm64
  7. Manager + lifecycle + dedupe (inject/python/{python,manager}.go)
  8. wire into perfagent.Agent + main.go --inject-python flag
  9. mmap-watcher new-exec hook for late arrivals
  10. bench harness flag
  11. integration tests (caps-gated)
  12. docs (README + docs/python-profiling.md)

Each task is TDD-shaped with bite-sized steps. Tests exercise stubs
for ptrace operations (real ptrace only in integration tests, since
unit tests can't fake a target process safely). The plan flags one
intentional placeholder: the haveCaps helper in Task 11 must be
copied from existing integration_test.go conventions.
Hard requirement made explicit: producer must do zero work when no
subscriber is registered. Two patterns documented (callback or
Subscribe-on-demand channel). Added TestNewExec_NoSubscribersZeroDispatch
to guard the invariant against refactors.
Add --inject-python CLI flag that injects sys.activate_stack_trampoline('perf')
into CPython 3.12+ targets via ptrace before BPF attach, so early samples carry
Python frame names. Strict per-PID mode with --pid; lenient with -a. Validates
that --inject-python requires --profile and CAP_SYS_PTRACE.
… profiling)

Remove PreexistingMarkerCheck, defaultPreexistingMarkerCheck, SkippedPreexisting,
and the preexisting field on trackedTarget. Always activate and always deactivate:
the marker check conflated "trampoline active now" with "file left from prior run",
causing Run 2+ in continuous profiling to silently skip activation.

Remove ErrPreexisting (now unused). Update package doc to document the idempotent
activation model and the -X perf tradeoff.
When ptrace attaches to a target mid-syscall (e.g. nanosleep), the kernel
records the syscall number in orig_rax. On PtraceCont the kernel runs
syscall-return processing, which clobbers RAX with -ERESTART or similar,
overwriting our remote-call return value. The pyrasite/py-spy/manhole
convention is to set orig_rax = -1 ('not in a syscall') in the call
frame to disable that processing.

Symptom: PyRun_SimpleString remote call returned 230 deterministically
on a Python target attached during time.sleep(). With the fix, all three
remote calls (PyGILState_Ensure → PyRun_SimpleString → PyGILState_Release)
return their actual values cleanly.

Verified end-to-end: integration test now produces a perf-map with
py:: entries and a profile with Python qualnames (Thread.run,
cpu_work, etc.) from a Python 3.14 target launched without -X perf.
The pre-flight check on _PyPerf_Callbacks was meant as a free
'is the trampoline compiled in?' filter, but it does not survive
real-world packaging: distributors (Fedora, Ubuntu, ...) strip
internal/private symbols from production libpython builds, so the
symbol is absent from .dynsym/.symtab even when --enable-perf-trampoline
was set at compile time. Result: detector wrongly skipped Python 3.12+
on standard distro builds with ErrNoPerfTrampoline.

We now rely on the runtime return value of PyRun_SimpleString — if
the trampoline isn't available, the call returns -1 and the manager
surfaces ErrNoPerfTrampoline at activation time. Cost: one extra
ptrace round-trip per genuinely-unsupported target during scan.

Verified on Fedora's stock python3.14 (libpython3.14.so.1.0 with
all internal symbols stripped) — detector now lets it through and
activation succeeds.
requirePython312Plus now also runs sys.activate_stack_trampoline +
deactivate_stack_trampoline at the command line to confirm the build
actually supports the trampoline. Hosts where Python is 3.12+ but
the build lacks --enable-perf-trampoline (rare, mostly custom builds)
will skip cleanly instead of failing.
errcheck flagged unchecked defer f.Close() in inject/elfsym,
inject/ptraceop, inject/python/detector. Wrap with the project's
existing `defer func() { _ = f.Close() }()` idiom.

staticcheck QF1006 in tracker_newexec_test: lift the break condition
into the for header.
@dpsoft dpsoft requested a review from Copilot April 30, 2026 00:18
@dpsoft dpsoft marked this pull request as ready for review April 30, 2026 00:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Python “perf trampoline” injector to perf-agent, enabling --profile --inject-python to activate/deactivate CPython 3.12+’s sys.activate_stack_trampoline('perf') on already-running processes via ptrace, with a system-wide late-process hook via the mmap watcher.

Changes:

  • Introduces a new inject/ package tree (elfsym, ptraceop, python) for detection + remote-call injection + lifecycle management.
  • Wires --inject-python through CLI → perfagent.Agentdwarfagent/ehmaps to support late execs in -a mode.
  • Adds unit tests, integration tests, bench plumbing, and documentation for Python injection.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
unwind/ehmaps/tracker_newexec_test.go Adds tests for lazy new-exec hook dispatch behavior
unwind/ehmaps/tracker.go Adds onNewExec callback + dispatch on group-leader fork events
unwind/dwarfagent/hooks.go Extends dwarfagent Hooks with OnNewExec callback
unwind/dwarfagent/common.go Registers OnNewExec hook onto PIDTracker before Run starts
test/integration_inject_python_test.go Adds caps-gated integration tests for Python injection behavior
perfagent/options.go Adds config + option for InjectPython
perfagent/agent.go Constructs python Manager, runs activate/deactivate, wires dwarf hooks, adds PID scan
main.go Adds --inject-python flag and forwards option
inject/python/python.go Defines Manager/Options/Stats and lifecycle entry points
inject/python/manager.go Implements activation, skip/summary logic, and ESRCH detection
inject/python/manager_test.go Unit tests for strict/lenient policy, dedupe, deadline behavior, ESRCH tolerance
inject/python/payload.go Defines activate/deactivate payload bytes
inject/python/payload_test.go Tests payload encoding/termination
inject/python/errors.go Defines sentinel errors for detection/activation outcomes
inject/python/detector.go /proc-based detector + ELF symbol resolution for libpython/exe
inject/python/detector_test.go Unit tests for detector behavior with synthetic /proc trees
inject/ptraceop/ptraceop.go Implements ptrace attach → 3-call sequence → restore/detach + stack discovery
inject/ptraceop/regs_amd64.go amd64 call-frame setup / return extraction helpers
inject/ptraceop/regs_amd64_test.go Unit tests for amd64 call-frame setup helpers
inject/ptraceop/regs_arm64.go arm64 call-frame setup / return extraction helpers
inject/ptraceop/regs_arm64_test.go Unit tests for arm64 call-frame setup helpers
inject/elfsym/soname.go libpython SONAME parsing + 3.12+ gating helper
inject/elfsym/soname_test.go Unit tests for SONAME parsing + version gating
inject/elfsym/elfsym.go ELF symbol resolver (.dynsym first, .symtab fallback)
inject/elfsym/elfsym_test.go Unit tests for ELF symbol resolution against host libc
docs/superpowers/specs/2026-04-28-python-perf-injector-design.md Adds detailed injector design doc
docs/superpowers/plans/2026-04-28-python-perf-injector.md Adds implementation plan doc
docs/python-profiling.md Adds user documentation for Python injection
bench/internal/schema/schema.go Records inject_python in bench output schema
bench/cmd/scenario/main.go Adds --inject-python flag to bench harness and plumbs into scenarios
README.md Adds README section describing profiling running Python processes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread inject/python/python.go Outdated
Comment thread inject/python/python.go
Comment on lines +139 to +166
// DeactivateAll runs the bounded shutdown deactivation pass. Tolerates ESRCH
// (process gone). Honors ctx cancellation AND the configured deactivation
// deadline (5s default).
func (m *Manager) DeactivateAll(ctx context.Context) {
deadline, cancel := context.WithTimeout(ctx, m.opts.DeactivateDeadline)
defer cancel()

snapshot := m.snapshotTracked()
for pid, tt := range snapshot {
select {
case <-deadline.Done():
m.log.Warn("python deactivate cancelled (deadline or ctx)",
"abandoned", len(snapshot)-int(m.stats.Deactivated.Load()))
return
default:
}
addrs := SymbolAddrsForTarget{
PyGILEnsure: tt.target.PyGILEnsureAddr,
PyGILRelease: tt.target.PyGILReleaseAddr,
PyRunString: tt.target.PyRunStringAddr,
}
if err := m.opts.Injector.RemoteDeactivate(pid, addrs); err != nil {
if isProcessGone(err) {
continue
}
m.log.Warn("python deactivate failed", "pid", pid, "err", err)
m.stats.DeactivateFailed.Add(1)
continue
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeactivateAll's timeout only gets checked between iterations; a single RemoteDeactivate call can still block indefinitely (e.g., ptrace attach/wait on an uninterruptible-sleep target), so this doesn't actually cap total shutdown time despite the comment. Consider running each RemoteDeactivate in a separate goroutine and returning once the deadline fires (even if a goroutine remains stuck), or extend the LowLevelInjector API to accept a context/deadline that can be enforced at the low level.

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +156
if runErr != nil {
return fmt.Errorf("PyRun_SimpleString: %w", runErr)
}
if runResult != 0 {
return fmt.Errorf("PyRun_SimpleString returned non-zero: %d (likely activation refused at runtime)", runResult)
}
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When PyRun_SimpleString returns non-zero, this returns a generic fmt.Errorf. The higher-level python.Manager expects to be able to classify this as ErrNoPerfTrampoline (and expose a structured skip reason / increment SkippedNoTramp), but ptraceop can't currently communicate this in a machine-readable way. Consider returning a typed error (or ptraceop-level sentinel) carrying the remote return code so the python bridge/Manager can translate it into python.ErrNoPerfTrampoline and treat it as a skip rather than a generic activation failure.

Copilot uses AI. Check for mistakes.
Comment thread inject/python/manager.go
Comment on lines +24 to +35
addrs := SymbolAddrsForTarget{
PyGILEnsure: target.PyGILEnsureAddr,
PyGILRelease: target.PyGILReleaseAddr,
PyRunString: target.PyRunStringAddr,
}
if err := m.opts.Injector.RemoteActivate(pid, addrs); err != nil {
m.stats.ActivateFailed.Add(1)
m.log.Warn("python inject failed", "pid", pid, "err", err)
if m.opts.StrictPerPID {
return fmt.Errorf("activate pid=%d: %w", pid, err)
}
return nil
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activateOne increments ActivateFailed for any RemoteActivate error, so failures due to missing perf-trampoline support (PyRun_SimpleString non-zero) won't be counted as SkippedNoTramp / reason=no_perf_trampoline even though ErrNoPerfTrampoline exists and recordSkipReason handles it. Once the low-level injector returns a typed/sentinel error for 'no perf trampoline', consider mapping it to python.ErrNoPerfTrampoline here and counting it as a skip (and, in lenient mode, not as a generic failure) so stats/reasons match the documented behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +36
ctx, cancel := context.WithCancel(t.Context())
done := make(chan struct{})
go func() {
tracker.Run(ctx, w)
close(done)
}()

// Send a group-leader ForkEvent (pid == tid).
w.ch <- MmapEventRecord{Kind: ForkEvent, PID: 99, TID: 99}
// Give Run time to process the event.
time.Sleep(50 * time.Millisecond)
cancel()
<-done
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestNewExec_NoSubscribersZeroDispatch doesn't actually prove the ForkEvent was processed: even if Run never consumes from w.ch before the cancel, the test still passes because onNewExec remains nil. To make this a real guardrail, use Run's observer hook to signal when the ForkEvent is observed/handled (e.g., close a channel from an observer when ev.Kind==ForkEvent && ev.PID==99) and wait on that instead of time.Sleep.

Copilot uses AI. Check for mistakes.
Comment thread inject/python/payload_test.go Outdated
scanForLibpython and scanForExeBase picked the first executable mapping
as load_base. That assumes the executable PT_LOAD segment is the first
one (file offset 0). Modern glibc (Ubuntu 24.04+, default for
actions/setup-python's CPython 3.12 build) uses -Wl,-z,separate-code,
which produces an ELF with a leading r-- LOAD segment (header + RELRO)
at offset 0 and the r-x .text segment at a non-zero offset. Old logic
returned start_of_text as load_base, off by the size of the first
segment. Every absolute symbol address derived from it was wrong by
that delta, so the remote call's RIP jumped into garbage; the target
SEGVed mid-execution and we read its pre-SEGV RAX — which still held
the kernel's -ERESTARTNOHAND (-514) from when ptrace attached during
clock_nanosleep.

Symptom: TestInjectPython_ActivatesTrampoline failed deterministically
on GHA runners (Ubuntu 24.04, Python 3.12.13) with
"PyRun_SimpleString returned non-zero: 18446744073709551102" while
passing on Fedora hosts (Python 3.14, executable-first layout).

Fix: filter mappings by file offset == 0 to anchor on the first
PT_LOAD segment regardless of perms. Both static and dynamic detection
paths get the same fix.

Also:
- ptraceop: validate post-call RIP == sentinel(0); if SIGSEGV fired at
  a non-zero address, the function crashed mid-execution — surface that
  with the actual fnAddr instead of returning whatever was in RAX.
- python/manager: include libpython, load_base, py_run_string in the
  "python inject failed" warn — makes future regressions debuggable
  from a single CI log line.
- detector_test: TestDetect_DynamicLinkedPython_SeparateCodeLayout
  covers the leading-r-- case directly.
Previous fix (filter mappings by file offset==0) gave the right
mapping_start, but the detector still treated mapping_start as the
load_bias. That works for PIE binaries with first-LOAD p_vaddr==0
(PIE convention) — and exclusively that case happens on Fedora local
hosts, hence the false-pass locally. It is wrong for non-PIE ET_EXEC
binaries: Ubuntu /usr/bin/python3.12 is a non-PIE executable with
first-LOAD p_vaddr=0x400000 and absolute symbol values; adding
mapping_start (0x400000) to an already-absolute sym.Value pushes RIP
past the binary's last mapping → SIGSEGV.

The right formula:

    load_bias = mapping_start - p_vaddr_of_first_LOAD
    abs_addr  = load_bias + sym.Value

For PIE p_vaddr==0 it collapses to mapping_start (current behavior,
unchanged). For non-PIE p_vaddr=0x400000 it correctly resolves to 0.

CI evidence: previous build's diagnostic showed
loadbase=0x400000, fnAddr=0xab27d0, SIGSEGV at fnAddr — proving fnAddr
was off the binary's executable text. With load_bias=0 we get
fnAddr=sym.Value, which lands inside .text.

Same ehmaps/LoadProcessMappings already does this for unwind tables;
inject/python now matches the convention.
…as SkippedNoTramp

ptraceop.RemoteActivate / RemoteDeactivate previously returned a generic
fmt.Errorf when the called function (PyRun_SimpleString) completed the
SIGSEGV-sentinel handshake but returned non-zero. The python.Manager had
no machine-readable way to distinguish that "Python-level error" case
from any other ptrace failure, so every such case incremented
ActivateFailed even though commit 384b01a's design intent was to
surface ErrNoPerfTrampoline at activation time after dropping the
_PyPerf_Callbacks pre-flight.

Wiring delivered:
  ptraceop:  return *ErrRemoteCallNonZero{Op, Result} on non-zero return.
             ptraceop stays language-agnostic — it does not import
             inject/python and does not interpret the int.
  perfagent: ptraceopBridge translates *ErrRemoteCallNonZero with Result
             that sign-extends to int32(-1) into a wrap of
             python.ErrNoPerfTrampoline. The bridge is the only piece
             that knows both layers, which keeps the existing import
             boundary intact.
  python:    activateOne checks errors.Is(err, ErrNoPerfTrampoline) and
             classifies via recordSkipReason → SkippedNoTramp +
             reason=no_perf_trampoline, mirroring detect-time skips.
             Strict mode wraps with the sentinel so callers can errors.Is.

Also clarify NewManager's doc — Detector and Injector are not defaulted
inside this package (which deliberately doesn't import ptraceop); they
are caller-injected by perfagent.Agent. Calling activate/deactivate
with either nil panics at first use, intentionally — a silent default
would mask wiring mistakes.

Rename TestPayloadsAreImmutable → TestActivatePayloadIsConsistent and
clarify what it actually asserts (consistency between consecutive calls,
not in-place mutation safety; the slice is intentionally shared).

Tests:
  ptraceop:  errors.As round-trip + Error() formatting for the typed err.
  python:    SkippedNoTramp counted (not ActivateFailed) on lenient path;
             strict path returns wrap that errors.Is the sentinel.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

bench/cmd/scenario/main.go:163

  • The new --inject-python bench flag is plumbed through function parameters (e.g., measureOnePID(..., injectPython bool)) but never used to alter profiler construction or hooks. As-is, the flag only changes the emitted JSON field and has no behavioral effect. Either wire it into the benchmarked code path (e.g., by constructing a perfagent.Agent with WithInjectPython / appropriate dwarf hooks), or remove the flag/parameters until it can be supported to avoid misleading results.
// measureOnePID times one NewProfilerWithMode(pid=...) call and the
// per-binary breakdown collected via OnCompile.
func measureOnePID(pid, runN int, unwind string, injectPython bool) schema.Run {
	var (
		mu      sync.Mutex
		entries []schema.Binary
	)
	hooks := &dwarfagent.Hooks{
		OnCompile: func(path, buildID string, ehFrameBytes int, dur time.Duration) {
			mu.Lock()
			defer mu.Unlock()
			entries = append(entries, schema.Binary{
				Path:         path,
				BuildID:      buildID,
				EhFrameBytes: ehFrameBytes,
				CompileMs:    float64(dur.Microseconds()) / 1000.0,
			})
		},
	}
	t0 := time.Now()
	prof, err := dwarfagent.NewProfilerWithMode(pid, false, []uint{0}, nil, 99, hooks, modeFromFlag(unwind))
	totalMs := float64(time.Since(t0).Microseconds()) / 1000.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread inject/python/python.go
Comment thread perfagent/agent.go Outdated
Comment thread test/integration_inject_python_test.go Outdated
Comment thread test/integration_inject_python_test.go Outdated
…sserts

Five separate review items folded into one commit (each independent
within its file):

* perfagent/agent.go: hasCapSysPtrace
  - Nil-check cap.GetProc() before dereferencing (mirrors the same
    guard in test/integration_test.go).
  - Check Permitted ∥ Effective. validate() runs before Start()
    promotes Permitted → Effective via SetFlag, so Effective-only
    rejected legitimate runs where the cap was granted via setcap or
    inherited but not yet promoted.

* inject/python/python.go: NewManager
  - Default opts.Detector = NewDetector("/proc", logger) when nil
    (zero-cost, both live in this package — no new import).
  - Panic at construction with a clear message if opts.Injector is
    nil instead of letting the deeper NPE fire later in activateOne.
    inject/python deliberately does not import inject/ptraceop so
    Injector cannot be defaulted; surfacing the wiring mistake at
    NewManager-time is friendlier than at first activate.

* bench/cmd/scenario, bench/internal/schema: drop --inject-python
  - The flag was plumbed through measureOnePID/measureSystemWide
    parameters and emitted into the JSON, but no codepath in the bench
    actually invokes Python injection — measure* constructs
    dwarfagent.NewProfilerWithMode directly instead of going through
    perfagent.Agent (which owns the python.Manager wiring). Reporting
    "with injection" results identical to "without" is misleading;
    drop the flag, parameter, and schema field. Re-add when the bench
    is reworked around perfagent.Agent. Comment in main.go records
    the intent.

* test/integration_inject_python_test.go: TestInjectPython_*
  - TestInjectPython_ActivatesTrampoline: handle os.ReadFile error with
    t.Fatalf so a transient read failure produces a clear message
    instead of silently asserting against an empty buffer.
  - TestInjectPython_StrictFailsOnNonPython: drop the bare "python"
    substring fallback that would mask unrelated failures (missing
    CAP_SYS_PTRACE, validate-time flag rejection, etc.) and assert
    only on the structured "not python" reason tokens. Match
    case-insensitive because slog/fmt.Errorf chains differ in casing.
@dpsoft dpsoft merged commit e657738 into main Apr 30, 2026
10 checks passed
@dpsoft dpsoft deleted the feat/python-perf-injector branch April 30, 2026 14:24
dpsoft added a commit that referenced this pull request May 3, 2026
The committed architecture.excalidraw was the original sketch from the
FP-only era — it still showed only profile/, offcpu/, cpu/, perf.bpf.c,
offcpu.bpf.c, and cpu.bpf.c. Since then the codebase has grown:

- unwind/dwarfagent/ (DWARF hybrid CPU + off-CPU walker, PRs #1#11)
- unwind/{ehcompile,ehmaps,procmap} (CFI compile, per-PID lifecycle,
  /proc resolver — PRs #3#9)
- internal/perfevent (per-CPU perf_event_open + AttachRawLink shared
  helper extracted from profile/dwarfagent — PR #13)
- inject/{python,ptraceop,elfsym} (Python perf-trampoline activator
  via ptrace — PR #12)
- internal/{nspid,k8slabels} (namespace-aware --pid + k8s pprof labels
  — PR #14)
- bpf/perf_dwarf.bpf.c, bpf/offcpu_dwarf.bpf.c (DWARF kernel-side)

The new file groups them into pre-profile setup (optional, dashed —
nspid, k8slabels, inject), profilers (FP, DWARF, PMU), helpers
(perfevent, ehcompile, ehmaps, procmap, pprof), and the four BPF
programs in the kernel band, with arrows showing data + control flow.
Output lands in *-on-cpu.pb.gz / *-off-cpu.pb.gz / PMU stdout-or-file.

Layout was generated programmatically; open the file in
https://excalidraw.com or the VS Code extension to fine-tune
positions and colours.

The README's ASCII version of the same diagram is unchanged — both
exist intentionally so readers can grok the architecture without
opening Excalidraw, and the .excalidraw is the editable source for
future iterations.
dpsoft added a commit that referenced this pull request May 3, 2026
The committed architecture.excalidraw was the original sketch from the
FP-only era — it still showed only profile/, offcpu/, cpu/, perf.bpf.c,
offcpu.bpf.c, and cpu.bpf.c. Since then the codebase has grown:

- unwind/dwarfagent/ (DWARF hybrid CPU + off-CPU walker, PRs #1#11)
- unwind/{ehcompile,ehmaps,procmap} (CFI compile, per-PID lifecycle,
  /proc resolver — PRs #3#9)
- internal/perfevent (per-CPU perf_event_open + AttachRawLink shared
  helper extracted from profile/dwarfagent — PR #13)
- inject/{python,ptraceop,elfsym} (Python perf-trampoline activator
  via ptrace — PR #12)
- internal/{nspid,k8slabels} (namespace-aware --pid + k8s pprof labels
  — PR #14)
- bpf/perf_dwarf.bpf.c, bpf/offcpu_dwarf.bpf.c (DWARF kernel-side)

The new file groups them into pre-profile setup (optional, dashed —
nspid, k8slabels, inject), profilers (FP, DWARF, PMU), helpers
(perfevent, ehcompile, ehmaps, procmap, pprof), and the four BPF
programs in the kernel band, with arrows showing data + control flow.
Output lands in *-on-cpu.pb.gz / *-off-cpu.pb.gz / PMU stdout-or-file.

Layout was generated programmatically; open the file in
https://excalidraw.com or the VS Code extension to fine-tune
positions and colours.

The README's ASCII version of the same diagram is unchanged — both
exist intentionally so readers can grok the architecture without
opening Excalidraw, and the .excalidraw is the editable source for
future iterations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants