Preserve file, line, module, and inline frames in pprof output#8
Merged
Conversation
- pprof.ProfileSample.Stack becomes []Frame. Locations are keyed by (Name, File, Module, Line); functions by (Name, File, Module). Same function at different source lines now produces distinct locations instead of silently collapsing to the first-seen line. - Add perf-map decoder for Python (py::name:/path.py) and Node (LazyCompile:/Function:/JS: prefixes, including modern Node v16+ JS:<tier><name> format). Populates Function.Filename from the perf-map path rather than leaving it packed into Name. - Profile and offcpu collectors use SymbolizerWithInlinedFns(true) and expand blazesym.Sym.Inlined into leaf-first Frames via blazeSymToFrames. Recovers user code that LLVM inlined into a visible outer frame — verified on Rust release builds (cpu_intensive_work now appears). - Add regression tests for line-dedup and module-dedup, and decoder tests for Python / Node perf-map formats. - docs/dwarf-unwinding-design.md captures follow-up scope for the frame-pointer limitation that remains beyond this change.
There was a problem hiding this comment.
Pull request overview
This PR refactors the internal profiling pipeline to preserve richer stack-frame metadata (file/module/line and inline frames) in generated pprof output, improving accuracy for symbolization and attribution across native + perf-map based runtimes.
Changes:
- Replace
ProfileSample.Stackfrom[]stringto[]Frameand key pprof Locations/Functions by(Name, File, Module, Line)/(Name, File, Module)respectively. - Enable blazesym inline-frame expansion (
SymbolizerWithInlinedFns(true)) and expandblazesym.Sym.Inlinedinto multipleFrames. - Add perf-map frame decoding for Python and Node formats, plus regression/tests for line/module dedup and decoder behavior; add a DWARF unwinding design doc for follow-up work.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
profile/profiler.go |
Switch CPU collector stack representation to []pprof.Frame, enable inline symbolization, and expand inline chains into frames. |
offcpu/profiler.go |
Same as CPU collector: adopt []pprof.Frame, enable inline symbolization, expand inline chains. |
pprof/pprof.go |
Introduce Frame, update sample/building logic to preserve file/module/line and decode perf-map frames; update dedup keys. |
pprof/pprof_test.go |
Update existing tests for Frame stacks and add decoder + regression tests for dedup/line/module behavior. |
docs/dwarf-unwinding-design.md |
Document deferred design/options for future DWARF-based unwinding improvements beyond frame-pointer walking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+190
to
+192
| func (f Frame) functionKey() frameKey { | ||
| return frameKey{Name: f.Name, File: f.File, Module: f.Module} | ||
| } |
Comment on lines
+53
to
+57
| func blazeSymToFrames(s blazesym.Sym) []pprof.Frame { | ||
| out := make([]pprof.Frame, 0, 1+len(s.Inlined)) | ||
| for i := len(s.Inlined) - 1; i >= 0; i-- { | ||
| in := s.Inlined[i] | ||
| f := pprof.Frame{Name: in.Name, Module: s.Module} |
Comment on lines
+35
to
+53
| // blazeSymToFrames mirrors the converter in profile/profiler.go: expands | ||
| // inlined call chains into separate leaf-first Frames. | ||
| func blazeSymToFrames(s blazesym.Sym) []pprof.Frame { | ||
| out := make([]pprof.Frame, 0, 1+len(s.Inlined)) | ||
| for i := len(s.Inlined) - 1; i >= 0; i-- { | ||
| in := s.Inlined[i] | ||
| f := pprof.Frame{Name: in.Name, Module: s.Module} | ||
| if in.CodeInfo != nil { | ||
| f.File = in.CodeInfo.File | ||
| f.Line = in.CodeInfo.Line | ||
| } | ||
| out = append(out, f) | ||
| } | ||
| outer := pprof.Frame{Name: s.Name, Module: s.Module} | ||
| if s.CodeInfo != nil { | ||
| outer.File = s.CodeInfo.File | ||
| outer.Line = s.CodeInfo.Line | ||
| } | ||
| out = append(out, outer) |
golangci-lint's unused linter flagged the standalone functionKey struct as dead code. The method Frame.functionKey() returns frameKey (the same shape as locationKey for the relevant subset), so the parallel struct type was never referenced. Drop the type, keep the method.
dpsoft
added a commit
that referenced
this pull request
Apr 25, 2026
Conflict in pprof/pprof.go between PR #8's pre-S5 method-based dedup (f.locationKey() / f.functionKey() returning frameKey) and S9's struct-based intern keys + sentinels + looksJIT helper. Resolution: keep S9. The methods are dead code in S9's world — addLocation now constructs locationKey/locationFallbackKey/functionKey structs directly, never calls Frame.locationKey() or Frame.functionKey(). go build, pprof unit tests, and procmap unit tests all pass after the merge.
dpsoft
added a commit
that referenced
this pull request
May 21, 2026
#25) * fix(kernel-stacks): lockdown-safe kernel symbolization + perf.data userspace MMAP2 Three production gaps shipped in v1.2.0 M1 kernel-stacks: 1. Kernel symbolization fails under lockdown=integrity (Secure Boot). blazesym's kernel source unconditionally probes /proc/kcore for DWARF, which is 0400 root:root + CAP_SYS_RAWIO-gated. perf-agent binaries setcap'd with the standard cap_perfmon/cap_bpf set lack that, so blazesym returns BLAZE_ERR_PERMISSION_DENIED for every batch and kernel frames disappear from the pprof. Setting vmlinux="" doesn't help — blazesym still hits kcore. Fix: pure-Go /proc/kallsyms symbolizer (symbolize/kallsyms.go) as the lockdown-safe fallback. On EPERM from blazesym, LocalKernelSymbolizer switches to the kallsyms path for the rest of its lifetime (sticky). Resolution loses inline expansion and file:line attribution but keeps function name + offset + module marker, which is the load-bearing property for flame graphs. PERFAGENT_FORCE_KERNEL_FALLBACK=1 lets CI exercise the fallback on hosts that don't naturally hit EPERM. 2. When every symbolizer fails, kernel frames are dropped entirely instead of being preserved as raw addresses. Caller's pprof loses kernel context. Fix: rawKernelAddrFrames synthesizes Frame{Name:"0x<hex>", Module:"[kernel.kallsyms]", Reason:FailureMissingSymbols} so operators can still decode hot kernel frames via /proc/kallsyms (the existing awk workaround the spec calls out), and the pprof keeps structural shape. 3. perf.data files written by --perf-data-output contain the [kernel.kallsyms]_text MMAP2 but no userspace MMAP2 records. `perf script` / `perf report` show [unknown] for every user-space IP, forcing operators to post-process with the maps.txt + nm hack. Fix: perfdata.Writer.AddUserspaceMmaps walks /proc/<pid>/maps via the existing procmap.Resolver and emits one PERF_RECORD_MMAP2 per executable mapping for the target PID. Wired into agent.go right after AddKernelMmap. System-wide (-a) is out of scope — left as a documented follow-up. Tests: stub-based unit tests for the fallback ladder and raw-address preservation (no kallsyms-readable host required), plus four integration tests including a forced-fallback variant that exercises the kallsyms path on every host. * docs(roadmap): post-prod-hardening improvement track Captures 10 follow-ups surfaced while fixing the v1.2.0 kernel-stacks production gaps: - self-profile lane (would have caught the lockdown bug at PR time) - metrics.Exporter histograms for symbolize hot paths - /metrics + /debug/pprof HTTP endpoint - symbolizer error counter by reason (lockdown_eperm, kallsyms_unreadable, ...) - CI lane that runs kernel-stacks integration with PERFAGENT_FORCE_KERNEL_FALLBACK=1 - go test -bench budgets for symbolize - allocs/sample budget - system-wide userspace MMAP2 (the -a follow-up to this PR) - lazy MMAP2 on first new-PID sample - kthread MMAP / COMM attribution audit Prioritized so the first three would prevent the next ship-and-pray. * feat(kernel-stacks): ship roadmap items #2, #5, #8, #10 in same PR Four follow-ups from docs/post-prod-hardening.md landed together with the bug fixes — small enough to ship in one commit, large enough to matter: #5 CI lockdown lane: new `make test-integration-lockdown` target runs the kernel-stacks integration tests with PERFAGENT_FORCE_KERNEL_FALLBACK=1, wired into the default `make test`. Every CI run now exercises the kallsyms fallback path regardless of the host's lockdown state — would have caught the original v1.2.0 ship-and-pray. #2 metrics counters: symbolize.Counters (atomic) tracks KernelBatches, KernelInputIPs, KernelBatchFailures, KernelFallbackEngaged, KernelRawAddrFrames. Logged at agent shutdown via agent.cleanup. Histograms (p50/p99 batch latency) left as follow-up (requires reservoir sampling). The fallback_engaged counter bumps both on natural EPERM-triggered switch AND on env-var forced fallback — operators see the active mode from the end-of-run log without scraping. #8 system-wide userspace MMAP2: emitCommAndMmapsForAllPIDs walks /proc at writer init and emits PERF_RECORD_MMAP2 for every non-kthread PID's executable mappings (extends Bug 3 single-PID fix to -a mode). TestPerfDataUserspaceMmap2_SystemWide verifies the walk covers multiple distinct PIDs (saw 9068 on this host). Snapshot semantics; lazy emission is roadmap #9. #10 COMM attribution audit: found that perfdata.Writer.AddComm was never called from anywhere in perf-agent — every perf.data emitted had zero COMM records. Not just kthreads — all PIDs showed up nameless in `perf script`. emitCommForPID reads /proc/<pid>/comm and emits PERF_RECORD_COMM for every PID enumerated, including kthreads (kvm-pit, vhost-*, kworker/*). TestPerfDataUserspaceMmap2 asserts the COMM record appears alongside the MMAP2 record for the workload pid. Modernized three for-loops to Go 1.22 `for i := range N` style in the modified files (per Go 1.26 project target). Roadmap doc updated to mark shipped items and reorder remaining priorities — #1 (self-profile lane) is now the highest-leverage follow-up. * feat(bench): self-profile scenario (roadmap #1) — profile the profiler A second perf-agent profiles the first while it captures a CPU workload. Catches: - perf-agent's own CPU overhead (agent_samples / workload_samples at the same sample rate — directly comparable) - kernel-symbol resolution rate (fraction of kernel Locations whose Function.Name isn't a "0x<hex>" raw-address fallback) The kernel-resolution canary is what would have surfaced the v1.2.0 lockdown bug at PR time: a sudden drop from ~100% to 0% on lockdown hosts would have been impossible to miss in CI. Wiring: - new "self" case in bench/cmd/scenario, with --self-duration, --cpu-budget, --resolution-budget flags - schema.SelfMetrics inlined into Run (omitzero, no effect on other scenarios' JSON) - countKernelResolution helper + unit tests for the metric logic - non-zero exit (3) when budgets breached so CI can gate on it - make bench-self target with 1.5 CPU / 0.5 resolution defaults (calibrated to catch regressions, not enforce a tight target on this codebase as-of-today) Verified end-to-end: 2 runs on this host showed agent/workload CPU ratios of 0.38 and 0.85 with 100% kernel resolution — both within budget, scenario exits 0. Tight budgets (--cpu-budget 0.01) exit 3. Skips the in-process caps check for "self" because the scenario is pure orchestration; the perf-agent subprocesses carry their own file caps. bench-self Makefile target deliberately depends only on bench-build (not build) so it doesn't wipe the manually-applied setcap on perf-agent each invocation. Roadmap doc updated to mark #1 shipped; remaining items reordered. * fix(symbolize): userspace fallback chain — discovered via bench-self Two related bugs surfaced by dogfooding the self-profile scenario (roadmap #1) against this PR's own perf-agent binary: A. DebuginfodSymbolizer doesn't fall back to local on NULL. When DEBUGINFOD_URLS points at a server that doesn't have the target binary's build-id (the normal case for locally-built perf-agent), blaze_symbolize_process_abs_addrs returns NULL and the dispatcher gave up — every userspace frame became <unknown>. Fix: keep a LocalSymbolizer next to the cgo state; on NULL, retry through it before erroring. B. Userspace symbolize EPERM when target has file caps. Profiling a setcap'd process (e.g., a second perf-agent) hits "permission denied" from blazesym because /proc/<pid>/exe is restricted by ptrace_scope when the target is privileged. The error propagated as "Failed to symbolize user" and the whole batch was dropped. Fix: on any blazesym error, LocalSymbolizer.SymbolizeProcess now returns rawUserAddrFrames (Name="0x<hex>", Reason=FailureMissingSymbols) — symmetric to the kernel-side rawKernelAddrFrames fix in this PR's first commit. Stack shape and addresses survive into the pprof so operators can recover with addr2line. Full per-mapping ELF resolution (the proper fix that bypasses /proc/<pid>/exe entirely by using procmap-supplied paths) is left as a roadmap follow-up — see docs/post-prod-hardening.md "Findings from self-profile" section. Verified end-to-end: profiling perf-agent with perf-agent on a setcap'd binary went from 100% <unknown> (whole batch dropped) to named raw-hex frames + structural mapping info; addr2line on those hex names produced the hot-function ranking that drives the new roadmap section (cilium/ebpf BTF parse, net.* DNS for debuginfod, SQLite cache init, etc.). No "permission denied" log spam. * perf(symbolize, bpfstack): close the dogfood loop — three iterations Running the self scenario four times in sequence, each iteration exposed the next bottleneck once the previous one was fixed: iter 2 → fix per-IP empty-name rendering frameFromCSym and fromBlazesymSym left Name="" when blazesym resolved the binary but couldn't map a specific IP to a symbol. pprof rendered those as <unknown>. Filling with "0x<hex>" + FailureMissingSymbols matches the kernel-side behavior shipped earlier in this PR. <unknown> went from 76% to ~0%. iter 3 → fix BPF user-stack kernel leak + speed up /proc/kallsyms bpf_get_stackid with BPF_F_USER_STACK can include kernel addresses in the user-stack buffer when the sampled task was in syscall/irq/fault context. That leak made 40% of "user" CPU look like 0xffff...kernel-range hex names in the pprof, with the user code below them invisible. bpfstack.SplitUserKernelIPs partitions ips by the canonical x86_64 kernel threshold and routes the strays to the kernel symbolizer (where they belong). Once the leak was fixed, the new top of the profile was /proc/kallsyms reading (module_get_kallsym, seq_read_iter, vsnprintf — all kernel-side). The kallsyms file is synthesized by the kernel: each small read() forces vsnprintf for every line. The default 4 KiB scanner buffer meant ~700 read syscalls for 3M kallsyms lines. Wrapping the file in a 256 KiB bufio Reader cut that ~64x and dropped kallsyms entirely out of the top 20. iter 4 → no fix; steady-state hot paths surface do_syscall_64 (42% cum), finish_task_switch (21%), ep_poll (10%), ringbuf.readRecord, dwarfagent.aggregateCPUSample, Syscall6 / nanotime1 / futex. All intrinsic costs of being an eBPF-based sampler. Further wins are diminishing returns. Roadmap doc captures the iteration log so future contributors see the dogfood-as-process pattern: ship #1, run it against itself, let it tell you what to fix next. Tests: new internal/bpfstack/split_test.go covers SplitUserKernelIPs edge cases (kernel-at-top leak, no-leak passthrough, all-kernel, boundary value). All existing unit suites still pass. * perf(symbolize): allocation-free /proc/kallsyms parser bench-self iter 5 (A=with --kernel-stacks, B=without) surfaced strings.Fields + strconv.ParseUint as the top user-side allocation source: 3M kallsyms lines × multi-string Fields × ParseUint triggered visible GC pressure (mallocgc / sweepone / tryDeferToSpanScan in the user-side pprof's top 10). Replaced with parseKallsymsLine: byte-level scan over the read buffer, no string allocation per line. Names are copied out once per kept symbol (mandatory — buffer reuses). Module strings go through an intern map (typical kernel has tens of modules but millions of module symbols). BenchmarkParseKallsymsLine-16 72630638 16.54 ns/op 0 B/op 0 allocs/op iter 6 confirmed: user-side allocation hotspots are gone. The remaining kallsyms cost is kernel-side (module_get_kallsym + vsnprintf rendering the synthesized /proc/kallsyms file) and can only be avoided by NOT parsing on every invocation. Roadmap doc adds a "remaining bottleneck" section proposing a disk-cached index keyed by kernel boot_id as the next follow-up. Also lifts strings.Fields-based parsing out of the kallsyms hot path entirely — net deletion of 2 imports (strconv, strings) from kallsyms.go. * perf(symbolize): disk-cached /proc/kallsyms index keyed by boot_id bench-self iter 6 identified the kernel synthesizing /proc/kallsyms (module_get_kallsym + vsnprintf in the profile) as the floor — unfixable from userspace because the file is generated on read. The fix is to avoid re-reading on every agent invocation. Cache key: kernel boot_id from /proc/sys/kernel/random/boot_id. Kernel symbol addresses change only across reboots (KASLR), so boot_id is exactly the right signal — drops the cache when it would be wrong, never falsely retains a stale copy. Format (little-endian): header: magic u32 | version u32 | boot_id [16]byte | n_syms u32 per-symbol: addr u64 | name_len u16 | mod_len u16 | name | module ~50 bytes/symbol, ~11 MiB on disk for this host's filtered kallsyms (~225k T/t/W/w/i entries). atomic write via temp + rename, so concurrent readers never see a partial file. Behavioral flow in newKallsymsSymbolizer: 1. loadCachedKallsyms → return on hit (boot_id matches) 2. parseKallsymsFresh on stale / corrupt / missing 3. best-effort writeKallsymsCache for next invocation iter 7 dogfood (cold vs warm cache run, same workload + flags): metric cold warm delta module_get_kallsym 16% 11% -45% vsnprintf (cum) 13% 5% -67% seq_read_iter (cum)40% n/a out of top 12 total CPU 676ms 565ms -16% Tests cover roundtrip, stale boot_id rejection, missing file, and corrupt-file non-fatal handling. cachePathFn and readBootIDFn are package vars so tests can pin both deterministically. Next opportunity (next commit): the remaining ~11% module_get_kallsym in the warm run is blazesym's FAILED kernel-source attempt reading kallsyms before hitting EPERM on /proc/kcore. A persistent "blazesym EPERM'd on this boot" marker would let perf-agent skip that attempt entirely on subsequent invocations. * perf(symbolize): persistent blazesym-EPERM marker — skip failing attempt Iter 7 dogfood revealed that even with a warm kallsyms disk cache, the FIRST blazesym call per agent invocation still cost ~16% of perf-agent CPU: blazesym opens /proc/kallsyms (the read shows up as module_get_kallsym + vsnprintf in the profile) before discovering /proc/kcore is restricted and returning EPERM. The sticky in-process bit prevents subsequent batches from re-paying, but every new agent process re-attempts blazesym from scratch. Fix: a boot_id-scoped marker file at ~/.cache/perf-agent/blazesym-eperm-<hex_boot_id>. Written on the first observed EPERM. Checked at LocalKernelSymbolizer construction: if present, fallback is pre-engaged and blazesym is never invoked. Marker is automatically invalidated by reboot (boot_id changes), so a kernel upgrade that loosens lockdown is re-detected on next boot. iter 8 dogfood (cold → warm cache + warm marker, same workload): metric cold warm delta module_get_kallsym 16% ZERO out of top 12 vsnprintf (cum) 14% ZERO out of top 12 do_syscall_64 (cum) 52% 0% down (no failing blazesym calls) total perf-agent CPU 505ms 131ms -74% Warm pprof now shows intrinsic eBPF-sampler costs only: finish_task_switch (21%), ep_poll waiting (69% cum), spin_unlock. Diminishing returns from here — every remaining sample is genuine "be a profiler" work. Tests pin XDG_CACHE_HOME via t.Setenv + override readBootIDFn so the marker lands in a t.TempDir(). Covers: roundtrip, boot_id-scoped invisibility (post-reboot equivalence), and filename structurally encodes the boot_id. * perf(perfdata): lazy MMAP2/COMM on first sample per PID (roadmap #9) bench-self iter 9 found system-wide capture burning ~30% of perf-agent CPU in kernel /proc/<pid>/maps rendering (show_map_vma, mangle_path, lock_next_vma, prepend_path, d_path) because emitCommAndMmapsForAllPIDs walked every PID at writer init — ~9000 reads on this host, of which only ~200 produce samples in a typical capture. Fix: perfdata.Writer.OnNewPID callback fires the first time AddSample sees each unique pid (sentinel-filtered: skips 0 and 0xffffffff). agent.go wires it to emitCommForPID + emitUserspaceMmapsForPID for system-wide mode; the eager walk is gone. Single-PID mode is unchanged (always knows its target). iter 11 dogfood (system-wide, --perf-data-output ON): perf.data size: ~3 MB (vs ~40-50 MB the eager walk produced) enrolled PIDs: ~208 lazy (vs ~9000 eager) remaining /proc/maps cost: ~15% cum, bounded by sampled-PID count rather than host-PID count — natural floor from procmap.Resolver's per-batch re-snapshot (spec invariant) and dwarfagent's lazy attach. Bonus: lazy mode also covers PIDs that exec AFTER capture starts. The eager walk could never see those. Tests pin the contract: fires once per unique pid (multiple samples for same pid don't re-trigger); sentinel pids are filtered; nil callback is a no-op (existing --pid-mode callers keep working unchanged). * feat(symbolize): reason-bucketed error counters + bench coverage (roadmap #4, #6) #4 reason-bucketed counters: - KernelLockdownEPERM bumps on each BLAZE_ERR_PERMISSION_DENIED - KernelOtherErr bumps on every other blazesym failure - CountersSnapshot.String() now logs eperm=N and other_err=N Operators can distinguish "canonical lockdown" (high eperm, fallback_engaged=1) from "unexpected blazesym failure" (any other_err > 0 deserves attention) without grepping logs. #6 symbolize benchmarks: - BenchmarkParseKallsymsFresh (cold-path baseline) - BenchmarkLoadCachedKallsyms (warm-path; ~20x faster) - BenchmarkResolveKernelIPs (per-IP cost ~35 ns) - BenchmarkParseKallsymsLine kept for per-line regression - new `make bench-symbolize` target The 20x cache speedup is the headline metric. PRs touching the symbolize hot path can now show numerical evidence rather than vibes. Roadmap doc reflects shipped status; both items move from Pending to Shipped in the triage table. * feat(perfagent): --metrics-listen HTTP endpoint (roadmap #3) --metrics-listen <addr> starts an in-process HTTP server on the given address (e.g. 127.0.0.1:7777) exposing: /metrics Prometheus text format of symbolize.Counters (batches, input_ips, batch_failures, fallback_engaged, raw_addr_frames, lockdown_eperm, other_err) /debug/pprof/ Go runtime self-pprof — live scrape with `go tool pprof http://host:7777/debug/pprof/profile` Default off — no port opened, zero CPU overhead when unused. Lifecycle: server starts after kernelSymbolizer is constructed (so the metrics handler closure can snapshot live counters); shutdown happens in Agent.cleanup AFTER the symbolizer is closed, with a 2s graceful drain. Tests cover: - Prometheus exposition format (HELP + TYPE + value lines per metric, all 7 counters present) - Lifecycle round-trip against a :0 listener (live scrape, Stop returns within grace, post-Stop scrape fails) - Empty-addr no-op (--metrics-listen not set → nil server) - /debug/pprof/ mount works through the metrics mux End-to-end verified: agent run with --metrics-listen serves the canonical Prometheus payload AND a working gzipped pprof profile (1f 8b 08 magic). * feat(symbolize): batch-duration histograms + allocs budget (roadmap #2, #7) #2 batch-duration histograms: - LatencyHist: ring buffer (1024 samples) + min/max/sum, with p50/p99 on Snapshot via slices.Sort. - Wired into SymbolizeKernel: time.Now() on entry, Record on deferred exit. Microseconds — readable across warm-cache (sub-ms) and cold-CGO (millisecond+) paths. - End-of-run log line gains batch_p50_us / batch_p99_us / batch_max_us fields. - /metrics endpoint exposes three new gauges: perf_agent_symbolize_kernel_batch_{p50,p99,max}_microseconds Sliding-window semantics: an early burst of slow batches doesn't permanently bias the percentile — only the most recent 1024 stay in the ring. Tested with the TestLatencyHist_RingOverwrite case. #7 allocs budget: - testing.AllocsPerRun gates on the three hottest paths: parseKallsymsLine (0 alloc/op), kallsymsSymbolizer.Resolve (1 alloc/op = the return slice), LatencyHist.Record (0 alloc/op). - PRs that reintroduce strings.Fields, strconv.ParseUint, or other allocators into these paths will fail the test immediately — not just look worse in benchstat. Catches the regression class dogfood iter 5 surfaced (mallocgc / sweepone in the user-side profile of perf-agent). Roadmap doc: items #2 and #7 marked shipped (#7 partial — the symbolize hot path is gated; sample-processing follow-up is documented). * chore(perfagent): drop dead emitCommAndMmapsForAllPIDs CI lint caught it: the helper was the eager-walk path replaced by the lazy OnNewPID hook in 405611a (roadmap #9). emitCommForPID and emitUserspaceMmapsForPID are now the only callers — both already exported through the lazy callback. Dead. * docs(README): drop GPU-profiling experimental notice GPU profiling lives on its own track (and its own worktree). The README banner was conflating it with the stable CPU / off-CPU / PMU paths; dropping the line keeps the front page honest about what this build delivers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pprof.ProfileSample.Stack becomes []Frame. Locations are keyed by (Name, File, Module, Line); functions by (Name, File, Module). Same function at different source lines now produces distinct locations instead of silently collapsing to the first-seen line.
Add perf-map decoder for Python (py::name:/path.py) and Node (LazyCompile:/Function:/JS: prefixes, including modern Node v16+ JS: format). Populates Function.Filename from the perf-map path rather than leaving it packed into Name.
Profile and offcpu collectors use SymbolizerWithInlinedFns(true) and expand blazesym.Sym.Inlined into leaf-first Frames via blazeSymToFrames. Recovers user code that LLVM inlined into a visible outer frame — verified on Rust release builds (cpu_intensive_work now appears).
Add regression tests for line-dedup and module-dedup, and decoder tests for Python / Node perf-map formats.
docs/dwarf-unwinding-design.md captures follow-up scope for the frame-pointer limitation that remains beyond this change.