Skip to content

docs(README): trim noisy intro + use-case + architecture prose#16

Merged
dpsoft merged 2 commits into
mainfrom
chore/readme-iter2
May 4, 2026
Merged

docs(README): trim noisy intro + use-case + architecture prose#16
dpsoft merged 2 commits into
mainfrom
chore/readme-iter2

Conversation

@dpsoft
Copy link
Copy Markdown
Owner

@dpsoft dpsoft commented May 3, 2026

Iteration on the README that lands now that #15 is in. Three passages were doing more work than they earned; this PR trims them.

Changes

1. Intro paragraph

Mixed pitch with implementation detail across two paragraphs. Now one sentence.

```diff

  • A single binary that samples on-CPU stack traces, off-CPU blocking time, and hardware PMU
  • counters — and emits production-ready pprof. Hybrid FP+DWARF unwinder handles release-built
  • C++/Rust binaries that omit frame pointers; built-in symbolization for native code (DWARF +
  • ELF), Python (`-X perf` perf-maps), Node.js (`--perf-basic-prof`), and Go.
  • Runs entirely local. No backend, no telemetry, no scrape config.
  • Capture on-CPU, off-CPU, and PMU profiles — system-wide or per-PID — and emit pprof. One
  • binary, runs locally, no backend or telemetry.
    ```

The dropped tech detail (FP+DWARF, Python `-X perf`, Node.js `--perf-basic-prof`) still lives in the use-case section and Flags table — readers who want to know how find it; readers who want to know what aren't ambushed with it.

2. `--inject-python` use case

Four-clause sentence covering trampoline semantics, sidecar deployment, `shareProcessNamespace`, and `kubectl cp`. The deployment shape duplicates the 🐳 Sidecar profiling inside Kubernetes pods use case lower down. Trim to the actual value prop.

```diff

  • Hot-attach to a running pod or process — no restart. For Python 3.12+, `--inject-python`
  • activates the perf trampoline at profile start and deactivates it at exit, so the per-call
  • overhead does not persist past the profiling window. Drop in as a sidecar
  • (`shareProcessNamespace: true`), capture for 30s, exit. Output is pprof; ship it home with
  • `kubectl cp` or pipe it into your store.
  • Hot-attach to a running process — no restart, no preinstalled agent. For Python 3.12+,
  • `--inject-python` enables the perf trampoline only for the capture window, so there's no
  • persistent overhead.
    ```

3. Architecture prose

Compressed two dense paragraphs (MMAP2 watchers, eager-compile fallbacks, ehcompile→ehmaps installation, per-PID lazy caching) into two sentences. The architecture diagram stays; the prose around it now states which flag selects which walker, and what `procmap.Resolver` buys downstream tools.

```diff

  • Two stack-walker paths share a single user-space pipeline:
    • FP path (`--unwind fp`): cheap, kernel-side stackmap aggregation. Truncates on
  • FP-less code (release C++/Rust without `-fno-omit-frame-pointer`).
    • DWARF/hybrid path (`--unwind dwarf` or `auto`, the default): pure-FP for FP-safe
  • code, falls through to `.eh_frame`-derived CFI rules for FP-less PCs. Userspace
  • pre-compiles per-binary CFI from `.eh_frame` (`unwind/ehcompile`) and installs it into
  • BPF maps (`unwind/ehmaps`); the BPF walker reads CFI per-frame. MMAP2 events keep CFI
  • fresh as processes `dlopen`/`exec`. Eager-compile failures (Go binaries lack
  • `.eh_frame`) are tolerated — the walker's FP path covers those.
  • The `procmap.Resolver` sits between the walkers and pprof. It lazily reads
  • `/proc//maps` and ELF `.note.gnu.build-id`, caches per-PID, and gives the pprof
  • builder real `Mapping` identity (path, start/limit, file offset, build-id). Each
  • `Location` is keyed by `(mapping_id, file_offset)` rather than by symbol name, so two
  • PCs that symbolize to the same `(file, line, func)` stay distinguishable — the data
  • downstream tools need for sample-based PGO and cross-run diffing.
  • Two stack-walker paths: `--unwind fp` (cheap, kernel-side aggregation; truncates on
  • FP-less code) and `--unwind dwarf` / `auto` (default — FP fast path with
  • `.eh_frame`-derived CFI fallback for release C++/Rust without frame pointers).
  • Sample addresses resolve through `procmap.Resolver` (lazy `/proc//maps` +
  • build-id), so each pprof `Mapping` carries real per-binary identity and each `Location`
  • is keyed by `(mapping_id, file_offset)` — what `go tool pprof -diff_base` and
  • sample-based PGO converters need to round-trip.
    ```

No new files, no link-outs. Readers who want depth open the code.

Diff shape

```
README.md +4 / −9 (439 → 434 lines)
```

Status

Draft. Iterating — happy to keep cutting if you point at more passages that don't earn their keep.

dpsoft added 2 commits May 3, 2026 18:09
…re prose

Three passages were doing more work than they earned:

- **Intro** mixed pitch (CPU/off-CPU/PMU + pprof) with implementation
  detail (FP+DWARF, Python -X perf, Node.js --perf-basic-prof) and
  ran across two paragraphs. Reader had to parse a lot to extract
  "what is this, where does it run." Now one sentence: capture
  these profiles, emit pprof, run locally, no backend.

- **--inject-python use case** had a four-clause sentence covering
  trampoline activation semantics, sidecar k8s deployment shape,
  shareProcessNamespace flag, kubectl cp egress. The deployment
  cookbook bits are duplicated by the dedicated 🐳 Sidecar use case
  below. Trim to the actual value prop: hot-attach without restart,
  trampoline only for the capture window.

- **Architecture prose** under the ASCII diagram explained MMAP2 watchers,
  Go-binaries-lack-.eh_frame fallback, ehcompile→ehmaps installation,
  per-PID lazy procmap caching. All true and load-bearing for a
  contributor; not value-prop for a reader picking up the README.
  Compress to two sentences: which flag selects which walker, and
  what the procmap.Resolver buys you (-diff_base + sample-PGO).

No new files, no link-outs to a separate architecture doc. Reader
who wants the depth opens the code.

Net: −9, +4. README drops to 434 lines.
…il, library variants

A through F from the iteration list:

- A. Drop Usage > "Profiling running Python processes" and "Profiling
  inside a Kubernetes pod" subsections. Both restated what the 🔥 and
  🐳 use cases already cover; the Python one duplicated detail that
  lives in docs/python-profiling.md. Replace with a one-line pointer.

- B. Trim Output > "pprof fidelity" from a 5-paragraph reference
  enumeration of every Mapping/Location field to two sentences that
  state what the consumer cares about (real per-binary identity,
  file-offset Locations, [kernel]/[jit] sentinels, tags + labels).

- C. Library usage: keep the canonical example, drop the in-memory
  collection and custom-label-enricher sub-blocks. Both were one-
  option-each illustrations that the package docs already cover.

- D. Drop "Real workflows perf-agent is built for. Each maps to one
  or more of the modes documented under [Flags]." — meta-commentary
  the section title already conveys.

- E. Light cuts on use-case prose: drop "Output is pprof, viewable in
  go tool pprof or any flame-graph tool" (already established), drop
  "all inlined frames expanded by blazesym" (impl detail), drop
  "without the perf stat parsing tax" (comparative aside), drop the
  /proc/<N>/status NSpid mechanism explanation in the sidecar
  use-case (impl detail), simplify "two PCs that symbolize to the
  same (file, line, func) stay distinguishable" to "address-stable
  across runs."

- F. Tagline + intro paragraph were saying the same thing. Keep
  the tagline (italic, distinct visually); the intro becomes just
  the deployment line: "One binary, runs locally, no backend or
  telemetry."

Net: -94 / +13. README drops to 353 lines. Every cut is duplicate,
reference material that has a home elsewhere, or implementation
noise that doesn't help a reader pick up the tool.
@dpsoft dpsoft marked this pull request as ready for review May 4, 2026 12:22
@dpsoft dpsoft merged commit 5785c64 into main May 4, 2026
16 of 17 checks passed
@dpsoft dpsoft deleted the chore/readme-iter2 branch May 4, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant