Skip to content

schedstat: add comprehensive GC analysis from range events#5

Merged
tbg merged 5 commits into
mainfrom
gc-analysis
Mar 9, 2026
Merged

schedstat: add comprehensive GC analysis from range events#5
tbg merged 5 commits into
mainfrom
gc-analysis

Conversation

@tbg
Copy link
Copy Markdown
Collaborator

@tbg tbg commented Mar 9, 2026

Summary

  • Capture structured GC range events (EventRangeBegin/Active/End) into a new gc_ranges table, replacing crude pattern-matching on g_transitions reason strings
  • Replace --gc output with comprehensive analysis: cycle summary, STW breakdown, mark assist impact per goroutine, scheduling latency during GC vs normal, per-cycle breakdown (verbose), and sweep summary
  • Change --gc default to true since it now provides useful, low-noise context

Example output (from experiment-upsert1000-gateway.bin):

--- GC Analysis ---
GC cycles: 1, total: 30.31ms, avg: 30.31ms, min: 30.31ms, max: 30.31ms
  STW pauses: 3, total: 270.3µs, max: 140.4µs
    GC sweep termination: 1 pauses, total 140.4µs, max 140.4µs
    GC mark termination: 1 pauses, total 125.7µs, max 125.7µs
  Mark assist: 963 events across 836 goroutines, total: 93.39ms, max single: 1.26ms
    Top affected goroutines:
      G1222     ...RaftTransport...  95 assists, total 11.64ms, max 1.26ms
  Scheduling latency during GC vs normal:
                   count      p50        p99        max
    During GC:     4969       74.6µs     7.51ms     9.44ms
    Non-GC:        1549443    21.3µs     502.6µs    2.40ms
    Ratio:                    3.5x       15.0x      3.9x
  Sweep: 376 events, total: 304.9µs

Test plan

  • go build ./... passes at each commit
  • go test ./... passes at each commit
  • Manual verification: gc_ranges populated with real GC data
  • Manual verification: --gc=false suppresses section
  • Manual verification: --verbose shows per-cycle breakdown
  • Golden files regenerated and verified

tbg and others added 4 commits March 9, 2026 16:06
Add a gc_ranges table and gc_cycles view to the schema, and handle
EventRangeBegin/Active/End trace events to populate them. This captures
structured GC data (mark phases, mark assist, sweeps, STW pauses)
instead of relying on crude pattern-matching on g_transitions reason
strings.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Replace the crude pattern-matching GC analysis with structured output
using the gc_ranges table: cycle summary, STW breakdown, mark assist
impact per goroutine, scheduling latency comparison (during GC vs
normal), per-cycle breakdown (verbose), and sweep summary.

Change --gc default to true since it now provides useful context.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Set gc: true in testOpts to match the new default, and regenerate
golden files to include the GC analysis section.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
1s trace from a single-node cluster at ~20% CPU showing GC mark assist
impact even under light load (2 GC cycles, 5.5x p99 latency during GC).

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg marked this pull request as ready for review March 9, 2026 15:07
Add "@ t=NNms" to the mark assist top-affected-goroutines output,
showing when the longest assist started relative to trace start.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg merged commit c40de56 into main Mar 9, 2026
1 check passed
@tbg tbg deleted the gc-analysis branch March 9, 2026 15:20
tbg added a commit that referenced this pull request Mar 10, 2026
PR #5 introduced G%-8d in the mark assist output, missing the
lowercase convention established in PR #6.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant