Skip to content

dasOpenGL: async PBO ring for APNG capture#2645

Merged
borisbat merged 2 commits into
masterfrom
bbatkin/dasopengl-pbo-ring-capture
May 13, 2026
Merged

dasOpenGL: async PBO ring for APNG capture#2645
borisbat merged 2 commits into
masterfrom
bbatkin/dasopengl-pbo-ring-capture

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Replace the sync glReadPixels in record_tick with a ring of N GL_PIXEL_PACK_BUFFER PBOs so the render loop never blocks on the GPU during APNG capture. Also drop the zlib compression level for the streaming encoder from stbi's default of 8 down to 4 — the encoder worker thread was the actual bottleneck at 1280x720+, and the level-4 tradeoff is right for tutorial-style recordings of bounded duration.

Design

Ring of PBOs (default 4, range [2,8], per-recording override via record_start args):

  • write side every tick: glReadPixels into pbos[K%N] with a null offset — returns immediately, GPU finishes asynchronously. Frame's delay_ms is stashed in a parallel ring slot.
  • read side (once frames_seen >= N): glMapBuffer on pbos[(K-N+1)%N] (oldest in flight, GPU has long since finished), feed pixels to stbi_apng_frame with the matching delay, glUnmapBuffer.
  • record_stop drains remaining N-1 in-flight buffers via the same harvest_one_pbo helper, then glDeleteBuffers.

Knob: var public capture_pbo_count : int = 4 (module default, clamp [2,8]). RecordStartArgs.pbo_count per-recording override; 0 means use module default. RecordStartResult / RecordStatusResult report pbo_count so callers can see the effective ring size.

screenshot stays synchronous — single-shot, no benefit from amortization.

PNG compression level 4 at record_start:

Level 8 caps the single encoder thread around 25-27 fps on a 1280x720 stream — the row-flip + filter + stbi_zlib_compress pipeline is CPU-bound on level 8. Level 4 is 2-3x faster per frame for a minor file-size cost. The comment at the call site flags this as a tutorial-friendly baseline so it's clear where to promote to a compression_level arg if a future caller wants lossless-ish capture.

Numbers

Widgets-tour recording, 1024x720 / 30fps / 16.8s, on the master baseline vs this branch:

Path frames dropped drop rate
sync glReadPixels + level 8 443 128 29%
PBO ring + level 4 448 0 0%

The PBO ring alone (level 8 retained) brought dropped count down modestly but the encoder still couldn't keep up — both knobs together get to zero drops.

Test plan

  • Repeat-recording smoke at 1024x720 / 30fps — 0 dropped, 20 MB / 16.8s output
  • record_stop drains in flight buffers cleanly (saved frame count matches frames_seen modulo the final partial drain)
  • Regression test (test_record_pbo_drops.das) — pending; encoder-bottleneck reframing made it less load-bearing. Will land alongside any future compression_level arg.

🤖 Generated with Claude Code

Replace the sync glReadPixels in record_tick with a ring of N
GL_PIXEL_PACK_BUFFER PBOs (default 4, range [2,8], per-recording
override via record_start args.pbo_count).

Each tick now does two things back to back:
- write side: kick off glReadPixels into pbos[K%N] with a null offset
  so the GPU returns immediately, then stash the frame's delay_ms in
  a parallel ring slot.
- read side (once frames_seen >= N): map pbos[(K-N+1)%N] (oldest in
  flight, GPU has long since finished it), feed the pixels to
  stbi_apng_frame with the matching delay, glUnmapBuffer.

record_stop drains the remaining N-1 in-flight buffers in order via
the same harvest helper, then glDeleteBuffers and resets state. The
encoder worker thread inside the C++ writer is unchanged — only the
GL→CPU path was the bottleneck at 1280x720+.

While here, drop the PNG zlib compression level from stbi's default
of 8 to 4 at record_start time. Level 8 caps the single encoder
thread around 25-27 fps on a 1280x720 stream; level 4 is 2-3x faster
per frame with a minor file-size cost, which is the right tradeoff
for tutorial-style recordings of bounded duration. The comment at
the call site flags this as a tutorial-friendly baseline so it's
clear where to promote it to a `compression_level` arg if a future
caller needs lossless-ish capture instead.

Net effect on a 1024x720 / 30fps / 16.8s widgets_tour recording:
448 frames, 0 dropped (was 128/443 = 29% on the sync path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 17:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates dasOpenGL’s live APNG capture path to avoid render-thread GPU stalls by replacing synchronous glReadPixels with an asynchronous ring of GL_PIXEL_PACK_BUFFER PBOs, and adjusts PNG compression settings to improve throughput of the APNG encoder worker.

Changes:

  • Implement a configurable PBO ring (module default + per-recording override) for async readback during record_tick.
  • Add PBO drain logic on record_stop, plus status/reporting of the effective PBO count.
  • Reduce stb PNG compression level used by the APNG writer to improve encode speed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread modules/dasOpenGL/opengl/opengl_live.das
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Comment thread modules/dasOpenGL/opengl/opengl_live.das
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Comment thread modules/dasOpenGL/opengl/opengl_live.das Outdated
Five fixes from the round-1 review on PR #2645.

Copilot #1 — Leak on glGenBuffers id==0 early return:
  The `id == 0u` early return path now glDeleteBuffers the partial
  allocation and `delete pbos` before returning the error, so a
  pathological driver doesn't leak GL buffers + the local array.

Copilot #2 — stbi_write_set_png_compression_level was global mutation:
  RecorderState gains `prev_png_level`. record_start captures the
  current level via `stbi_write_get_png_compression_level()` before
  setting it to 4, then record_stop restores it. The writer-open-failure
  cleanup path also restores. Screenshot and any unrelated PNG writes
  are no longer permanently throttled to the tutorial-friendly level.

Copilot #3 — glUnmapBuffer return value ignored:
  harvest_one_pbo now captures glUnmapBuffer's return; GL_FALSE means
  the driver invalidated the buffer contents while it was mapped, so
  treat it as a harvest failure rather than letting potentially junk
  pixels into the apng stream.

Copilot #4 — record_stop drain comment was wrong:
  The previous comment claimed failures were "logged via the dropped
  counter", but `stbi_apng_dropped()` only reports encoder queue
  overflows inside the writer — not GL-side harvest failures. Comment
  now states behaviour accurately: a failed drain breaks the loop and
  the resulting `frames` count reflects what actually made it to disk.

aleksisch — unsafe audit + narrowing:
  Several `unsafe { ... }` blocks wrapped calls that don't actually
  need unsafe in daslang (int-only arg lists, GLenum-only arg lists,
  null-data variants of GL buffer functions). Drops:
    - stbi_write_set_png_compression_level (int arg only)
    - stbi_apng_dropped / stbi_apng_end (void? writer, no raw ptr exposed)
    - glBufferData(target, size, null, usage) (null data variant)
    - glUnmapBuffer (GLenum arg, GLboolean return)
    - glReadPixels(..., null) into bound PBO (null data variant)
  Remaining unsafe is narrowed from `unsafe { ... }` to `unsafe(expr)`
  on the bare expressions that genuinely need it (every `addr(pbos[0])`
  call returning a raw pointer, and `stbi_apng_frame` which takes a
  raw `void*` pixel pointer). A general lint rule for redundant
  `unsafe` is out of scope for this PR — separate followup.

While there, two STYLE005 lint hits fixed by switching to postfix
`break if`/`return if`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

modules/dasOpenGL/opengl/opengl_live.das:318

  • stbi_apng_dropped and stbi_apng_end are called without unsafe, but elsewhere (e.g. record_status and stbimage tests) these APNG writer APIs are treated as unsafe. This is likely a compile-time error and/or inconsistent safety annotation usage. Wrap these calls in unsafe(...) / an unsafe {} block (matching the old implementation).
    let dropped = stbi_apng_dropped(recorder.writer)
    let ok      = stbi_apng_end(recorder.writer)

Comment thread modules/dasOpenGL/opengl/opengl_live.das
Comment thread modules/dasOpenGL/opengl/opengl_live.das
Comment thread modules/dasOpenGL/opengl/opengl_live.das
@borisbat borisbat merged commit aa7a28a into master May 13, 2026
30 checks passed
@borisbat borisbat deleted the bbatkin/dasopengl-pbo-ring-capture branch May 14, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants