dasOpenGL: async PBO ring for APNG capture#2645
Merged
Merged
Conversation
Replace the sync glReadPixels in record_tick with a ring of N GL_PIXEL_PACK_BUFFER PBOs (default 4, range [2,8], per-recording override via record_start args.pbo_count). Each tick now does two things back to back: - write side: kick off glReadPixels into pbos[K%N] with a null offset so the GPU returns immediately, then stash the frame's delay_ms in a parallel ring slot. - read side (once frames_seen >= N): map pbos[(K-N+1)%N] (oldest in flight, GPU has long since finished it), feed the pixels to stbi_apng_frame with the matching delay, glUnmapBuffer. record_stop drains the remaining N-1 in-flight buffers in order via the same harvest helper, then glDeleteBuffers and resets state. The encoder worker thread inside the C++ writer is unchanged — only the GL→CPU path was the bottleneck at 1280x720+. While here, drop the PNG zlib compression level from stbi's default of 8 to 4 at record_start time. Level 8 caps the single encoder thread around 25-27 fps on a 1280x720 stream; level 4 is 2-3x faster per frame with a minor file-size cost, which is the right tradeoff for tutorial-style recordings of bounded duration. The comment at the call site flags this as a tutorial-friendly baseline so it's clear where to promote it to a `compression_level` arg if a future caller needs lossless-ish capture instead. Net effect on a 1024x720 / 30fps / 16.8s widgets_tour recording: 448 frames, 0 dropped (was 128/443 = 29% on the sync path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates dasOpenGL’s live APNG capture path to avoid render-thread GPU stalls by replacing synchronous glReadPixels with an asynchronous ring of GL_PIXEL_PACK_BUFFER PBOs, and adjusts PNG compression settings to improve throughput of the APNG encoder worker.
Changes:
- Implement a configurable PBO ring (module default + per-recording override) for async readback during
record_tick. - Add PBO drain logic on
record_stop, plus status/reporting of the effective PBO count. - Reduce stb PNG compression level used by the APNG writer to improve encode speed.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
aleksisch
reviewed
May 13, 2026
aleksisch
reviewed
May 13, 2026
aleksisch
reviewed
May 13, 2026
aleksisch
reviewed
May 13, 2026
Five fixes from the round-1 review on PR #2645. Copilot #1 — Leak on glGenBuffers id==0 early return: The `id == 0u` early return path now glDeleteBuffers the partial allocation and `delete pbos` before returning the error, so a pathological driver doesn't leak GL buffers + the local array. Copilot #2 — stbi_write_set_png_compression_level was global mutation: RecorderState gains `prev_png_level`. record_start captures the current level via `stbi_write_get_png_compression_level()` before setting it to 4, then record_stop restores it. The writer-open-failure cleanup path also restores. Screenshot and any unrelated PNG writes are no longer permanently throttled to the tutorial-friendly level. Copilot #3 — glUnmapBuffer return value ignored: harvest_one_pbo now captures glUnmapBuffer's return; GL_FALSE means the driver invalidated the buffer contents while it was mapped, so treat it as a harvest failure rather than letting potentially junk pixels into the apng stream. Copilot #4 — record_stop drain comment was wrong: The previous comment claimed failures were "logged via the dropped counter", but `stbi_apng_dropped()` only reports encoder queue overflows inside the writer — not GL-side harvest failures. Comment now states behaviour accurately: a failed drain breaks the loop and the resulting `frames` count reflects what actually made it to disk. aleksisch — unsafe audit + narrowing: Several `unsafe { ... }` blocks wrapped calls that don't actually need unsafe in daslang (int-only arg lists, GLenum-only arg lists, null-data variants of GL buffer functions). Drops: - stbi_write_set_png_compression_level (int arg only) - stbi_apng_dropped / stbi_apng_end (void? writer, no raw ptr exposed) - glBufferData(target, size, null, usage) (null data variant) - glUnmapBuffer (GLenum arg, GLboolean return) - glReadPixels(..., null) into bound PBO (null data variant) Remaining unsafe is narrowed from `unsafe { ... }` to `unsafe(expr)` on the bare expressions that genuinely need it (every `addr(pbos[0])` call returning a raw pointer, and `stbi_apng_frame` which takes a raw `void*` pixel pointer). A general lint rule for redundant `unsafe` is out of scope for this PR — separate followup. While there, two STYLE005 lint hits fixed by switching to postfix `break if`/`return if`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
modules/dasOpenGL/opengl/opengl_live.das:318
stbi_apng_droppedandstbi_apng_endare called withoutunsafe, but elsewhere (e.g.record_statusand stbimage tests) these APNG writer APIs are treated as unsafe. This is likely a compile-time error and/or inconsistent safety annotation usage. Wrap these calls inunsafe(...)/ anunsafe {}block (matching the old implementation).
let dropped = stbi_apng_dropped(recorder.writer)
let ok = stbi_apng_end(recorder.writer)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the sync
glReadPixelsinrecord_tickwith a ring of NGL_PIXEL_PACK_BUFFERPBOs so the render loop never blocks on the GPU during APNG capture. Also drop the zlib compression level for the streaming encoder from stbi's default of 8 down to 4 — the encoder worker thread was the actual bottleneck at 1280x720+, and the level-4 tradeoff is right for tutorial-style recordings of bounded duration.Design
Ring of PBOs (default 4, range [2,8], per-recording override via
record_startargs):glReadPixelsintopbos[K%N]with a null offset — returns immediately, GPU finishes asynchronously. Frame'sdelay_msis stashed in a parallel ring slot.frames_seen >= N):glMapBufferonpbos[(K-N+1)%N](oldest in flight, GPU has long since finished), feed pixels tostbi_apng_framewith the matching delay,glUnmapBuffer.record_stopdrains remaining N-1 in-flight buffers via the sameharvest_one_pbohelper, thenglDeleteBuffers.Knob:
var public capture_pbo_count : int = 4(module default, clamp [2,8]).RecordStartArgs.pbo_countper-recording override;0means use module default.RecordStartResult/RecordStatusResultreportpbo_countso callers can see the effective ring size.screenshotstays synchronous — single-shot, no benefit from amortization.PNG compression level 4 at
record_start:Level 8 caps the single encoder thread around 25-27 fps on a 1280x720 stream — the row-flip + filter +
stbi_zlib_compresspipeline is CPU-bound on level 8. Level 4 is 2-3x faster per frame for a minor file-size cost. The comment at the call site flags this as a tutorial-friendly baseline so it's clear where to promote to acompression_levelarg if a future caller wants lossless-ish capture.Numbers
Widgets-tour recording, 1024x720 / 30fps / 16.8s, on the master baseline vs this branch:
glReadPixels+ level 8The PBO ring alone (level 8 retained) brought dropped count down modestly but the encoder still couldn't keep up — both knobs together get to zero drops.
Test plan
record_stopdrains in flight buffers cleanly (saved frame count matchesframes_seenmodulo the final partial drain)test_record_pbo_drops.das) — pending; encoder-bottleneck reframing made it less load-bearing. Will land alongside any futurecompression_levelarg.🤖 Generated with Claude Code