chore: remove all heap allocation from the hot renderer loop by antouhou · Pull Request #38 · antouhou/grafo

antouhou · 2026-02-22T18:20:56Z

Summary by CodeRabbit

Refactor
- Per-frame scratch state and in-place traversal introduced to reuse buffers, trim memory on resize/lifecycle, and reduce allocations; shared tessellated geometry/vertex buffers added to avoid duplicate copies; render & effect flows updated to consume provided scratch stacks/textures.
Chores
- Removed several public per-instance helper setters from an example app, reducing public surface.
Tests
- Added unit tests for buffer reuse, readback padding, traversal scratch behavior, and cache/shared ownership.

coderabbitai · 2026-02-22T18:21:13Z

📝 Walkthrough

Walkthrough

Adds per-frame scratch containers (RendererScratch, TraversalScratch) and in-place traversal; converts tessellated vertex buffers to Arc with TessellatedGeometry; reworks effect/texture recycling and effect output handling; introduces reusable readback and lyon buffer pool reuse APIs; updates renderer call sites to use and trim scratch state.

Changes

Cohort / File(s)	Summary
Vertex buffer ownership & caching `src/cache.rs`, `src/shape.rs`, `src/util.rs`	Cache and shape tessellation switch to `Arc<VertexBuffers<...>>`; add `TessellatedGeometry` enum and helpers; lyon vertex buffer pool gains return/reuse API, pool limit, and test-only stats.
Per-frame scratch containers & policies `src/renderer/types.rs`, `src/renderer/traversal.rs`	Add `RendererScratch` and `TraversalScratch` with lifecycle methods, capacity constants, and trimming helpers; replace plan_traversal return with in-place `plan_traversal_in_place`.
Renderer wiring & lifecycle `src/renderer.rs`, `src/renderer/construction.rs`	Add `scratch: RendererScratch` field; initialize it in `Renderer::new`; add `begin_frame_scratch` and `trim_scratch_on_resize_or_policy`.
Traversal & rendering flow `src/renderer/rendering.rs`, `src/renderer/traversal.rs`	Move traversal/event/stencil data into scratch, call `plan_traversal_in_place`, reuse and restore scratch between frames, and propagate traversal scratch into render paths.
Render passes & effect outputs `src/renderer/passes.rs`, `src/renderer/surface.rs`, `src/renderer/draw_queue.rs`	Change `AppliedEffectOutput` to `primary_work_texture` + optional `secondary_work_texture` and add `push_work_textures_into`; make pass/render functions accept external scratch vectors; call scratch trimming on resize/queue clears.
Effect pipeline & texture recycling `src/effect.rs`, `src/renderer/effects.rs`	`OffscreenTexturePool::recycle` now takes `&mut Vec<PooledTexture>`; add cached regex helpers; unify pipeline layout creation; add `overwrite_effect_params` helper for in-place param updates.
Readback & tessellation cleanup `src/renderer/readback.rs`, `src/renderer/preparation.rs`	Add `copy_padded_readback_rows` and `map_readback_buffer_into` to reuse `scratch.readback_bytes`; update render readback flows; ensure tessellated buffers can be returned to pool via ownership helpers.
Examples / API surface removal `examples/transforms.rs`	Removed a public `impl<'a> App<'a>` block containing multiple setter/mutator helper methods, shrinking the example's public surface.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(120,180,120,0.5)
    participant Client
    end
    rect rgba(80,160,240,0.5)
    participant Renderer
    participant TraversalScratch
    participant EffectPipeline
    participant OffscreenPool
    participant GPU
    end

    Client->>Renderer: render_to_texture_view()
    Renderer->>TraversalScratch: begin() / provide traversal_scratch
    Renderer->>TraversalScratch: plan_traversal_in_place(draw_tree, traversal_scratch)
    TraversalScratch-->>Renderer: traversal data (in-place)
    Renderer->>EffectPipeline: compile/apply effects
    EffectPipeline->>OffscreenPool: acquire/recycle PooledTexture(s)
    OffscreenPool->>GPU: create/reuse textures
    EffectPipeline->>GPU: dispatch pipelines / composite
    GPU-->>Renderer: output textures / readback bytes
    Renderer->>TraversalScratch: store scratch back (end frame)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: split renderer into separate files #37 — Overlaps renderer subsystem changes including TraversalScratch and render_segments modifications; directly related.
fix: nested effects rendering #36 — Modifies render_segments, render_scene_behind_group, and scratch buffer usage; strongly connected.
feat: custom shaders #35 — Changes effect pipeline and offscreen texture pool behavior that intersect with this PR's effect/texture recycling updates.

Poem

🐰 I nibble bytes and stack them tight,
scratch boxes ready for every frame night.
Arcs hug vertices, shared and bright,
traversal in-place, hopping light.
Reuse, trim, render — a rabbit's delight. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.20% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary objective of the PR: eliminating heap allocations in the hot renderer loop through scratch buffer reuse and Arc-wrapped shared ownership.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat-optimization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/renderer/readback.rs`:
- Around line 129-139: The code should guard against empty or insufficient
readback bytes before calling copy_padded_readback_rows: after calling
Self::map_readback_buffer_into(&self.device, output_buffer, &mut readback_bytes)
check whether readback_bytes is empty (or its length is less than height *
padded_bytes_per_row) and if so restore self.scratch.readback_bytes =
readback_bytes and return early (same early‑return pattern used by
render_to_argb32) to avoid passing a too‑small buffer into
copy_padded_readback_rows; reference map_readback_buffer_into,
copy_padded_readback_rows, render_to_argb32 and self.scratch.readback_bytes when
making the change.

In `@src/renderer/types.rs`:
- Around line 165-180: trim_to_policy currently calls trim_vector_if_needed(&mut
self.readback_bytes, MAX_READBACK_BYTES_CAPACITY) but shrink_to cannot reduce
capacity below the current len, so large readbacks won't free memory; update
trim_to_policy to first reduce the length of self.readback_bytes when it exceeds
MAX_READBACK_BYTES_CAPACITY (e.g.,
self.readback_bytes.truncate(MAX_READBACK_BYTES_CAPACITY) or
self.readback_bytes.clear() depending on semantics) and then call
trim_vector_if_needed on readback_bytes so shrink_to can actually reclaim
capacity.

coderabbitai

🧹 Nitpick comments (1)

src/renderer/readback.rs (1)

129-135: Replace checked_mul().unwrap_or(usize::MAX) with saturating_mul().
The pattern flags the Clippy manual_saturating_arithmetic lint. Using saturating_mul() is more idiomatic and directly expresses the intent.

🔧 Suggested change

-        let required_readback_len = (height as usize)
-            .checked_mul(padded_bytes_per_row as usize)
-            .unwrap_or(usize::MAX);
+        let required_readback_len =
+            (height as usize).saturating_mul(padded_bytes_per_row as usize);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/renderer/readback.rs` around lines 129 - 135, The multiplication that
computes required_readback_len uses checked_mul().unwrap_or(usize::MAX), which
triggers Clippy's manual_saturating_arithmetic lint; replace that expression
with a direct saturating multiplication on usize (e.g. use (height as
usize).saturating_mul(padded_bytes_per_row as usize)) in the readback length
calculation so required_readback_len is computed without unwraps and the intent
is explicit; update the calculation near where required_readback_len is defined
(alongside readback_bytes and Self::map_readback_buffer_into calls) and remove
the checked_mul()/unwrap_or() pattern.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/renderer/readback.rs`:
- Around line 129-135: The multiplication that computes required_readback_len
uses checked_mul().unwrap_or(usize::MAX), which triggers Clippy's
manual_saturating_arithmetic lint; replace that expression with a direct
saturating multiplication on usize (e.g. use (height as
usize).saturating_mul(padded_bytes_per_row as usize)) in the readback length
calculation so required_readback_len is computed without unwraps and the intent
is explicit; update the calculation near where required_readback_len is defined
(alongside readback_bytes and Self::map_readback_buffer_into calls) and remove
the checked_mul()/unwrap_or() pattern.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/renderer/readback.rs (1)
13-15: Fast-path copy_from_slice may panic if data.len() != output_size.

When padded_bytes_per_row == unpadded_bytes_per_row, the function assumes data.len() exactly equals output_size. If the caller passes a buffer with extra trailing bytes, copy_from_slice will panic due to length mismatch.

Current callers validate readback_bytes.len() >= required_readback_len, so the buffer could be larger than needed. Consider using a range slice to be defensive:
🛡️ Defensive fix
     if padded_bytes_per_row == unpadded_bytes_per_row {
-        output.copy_from_slice(data);
+        output.copy_from_slice(&data[..output_size]);
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/readback.rs` around lines 13 - 15, The fast-path assumes
data.len() == output.len() and calls output.copy_from_slice(data), which can
panic if callers pass a larger readback_bytes buffer (they only guarantee >=
required_readback_len); change the fast-path in the branch where
padded_bytes_per_row == unpadded_bytes_per_row to copy only the needed slice
(use data[..output.len()] or equivalent) into output instead of passing the
whole data, referencing the variables padded_bytes_per_row,
unpadded_bytes_per_row, output.copy_from_slice(data), readback_bytes and
required_readback_len to locate the code to modify.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/renderer/readback.rs`:
- Around line 310-316: The current code only guards against empty readback_bytes
and then blindly casts and slices into src_words, which can panic if
map_readback_buffer_into returned partial data; before calling
bytemuck::cast_slice and slicing src_words[..needed_len], validate that
readback_bytes.len() is at least needed_len * 4 (or that
bytemuck::cast_slice(&readback_bytes).len() >= needed_len) and handle the
short-read case (e.g., store the partial data in self.scratch.readback_bytes and
return or surface an error) so
out_pixels.copy_from_slice(&src_words[..needed_len]) is only executed when there
are enough u32 words; reference symbols: readback_bytes,
self.scratch.readback_bytes, src_words, out_pixels, needed_len, and
render_to_buffer for consistency with other checks.

---

Nitpick comments:
In `@src/renderer/readback.rs`:
- Around line 13-15: The fast-path assumes data.len() == output.len() and calls
output.copy_from_slice(data), which can panic if callers pass a larger
readback_bytes buffer (they only guarantee >= required_readback_len); change the
fast-path in the branch where padded_bytes_per_row == unpadded_bytes_per_row to
copy only the needed slice (use data[..output.len()] or equivalent) into output
instead of passing the whole data, referencing the variables
padded_bytes_per_row, unpadded_bytes_per_row, output.copy_from_slice(data),
readback_bytes and required_readback_len to locate the code to modify.

coderabbitai · 2026-02-22T18:55:16Z

+        if readback_bytes.is_empty() {
+            self.scratch.readback_bytes = readback_bytes;
            return;
        }

-        let src_words: &[u32] = bytemuck::cast_slice(&data);
+        let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);
        out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);


⚠️ Potential issue | 🟡 Minor

Missing length validation before casting and slicing.

The check on line 310 only guards against empty data, but render_to_buffer (lines 131-134) additionally validates that the buffer has sufficient length. If map_readback_buffer_into returns partial data, line 316 will panic when slicing src_words[..needed_len].

🛡️ Proposed fix to add length validation

- if readback_bytes.is_empty() { + let required_len = needed_len * std::mem::size_of::<u32>(); + if readback_bytes.is_empty() || readback_bytes.len() < required_len { self.scratch.readback_bytes = readback_bytes; return; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if readback_bytes.is_empty() {

self.scratch.readback_bytes = readback_bytes;

return;

}

let src_words: &[u32] = bytemuck::cast_slice(&data);

let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);

out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);

let required_len = needed_len * std::mem::size_of::<u32>();

if readback_bytes.is_empty() || readback_bytes.len() < required_len {

self.scratch.readback_bytes = readback_bytes;

return;

}

let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);

out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/renderer/readback.rs` around lines 310 - 316, The current code only guards against empty readback_bytes and then blindly casts and slices into src_words, which can panic if map_readback_buffer_into returned partial data; before calling bytemuck::cast_slice and slicing src_words[..needed_len], validate that readback_bytes.len() is at least needed_len * 4 (or that bytemuck::cast_slice(&readback_bytes).len() >= needed_len) and handle the short-read case (e.g., store the partial data in self.scratch.readback_bytes and return or surface an error) so out_pixels.copy_from_slice(&src_words[..needed_len]) is only executed when there are enough u32 words; reference symbols: readback_bytes, self.scratch.readback_bytes, src_words, out_pixels, needed_len, and render_to_buffer for consistency with other checks.

antouhou added 2 commits February 20, 2026 18:44

fix: remove heap allocations in the render loop

a7dc890

cleanup

1f3aa6e

remove unused functions in the example

5e56d8e

coderabbitai Bot reviewed Feb 22, 2026

View reviewed changes

Comment thread src/renderer/readback.rs

Comment thread src/renderer/types.rs

Address coderabbit's comments

5e4d22f

coderabbitai Bot reviewed Feb 22, 2026

View reviewed changes

Update readback.rs

9266ebe

coderabbitai Bot reviewed Feb 22, 2026

View reviewed changes

antouhou merged commit 671703a into main Feb 22, 2026
5 checks passed

antouhou deleted the feat-optimization branch February 22, 2026 21:52

This was referenced Feb 28, 2026

refactor: optimization: simplified pipeline for rectangles #39

Merged

feat: front-to-back rendering #40

Closed

coderabbitai Bot mentioned this pull request Apr 18, 2026

feat: add an option to disable caching #49

Open

coderabbitai Bot mentioned this pull request May 1, 2026

feat: add config for backdrop capture #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: remove all heap allocation from the hot renderer loop#38

chore: remove all heap allocation from the hot renderer loop#38
antouhou merged 5 commits into
mainfrom
feat-optimization

antouhou commented Feb 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antouhou commented Feb 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

antouhou commented Feb 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading