Skip to content

chore: remove all heap allocation from the hot renderer loop#38

Merged
antouhou merged 5 commits into
mainfrom
feat-optimization
Feb 22, 2026
Merged

chore: remove all heap allocation from the hot renderer loop#38
antouhou merged 5 commits into
mainfrom
feat-optimization

Conversation

@antouhou
Copy link
Copy Markdown
Owner

@antouhou antouhou commented Feb 22, 2026

Summary by CodeRabbit

  • Refactor
    • Per-frame scratch state and in-place traversal introduced to reuse buffers, trim memory on resize/lifecycle, and reduce allocations; shared tessellated geometry/vertex buffers added to avoid duplicate copies; render & effect flows updated to consume provided scratch stacks/textures.
  • Chores
    • Removed several public per-instance helper setters from an example app, reducing public surface.
  • Tests
    • Added unit tests for buffer reuse, readback padding, traversal scratch behavior, and cache/shared ownership.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

Adds per-frame scratch containers (RendererScratch, TraversalScratch) and in-place traversal; converts tessellated vertex buffers to Arc with TessellatedGeometry; reworks effect/texture recycling and effect output handling; introduces reusable readback and lyon buffer pool reuse APIs; updates renderer call sites to use and trim scratch state.

Changes

Cohort / File(s) Summary
Vertex buffer ownership & caching
src/cache.rs, src/shape.rs, src/util.rs
Cache and shape tessellation switch to Arc<VertexBuffers<...>>; add TessellatedGeometry enum and helpers; lyon vertex buffer pool gains return/reuse API, pool limit, and test-only stats.
Per-frame scratch containers & policies
src/renderer/types.rs, src/renderer/traversal.rs
Add RendererScratch and TraversalScratch with lifecycle methods, capacity constants, and trimming helpers; replace plan_traversal return with in-place plan_traversal_in_place.
Renderer wiring & lifecycle
src/renderer.rs, src/renderer/construction.rs
Add scratch: RendererScratch field; initialize it in Renderer::new; add begin_frame_scratch and trim_scratch_on_resize_or_policy.
Traversal & rendering flow
src/renderer/rendering.rs, src/renderer/traversal.rs
Move traversal/event/stencil data into scratch, call plan_traversal_in_place, reuse and restore scratch between frames, and propagate traversal scratch into render paths.
Render passes & effect outputs
src/renderer/passes.rs, src/renderer/surface.rs, src/renderer/draw_queue.rs
Change AppliedEffectOutput to primary_work_texture + optional secondary_work_texture and add push_work_textures_into; make pass/render functions accept external scratch vectors; call scratch trimming on resize/queue clears.
Effect pipeline & texture recycling
src/effect.rs, src/renderer/effects.rs
OffscreenTexturePool::recycle now takes &mut Vec<PooledTexture>; add cached regex helpers; unify pipeline layout creation; add overwrite_effect_params helper for in-place param updates.
Readback & tessellation cleanup
src/renderer/readback.rs, src/renderer/preparation.rs
Add copy_padded_readback_rows and map_readback_buffer_into to reuse scratch.readback_bytes; update render readback flows; ensure tessellated buffers can be returned to pool via ownership helpers.
Examples / API surface removal
examples/transforms.rs
Removed a public impl<'a> App<'a> block containing multiple setter/mutator helper methods, shrinking the example's public surface.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(120,180,120,0.5)
    participant Client
    end
    rect rgba(80,160,240,0.5)
    participant Renderer
    participant TraversalScratch
    participant EffectPipeline
    participant OffscreenPool
    participant GPU
    end

    Client->>Renderer: render_to_texture_view()
    Renderer->>TraversalScratch: begin() / provide traversal_scratch
    Renderer->>TraversalScratch: plan_traversal_in_place(draw_tree, traversal_scratch)
    TraversalScratch-->>Renderer: traversal data (in-place)
    Renderer->>EffectPipeline: compile/apply effects
    EffectPipeline->>OffscreenPool: acquire/recycle PooledTexture(s)
    OffscreenPool->>GPU: create/reuse textures
    EffectPipeline->>GPU: dispatch pipelines / composite
    GPU-->>Renderer: output textures / readback bytes
    Renderer->>TraversalScratch: store scratch back (end frame)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I nibble bytes and stack them tight,
scratch boxes ready for every frame night.
Arcs hug vertices, shared and bright,
traversal in-place, hopping light.
Reuse, trim, render — a rabbit's delight. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary objective of the PR: eliminating heap allocations in the hot renderer loop through scratch buffer reuse and Arc-wrapped shared ownership.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-optimization

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/renderer/readback.rs`:
- Around line 129-139: The code should guard against empty or insufficient
readback bytes before calling copy_padded_readback_rows: after calling
Self::map_readback_buffer_into(&self.device, output_buffer, &mut readback_bytes)
check whether readback_bytes is empty (or its length is less than height *
padded_bytes_per_row) and if so restore self.scratch.readback_bytes =
readback_bytes and return early (same early‑return pattern used by
render_to_argb32) to avoid passing a too‑small buffer into
copy_padded_readback_rows; reference map_readback_buffer_into,
copy_padded_readback_rows, render_to_argb32 and self.scratch.readback_bytes when
making the change.

In `@src/renderer/types.rs`:
- Around line 165-180: trim_to_policy currently calls trim_vector_if_needed(&mut
self.readback_bytes, MAX_READBACK_BYTES_CAPACITY) but shrink_to cannot reduce
capacity below the current len, so large readbacks won't free memory; update
trim_to_policy to first reduce the length of self.readback_bytes when it exceeds
MAX_READBACK_BYTES_CAPACITY (e.g.,
self.readback_bytes.truncate(MAX_READBACK_BYTES_CAPACITY) or
self.readback_bytes.clear() depending on semantics) and then call
trim_vector_if_needed on readback_bytes so shrink_to can actually reclaim
capacity.

Comment thread src/renderer/readback.rs
Comment thread src/renderer/types.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/renderer/readback.rs (1)

129-135: Replace checked_mul().unwrap_or(usize::MAX) with saturating_mul().
The pattern flags the Clippy manual_saturating_arithmetic lint. Using saturating_mul() is more idiomatic and directly expresses the intent.

🔧 Suggested change
-        let required_readback_len = (height as usize)
-            .checked_mul(padded_bytes_per_row as usize)
-            .unwrap_or(usize::MAX);
+        let required_readback_len =
+            (height as usize).saturating_mul(padded_bytes_per_row as usize);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/readback.rs` around lines 129 - 135, The multiplication that
computes required_readback_len uses checked_mul().unwrap_or(usize::MAX), which
triggers Clippy's manual_saturating_arithmetic lint; replace that expression
with a direct saturating multiplication on usize (e.g. use (height as
usize).saturating_mul(padded_bytes_per_row as usize)) in the readback length
calculation so required_readback_len is computed without unwraps and the intent
is explicit; update the calculation near where required_readback_len is defined
(alongside readback_bytes and Self::map_readback_buffer_into calls) and remove
the checked_mul()/unwrap_or() pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/renderer/readback.rs`:
- Around line 129-135: The multiplication that computes required_readback_len
uses checked_mul().unwrap_or(usize::MAX), which triggers Clippy's
manual_saturating_arithmetic lint; replace that expression with a direct
saturating multiplication on usize (e.g. use (height as
usize).saturating_mul(padded_bytes_per_row as usize)) in the readback length
calculation so required_readback_len is computed without unwraps and the intent
is explicit; update the calculation near where required_readback_len is defined
(alongside readback_bytes and Self::map_readback_buffer_into calls) and remove
the checked_mul()/unwrap_or() pattern.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/renderer/readback.rs (1)

13-15: Fast-path copy_from_slice may panic if data.len() != output_size.

When padded_bytes_per_row == unpadded_bytes_per_row, the function assumes data.len() exactly equals output_size. If the caller passes a buffer with extra trailing bytes, copy_from_slice will panic due to length mismatch.

Current callers validate readback_bytes.len() >= required_readback_len, so the buffer could be larger than needed. Consider using a range slice to be defensive:

🛡️ Defensive fix
     if padded_bytes_per_row == unpadded_bytes_per_row {
-        output.copy_from_slice(data);
+        output.copy_from_slice(&data[..output_size]);
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/readback.rs` around lines 13 - 15, The fast-path assumes
data.len() == output.len() and calls output.copy_from_slice(data), which can
panic if callers pass a larger readback_bytes buffer (they only guarantee >=
required_readback_len); change the fast-path in the branch where
padded_bytes_per_row == unpadded_bytes_per_row to copy only the needed slice
(use data[..output.len()] or equivalent) into output instead of passing the
whole data, referencing the variables padded_bytes_per_row,
unpadded_bytes_per_row, output.copy_from_slice(data), readback_bytes and
required_readback_len to locate the code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/renderer/readback.rs`:
- Around line 310-316: The current code only guards against empty readback_bytes
and then blindly casts and slices into src_words, which can panic if
map_readback_buffer_into returned partial data; before calling
bytemuck::cast_slice and slicing src_words[..needed_len], validate that
readback_bytes.len() is at least needed_len * 4 (or that
bytemuck::cast_slice(&readback_bytes).len() >= needed_len) and handle the
short-read case (e.g., store the partial data in self.scratch.readback_bytes and
return or surface an error) so
out_pixels.copy_from_slice(&src_words[..needed_len]) is only executed when there
are enough u32 words; reference symbols: readback_bytes,
self.scratch.readback_bytes, src_words, out_pixels, needed_len, and
render_to_buffer for consistency with other checks.

---

Nitpick comments:
In `@src/renderer/readback.rs`:
- Around line 13-15: The fast-path assumes data.len() == output.len() and calls
output.copy_from_slice(data), which can panic if callers pass a larger
readback_bytes buffer (they only guarantee >= required_readback_len); change the
fast-path in the branch where padded_bytes_per_row == unpadded_bytes_per_row to
copy only the needed slice (use data[..output.len()] or equivalent) into output
instead of passing the whole data, referencing the variables
padded_bytes_per_row, unpadded_bytes_per_row, output.copy_from_slice(data),
readback_bytes and required_readback_len to locate the code to modify.

Comment thread src/renderer/readback.rs
Comment on lines +310 to 316
if readback_bytes.is_empty() {
self.scratch.readback_bytes = readback_bytes;
return;
}

let src_words: &[u32] = bytemuck::cast_slice(&data);
let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);
out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing length validation before casting and slicing.

The check on line 310 only guards against empty data, but render_to_buffer (lines 131-134) additionally validates that the buffer has sufficient length. If map_readback_buffer_into returns partial data, line 316 will panic when slicing src_words[..needed_len].

🛡️ Proposed fix to add length validation
-        if readback_bytes.is_empty() {
+        let required_len = needed_len * std::mem::size_of::<u32>();
+        if readback_bytes.is_empty() || readback_bytes.len() < required_len {
             self.scratch.readback_bytes = readback_bytes;
             return;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if readback_bytes.is_empty() {
self.scratch.readback_bytes = readback_bytes;
return;
}
let src_words: &[u32] = bytemuck::cast_slice(&data);
let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);
out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);
let required_len = needed_len * std::mem::size_of::<u32>();
if readback_bytes.is_empty() || readback_bytes.len() < required_len {
self.scratch.readback_bytes = readback_bytes;
return;
}
let src_words: &[u32] = bytemuck::cast_slice(&readback_bytes);
out_pixels[..needed_len].copy_from_slice(&src_words[..needed_len]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/renderer/readback.rs` around lines 310 - 316, The current code only
guards against empty readback_bytes and then blindly casts and slices into
src_words, which can panic if map_readback_buffer_into returned partial data;
before calling bytemuck::cast_slice and slicing src_words[..needed_len],
validate that readback_bytes.len() is at least needed_len * 4 (or that
bytemuck::cast_slice(&readback_bytes).len() >= needed_len) and handle the
short-read case (e.g., store the partial data in self.scratch.readback_bytes and
return or surface an error) so
out_pixels.copy_from_slice(&src_words[..needed_len]) is only executed when there
are enough u32 words; reference symbols: readback_bytes,
self.scratch.readback_bytes, src_words, out_pixels, needed_len, and
render_to_buffer for consistency with other checks.

@antouhou antouhou merged commit 671703a into main Feb 22, 2026
5 checks passed
@antouhou antouhou deleted the feat-optimization branch February 22, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant