fix(webgpu): fixes per-frame memory growth#140
Open
thejustinwalsh wants to merge 6 commits into
Open
Conversation
Continuous WebGPU rendering leaked native heap render-proportionally: each frame mints single-use wrappers (swapchain texture view, command encoder, render/compute pass, command buffer) whose native Arc handle is dropped only by the V8 GC finalizer, which a tight requestAnimationFrame loop starves. global.gc() does not flush the finalizer tasks; backgrounding the loop drains them and the heap collapses. Release each transient deterministically at the WebGPU-spec operation that consumes it, with the GC finalizer kept as a backstop: render/compute pass -> pass.end() command encoder -> encoder.finish() command buffer -> queue.submit() swapchain view -> the owning GPUCanvasContext.presentSurface() The native handle lives in ArcHandle (unique_ptr + stateless custom deleter, new ArcHandle.h, with a MutArcHandle variant for non-const C ABI releases). reset() releases the Arc once and nulls the pointer; ~unique_ptr (run by the GC finalizer via ObjectWrapperImpl's virtual destructor) is a no-op when already null, so exactly-once release is a type invariant. Hand-written destructors and manual null-guards are gone; C ABI call sites pass the raw pointer via .get(). The Rust crate is unchanged. The swapchain view is the only transient tied to a context, so each GPUCanvasContext tracks its own views (swapchainContext_ stamped in getCurrentTexture, registered in GPUTexture.createView) and releases them in presentSurface(): per-context, so multiple canvases in one isolate never drain each other's in-flight views. JS destroy() calls are optional-chained, safe ahead of the native rebuild. Verified: the C++ compiles and links via the Android NDK toolchain (Gradle assembleRelease).
…ten dtors) Convert the remaining 21 persistent WebGPU wrappers to hold their Rust Arc handle in ArcHandle/MutArcHandle, the same primitive as the five transients. Each hand-written destructor that called a raw canvas_native_webgpu_*_release is gone; the unique_ptr member releases the handle once on destruction, so the GC path is unchanged in behavior but the lifecycle now lives in the type. Accessors and C ABI call sites use handle.get(); GPUImpl's null-guarded release and GPUCanvasContextImpl's release (which keeps its raf_.reset() ordering) fold into the same model. Behavior-preserving cleanup: these objects are not minted per frame and do not soak; this removes duplicated release bookkeeping and makes all 26 WebGPU wrappers consistent. Verified compiling via the Android NDK toolchain.
presentSurface copies the swapchain texture into the read-back texture every frame (toDataURL support) using a command encoder + command buffer, but never dropped them, so they accumulated in wgpu-core's registry render-proportionally — a native-heap leak. Drop the command buffer after submit and the encoder after the copy, mirroring gpu_queue.rs::queue_submit. Confirmed on a Moto G 2025: symbolized heapprofd named present_surface as the top net-retained allocator; the drops remove it (~0.31 -> ~0.18 MB/s under continuous render).
getCurrentTexture() returns a fresh per-frame GPUTexture wrapper holding an Arc clone of the surface texture; its only non-deterministic free path was the GC finalizer, which a tight render loop starves. Track the per-context swapchain textures and drop their native handle at presentSurface() (the texture's point of death) via a JS-exposed __releaseHandle on GPUTextureImpl that decrements the Arc. __releaseHandle is distinct from destroy(): destroy() frees the GPU texture and must never run on the swapchain texture; __releaseHandle only drops our wrapper's handle. Completes deterministic per-frame release for the swapchain path.
getCurrentTexture() registers a wgpu Texture (carrying an auto-created surface clear_view) in wgpu-core's hub each frame. surface present()/discard() only release the acquired-texture ref; the hub registry holds a second ref that must be dropped explicitly. CanvasGPUTexture::drop only did this for the None (app-created) branch — the Some (surface) branch was a no-op with its discard commented out, so the per-frame swapchain texture and its clear image view leaked in the hub every frame. Symbolized heapprofd named surface_get_current_texture (present.rs:220, ash create_image_view) as the top remaining grower. Drop the surface texture in the Some branch (and discard first if it was acquired but never presented). The discard's error is logged rather than fatally aborted — the original code used handle_error_fatal, which crashes the process from within Drop and is almost certainly why this cleanup was commented out, trading a crash for a leak. Verified on a Moto G 2025: continuous-render foreground soak goes from ~0.18 MB/s to +0.002 MB/s over 5 minutes (flat; matches the blank-app baseline). This was the last of three native leaks (per-frame wrapper Arcs; read-back encoder/buffer; this).
Trim the explanatory comment blocks added with the leak fixes down to terse one-liners matching the surrounding code. No behavior change.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a slow steady memory growth leak discovered through an accidental soak test. I left the device running for several hours and came back to a low framerate. Ran heap snapshots in incremental waves sniping the offenders until heap growth was stable while rendering at 60-120fps.
Fixes #141.
I did not run an exhaustive search across all renderering implementations to verify memory leaks outside of WebGPU. This fix targets iOS and Android via the WebGPU rendering backend.
leak_before_baseline.mp4
leak_after_fixed.mp4
Copilot Summary
This pull request introduces significant improvements to resource management and memory safety in the WebGPU implementation, especially regarding swapchain textures, command buffers, and the C++/Rust interface on iOS. The main themes are: (1) explicit and timely resource cleanup in the JavaScript/TypeScript bindings, (2) improved handling of swapchain textures and views, and (3) a safer, RAII-based approach to managing Rust Arc handles in the iOS C++ layer.
Resource Management Improvements (JS/TS):
Native Resource Cleanup (Rust/C++/iOS):
ArcHandleRAII wrapper in C++ for Rust Arc handles exposed via the C ABI, replacing manual release calls in destructors and reducing the risk of double-free or leaks. Updated all relevant iOS WebGPU wrapper classes to use this pattern. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]Swapchain Texture Lifecycle (Rust):
Summary of Key Changes:
1. Resource Management and Cleanup (JS/TS):
GPUCanvasContext, and release them atpresentSurface()to prevent leaks and ensure timely return to the swapchain. [1] [2] [3] [4]2. Rust/C++/iOS Interop Safety:
ArcHandle.hand refactored all iOS C++ WebGPU wrapper classes to use RAII for Rust Arc handles, eliminating manual release calls and improving safety. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]3. Swapchain Texture Handling (Rust):
These changes collectively improve memory safety, performance, and reliability of the WebGPU implementation across platforms.