Skip to content

(fix): Scope-aware runtime escape detection#36

Merged
mgyoo86 merged 5 commits intomasterfrom
fix/runtime_check
Mar 25, 2026
Merged

(fix): Scope-aware runtime escape detection#36
mgyoo86 merged 5 commits intomasterfrom
fix/runtime_check

Conversation

@mgyoo86
Copy link
Member

@mgyoo86 mgyoo86 commented Mar 25, 2026

Problem

Runtime escape detection (RUNTIME_CHECK=1) iterated all pool vectors when
checking for escaping arrays. This caused false-positive PoolRuntimeEscapeError
for arrays acquired in an outer @with_pool scope and used inside a nested
inner scope — a valid and common pattern.

@with_pool pool begin
    v = acquire!(pool, Float64, 100)      # outer scope
    @with_pool pool begin
        w = acquire!(pool, Float64, 50)   # inner scope
        result = compute!(v, w)           # v is safe here, but was flagged ✗
    end
end

Additionally, the direct-rewind @with_pool macro validated before cleaning
up leaked inner scopes, causing _scope_boundary to use an incorrect (leaked)
depth when an inner scope threw without rewind.

Solution

1. _scope_boundary — scope-aware overlap check

New @inline function that uses the existing checkpoint stack to compute the
boundary index: vectors at 1:boundary belong to outer scopes and are skipped.

@inline function _scope_boundary(tp::AbstractTypedPool, depth::Int)
    @inbounds if tp._checkpoint_depths[end] == depth
        return tp._checkpoint_n_active[end]
    end
    return tp.n_active   # no checkpoint → nothing acquired here → all safe
end

Applied to all three backends:

  • CPU: _check_pointer_overlap, _check_bitchunks_overlap
  • CUDA: _check_tp_cuda_overlap
  • Metal: _check_tp_metal_overlap

2. Macro reorder — leaked scope cleanup before validation

In the direct-rewind @with_pool expansion (both block-form and function-form),
leaked inner scope cleanup now runs before _validate_pool_return:

Before: execute → validate → cleanup leaked → rewind
After:  execute → cleanup leaked → validate → rewind

This ensures _scope_boundary always sees the correct _current_depth.

3. Robust poison for custom isbits types

_poison_fill! now handles custom isbits structs via duck-type zero
(0 * first(v)) when zero(T) is not defined, and skips poisoning entirely
for non-isbits reference types (where resize!(v, 0) handles invalidation).

mgyoo86 added 4 commits March 24, 2026 21:28
…outer-scope arrays

_check_pointer_overlap and _check_bitchunks_overlap previously iterated ALL
pool vectors, causing false PoolRuntimeEscapeError when an array acquired in
an outer scope was used/returned from an inner scope. Now uses checkpoint
stack boundary (_scope_boundary) to only check vectors acquired in the
current scope depth.
…s in CI

- Reorder direct-rewind @with_pool expansion: leaked inner scope cleanup
  now runs BEFORE _validate_pool_return, ensuring _scope_boundary uses
  the correct (normalized) depth instead of a leaked inner depth.
  Applies to both block-form and function-form code paths.

- Register test_scope_depth_validation.jl in runtests.jl for both
  Julia 1.11+ and legacy branches so CI exercises the scope-depth tests.

- Add edge case tests: ReshapedArray parent-chain propagation (scenarios
  16a-c), sentinel depth-0 (scenario 17), and leaked inner scope
  regression (scenarios 18a-b for Array and BitArray).

- Update test header comment to past tense for the fixed bug description.
Apply _scope_boundary to _check_tp_cuda_overlap and
_check_tp_metal_overlap so GPU backends only check vectors acquired
in the current scope, matching the CPU fix from 761a3c9.

Previously both GPU overlap functions iterated all tp.vectors,
causing false-positive escape errors for outer-scope GPU arrays
used inside nested @with_pool scopes.

Add scope-depth validation tests for both backends (7 scenarios
each: outer/inner, 3-level nesting, mixed types, lazy path,
container wrapping, leaked scope). Metal tests verified locally.
_poison_fill! now handles three cases:
- Known types (Float, Int, Complex): dispatch to _poison_value (NaN, typemax)
- Custom isbits structs without zero(T): duck-type fallback via 0 * first(v)
- Non-isbits reference types: skip poison entirely (resize! handles invalidation)

Previously, _poison_fill! called _poison_value(T) unconditionally, which
fell through to zero(T) for unknown types — causing MethodError during
rewind for custom structs that don't define zero().

Add test scenarios for custom isbits struct (TestPoint) and mutable
non-isbits struct (TestParticle) validating both scope-aware escape
detection and graceful poison/skip behavior.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes false-positive runtime escape detection in nested @with_pool scopes by making overlap checks scope-aware (only checking vectors acquired in the current scope), reorders direct-rewind macro cleanup to normalize leaked inner scopes before validation, and improves poisoning behavior for custom element types.

Changes:

  • Add _scope_boundary and apply it to CPU/CUDA/Metal overlap checks so outer-scope arrays aren’t flagged in inner scopes.
  • Reorder direct-rewind @with_pool expansion to clean leaked inner scopes before calling _validate_pool_return.
  • Extend _poison_fill! to handle some custom isbits structs and skip poisoning for non-isbits element types; add comprehensive CPU/CUDA/Metal tests for scope-depth validation.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/debug.jl Adds _scope_boundary, updates pointer-overlap to be scope-aware, and changes _poison_fill! fallback behavior.
src/macros.jl Reorders leaked-scope cleanup to occur before runtime validation in direct-rewind macro paths.
src/bitarray.jl Makes BitArray chunk overlap checks scope-aware (Julia ≥ 1.11 path).
ext/AdaptiveArrayPoolsCUDAExt/debug.jl Adds scope-aware boundary to CUDA overlap checks by threading current_depth.
ext/AdaptiveArrayPoolsMetalExt/debug.jl Adds scope-aware boundary to Metal overlap checks by threading current_depth.
test/test_scope_depth_validation.jl New CPU test suite covering nested scopes, containers, views/reshape parent chains, leaked scopes, and poison behavior.
test/runtests.jl Includes the new CPU scope-depth validation test file.
test/cuda/test_scope_depth_validation.jl New CUDA test suite mirroring key CPU scope-depth validation scenarios.
test/cuda/runtests.jl Includes the new CUDA scope-depth validation test file.
test/metal/test_scope_depth_validation.jl New Metal test suite mirroring key CPU scope-depth validation scenarios.
test/metal/runtests.jl Includes the new Metal scope-depth validation test file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.61%. Comparing base (d73eac7) to head (3e95f31).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/macros.jl 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   96.55%   96.61%   +0.06%     
==========================================
  Files          14       14              
  Lines        2726     2748      +22     
==========================================
+ Hits         2632     2655      +23     
+ Misses         94       93       -1     
Files with missing lines Coverage Δ
src/bitarray.jl 91.11% <100.00%> (+0.30%) ⬆️
src/debug.jl 95.70% <100.00%> (+0.77%) ⬆️
src/legacy/bitarray.jl 94.59% <100.00%> (+0.15%) ⬆️
src/macros.jl 97.98% <50.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Port _scope_boundary to src/legacy/bitarray.jl — the Julia <1.11 code
path still iterated all pool.bits.vectors, causing false-positive
PoolRuntimeEscapeError on LTS. Fixes CI failure on julia-lts.

Harden _poison_fill! so neither zero(T) nor 0*first(v) failure can
throw during rewind: both attempts are individually try/catch guarded.
If neither produces a value, poisoning is skipped gracefully.
@mgyoo86 mgyoo86 merged commit 9879ca5 into master Mar 25, 2026
14 checks passed
@mgyoo86 mgyoo86 deleted the fix/runtime_check branch March 25, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants