Skip to content

Serialize build_callable under a global lock to prevent segfaults in threaded usage#227

Open
Copilot wants to merge 4 commits into
mainfrom
copilot/fix-julia-1-10-segfault
Open

Serialize build_callable under a global lock to prevent segfaults in threaded usage#227
Copilot wants to merge 4 commits into
mainfrom
copilot/fix-julia-1-10-segfault

Conversation

Copy link
Copy Markdown

Copilot AI commented Jun 5, 2026

On Julia 1.10, concurrent threads can race through build_callable during first-time callable IR generation, causing segfaults in threaded Particle Gibbs sampling. The existing GlobalMCCache lock only protects setindex!, not the full IR derivation + compilation pipeline.

  • Wrap build_callable body in a ReentrantLock so only one thread performs callable IR generation at a time
  • Cache lookup is rechecked after acquiring the lock to avoid redundant compilation
const build_callable_lock = ReentrantLock()

function build_callable(sig::Type{<:Tuple})
    # argument validation (no lock needed)
    lock(build_callable_lock)
    try
        # existing body unchanged
    finally
        unlock(build_callable_lock)
    end
end

Fix #226

Serialise build_callable under a ReentrantLock so that concurrent
threads cannot race during first-time callable IR generation and
cache population, which caused segfaults in threaded Particle Gibbs
sampling on Julia 1.10.
Copilot AI changed the title [WIP] Fix Julia 1.10 segfault in threaded PG due to concurrent Libtask callable generation Serialize build_callable under a global lock to prevent segfaults in threaded usage Jun 5, 2026
Copilot finished work on behalf of yebai June 5, 2026 07:14
Copilot AI requested a review from yebai June 5, 2026 07:14
@yebai yebai marked this pull request as ready for review June 5, 2026 07:14
@sunxd3 sunxd3 requested review from sunxd3 and removed request for yebai June 5, 2026 08:29
- Move `build_callable_lock` above the docstring so the docstring attaches
  to `build_callable` again (fixes Documenter `docs_block`/`missing_docs`)
- Wrap the `misty_closure` call to stay within blue style's margin at its
  new indentation (fixes JuliaFormatter check)
- Update the SMC test to FlexiChains' `(niters, nchains)` layout: the chain
  count is `size(chain, 2)`, not `size(chain, 3)`

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Libtask.jl documentation for PR #227 is available at:
https://TuringLang.github.io/Libtask.jl/previews/PR227/

Comment thread src/copyable_task.jl
# the valid world age to be very strictly just the current age allows the
# compiler to do more inlining and other optimisation.
unoptimised_ir = set_valid_world!(unoptimised_ir, world_age)
lock(build_callable_lock)
Copy link
Copy Markdown
Member

@yebai yebai Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to unit-test this, as it is a stochastic crash caused by racing. Intensive A/B testing verifies this is indeed the fix.

chain = sample(StableRNG(468), model, SMC(), 100; progress=false)
@test size(chain, 1) == 100
@test size(chain, 3) == 1
@test size(chain, 2) == 1
Copy link
Copy Markdown
Member

@yebai yebai Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to a change in FlexiChains and fails on main too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Julia 1.10 segfault in threaded PG due to concurrent Libtask callable generation

2 participants