Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still gc crash #53491

Closed
sgaure opened this issue Feb 27, 2024 · 2 comments · Fixed by #53527
Closed

Still gc crash #53491

sgaure opened this issue Feb 27, 2024 · 2 comments · Fixed by #53527
Assignees
Labels
kind:regression Regression in behavior compared to a previous version
Milestone

Comments

@sgaure
Copy link

sgaure commented Feb 27, 2024

It seems there are still some problems in the gc with the recent master, despite the fix in #53355.

I'm running

Julia Version 1.12.0-DEV.89
Commit 35cb8a556b* (2024-02-27 06:12 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen Threadripper PRO 5945WX 12-Cores
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)
Environment:
  JULIA_EDITOR = emacs -nw

compiled and installed on this machine.

Here is an MWE which frequently results in a crash. It runs fine in version 1.10.1 installed from snap on Ubuntu 23.10.

using .Threads

function fun(N)
    parts = Iterators.partition(1:N, 1 + N ÷ nthreads())

    tasks = [@spawn begin
                 s = Vector{Float64}(undef, length($part))
                 base = first($part)
                 for i in $part
                     s[i-base+1] = 1/i
                 end
                 return s
             end for part in parts]

    mapreduce(vcat, tasks) do t
        fetch(t)::Vector{Float64}
    end
end

println(sum(fun(42)))

println(sum(fun(10_000_000)))

Without the line with fun(42) it seems to work.

With the above program in the file crash.jl, a typical crash looks like

$ julia -t 12 -e 'include("crash.jl")'
4.326742806648339
GC error (probable corruption)
C error (probable corruption)
Allocations: 618918 (Pool: 618895; Big: 23); GC: 1
Allocations: 618918 (Pool: 618895; Big: 23); GC: 1

!!! ERROR in jl_ -- ABORTING !!!

!!! ERROR in jl_ -- ABORTING !!!
GC error (probable corruption)

[1214416] signal 6 (-6): Aborted

[1214416] si618918 (Pool: 618895; Big: 23); GC: 1
in expression starting at none:0
 Big: 23); in expression starting at none:0
Allocations: 618918 (Pool: 618895; Big: 23); GC: 1
Aborted (core dumped)
@gbaraldi
Copy link
Member

gbaraldi commented Feb 27, 2024

This triggers an assert for me. Looking further into it

@KristofferC KristofferC added this to the 1.11 milestone Feb 27, 2024
@JeffBezanson JeffBezanson added the kind:regression Regression in behavior compared to a previous version label Feb 27, 2024
@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 27, 2024

In analysis of this, we found there is a safepoint that was introduced into JL_TRY in 3f23533 (the safepoint could be there before, but only in rare cases where unlocks was true or defer_signal was true, which typically wouldn't be true in these cases). This safepoint had been intentionally hidden from the gc-analyzer in the definition of JL_TRY, so we didn't notice it cause failures in the checker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants