Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange CI error running gc code #53026

Closed
vtjnash opened this issue Jan 23, 2024 · 4 comments
Closed

strange CI error running gc code #53026

vtjnash opened this issue Jan 23, 2024 · 4 comments
Assignees
Labels
domain:ci Continuous integration domain:multithreading Base.Threads and related functionality

Comments

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 23, 2024

This seems it clearly shouldn't happen, as the Matrix is thread-local and the size is not changed, so there is not much way the length of the Memory inside should be corrupted, but yet somehow, we see this failure:

gc                                                (1) |        started at 2024-01-21T00:19:42.883
ERROR: LoadError: TaskFailedException
    nested task error: BoundsError: attempt to access 1125×1125 Matrix{Cell} at index [1]
    Stacktrace:
     [1] throw_boundserror(A::Matrix{Cell}, I::Tuple{Int64})
       @ Base ./essentials.jl:14
     [2] setindex!
       @ ./array.jl:971 [inlined]
     [3] fillcells!(mc::Matrix{Cell})
       @ Main /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:18
     [4] work
       @ /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:25 [inlined]
     [5] macro expansion
       @ /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:30 [inlined]
     [6] (::var"#7#threadsfor_fun#2"{var"#7#threadsfor_fun#1#3"{UnitRange{Int64}}})(tid::Int64; onethread::Bool)
       @ Main ./threadingconstructs.jl:214
     [7] #7#threadsfor_fun
       @ ./threadingconstructs.jl:181 [inlined]
     [8] (::Base.Threads.var"#1#2"{var"#7#threadsfor_fun#2"{var"#7#threadsfor_fun#1#3"{UnitRange{Int64}}}, Int64})()
       @ Base.Threads ./threadingconstructs.jl:153
Stacktrace:
 [1] threading_run(fun::var"#7#threadsfor_fun#2"{var"#7#threadsfor_fun#1#3"{UnitRange{Int64}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:171
 [2] macro expansion
   @ ./threadingconstructs.jl:219 [inlined]
 [3] run(maxsize::Int64)
   @ Main /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:29
 [4] top-level scope
   @ /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:35
in expression starting at /cache/build/tester-amdci5-12/julialang/julia-master/julia-c5e9e7c283/share/julia/test/gc/objarray.jl:35
gc                                                (1) |         failed at 2024-01-21T00:20:30.964

https://buildkite.com/julialang/julia-master/builds/32516#018d2921-a653-440d-aefb-55eb83d5595c

@vtjnash vtjnash added domain:multithreading Base.Threads and related functionality domain:ci Continuous integration labels Jan 23, 2024
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Feb 1, 2024

More apparent examples of GC issues with this test

gc                                                (1) |        started at 2024-02-01T14:28:27.010
ERROR: LoadError: TaskFailedException
    nested task error: StackOverflowError:
    Stacktrace:
     [1] check(node::Main.BinaryTreeMutable.Node)
       @ Main.BinaryTreeMutable /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:20
     [2] check(node::Main.BinaryTreeMutable.Node) (repeats 32246 times)
       @ Main.BinaryTreeMutable /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:21
     [3] macro expansion
       @ /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:35 [inlined]
     [4] (::Main.BinaryTreeMutable.var"#5#threadsfor_fun#2"{Main.BinaryTreeMutable.var"#5#threadsfor_fun#1#3"{Int32, Vector{String}, Int32, StepRange{Int32, Int32}}})(tid::Int32; onethread::Bool)
       @ Main.BinaryTreeMutable ./threadingconstructs.jl:215
     [5] #5#threadsfor_fun
       @ ./threadingconstructs.jl:182 [inlined]
     [6] (::Base.Threads.var"#1#2"{Main.BinaryTreeMutable.var"#5#threadsfor_fun#2"{Main.BinaryTreeMutable.var"#5#threadsfor_fun#1#3"{Int32, Vector{String}, Int32, StepRange{Int32, Int32}}}, Int32})()
       @ Base.Threads ./threadingconstructs.jl:154
Stacktrace:
 [1] threading_run(fun::Main.BinaryTreeMutable.var"#5#threadsfor_fun#2"{Main.BinaryTreeMutable.var"#5#threadsfor_fun#1#3"{Int32, Vector{String}, Int32, StepRange{Int32, Int32}}}, static::Bool)
   @ Base.Threads ./threadingconstructs.jl:172
 [2] macro expansion
   @ ./threadingconstructs.jl:220 [inlined]
 [3] binary_trees(io::Base.DevNull, n::Int32)
   @ Main.BinaryTreeMutable /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:31
 [4] top-level scope
   @ /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:53
in expression starting at /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/binarytree.jl:53
GC error (probable corruption)
Allocations: 1790315 (Pool: 1790283; Big: 32); GC: 7
!!! ERROR in jl_ -- ABORTING !!!
[8571] signal 6 (-6): Aborted
in expression starting at /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc/objarray.jl:35
Allocations: 1790315 (Pool: 1790283; Big: 32); GC: 7
gc                                                (1) |         failed at 2024-02-01T14:29:44.974
Error During Test at /cache/build/tester-amdci5-12/julialang/julia-master/julia-30ccace427/share/julia/test/gc.jl:12
  Test threw exception

https://buildkite.com/julialang/julia-master/builds/33000#018d64bc-df85-41f7-bf7d-cfa160941109
@d-netto

@d-netto
Copy link
Member

d-netto commented Feb 7, 2024

I started doing a bisect on this, but cycle time is fairly slow since this takes a while to reproduce in the GC tests.

I plan to post an update here when I'm done bisecting.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Feb 7, 2024

If helpful, there is also an rr trace of this in #52757

@d-netto
Copy link
Member

d-netto commented Feb 13, 2024

Unable to reproduce it after leaving it running for a few hundred times in a loop... Will look at the rr trace.

KristofferC pushed a commit that referenced this issue Feb 26, 2024
This aims to slightly simplify the synchronization by making
`n_threads_marking` the sole memory location of relevance for it, it
also removes the fast path, because being protected by the lock is
quite important so that the observed gc state arrays are valid.

Fixes: #53350
Fixes: #52757
Maybe fixes: #53026
Co-authored-by: Jameson Nash <vtjnash@gmail.com>

(cherry picked from commit a96726b)
tecosaur pushed a commit to tecosaur/julia that referenced this issue Mar 4, 2024
This aims to slightly simplify the synchronization by making
`n_threads_marking` the sole memory location of relevance for it, it
also removes the fast path, because being protected by the lock is
quite important so that the observed gc state arrays are valid.

Fixes: JuliaLang#53350
Fixes: JuliaLang#52757
Maybe fixes: JuliaLang#53026
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
mkitti pushed a commit to mkitti/julia that referenced this issue Mar 7, 2024
This aims to slightly simplify the synchronization by making
`n_threads_marking` the sole memory location of relevance for it, it
also removes the fast path, because being protected by the lock is
quite important so that the observed gc state arrays are valid.

Fixes: JuliaLang#53350
Fixes: JuliaLang#52757
Maybe fixes: JuliaLang#53026
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Continuous integration domain:multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

2 participants