-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race Condition: Test.@test_logs
is not thread safe.
#54439
Comments
Test.collect_test_logs()
is not thread safe.Test.@test_logs
is not thread safe.
EDIT: I've edited the original message to the much more compelling example Any Task spawned during a Test, which logs after Here is a contrived example that I managed to get to reproduce the issue: julia> function code_to_test()
# do the work
@info "starting"
start_transaction()
# ...
@info "done"
# Clean up
end_transaction()
end
code_to_test (generic function with 1 method)
julia> start_transaction() = nothing
start_transaction (generic function with 1 method)
julia> function end_transaction()
# spawn a task to do some cleanup in the background, so we can
# return to the client immediately
Threads.@spawn begin
# clean stuff up
@info "Background Cleanup Task: All done cleaning up!"
for _ in 1:1000
@info "now it's clean, goodbye"
end
end
end
end_transaction (generic function with 1 method)
julia> using Test
julia> for _ in 1:10000
let (logs,_) = Test.collect_test_logs() do
@info "hey"
code_to_test()
@warn "hi"
end
for log in logs
@test length(log.message) > 1
end
end
end
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] iterate(A::Vector{LogRecord}, i::Int64)
@ Base ./array.jl:945
[3] top-level scope
@ REPL[5]:10 This also sometimes segfaults, of course: [66213] signal (11.2): Segmentation fault: 11
in expression starting at REPL[5]:1
signal (11) thread (1) sig_match_fast at /Users/nathandaly/builds/julia-RAI/src/gf.c:2914 [inlined]
signal (11) thread (1) jl_lookup_generic_ at /Users/nathandaly/builds/julia-RAI/src/gf.c:2994 [inlined]
signal (11) thread (1) ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:3073 We see this showing up in our CI build farm from time to time. The solution is almost certainly to make that vector thread-safe, by introducing a lock around it, and then copying the vector out before returning. |
Fixes JuliaLang#54439. - Lock around concurrent accesses to .logs and .message_limits - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes #54439. - Lock around concurrent accesses to .logs and .message_limits - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes #54439. - Lock around concurrent accesses to .logs, .message_limits, and .shouldlog_args. - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes JuliaLang#54439. - Lock around concurrent accesses to .logs and .message_limits - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes #54439. - Lock around concurrent accesses to .logs, .message_limits, and .shouldlog_args. - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :) (cherry picked from commit 0437210)
Fixes #54439. - Lock around concurrent accesses to .logs, .message_limits, and .shouldlog_args. - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :) (cherry picked from commit 0437210)
Fixes JuliaLang#54439. - Lock around concurrent accesses to .logs and .message_limits - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes JuliaLang#54439. - Lock around concurrent accesses to .logs and .message_limits - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
Fixes JuliaLang#54439. - Lock around concurrent accesses to .logs, .message_limits, and .shouldlog_args. - Copy the vector out of the logger at the end, to shield against dangling Tasks. Before: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000 julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug [56155] signal (6): Abort trap: 6 in expression starting at REPL[8]:1 signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line) Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119 [1] 56154 abort julia -tauto ``` After: ```julia julia> Threads.nthreads() 8 julia> function foo(n) @info "Doing foo with n=$n" @sync for i=1:n Threads.@Spawn @info "Iteration $i" end 42 end foo (generic function with 1 method) julia> for _ in 1:1000 @test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000) end ``` (no crash) :)
EDIT: changed to a much simpler example that shows the issue.
The
@test_logs
macro is not thread safe, if the code under test uses multiple threads:This is definitely a bug, and should be fixed.
We see this showing up in our CI build farm from time to time.
The solution is almost certainly to make that vector thread-safe, by introducing a lock around it, and then copying the vector out before returning.
The text was updated successfully, but these errors were encountered: