Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try changes backtraces on FreeBSD #30233

Closed
ararslan opened this issue Dec 1, 2018 · 18 comments · Fixed by #40975
Closed

try changes backtraces on FreeBSD #30233

ararslan opened this issue Dec 1, 2018 · 18 comments · Fixed by #40975
Labels
domain:error handling Handling of exceptions by Julia or the user system:freebsd Affects only FreeBSD

Comments

@ararslan
Copy link
Member

ararslan commented Dec 1, 2018

The errorshow tests failed locally for me on FreeBSD 12.0, and when I investigated, I found that the code that fails works fine if run outside of a testset. As it turns out, the try inserted by @testset is what's causing the difference in stack traces. The following is adapted from the tests:

julia> begin
           f(c) = g(c + 1)
           g(c) = c > 10000 ? (return backtrace()) : f(c + 1)
           bt = f(1)
           io = IOBuffer()
           Base.show_backtrace(io, bt)
           println(String(take!(io)))
       end

Stacktrace:
 [1] g(::Int64) at ./REPL[1]:3
 [2] f(::Int64) at ./REPL[1]:2
 ... (the last 2 lines are repeated 5000 more times)
 [10003] top-level scope at REPL[1]:4
 [10004] eval(::Module, ::Any) at ./boot.jl:319
 [10005] eval_user_input(::Any, ::REPL.REPLBackend) at /usr/home/alex/Projects/julia/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85
 [10006] macro expansion at /usr/home/alex/Projects/julia/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117 [inlined]

julia> try
           f(c) = g(c + 1)
           g(c) = c > 10000 ? (return backtrace()) : f(c + 1)
           bt = f(1)
           io = IOBuffer()
           Base.show_backtrace(io, bt)
           println(String(take!(io)))
       catch err
           err
       end

Stacktrace:
 [1] (::getfield(Main, Symbol("#g#4")){getfield(Main, Symbol("#f#3"))})(::Int64) at ./REPL[2]:3
 [2] (::getfield(Main, Symbol("#f#3")))(::Int64) at ./REPL[2]:2
 ... (the last 2 lines are repeated 145 more times)
 [293] (::getfield(Main, Symbol("#g#4")){getfield(Main, Symbol("#f#3"))})(::Int64) at ./REPL[2]:3

I'm also seeing this on FreeBSD 11.2, which is what our CI is running. This happens with both Julia 1.0 and master. Julia 0.6 produces the correct number of frames (10006 instead of 293) but without the collapsing that we now have.

Note that this issue seems similar to #29695, but Julia is not running with --compile=min here.

@ararslan ararslan added domain:error handling Handling of exceptions by Julia or the user system:freebsd Affects only FreeBSD labels Dec 1, 2018
@c42f
Copy link
Member

c42f commented Dec 2, 2018

#29695 can only corrupt about one frame per 1000 or so because that's the chunk size used in jl_backtrace_from_here. So I don't think this can be the same thing.

I can't repro this on linux, but here's some things which might help a little:

  • Look at the raw backtrace instead of the pretty printed one.
  • Look at backtrace(true) (ie, including C frames)

@ararslan
Copy link
Member Author

ararslan commented Dec 3, 2018

I can't repro this on linux

Likewise, nor on macOS; I've only seen it on FreeBSD. If it occurred elsewhere the tests would be failing on CI, but oddly enough the tests pass even on FreeBSD CI.

Look at the raw backtrace instead of the pretty printed one.

Without try, I get 10016 elements in the stack trace, including jl_fptr_trampoline, do_call, etc. at the end. With try, I get 441 elements, and all are calls to f or g.

Look at backtrace(true) (ie, including C frames)

? I only see one method for backtrace(), which is no arguments.

@ararslan
Copy link
Member Author

ararslan commented Dec 3, 2018

Ah, I see, it's stacktrace(::Bool). I was doing map(StackTraces.lookup, x) over the arrays from backtrace. Using stacktrace(x, true), I get C functions without try and no C functions with try.

@c42f
Copy link
Member

c42f commented Dec 3, 2018

Sorry, as you found out I meant stacktrace(), not backtrace().

Anyway, is this the difference between the system deciding to run this code in the interpreter vs not? Maybe interpreter stack traces are broken on FreeBSD? There's certainly enough system-specific black magic in interpreter-stacktrace.c that it wouldn't be too surprising.

What happens without the try, but with --compile=min vs --compile=yes?

@c42f
Copy link
Member

c42f commented Dec 3, 2018

Well, StackOverflow() is what happens with --compile=min locally... but presumably the 10000 can be decreased to repro with a much shallower stack.

@iblislin
Copy link
Member

iblislin commented Dec 3, 2018

I guess the root cause is not try statement, it's issue of scope.
let statement can trigger this issue as well.
On my 12.0-PRERELEASE

julia> begin
         f(c) = g(c + 1)
         g(c) = c > 10000 ? (return backtrace()) : f(c + 1)
         bt = f(1)
         io = IOBuffer()
         Base.show_backtrace(io, bt)
         println(String(take!(io)))
       end


Stacktrace:
 [1] g(::Int64) at ./REPL[14]:3
 [2] f(::Int64) at ./REPL[14]:2
 ... (the last 2 lines are repeated 5000 more times)
 [10003] top-level scope at REPL[14]:4
 [10004] eval(::Module, ::Any) at ./boot.jl:319
 [10005] eval_user_input(::Any, ::REPL.REPLBackend) at /usr/home/iblis/git/julia/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85
 [10006] macro expansion at /usr/home/iblis/git/julia/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117 [inlined]

julia> let
         f(c) = g(c + 1)
         g(c) = c > 10000 ? (return backtrace()) : f(c + 1)
         bt = f(1)
         io = IOBuffer()
         Base.show_backtrace(io, bt)
         println(String(take!(io)))
       end

Stacktrace:
 [1] (::getfield(Main, Symbol("#g#10")){getfield(Main, Symbol("#f#9"))})(::Int64) at ./REPL[15]:3
 [2] (::getfield(Main, Symbol("#f#9")))(::Int64) at ./REPL[15]:2
 ... (the last 2 lines are repeated 145 more times)
 [293] (::getfield(Main, Symbol("#g#10")){getfield(Main, Symbol("#f#9"))})(::Int64) at ./REPL[15]:3

@ararslan
Copy link
Member Author

ararslan commented Dec 3, 2018

Interesting note about let.

I've also noticed that I get the same backtrace with begin, let, and try if I define f and g as anonymous functions, i.e.

f = c->g(c + 1)
g = c->c > 10000 ? (return backtrace()) : f(c + 1)

both on 11.2 and 12.0.

@ararslan
Copy link
Member Author

ararslan commented Dec 4, 2018

Running in the interpreter I get the same backtrace for both cases: begin and try/let. Said backtrace is markedly different than when running normally though. EDIT: Except with C functions omitted (stacktrace(x, false)), then they're identical.

@c42f
Copy link
Member

c42f commented Dec 4, 2018

I can repro this with the FreeBSD image FreeBSD-12.0-RC3-amd64.qcow2 running on KVM.

I thought it was a bug in interpreter backtraces, but looks like it might be a bug in walking the native stack instead. Here's a repro in global scope showing the edge of the failure condition:

f = c-> g(c + 1)
g = c-> c > 10000 ? (return backtrace()) : f(c + 1)
bt = f(1)


@show length(f(9833))
display(stacktrace(f(9833), true))

println()
@show length(f(9831))
display(stacktrace(f(9831), true))

which prints

length(f(9833)) = 363
370-element Array{Base.StackTraces.StackFrame,1}:
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13            
 (::getfield(Main, Symbol("##33#34")))(::Int64)                          
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12            
 (::getfield(Main, Symbol("##31#32")))(::Int64)                          
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13            
 (::getfield(Main, Symbol("##33#34")))(::Int64)                          
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12            
 (::getfield(Main, Symbol("##31#32")))(::Int64)                          
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13            
 ⋮                                                                       
 jl_toplevel_eval_in at toplevel.c:793                                   
 eval(::Module, ::Any) at boot.jl:319                                    
 eval_user_input(::Any, ::REPL.REPLBackend) at REPL.jl:85                
 macro expansion at REPL.jl:117 [inlined]                                
 (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})() at task.jl:259
 jl_apply at julia.h:1571 [inlined]                                      
 start_task at task.c:572                                                
 ip:0xffffffffffffffff                                                   

length(f(9831)) = 341
341-element Array{Base.StackTraces.StackFrame,1}:
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13
 (::getfield(Main, Symbol("##33#34")))(::Int64)              
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12
 (::getfield(Main, Symbol("##31#32")))(::Int64)              
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13
 (::getfield(Main, Symbol("##33#34")))(::Int64)              
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12
 (::getfield(Main, Symbol("##31#32")))(::Int64)              
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13
 ⋮                                                           
 (::getfield(Main, Symbol("##33#34")))(::Int64)              
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12
 (::getfield(Main, Symbol("##31#32")))(::Int64)              
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13
 (::getfield(Main, Symbol("##33#34")))(::Int64)              
 (::getfield(Main, Symbol("##31#32")))(::Int64) at test.jl:12
 (::getfield(Main, Symbol("##31#32")))(::Int64)              
 (::getfield(Main, Symbol("##33#34")))(::Int64) at test.jl:13

Note the stack trace has been truncated in f(9831), with the bottom of the stack disappearing completely.

@c42f
Copy link
Member

c42f commented Dec 4, 2018

A more minimalistic reproduction, stripping away the mutually recursive functions and any Base stack formatting machinery:

@noinline function raw_bt()
    bt, bt2 = ccall(:jl_backtrace_from_here, Any, (Int32,), false)
    collect(bt)
end

g = c-> c <= 0 ? (return raw_bt()) : g(c - 1)

@show length(g(167))
@show length(g(168))

Which produces

length(g(167)) = 367
length(g(168)) = 339  # Should be longer, but is missing all the final frames!

@c42f
Copy link
Member

c42f commented Dec 4, 2018

By inserting a bunch of printf logging, I've traced this down to a call to unw_step returning 0 too early at

return unw_step(cursor) > 0;

So that would suggest a FreeBSD specific bug in libunwind, I suppose?

@c42f
Copy link
Member

c42f commented Dec 4, 2018

Or, looking into libunwind, a bug in the way that the DWARF info is emitted by the julia JIT? Turning on libunwind UNW_DEBUG by hacking the libunwind source (there's probably a better way) and running with the environment variable UNW_DEBUG_LEVEL=2 reports

julia> @show length(g(167))
[...]
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7bd8, ip=0x000000000020251e, cfa=0x00007fffffffe9c0)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 138, segbase = 2013f8, debug_frame_base = 0, fde_addr = 201530
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7bd8, ip=0x00000000002023fc, cfa=0x00007fffffffeab0)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 118, segbase = 2013f8, debug_frame_base = 0, fde_addr = 201510
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7bd8, ip=0x000000000020211b, cfa=0x00007fffffffeae0)
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 68, segbase = 2013f8, debug_frame_base = 0, fde_addr = 201460
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7bd8, ip=0x000000080022a000, cfa=0x00007fffffffeb30)
  >_ULx86_64_step: NULL %rbp loc, returning 0
length(g(167)) = 357

vs

julia> @show length(g(168))
[...]
 >_ULx86_64_step: (cursor=0x7fffffff7b58, ip=0x000000081ca017e7, cfa=0x00007fffffff94a0)
 >_ULx86_64_step: [RBP=0x7fffffffd3e0] = 0x7fffffffd3f0 (cfa = 0x7fffffff94a0) -> 0x7fffffffd460
 >_ULx86_64_step: Frame Chain [RIP=0xffffd3f8] = 0x1ca0178f
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7b58, ip=0x000000081ca0178f, cfa=0x00007fffffff94b0)
 >_ULx86_64_step: [RBP=0x7fffffffd3f0] = 0x7fffffffd460 (cfa = 0x7fffffff94b0) -> 0x7fffffffd470
 >_ULx86_64_step: Frame Chain [RIP=0xffffd468] = 0x1ca017e7
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7b58, ip=0x000000081ca017e7, cfa=0x00007fffffff94c0)
 >_ULx86_64_step: [RBP=0x7fffffffd460] = 0x7fffffffd470 (cfa = 0x7fffffff94c0) -> 0x7fffffffd4e0
 >_ULx86_64_step: Frame Chain [RIP=0xffffd478] = 0x1ca0178f
  >_ULx86_64_step: returning 1
 >_ULx86_64_step: (cursor=0x7fffffff7b58, ip=0x000000081ca0178f, cfa=0x00007fffffff94d0)
 >_ULx86_64_step: [RBP=0x7fffffffd470] = 0x7fffffffd4e0 (cfa = 0x7fffffff94d0) -> 0x7fffffffd4f0
  >_ULx86_64_step: NULL %rbp loc, returning 0
length(g(168)) = 339

The confusing thing is that it's giving up while unwinding a recursive self-call (which actually turns into a couple of mutually recursive calls as we can see in the alternating addresses here), so I would have thought the structure of the native stack would be consistent all the way down.

@c42f
Copy link
Member

c42f commented Dec 4, 2018

Ah, looks like DWARF lookup failed, so libunwind is falling back to a heuristic:

              /* Heuristic to determine incorrect guess.  For RBP to be a
                 valid frame it needs to be above current CFA, but don't
                 let it go more than a little.  Note that we can't deduce
                 anything about new RBP (rbp1) since it may not be a frame
                 pointer in the frame above.  Just check we get the value. */

which of course can get confused. So I guess this is down to lack of — or somehow broken — dwarf info emitted by the JIT on FreeBSD.

@ararslan
Copy link
Member Author

ararslan commented Dec 4, 2018

I just have to say: absolutely amazing work tracking this down!

@c42f
Copy link
Member

c42f commented Dec 5, 2018

libunwind has really great debug logging. Here's part of the output of

UNW_DEBUG_LEVEL=100 ./julia -e 'g(c) = c <= 0 ? (return backtrace()) : g(c - 1); g(100)' 2> unw.log

which shows that the DWARF info is not found on FreeBSD, even for this normal (non-anonymous) function g() above. The difference between anon and normal functions may simply be that the anon functions break the heuristics used by libunwind when it can't find the DWARF info.

 >_ULx86_64_step: (cursor=0x7fffffffcbb8, ip=0x00000008114ff1e5, cfa=0x00007fffffffd0e0)
                >get_rs_cache: acquiring lock
              >_ULx86_64_dwarf_find_proc_info: looking for IP=0x8114ff1e4
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/bin/julia, base=0x0)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libjulia.so.1, base=0x800248000)
               > [...]
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libumfpack.so, base=0x81cc01000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libspqr.so, base=0x814379000)
               >_ULx86_64_dwarf_callback: checking /libexec/ld-elf.so.1, base=0x800203000)
              >_ULx86_64_dwarf_find_proc_info: IP=0x8114ff1e4 not found
                >put_rs_cache: unmasking signals/interrupts and releasing lock
               >_ULx86_64_dwarf_step: returning -10
             >_ULx86_64_step: dwarf_step() failed (ret=-10), trying frame-chain
                >access_mem: mem[00000008114ff1e5] -> 841f0f66c35d
                >access_mem: mem[00000008114ff1ed] -> 48e5894855000000
                >access_mem: mem[00000008114ff1f5] -> b848388b4808468b
                >access_mem: mem[00000008114ff1e0] -> 66c35dd0ffd231f6
            >_ULx86_64_is_signal_frame: b0 0x66c35dd0ffd231f6
                >access_mem: mem[00000008114ff1e5] -> 841f0f66c35d
                >access_mem: mem[00000008114ff1ed] -> 48e5894855000000
              >is_plt_entry: ip=0x8114ff1e5 => 0x0000841f0f66c35d 0x48e5894855000000, ret = 0
                >access_mem: mem[00007fffffffd0d0] -> 7fffffffd0e0
                >access_mem: mem[00007fffffffd0e0] -> 7fffffffd0f0

It doesn't even seem to be required that we define any function in particular. You can get a "dwarf [...] not found" from libunwind on FreeBSD even with

UNW_DEBUG_LEVEL=100 ./julia -e 'backtrace()'

@c42f
Copy link
Member

c42f commented Dec 5, 2018

More info. Comparing linux and FreeBSD for the following command

UNW_DEBUG_LEVEL=100 ./julia -e 'g() = backtrace(); g()' 2> unw.log

and comparing the first frame which is not found on FreeBSD but is found on linux, we have:

Linux:

 >_ULx86_64_step: (cursor=0x7ffd19606110, ip=0x00007f9fb011d1df, cfa=0x00007ffd19606600)
                >get_rs_cache: acquiring lock
               >lookup: e->start_ip_offset = 0
               >_ULx86_64_dwarf_search_unwind_table: ip=0x7f9fb011d1de, start_ip=0x0
 >_ULx86_64_dwarf_search_unwind_table: e->fde_offset = 18, segbase = 7f9fb001d098, debug_frame_base = 0, fde_addr = 7f9fb001d0b0
            >_ULx86_64_dwarf_extract_proc_info_from_fde: FDE @ 0x7f9fb001d0b0
               >_ULx86_64_dwarf_extract_proc_info_from_fde: looking for CIE at address 7f9fb001d098
               >parse_cie: CIE parsed OK, augmentation = "zR", handler=0x0
               >_ULx86_64_dwarf_extract_proc_info_from_fde: FDE covers IP 0x7f9fb011d1c0-0x7f9fb011d1e1, LSDA=0x0
     >_ULx86_64_fetch_frame: fetch frame ip=0x7f9fb011d1df cfa=0x7ffd19606600 format=0
               >run_cfi_program: CFA_def_cfa r7+0x8
               >run_cfi_program: CFA_offset r16 at cfa+0xfffffffffffffff8
               >run_cfi_program: CFA_advance_loc to 0x7f9fb011d1c1
               >run_cfi_program: CFA_def_cfa_offset 0x10
     >_ULx86_64_cache_frame: cache frame ip=0x7f9fb011d1df cfa=0x7ffd19606600 format=0
                >put_rs_cache: unmasking signals/interrupts and releasing lock
     >_ULx86_64_reuse_frame: reuse frame ip=0x7f9fb011d1df cfa=0x7ffd19606600 format=0 addr=0x0 offset=+0
                >access_mem: mem[00007ffd19606608] -> 7f9fc276941c
               >_ULx86_64_dwarf_step: returning 1
  >_ULx86_64_step: returning 1

FreeBSD:

 >_ULx86_64_step: (cursor=0x7fffffffd218, ip=0x00000008114ff1d6, cfa=0x00007fffffffd740)
                >get_rs_cache: acquiring lock
              >_ULx86_64_dwarf_find_proc_info: looking for IP=0x8114ff1d5
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/bin/julia, base=0x0)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libjulia.so.1, base=0x800248000)
               >_ULx86_64_dwarf_callback: checking /lib/libelf.so.2, base=0x8007c8000)
               >_ULx86_64_dwarf_callback: checking /lib/libkvm.so.7, base=0x8007e2000)
               >_ULx86_64_dwarf_callback: checking /usr/lib/librt.so.1, base=0x8007f5000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libgcc_s.so.1, base=0x8007fd000)
               >_ULx86_64_dwarf_callback: checking /usr/lib/libc++.so.1, base=0x800a14000)
               >_ULx86_64_dwarf_callback: checking /lib/libcxxrt.so.1, base=0x800ae3000)
               >_ULx86_64_dwarf_callback: checking /lib/libm.so.5, base=0x800b04000)
               >_ULx86_64_dwarf_callback: checking /lib/libc.so.7, base=0x800b36000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libLLVM-6.0.so, base=0x801000000)
               >_ULx86_64_dwarf_callback: checking /lib/libthr.so.3, base=0x800f29000)
               >_ULx86_64_dwarf_callback: checking /usr/lib/libexecinfo.so.1, base=0x800f54000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/julia/sys.so, base=0x803e00000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libpcre2-8.so, base=0x81112b000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libgmp.so, base=0x8112ee000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libmpfr.so, base=0x81136e000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libdSFMT.so, base=0x811f2a000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libopenblas64_.so, base=0x812000000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libgfortran.so.4, base=0x813d4b000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libquadmath.so.0, base=0x81411c000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libcholmod.so, base=0x818602000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libcolamd.so, base=0x811fde000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libamd.so, base=0x811fe8000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libcamd.so, base=0x81435b000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libccolamd.so, base=0x814368000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libsuitesparseconfig.so, base=0x811ff4000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libsuitesparse_wrapper.so, base=0x811ff8000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libumfpack.so, base=0x81aa01000)
               >_ULx86_64_dwarf_callback: checking /usr/home/tcfoster/julia/usr/lib/libspqr.so, base=0x814379000)
               >_ULx86_64_dwarf_callback: checking /libexec/ld-elf.so.1, base=0x800203000)
              >_ULx86_64_dwarf_find_proc_info: IP=0x8114ff1d5 not found
                >put_rs_cache: unmasking signals/interrupts and releasing lock
               >_ULx86_64_dwarf_step: returning -10
             >_ULx86_64_step: dwarf_step() failed (ret=-10), trying frame-chain
                >access_mem: mem[00000008114ff1d6] -> c35d10c48348
                >access_mem: mem[00000008114ff1de] -> 0
                >access_mem: mem[00000008114ff1e6] -> 0
                >access_mem: mem[00000008114ff1d1] -> c48348d0ffd231f6
            >_ULx86_64_is_signal_frame: b0 0xc48348d0ffd231f6
                >access_mem: mem[00000008114ff1d6] -> c35d10c48348
                >access_mem: mem[00000008114ff1de] -> 0
              >is_plt_entry: ip=0x8114ff1d6 => 0x0000c35d10c48348 0x0000000000000000, ret = 0
                >access_mem: mem[00007fffffffd730] -> 7fffffffd750
                >access_mem: mem[00007fffffffd750] -> 7fffffffd780
 >_ULx86_64_step: [RBP=0x7fffffffd730] = 0x7fffffffd750 (cfa = 0x7fffffffd740) -> 0x7fffffffd780
                >access_mem: mem[00007fffffffd758] -> 800306e63
 >_ULx86_64_step: Frame Chain [RIP=0xffffd758] = 0x306e63
  >_ULx86_64_step: returning 1

not sure what to make of this yet.

@iblislin
Copy link
Member

iblislin commented Dec 5, 2018

Oh, okay. It cannot find the debug info.
Now, I issued gmake debug and re-run the code. It looks correct now.

I think the reason of CI pass is gmake test always builds debug stuffs.

@c42f
Copy link
Member

c42f commented Dec 5, 2018

It cannot find debug info for specific frames, including JITted ones. But other frames are found just fine, including ones from the sysimg.

gmake debug could well be masking the issue. I suggest turning on libunwind debug logging to be sure of what's going on (eg edit deps/srccache/libunwind-1.1-julia2/include/libunwind_i.h to #define UNW_DEBUG 1 always)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:error handling Handling of exceptions by Julia or the user system:freebsd Affects only FreeBSD
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants