Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings tests segfaulting on arm #12347

Closed
ViralBShah opened this issue Jul 28, 2015 · 13 comments
Closed

strings tests segfaulting on arm #12347

ViralBShah opened this issue Jul 28, 2015 · 13 comments
Labels
system:arm ARMv7 and AArch64

Comments

@ViralBShah
Copy link
Member

The strings types and basic tests are segfaulting on arm.

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+6253 (2015-07-26 15:40 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit a97b24b* (1 day old master)
|__/                   |  arm-linux-gnueabihf
@ViralBShah ViralBShah added the system:arm ARMv7 and AArch64 label Jul 28, 2015
@ViralBShah
Copy link
Member Author

The string test itself is crashing as well.

@ViralBShah
Copy link
Member Author

This is the stacktrace.

gdb) set args runtests.jl string
(gdb) r
Starting program: /home/viral/julia/usr/bin/julia-debug runtests.jl string
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
     * string               
Program received signal SIGSEGV, Segmentation fault.
0xb5f9e140 in __memcpy_neon () at ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S:332
332     ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S: No such file or directory.
(gdb) bt
#0  0xb5f9e140 in __memcpy_neon () at ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S:332
#1  0xb60fb992 in jl_try_substrtod (str=0x0, offset=2, len=3054483733) at builtins.c:804
#2  0x93094088 in julia_parse_20934 () at parse.jl:163
#3  0x93094240 in jlcall_parse_20934 ()
#4  0xb60f8fa4 in jl_apply (f=0x95804620, args=0xbeffd5c8, nargs=2) at julia.h:1263
#5  0xb60fc104 in jl_trampoline (F=0x95804620, args=0xbeffd5c8, nargs=2) at builtins.c:979
#6  0xb60efe38 in jl_apply (f=0x95804620, args=0xbeffd5c8, nargs=2) at julia.h:1263
#7  0xb60f3d8c in jl_apply_generic (F=0x94add540, args=0xbeffd5c8, nargs=2) at gf.c:1675
#8  0x9309608c in julia_anonymous_20933 () at test.jl:90
#9  0xb60f8fa4 in jl_apply (f=0x95804490, args=0x0, nargs=0) at julia.h:1263
#10 0xb60fc104 in jl_trampoline (F=0x95804490, args=0x0, nargs=0) at builtins.c:979
#11 0x932070f4 in julia_do_test_20685 (body=<optimized out>, qex=<optimized out>) at test.jl:50
#12 0xb60efe38 in jl_apply (f=0x95d73140, args=0xbeffd904, nargs=2) at julia.h:1263
#13 0xb60f3ce6 in jl_apply_generic (F=0x94a1aa20, args=0xbeffd904, nargs=2) at gf.c:1651
#14 0xb617f0c0 in jl_apply (f=0x94a1aa20, args=0xbeffd904, nargs=2) at julia.h:1263
#15 0xb617f486 in do_call (f=0x94a1aa20, args=0x9593cf34, nargs=2, eval0=0x0, locals=0x0, nl=0, ngensym=0)
    at interpreter.c:65
#16 0xb617fef6 in eval (e=0x95804440, locals=0x0, nl=0, ngensym=0) at interpreter.c:212
#17 0xb617f1ec in jl_interpret_toplevel_expr (e=0x95804440) at interpreter.c:27
#18 0xb6197596 in jl_toplevel_eval_flex (e=0x95804330, fast=1) at toplevel.c:524
#19 0xb6197898 in jl_parse_eval_all (fname=0x95d32a90 "/home/viral/julia/test/strings/basic.jl", len=39)
    at toplevel.c:574
#20 0xb6197a82 in jl_load (fname=0x95d32a90 "/home/viral/julia/test/strings/basic.jl", len=39) at toplevel.c:614
#21 0xb6197af4 in jl_load_ (str=0x95d72ed0) at toplevel.c:622
#22 0xb4997aa0 in julia_include_6828 (fname=<optimized out>) at boot.jl:254
#23 0xb60efe38 in jl_apply (f=0x93eb0850, args=0xbeffe1dc, nargs=1) at julia.h:1263
#24 0xb60f3ce6 in jl_apply_generic (F=0x93eb0780, args=0xbeffe1dc, nargs=1) at gf.c:1651
#25 0xb4a760f0 in julia_include_from_node1_17053 (path=<optimized out>) at loading.jl:197
#26 0xb60efe38 in jl_apply (f=0x94b94710, args=0xbeffe2bc, nargs=1) at julia.h:1263
#27 0xb60f3ce6 in jl_apply_generic (F=0x94b946a0, args=0xbeffe2bc, nargs=1) at gf.c:1651
#28 0xb617f0c0 in jl_apply (f=0x94b946a0, args=0xbeffe2bc, nargs=1) at julia.h:1263
#29 0xb617f486 in do_call (f=0x94b946a0, args=0x95d328b4, nargs=1, eval0=0x0, locals=0x0, nl=0, ngensym=0)
    at interpreter.c:65
#30 0xb617fef6 in eval (e=0x95d72e20, locals=0x0, nl=0, ngensym=0) at interpreter.c:212
#31 0xb617f1ec in jl_interpret_toplevel_expr (e=0x95d72e20) at interpreter.c:27
#32 0xb6197596 in jl_toplevel_eval_flex (e=0x95d72e00, fast=1) at toplevel.c:524
---Type <return> to continue, or q <return> to quit---
#33 0xb6197898 in jl_parse_eval_all (fname=0x95d32810 "/home/viral/julia/test/string.jl", len=32) at toplevel.c:574
#34 0xb6197a82 in jl_load (fname=0x95d32810 "/home/viral/julia/test/string.jl", len=32) at toplevel.c:614
#35 0xb6197af4 in jl_load_ (str=0x95d72de0) at toplevel.c:622
#36 0xb4997aa0 in julia_include_6828 (fname=<optimized out>) at boot.jl:254
#37 0xb60efe38 in jl_apply (f=0x93eb0850, args=0xbeffeb8c, nargs=1) at julia.h:1263
#38 0xb60f3ce6 in jl_apply_generic (F=0x93eb0780, args=0xbeffeb8c, nargs=1) at gf.c:1651
#39 0xb4ac27d8 in julia_include_from_node1_18574 (path=<optimized out>) at loading.jl:197
#40 0xb60efe38 in jl_apply (f=0x94b94720, args=0xbeffec54, nargs=1) at julia.h:1263
#41 0xb60f3ce6 in jl_apply_generic (F=0x94b946a0, args=0xbeffec54, nargs=1) at gf.c:1651
#42 0x9320f1dc in julia_runtests_20679 () at /home/viral/julia/test/testdefs.jl:196
#43 0x9320f3e8 in jlcall_runtests_20679 ()
#44 0xb60f8fa4 in jl_apply (f=0x95b8e7b0, args=0xbeffed80, nargs=1) at julia.h:1263
#45 0xb60fc104 in jl_trampoline (F=0x95b8e7b0, args=0xbeffed80, nargs=1) at builtins.c:979
#46 0xb60efe38 in jl_apply (f=0x95b8e7b0, args=0xbeffed80, nargs=1) at julia.h:1263
#47 0xb60f3d8c in jl_apply_generic (F=0x95b30320, args=0xbeffed80, nargs=1) at gf.c:1675
#48 0xb60f8fa4 in jl_apply (f=0x95b30320, args=0xbeffed80, nargs=1) at julia.h:1263
#49 0xb60fa79a in jl_f_apply (F=0x0, args=0xbeffee14, nargs=2) at builtins.c:485
#50 0x93215098 in julia_anonymous_20676 () at multi.jl:658
#51 0xb60f8fa4 in jl_apply (f=0x95b8e720, args=0x0, nargs=0) at julia.h:1263
#52 0xb60fc104 in jl_trampoline (F=0x95b8e720, args=0x0, nargs=0) at builtins.c:979
#53 0x9324f208 in julia_run_work_thunk_20640 (thunk=<optimized out>) at multi.jl:619
#54 0x932190a0 in julia_remotecall_fetch_20674 (w=<optimized out>, f=<optimized out>) at multi.jl:692
#55 0xb60f8fa4 in jl_apply (f=0x95b8d050, args=0xbefff180, nargs=3) at julia.h:1263
#56 0xb60fc104 in jl_trampoline (F=0x95b8d050, args=0xbefff180, nargs=3) at builtins.c:979
#57 0xb60efe38 in jl_apply (f=0x95b8d050, args=0xbefff180, nargs=3) at julia.h:1263
#58 0xb60f3d8c in jl_apply_generic (F=0x94a1a140, args=0xbefff180, nargs=3) at gf.c:1675
#59 0xb60f8fa4 in jl_apply (f=0x94a1a140, args=0xbefff180, nargs=3) at julia.h:1263
#60 0xb60fa79a in jl_f_apply (F=0x0, args=0xbefff220, nargs=3) at builtins.c:485
#61 0x9321b0d4 in julia_remotecall_fetch_20673 (id=0, f=<optimized out>) at multi.jl:707
#62 0xb60f8fa4 in jl_apply (f=0x95b50770, args=0xbefff340, nargs=3) at julia.h:1263
#63 0xb60fc104 in jl_trampoline (F=0x95b50770, args=0xbefff340, nargs=3) at builtins.c:979
#64 0xb60efe38 in jl_apply (f=0x95b50770, args=0xbefff340, nargs=3) at julia.h:1263
#65 0xb60f3d8c in jl_apply_generic (F=0x94a1a140, args=0xbefff340, nargs=3) at gf.c:1675
#66 0xb60f8fa4 in jl_apply (f=0x94a1a140, args=0xbefff340, nargs=3) at julia.h:1263
#67 0xb60fa79a in jl_f_apply (F=0x0, args=0xbefff5d4, nargs=3) at builtins.c:485
#68 0x93234298 in julia_anonymous_20656 () at task.jl:1395
---Type <return> to continue, or q <return> to quit---
#69 0xb60f8fa4 in jl_apply (f=0x95806b50, args=0x0, nargs=0) at julia.h:1263
#70 0xb60fc104 in jl_trampoline (F=0x95806b50, args=0x0, nargs=0) at builtins.c:979
#71 0xb6189044 in jl_apply (f=0x95806b50, args=0x0, nargs=0) at julia.h:1263
#72 0xb6189564 in start_task () at task.c:232
#73 0xb61895aa in set_base_ctx (__stk=0xbefff6db "") at task.c:241
#74 0xb6189606 in julia_init (rel=JL_IMAGE_JULIA_HOME) at task.c:256
#75 0x0000a8ce in main (argc=2, argv=0xbefff868) at repl.c:583

@maleadt
Copy link
Member

maleadt commented Nov 26, 2015

Culprit seems to be the parse of a Float (32 or 64):

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.2-pre+15 (2015-11-20 12:12 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit c4bbb89 (5 days old release-0.4)
|__/                   |  arm-linux-gnueabihf

julia> parse(Float64,"1\n")
signal (11): Segmentation fault
Segmentation fault

Will be trying on 0.5, but seeing how this is a RPi2 it might take a while to compile :-)

@ViralBShah
Copy link
Member Author

Fails on 0.5 too. Just tried it on scaleway.

@ViralBShah
Copy link
Member Author

The crash is in here:

s = "1"; 
ccall(:jl_try_substrtod, Nullable{Float64}, (Ptr{UInt8},Csize_t,Csize_t), s, 0, sizeof(s))

@maleadt
Copy link
Member

maleadt commented Nov 27, 2015

Some investigation:

$ gdb --args ./julia -e 'ccall(:jl_try_substrtod, Nullable{Float64}, (Ptr{UInt8},Csize_t,Csize_t), "42", 0, 2)'
Program received signal SIGSEGV, Segmentation fault.
__memcpy_neon () at ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S:786

(gdb) frame 1
#1  0x75f38356 in jl_try_substrtod (str=0x0, offset=2, len=1978893001) at /home/tbesard/julia/src/builtins.c:818

So the malloc fails because of the outrageous len. Seems off...

(gdb) x/10x $sp
0x7effea28: 0x75f382c9  0x00000002  0x00000000  0x54e35670
0x7effea38: 0x7effea40  0x75f8af6d  0x00000002  0x75f382cb
0x7effea48: 0x00000000  0x00000000

(gdb) printf "%s\n", 0x54e35670
42

Seems like the stack pointer is corrupt: it should start at 0x7effea32, not at 0x7effea28. The value that's being interpreter as len, 0x75f382c9, points to 1 byte into jl_try_substrtod:

(gdb) info symbol 0x75f382c9
jl_try_substrtod + 1 in section .text of /home/tbesard/julia/usr/lib/libjulia-debug.so

@maleadt
Copy link
Member

maleadt commented Nov 27, 2015

A shorter repro:

// gcc -ggdb -fPIC -shared test.c -o test.so -Wall

#include <stdio.h>
#include <stdint.h>


void foo(int magic)
{
    printf("magic: %d\n", magic);
}


// from julia.h
typedef struct {
    uint8_t isnull;
    double value;
} jl_nullable_float64_t;

jl_nullable_float64_t bar(int magic)
{
    printf("magic: %d\n", magic);

    jl_nullable_float64_t ret = {(uint8_t)0, 0.0};
    return ret;
}

Call this library using:

lib = Libdl.dlopen("./test.so")
ccall(Libdl.dlsym(lib, :foo), Void,              (Int,), 42)
ccall(Libdl.dlsym(lib, :bar), Nullable{Float64}, (Int,), 42)

This yields the following output:

$ ./julia ../test.jl 
magic: 42
magic: 1376867701

signal (11): Segmentation fault

So probably an issue in how ccall manages the return value.

@ViralBShah
Copy link
Member Author

Perhaps this what @vtjnash was referring to about implementing ARM calling conventions, and is also the warning LLVM shows.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 28, 2015

yes. it's best to fix compile warnings and the ccall test before trying to investigate any other segfaults. fortunately, the ARM ABI is pretty simple, so the fallback ABI is almost correct.

@maleadt
Copy link
Member

maleadt commented Nov 28, 2015

Okay, had a look at it, and with this minimal ABI the strings testsuite succeeds. Still having issues with the ccall tests though (composite types > 4 bytes aren't returned correctly, eg. ccalltest::cgtest fails).

@nalimilan
Copy link
Member

Could this be related to #13752?

@maleadt
Copy link
Member

maleadt commented Nov 28, 2015

My current ABI implementation wouldn't make any difference, I think, as non of the return types of the ccalls in your issue would trigger an sret. Could be some other ABI issue though.

@maleadt maleadt mentioned this issue Nov 30, 2015
@yuyichao
Copy link
Contributor

Should be fixed by #14194 now there could be other issues that make the strings test fail but that would be a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64
Projects
None yet
Development

No branches or pull requests

5 participants