New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major steps toward finishing LJ_GC64 mode #149

Closed
wants to merge 15 commits into
base: v2.1
from

Conversation

Projects
None yet
9 participants
@corsix

corsix commented Mar 27, 2016

This PR is intended to make major progress towards #25.

At least for me, the x64/LJ_GC64 trace recorder and JIT compiler are capable of assorted small test cases, and are capable of running DynASM. However, I have not for example gone through and audited that everything in lj_asm_x86.h and lj_emit_x86.h behaves correctly under LJ_GC64, so some broken corners probably remain.

The ARM/ARM64/MIPS/PPC backends are currently broken due to the IR overhaul, and require some work to mend before merging of this PR can be considered, but said mending can be carried out in parallel with more testing of the x64/LJ_GC64 mode. The x86 and x64/!LJ_GC64 backends might also be broken, though they should at least still compile, and hopefully aren't too broken.

Changes are roughly separated out into commits to ease comprehension, though I don't guarantee that all of the intermediate states are meaningful.

@lukego

This comment has been minimized.

lukego commented Mar 27, 2016

Cool!

JFYI: after reading this PR, and #25, and the two emails linked from #25, I still have no idea what "LJ_GC64 mode" actually is :).

@corsix

This comment has been minimized.

corsix commented Mar 27, 2016

A mode in which the Lua heap is limited to the low 247 bytes of address space rather than the low 231 (thus removing the need for some linker contortions on OSX, making it easier to have multiple lua_States in a 64-bit process, and allowing more/larger objects in the Lua heap).

@NukeRusich

This comment has been minimized.

NukeRusich commented Mar 27, 2016

Thank you, Corsix!

@MikePall

This comment has been minimized.

Member

MikePall commented Mar 28, 2016

I've cherry-picked and committed the tangential parts.

  • The disassembly of BMI2 instructions is suboptimal ('vshlx' etc).
  • I would use: #define IRTSIZE_PGC (LJ_GC64 ? 8 : 4)

@corsix corsix force-pushed the corsix:x64 branch from aae2771 to b8a8079 Mar 28, 2016

@corsix

This comment has been minimized.

corsix commented Mar 28, 2016

I've rebased to account for cherry-picking, introduced IRTSIZE_PGC, and also slipped in a few fixes.

I'm still working on putting together a cross-compilation test environment for non-x86/x64, and figuring out what use I can make of the not-yet-tidied-up test suite.

@corsix corsix force-pushed the corsix:x64 branch 5 times, most recently from 383e1bc to ee1fbce Mar 29, 2016

@corsix corsix force-pushed the corsix:x64 branch from ee1fbce to a2a34af Apr 9, 2016

@corsix corsix force-pushed the corsix:x64 branch 4 times, most recently from 97cf347 to f509b07 Apr 16, 2016

@corsix

This comment has been minimized.

corsix commented Apr 17, 2016

I think that this PR is now in a reasonable state for all backends.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented Apr 17, 2016

I crash when I try to use your version:

0  0x000000000bcbc8fa in TRACE_19 () at ../../../src/lua/modules/syscall/linux/types.lua:430
1  0x000000000040eb32 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=<optimized out>, errfunc=errfunc@entry=2) at lj_api.c:1055
2  0x00000000004042e0 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
3  0x00000000004051a6 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
4  pmain (L=0x7ffff7fd6378) at luajit.c:537
5  0x00000000004210ca in lj_BC_FUNCC ()
6  0x000000000040ebbd in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404900 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
7  0x0000000000403df6 in main (argc=2, argv=0x7fffffffde88) at luajit.c:565

-joff:

0  0x000000000bcbf93a in TRACE_4 () at /media/caps/ssd_840_120gb/goluwa/src/lua/libraries/filesystem/base_file.lua:31
1  0x000000000040eb32 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=<optimized out>, errfunc=errfunc@entry=2) at lj_api.c:1055
2  0x00000000004042e0 in docall (L=0x7ffff7fd6378, narg=0, clear=1) at luajit.c:121
3  0x00000000004051a6 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
4  pmain (L=0x7ffff7fd6378) at luajit.c:537
5  0x00000000004210ca in lj_BC_FUNCC ()
6  0x000000000040ebbd in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404900 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
7  0x0000000000403df6 in main (argc=3, argv=0x7fffffffde68) at luajit.c:565

make XCFLAGS+=-DLUAJIT_ENABLE_GC64 XCFLAGS+=-DLUAJIT_ENABLE_LUA52COMPAT XCFLAGS+=-DLUAJIT_USE_GDBJIT CCDEBUG=-g

They don't look very useful. I can upload dump files but I'm not sure what you would need specifically. I'm not experienced with gdb and debugging C programs in general.

@DemiMarie

This comment has been minimized.

DemiMarie commented Apr 17, 2016

@CapsAdmin try recompiling with -DLUA_USE_ASSERT and see if you get an assertion failure.

@corsix

This comment has been minimized.

corsix commented Apr 17, 2016

Ah, LUAJIT_ENABLE_LUA52COMPAT isn't something any of my configurations
exercise.

On Sun, Apr 17, 2016 at 10:28 PM, Demetri Obenour notifications@github.com
wrote:

@CapsAdmin https://github.com/CapsAdmin try recompiling with
-DLUA_USE_ASSERT and see if you get an assertion failure.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#149 (comment)

@CapsAdmin

This comment has been minimized.

CapsAdmin commented Apr 17, 2016

it makes no difference here

@CapsAdmin

This comment has been minimized.

CapsAdmin commented Apr 17, 2016

Missed the suggestion about turning asserts on. but lua52compat makes no difference

0  0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
1  0x00007ffff732201a in __GI_abort () at abort.c:89
2  0x00007ffff7318bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x47e4e7 "lj_obj_equal(tv, &tvk)", file=file@entry=0x47e404 "lj_record.c", 
   line=line@entry=142, function=function@entry=0x47fb10 <__PRETTY_FUNCTION__.5893> "rec_check_slots") at assert.c:92
3  0x00007ffff7318c82 in __GI___assert_fail (assertion=assertion@entry=0x47e4e7 "lj_obj_equal(tv, &tvk)", file=file@entry=0x47e404 "lj_record.c", line=line@entry=142, 
   function=function@entry=0x47fb10 <__PRETTY_FUNCTION__.5893> "rec_check_slots") at assert.c:101
4  0x000000000043b836 in rec_check_slots (J=0x7ffff7fd6678) at lj_record.c:142
5  lj_record_ins (J=J@entry=0x7ffff7fd6678) at lj_record.c:2032
6  0x000000000041b6f8 in trace_state (L=0x7ffff7fd6378, dummy=<optimized out>, ud=0x7ffff7fd6678) at lj_trace.c:641
7  0x0000000000427903 in lj_vm_cpcall ()
8  0x000000000041c50a in lj_trace_ins (J=0x7ffff7fd6678, pc=pc@entry=0x7ffff7f0b7d0) at lj_trace.c:702
9  0x000000000040a649 in lj_dispatch_call (L=0x7ffff7fd6378, pc=0x7ffff7f0b7d4) at lj_dispatch.c:493
10 0x0000000000429329 in lj_vm_hotcall ()
11 0x0000000000411d54 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1055
12 0x0000000000404500 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
13 0x00000000004053f6 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
14 pmain (L=0x7ffff7fd6378) at luajit.c:537
15 0x000000000042750d in lj_BC_FUNCC ()
16 0x0000000000411e39 in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404b20 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
17 0x0000000000404016 in main (argc=2, argv=0x7fffffffde88) at luajit.c:565

@corsix corsix force-pushed the corsix:x64 branch from f509b07 to 28e6b99 Apr 18, 2016

@corsix

This comment has been minimized.

corsix commented Apr 18, 2016

... and the current test suite doesn't exercise ffi very much. I guess that should be my next focus.

@joeyu

This comment has been minimized.

joeyu commented Apr 20, 2016

In terms of enabling the ARM64 jit compiler, my question is: shall it be based on LJ_GC64 (&& LJ_FR2)?

It seems that !LJ_GC64 is the default configuration for x64 in v2.1 unless LUAJIT_ENABLE_GC64 is specified. And I'm not clear the status of LJ_GC64 (&& LJ_FR2) for x64 - for instance, is its performance competitive with the 32-bit (!LJ_GC64) version?

@MikePall

This comment has been minimized.

Member

MikePall commented Apr 21, 2016

There is no choice for ARM64. The interpreter requires LJ_GC64, due to iOS restrictions: lowest 4GB cannot be mapped, no workaround.

@NukeRusich

This comment has been minimized.

NukeRusich commented Apr 22, 2016

How about on Android?

@corsix corsix force-pushed the corsix:x64 branch from 28e6b99 to 69e1ba7 Apr 24, 2016

@joeyu

This comment has been minimized.

joeyu commented Apr 28, 2016

@MikePall Thanks.

@corsix I'd appreciate if I could know more about the remainder of the required total effort, e.g. which files and how many lines of code to be add/modified, in common part (i.e. IR, and other frontend/middleend components) and architecture-specific part (i.e. backend, I suppose) respectively.

It would be great if there was a rough estimate of man-month for an experienced luajit developer (e.g. you, as Mike has suggested) and newbie's (e.g. me:)) if applicable.

I'm investigating a plan of enabling ARM64. And sponsorship is an open option.

Thanks in advance.

@corsix

This comment has been minimized.

corsix commented Apr 28, 2016

If you're willing on gamble on this PR (or something close to it) being merged, then the common parts in this PR should be good enough to use to start building the ARM64-specific assembler. Personally, I have no intent of looking at ARM64 until x64 is finished.

I'm afraid that I'm unable to make a good estimate on how much work remains to be done in the common part (in the architecture-specific part, a reasonable first-order estimate is to look at the ARM backend, and assume that ARM64 would be similar amount of code/complexity).

@corsix corsix force-pushed the corsix:x64 branch from 69e1ba7 to 7213658 May 18, 2016

@corsix

This comment has been minimized.

corsix commented May 18, 2016

1st big commit has been reworked and split up into smaller pieces (though I've been short on time the last 10 days, so it isn't as polished or tested as I might otherwise like it to be). (old version is at https://github.com/corsix/LuaJIT/tree/x64v1 for now, should it still be useful in some way)

@MikePall

This comment has been minimized.

Member

MikePall commented May 23, 2016

Reviewed and applied with various improvements and fixes. Phew. Thanks!

Remaining issues:

  • Assertion failure in rec_check_slots() for ffi_metatype and ffi_jit_call tests.
  • Crash in gc_sweep() for stack_purge test.

Enhancements:

  • Various places use a 64 bit store plus a 32 bit load-modify-store to tag a value in memory. This is probably suboptimal for the CPU write combiner, but really needs to be benchmarked.
  • Fix ARM32 asm_fload(), so lj_ir_ggfload() can be used on all archs.
  • Now that the IR is immovable, this may allow various optimizations for constant loads on non-x86 archs, too.
  • Benchmark x64/!LJ_GC64 against x64/LJ_GC64 to find regressions and more tuning opportunities.

Note: the common LJ_GC64 code is now more or less final, so other archs can start to build upon it.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented May 23, 2016

I've compiled the latest changes with XCFLAGS+=-DLUAJIT_ENABLE_LUA52COMPAT XCFLAGS+=-DLUAJIT_ENABLE_GC64 XCFLAGS+=-DLUAJIT_USE_GDBJIT XCFLAGS+=-DLUA_USE_ASSERT CCDEBUG=-g

When trying to use this in my project I get this assertion error on startup:

Starting program: /media/caps/ssd_840_120gb/goluwa/data/bin/linux_x64/luajit ../../../src/lua/init.lua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
luajit: lj_record.c:111: rec_check_slots: Assertion `((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
0  0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
1  0x00007ffff732201a in __GI_abort () at abort.c:89
2  0x00007ffff7318bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x47f178 "((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)", file=file@entry=0x47e974 "lj_record.c", 
    line=line@entry=111, function=function@entry=0x480080 <__PRETTY_FUNCTION__.5890> "rec_check_slots") at assert.c:92
3  0x00007ffff7318c82 in __GI___assert_fail (assertion=assertion@entry=0x47f178 "((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)", file=file@entry=0x47e974 "lj_record.c", line=line@entry=111, 
    function=function@entry=0x480080 <__PRETTY_FUNCTION__.5890> "rec_check_slots") at assert.c:101
4  0x000000000043c78f in rec_check_slots (J=0x7ffff7fd6678) at lj_record.c:111
5  lj_record_ins (J=0x7ffff7fd6678) at lj_record.c:2036
6  0x000000000041b828 in trace_state (L=0x7ffff7fd6378, dummy=<optimized out>, ud=0x7ffff7fd6678) at lj_trace.c:651
7  0x0000000000427b38 in lj_vm_cpcall ()
8  0x000000000041c53a in lj_trace_ins (J=0x7ffff7fd6678, pc=<optimized out>) at lj_trace.c:710
9  0x000000000040a39c in lj_dispatch_ins (L=0x7ffff7fd6378, pc=0x7ffff7f8bc7c) at lj_dispatch.c:424
10 0x00000000004294f8 in lj_vm_inshook ()
11 0x0000000000411dd4 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1055
12 0x0000000000404580 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
13 0x0000000000405476 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
14 pmain (L=0x7ffff7fd6378) at luajit.c:537
15 0x0000000000427742 in lj_BC_FUNCC ()
16 0x0000000000411eb9 in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404ba0 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
17 0x0000000000404096 in main (argc=2, argv=0x7fffffffddd8) at luajit.c:565

But If I increase hotloop it gets a bit further (which I guess means less the jit compiling):

luajit: lj_record.c:111: rec_check_slots: Assertion `((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)' failed.
0  0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
1  0x00007ffff732201a in __GI_abort () at abort.c:89
2  0x00007ffff7318bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x47f178 "((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)", file=file@entry=0x47e974 "lj_record.c", 
    line=line@entry=111, function=function@entry=0x480080 <__PRETTY_FUNCTION__.5890> "rec_check_slots") at assert.c:92
3  0x00007ffff7318c82 in __GI___assert_fail (assertion=assertion@entry=0x47f178 "((((GCobj *)(((((tv)-1)->gcr).gcptr64) & (((uint64_t)1 << 47) - 1)))))->gch.gct == ~(~8u)", file=file@entry=0x47e974 "lj_record.c", line=line@entry=111, 
    function=function@entry=0x480080 <__PRETTY_FUNCTION__.5890> "rec_check_slots") at assert.c:101
4  0x000000000043c78f in rec_check_slots (J=0x7ffff7fd6678) at lj_record.c:111
5  lj_record_ins (J=0x7ffff7fd6678) at lj_record.c:2036
6  0x000000000041b828 in trace_state (L=0x7ffff7fd6378, dummy=<optimized out>, ud=0x7ffff7fd6678) at lj_trace.c:651
7  0x0000000000427b38 in lj_vm_cpcall ()
8  0x000000000041c53a in lj_trace_ins (J=0x7ffff7fd6678, pc=<optimized out>) at lj_trace.c:710
9  0x000000000040a39c in lj_dispatch_ins (L=0x7ffff7fd6378, pc=0x7ffff7f8bc7c) at lj_dispatch.c:424
10 0x00000000004294f8 in lj_vm_inshook ()
11 0x0000000000411dd4 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1055
12 0x0000000000404580 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
13 0x0000000000405476 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
14 pmain (L=0x7ffff7fd6378) at luajit.c:537
15 0x0000000000427742 in lj_BC_FUNCC ()
16 0x0000000000411eb9 in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404ba0 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
17 0x0000000000404096 in main (argc=2, argv=0x7fffffffddd8) at luajit.c:565

This happens if I use -joff :

Starting program: /media/caps/ssd_840_120gb/goluwa/data/bin/linux_x64/luajit -joff ../../../src/lua/init.lua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
lj_cconv_ct_ct (cts=0x7ffff7fe63d0, d=<optimized out>, s=0x7fffe836e0a0, dp=0x7ffff7032958 "\300\374\376\367\377\377\375\377\377\377\377\377\377\377\377\377%", sp=0x7fffe745e324 <error: Cannot access memory at address 0x7fffe745e324>, 
    flags=<optimized out>) at lj_cconv.c:243
243             else i = *(uint8_t *)sp;
(gdb) bt
0  lj_cconv_ct_ct (cts=0x7ffff7fe63d0, d=<optimized out>, s=0x7fffe836e0a0, dp=0x7ffff7032958 "\300\374\376\367\377\377\375\377\377\377\377\377\377\377\377\377%", sp=0x7fffe745e324 <error: Cannot access memory at address 0x7fffe745e324>, 
    flags=<optimized out>) at lj_cconv.c:243
1  0x0000000000451581 in lj_cconv_tv_ct (cts=0x7ffff7fe63d0, s=<optimized out>, sid=<optimized out>, o=0x7ffff7032958, sp=<optimized out>) at lj_cconv.c:389
2  0x0000000000423fe8 in lj_cf_ffi_meta___index (L=0x7ffff7fd6378) at lib_ffi.c:158
3  0x0000000000427742 in lj_BC_FUNCC ()
4  0x0000000000411dd4 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1055
5  0x0000000000404580 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
6  0x0000000000405476 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
7  pmain (L=0x7ffff7fd6378) at luajit.c:537
8  0x0000000000427742 in lj_BC_FUNCC ()
9  0x0000000000411eb9 in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404ba0 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
10 0x0000000000404096 in main (argc=3, argv=0x7fffffffddd8) at luajit.c:565

I hope this is useful rather than noisy.

@MikePall

This comment has been minimized.

Member

MikePall commented May 23, 2016

@CapsAdmin Any testing feedback is useful at this stage of development. Thanks!

We already have some short test cases for the rec_check_slots() assertion, so I'd wait before isolating another test case. However, the last crash is unexpected:

  • Crash in LJ_GC64 interpreted mode with load from FFI aggregate.

Judging from the backtrace, this looks like an out-of-bounds pointer for a uint8_t struct field or uint8_t[] array. If possible, please try to isolate this to a test case. Make sure the issue doesn't happen with !LJ_GC64 (and -joff), as it could be a plain FFI usage error, too.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented May 23, 2016

I managed to isolate the last error. Looking at my code I didn't see anything that would cause an out of bound error but when I tried to print the pointer (pointer was in a table that i dumped) it wouldn't crash anymore so it seemed like a gc thing where it crashes only when it exits the function.

local ffi = require("ffi")

ffi.cdef("void* malloc(size_t size); void free(void* ptr);")

local arr = ffi.cast("uint8_t *", ffi.gc(ffi.C.malloc(256*256*4), ffi.C.free))
-- local arr = ffi.new("uint8_t[?]", 256*256*4) --          <<< won't crash

for i = 0, 100 do
    arr[i] = math.random(50)
    collectgarbage() --          <<< will crash only when using malloc
end

So adding collectgarbage in the middle of the loop in this test case produces the same error trace as before. Using ffi.new works fine though.

Starting program: /media/caps/ssd_840_120gb/goluwa/data/bin/linux_x64/luajit -joff test.lua

Program received signal SIGSEGV, Segmentation fault.
lj_cconv_ct_ct (cts=0x7ffff7fe0bc0, d=0x7ffff7fe0e30, s=<optimized out>, dp=0x7ffff7f86011 <error: Cannot access memory at address 0x7ffff7f86011>, sp=<optimized out>, flags=2) at lj_cconv.c:197
197           else *(int8_t *)dp = (int8_t)i;
(gdb) bt
0  lj_cconv_ct_ct (cts=0x7ffff7fe0bc0, d=0x7ffff7fe0e30, s=<optimized out>, dp=0x7ffff7f86011 <error: Cannot access memory at address 0x7ffff7f86011>, sp=<optimized out>, flags=2) at lj_cconv.c:197
1  0x0000000000451f35 in lj_cconv_ct_tv (cts=0x7ffff7fe0bc0, d=<optimized out>, dp=0x7ffff7f86011 <error: Cannot access memory at address 0x7ffff7f86011>, o=0x7ffff7fe0950, flags=<optimized out>) at lj_cconv.c:627
2  0x00000000004505f0 in lj_cdata_set (cts=<optimized out>, d=<optimized out>, dp=<optimized out>, o=<optimized out>, qual=<optimized out>) at lj_cdata.c:294
3  0x0000000000423f33 in lj_cf_ffi_meta___newindex (L=0x7ffff7fd6378) at lib_ffi.c:178
4  0x0000000000427742 in lj_BC_FUNCC ()
5  0x0000000000411dd4 in lua_pcall (L=L@entry=0x7ffff7fd6378, nargs=nargs@entry=0, nresults=-1, errfunc=errfunc@entry=2) at lj_api.c:1055
6  0x0000000000404580 in docall (L=0x7ffff7fd6378, narg=0, clear=0) at luajit.c:121
7  0x0000000000405476 in handle_script (n=<optimized out>, argv=<optimized out>, L=<optimized out>) at luajit.c:288
8  pmain (L=0x7ffff7fd6378) at luajit.c:537
9  0x0000000000427742 in lj_BC_FUNCC ()
10 0x0000000000411eb9 in lua_cpcall (L=L@entry=0x7ffff7fd6378, func=func@entry=0x404ba0 <pmain>, ud=ud@entry=0x0) at lj_api.c:1079
11 0x0000000000404096 in main (argc=3, argv=0x7fffffffde18) at luajit.c:565

The reason I use malloc like this is because I allocate large pixel buffers which might go beyond 1gb.

@MikePall

This comment has been minimized.

Member

MikePall commented May 23, 2016

That's a misuse of ffi.gc. You must keep the original pointer object passed to (or returned from) ffi.gc, otherwise it's collected. A cast creates a new pointer object. Since you don't keep the original pointer around, the corresponding free function passed to ffi.gc is called when it's collected.

I.e. you need to do the cast inside and not outside of ffi.gc.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented May 23, 2016

Thanks! Now it's all working except for the rec_check_slots crashes. (this prevents me to test any further)

I see the mistake but the code that crashed did work before theses changes. So I was lucky in case anyone runs into the same problem after this.

@DemiMarie

This comment has been minimized.

DemiMarie commented May 24, 2016

Trying to debug this. Hard part is getting a crash in an unoptimized build.

@DemiMarie

This comment has been minimized.

DemiMarie commented May 24, 2016

Looking at a core file from an assertion, I find that depth is equal to 3 when the assertion lua_assert(J->framedepth == depth) fails. J->framedepth is 2.

@sindrom91

This comment has been minimized.

sindrom91 commented May 26, 2016

It looks like rec_mm_arith has base and basev off by one (+ LJ_FR2?).

That seems to resolve the assertion in ffi_metatype.lua, but theres another "attepmt to call a nil value" issue that occurs in the same file. That one happens at ffi_metatype.lua:205.

@DemiMarie

This comment has been minimized.

DemiMarie commented May 27, 2016

That trips an assertion in the test lang/meta/arith_jit.lua for me.

@sindrom91

This comment has been minimized.

sindrom91 commented May 27, 2016

Yea, it's definitely not that.

Anyway, I continued debugging and what seems to be happening is that tv, being checked at 134, actually is not a number, but an untagged function, which only looks like a number.

Its value was last set in vm_x64.dasc:4000, which says copy func+tag down, but the tag isn't actually there. It seems to have branched from vmeta_call/vmeta_call_ra, which clear type before conditional branch to BC_CALLT.

Anyway, I moved cleartp from 1108 to 1112 and things seem to be working fine.

(I didn't run all tests, just a few. I'll do that tomorrow.)

@CapsAdmin

This comment has been minimized.

CapsAdmin commented May 27, 2016

@sindrom91 I'm still getting the rec_check_slots crashes after trying that with assertions enabled.

In my graphical application I have an event manager whose draw callbacks using xpcall randomly stops being called for short periods. Knowing my own code it seems like there's a lua error except it's not showing. If I enable assertions these silent errors still there but then I also get the rec_check_slots crash. I don't have the time to look into it much at the moment.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented May 28, 2016

Without LUAJIT_ENABLE_GC64 I also get this sometimes.

Thread 1 "luajit" received signal SIGSEGV, Segmentation fault.
0x0000000000431cce in setgcV (it=<optimized out>, v=<optimized out>, o=<optimized out>, L=<optimized out>) at lj_obj.h:873
873       setgcVraw(o, v, it); tvchecklive(L, o);                                                                                                                                                                                                   
(gdb) bt                                                                                                                                                                                                                                            
0  0x0000000000431cce in setgcV (it=<optimized out>, v=<optimized out>, o=<optimized out>, L=<optimized out>) at lj_obj.h:873                                                                                                                      
1  snap_restoreval (J=0x40000558, T=<optimized out>, ex=<optimized out>, snapno=<optimized out>, rfilt=0, ref=<optimized out>, o=0x405bb250) at lj_snap.c:641
2  0x00000000004338c8 in lj_snap_restore (J=<optimized out>, exptr=0x7fffffffd910) at lj_snap.c:867
3  0x00000000004195fb in trace_exit_cp (L=<optimized out>, dummy=<optimized out>, ud=0x7fffffffd8b0) at lj_trace.c:773
4  0x0000000000425e5d in lj_vm_cpcall ()
5  0x000000000041b340 in lj_trace_exit (J=0x40000558, exptr=0x7fffffffd910) at lj_trace.c:843
6  0x00000000004276ac in lj_vm_exit_handler ()
7  0x0000000000000001 in ?? ()
8  0x0000000000000000 in ?? ()
luajit: lj_obj.h:873: setgcV: Assertion `!((((o)->it) - ((~4u)+1)) > ((~13u) - ((~4u)+1))) || ((~((o)->it) == (((GCobj *)(uintptr_t)((o)->gcr).gcptr32))->gch.gct) && !(((((GCobj *)(uintptr_t)((o)->gcr).gcptr32)))->gch.marked & ((((global_State *)(void *)(uintptr_t)(L->glref).ptr32))->gc.currentwhite ^ (0x01 | 0x02)) & (0x01 | 0x02)))' failed.

Thread 1 "luajit" received signal SIGABRT, Aborted.
0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
0  0x00007ffff7320418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
1  0x00007ffff732201a in __GI_abort () at abort.c:89
2  0x00007ffff7318bd7 in __assert_fail_base (fmt=<optimized out>, 
   assertion=assertion@entry=0x4700c0 "!((((o)->it) - ((~4u)+1)) > ((~13u) - ((~4u)+1))) || ((~((o)->it) == (((GCobj *)(uintptr_t)((o)->gcr).gcptr32))->gch.gct) && !(((((GCobj *)(uintptr_t)((o)->gcr).gcptr32)))->gch.marked & ((((global_Sta"..., file=file@entry=0x46ff78 "lj_obj.h", line=line@entry=873, function=function@entry=0x479011 <__PRETTY_FUNCTION__.3548> "setgcV") at assert.c:92
3  0x00007ffff7318c82 in __GI___assert_fail (
   assertion=assertion@entry=0x4700c0 "!((((o)->it) - ((~4u)+1)) > ((~13u) - ((~4u)+1))) || ((~((o)->it) == (((GCobj *)(uintptr_t)((o)->gcr).gcptr32))->gch.gct) && !(((((GCobj *)(uintptr_t)((o)->gcr).gcptr32)))->gch.marked & ((((global_Sta"..., file=file@entry=0x46ff78 "lj_obj.h", line=line@entry=873, function=function@entry=0x479011 <__PRETTY_FUNCTION__.3548> "setgcV") at assert.c:101
4  0x0000000000431e31 in setgcV (it=<optimized out>, v=<optimized out>, o=<optimized out>, L=<optimized out>) at lj_obj.h:873
5  snap_restoreval (J=<optimized out>, T=<optimized out>, ex=<optimized out>, snapno=<optimized out>, rfilt=<optimized out>, ref=<optimized out>, o=0x40c8a360) at lj_snap.c:641
6  0x00000000004338c8 in lj_snap_restore (J=<optimized out>, exptr=0x7fffffffd8c0) at lj_snap.c:867
7  0x00000000004195fb in trace_exit_cp (L=<optimized out>, dummy=<optimized out>, ud=0x7fffffffd860) at lj_trace.c:773
8  0x0000000000425e5d in lj_vm_cpcall ()
9  0x000000000041b340 in lj_trace_exit (J=0x40000558, exptr=0x7fffffffd8c0) at lj_trace.c:843
10 0x00000000004276ac in lj_vm_exit_handler ()
11 0x0000000000000000 in ?? ()

This was tested with the latest changes 56fe899 and compiled with XCFLAGS+=-DLUAJIT_ENABLE_LUA52COMPAT XCFLAGS+=-DLUAJIT_USE_GDBJIT XCFLAGS+=-DLUA_USE_ASSERT CCDEBUG=-g using default jit options. I hope it's not a user mistake again.

@sindrom91

This comment has been minimized.

sindrom91 commented Jun 2, 2016

I looked into another issue what occurs in ffi_metatype.lua (attempt to call a nil value) and it seems that setgcV (called from lj_cdata_setfin) is not working as expected. In this particular case, it's supposed to set TValue to nil and it does that by setting type to 11...1 and value to 00...0, which is not how nil is defined (lj_obj.h:247).

Edit: More importantly, tvisnil check fails, because it checks against whole TValue (not only upper 17 bits).

@MikePall

This comment has been minimized.

Member

MikePall commented Jun 3, 2016

@sindrom91 Thanks! The rec_check_slots and lj_cdata_setfin issues have been fixed.

I've tracked down the issue with the stack_purge test (but haven't fixed it, yet):

If a recorded function tailcalls at the base level, it should get a TREF_FRAME marker for the base slot. This doesn't happen for LJ_FR2, which can also be seen in -jdump=+s for the second trace: it's missing the | that separates the frame.

A missing TREF_FRAME means gotframe = 0 in asm_baseslot. Then asm_tail_link doesn't call asm_stack_check, which guards against stack overflow. The test triggers such an overflow, which in turn causes out-of-bounds writes.

I'm not sure at the moment whether it's truly unnecessary to deal with LJ_FR2 slot 1 anywhere else.

If yes, then asm_baseslot and jit/dump.lua should be fixed.

If no, then a fix probably needs to touch lj_record_tailcall, rec_check_slots (for the case s == 1) and snapshot_slots. But that causes other problems and I ran out of time to debug this further.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented Jun 3, 2016

I get the rec_check_slots error now only if I use my custom jit options (which may be exaggerated). Disabling that I get this though.

luajit: lj_asm_x86.h:603: asm_gencall: Assertion (ir)->o == IR_KNUM || (ir)->o == IR_KINT64 || (1 && ((ir)->o == IR_KGC || (ir)->o == IR_KPTR || (ir)->o == IR_KKPTR))' failed.`

Two different backtraces with same assertion error:
https://gist.github.com/CapsAdmin/3fd7517db8a219d77c5d8840ddae991d

Only happens with the gc64 flag enabled.

As with most of these reports I can probably make an isolated test case but it's very time consuming so I won't unless I'm asked to.

@MikePall

This comment has been minimized.

Member

MikePall commented Jun 5, 2016

I've added a fix for the assertion in asm_gencall.

@CapsAdmin

This comment has been minimized.

CapsAdmin commented Jun 5, 2016

With custom jit options I get the setgcV crash and the rec_check_slots crash during initialization. By changing the jit options around (they are changed during initialization programmatically) they disappear and I get other crashes every now and then.

gdb backtraces here:
https://gist.github.com/CapsAdmin/fc38ba10ecd75e1bc3934a386030895d

@MikePall MikePall referenced this pull request Oct 13, 2016

Closed

GC64 bugs #221

@corsix

This comment has been minimized.

corsix commented Oct 14, 2016

How about the following for the stack_purge issue?

diff --git a/src/jit/dump.lua b/src/jit/dump.lua
index fbadcce..d90bf4d 100644
--- a/src/jit/dump.lua
+++ b/src/jit/dump.lua
@@ -338,6 +338,8 @@ local function formatk(tr, idx, sn)
   elseif t == 21 then -- int64_t
     s = sub(tostring(k), 1, -3)
     if sub(s, 1, 1) ~= "-" then s = "+"..s end
+  elseif sn == 0x1057fff then -- SNAP(1, SNAP_FRAME | SNAP_NORESTORE, REF_NIL)
+    return "----"
   else
     s = tostring(k) -- For primitives.
   end
diff --git a/src/lj_record.c b/src/lj_record.c
index 48018f4..a858ffa 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -105,7 +105,7 @@ static void rec_check_slots(jit_State *J)
    lua_assert(tref_isfunc(tr));
 #if LJ_FR2
       } else if (s == 1) {
-   lua_assert(0);
+   lua_assert((tr & ~TREF_FRAME) == 0);
 #endif
       } else if ((tr & TREF_FRAME)) {
    GCfunc *fn = gco2func(frame_gc(tv));
@@ -747,7 +747,7 @@ void lj_record_tailcall(jit_State *J, BCReg func, ptrdiff_t nargs)
   }
   /* Move func + args down. */
   if (LJ_FR2 && J->baseslot == 2)
-    J->base[func+1] = 0;
+    J->base[func+1] = TREF_FRAME;
   memmove(&J->base[-1-LJ_FR2], &J->base[func], sizeof(TRef)*(J->maxslot+1+LJ_FR2));
   /* Note: the new TREF_FRAME is now at J->base[-1] (even for slot #0). */
   /* Tailcalls can form a loop, so count towards the loop unroll limit. */
diff --git a/src/lj_snap.c b/src/lj_snap.c
index 4825997..56235e2 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -69,7 +69,11 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
     TRef tr = J->slot[s];
     IRRef ref = tref_ref(tr);
 #if LJ_FR2
-    if (s == 1) continue;
+    if (s == 1) {
+      if ((tr & TREF_FRAME))
+   map[n++] = SNAP(1, SNAP_FRAME | SNAP_NORESTORE, REF_NIL);
+      continue;
+    }
     if ((tr & (TREF_FRAME | TREF_CONT)) && !ref) {
       TValue *base = J->L->base - J->baseslot;
       tr = J->slot[s] = (tr & 0xff0000) | lj_ir_k64(J, IR_KNUM, base[s].u64);
@@ -470,7 +474,10 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
       goto setslot;
     bloomset(seen, ref);
     if (irref_isk(ref)) {
-      tr = snap_replay_const(J, ir);
+      if (LJ_FR2 && (sn == SNAP(1, SNAP_FRAME | SNAP_NORESTORE, REF_NIL)))
+   tr = 0;
+      else
+   tr = snap_replay_const(J, ir);
     } else if (!regsp_used(ir->prev)) {
       pass23 = 1;
       lua_assert(s != 0);
@@ -484,7 +491,7 @@ void lj_snap_replay(jit_State *J, GCtrace *T)
     }
   setslot:
     J->slot[s] = tr | (sn&(SNAP_CONT|SNAP_FRAME));  /* Same as TREF_* flags. */
-    J->framedepth += ((sn & (SNAP_CONT|SNAP_FRAME)) && s);
+    J->framedepth += ((sn & (SNAP_CONT|SNAP_FRAME)) && (s != LJ_FR2));
     if ((sn & SNAP_FRAME))
       J->baseslot = s+1;
   }
@MikePall

This comment has been minimized.

Member

MikePall commented Oct 16, 2016

Applied.

Closing here. Moving possible enhancements to #225 #226 #227.

Please open new issues for any remaining LJ_GC64 bugs.

@MikePall MikePall closed this Oct 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment