Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix or ditch trace stitching #13

Closed
MikePall opened this issue Aug 18, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@MikePall
Copy link
Member

commented Aug 18, 2015

Trace stitching is currently broken and disabled.

It was quite useful for some code bases with many C calls that would otherwise cause a NYI. It was not so useful for other code bases where enough effort has been put into converting the code to do C calls with the FFI (e.g. OpenResty).

Previous discussion: http://www.freelists.org/post/luajit/small-script-to-reproduce-bogus-trace-stitch-errors-at-line-0-with-coroutines-in-latest-21,1
(please read the whole thread)

If this is to be fixed and re-enabled, please note the stitching heuristics haven't been successfully tuned at all. The trace length turned out not to be a useful criterion, when evaluated in isolation. The -Ominstitch limit was effectively used as an on/off switch.

@mraleph

This comment has been minimized.

Copy link

commented Aug 19, 2015

I've been thinking about this today because code base I work with hits a lot of "NYI: C function" aborts.

I came up with a way to fix stitching which is while being somewhat hacky - seems to be our best shot.

The core idea is to store boxed stitch link on the stack instead of storing trace number directly on the stack. There is no lightweight boxed GCobj so I just decided to go with GCudata for this:

slot where traceno used to be
v                             v base
[   *    ][  cont  ][  frame ][  ....  ]
    | 
    \--> [ GCudata { int32_t link } ]

link value is either:

  • positive value - linked state - trace number that cont_stitch should proceed to;
  • zero - blacklisted state, cont_stitch should proceed to cont_nop;
  • negative value - need to stitch state - start tracing and then link trace number ~link (note: ~ not - because link can be 0) to the newly created trace;

Every time tracer modifies ->link for some GCtrace or flushes this trace away it checks if this trace has linktype == LJ_TRLINK_STITCH in which case it finds stitch link box by looking at the last slots of the last snapshot and patches the value contained in the box.

This way even if the trace dies the stitch link can live on (with a 0 inside because lj_trace_free will set it to 0) and cont_stitch will function properly.

There is one very dodgy part here: touching stitch box object inside lj_trace_free - because we don't know whether trace & box are dying together (in which case box could very well be already freed by a sweeper) or box continues to live on. The way I "solve" this dependency is by assigning a metatable with a dummy __gc finalizer to the udata representing the stitch box. This way GCtrace object always dies before its stitch box (trace dies during the sweep phase, stitch box dies much later) which makes it safe to look at the stitch box inside lj_trace_free.

This solution has certain memory overhead (a udata with finalizer per stitched trace, plus a metatable containing that finalizer if any trace was stitched) but this overhead seems to be reasonably low and is only paid for stitched traces. Solution has a low implementation complexity and doesn't seem to have any architecture implications because it mostly reuses already existing machinery.

Alternative solution I considered was to write GCtrace itself into the slot where its number used to be written - but that requires massaging things like IR typing (which does not support GCtrace constants - it's "type" slot is taken by IRT_P64) and solving a self-reference problem (have to patch IR constants once actual GCtrace is allocated by trace_save). So this seems to be more complicated compared to the workaround described above.

I have a patch implementing this fix for x86 ready - so, Mike, if you think this approach has a reasonable ratio of cost/benefit/hackiness[*] and you don't see a mistake in my reasoning then I'll port it to all other supported platforms and send you a PR.

Otherwise, it was a fun exercise - and at least I found another bug in process :)

[*] - I personally feel that it's OK approach, though I am not happy with the fact that I have to box a 4 byte number into a box with 24 byte header (on x86).

@MikePall

This comment has been minimized.

Copy link
Member Author

commented Aug 19, 2015

After pondering over this for a while, it becomes clear what the real issue was: let the GC do its work and not try to second-guess it with a link number instead of a proper GC reference on the stack. Thank you for the insight!

I think putting the GCtrace object itself on the stack would be the cleanest solution. Freeing the trace needs to blacklist the link, of course.

The chicken-and-egg problem with code generation could be solved by putting a load from a non-moving location into the IR and filling in the final GCtrace later. Maybe split lj_ir_k64_find() into search and alloc and only use the alloc part. Then it's always the last allocated K64 constant. Since irt_to_itype() is dumb, one could just tag the load as IRT_P64 and asm_stack_restore() would do its job. There are a bunch of asserts that might get in the way, but they'll trigger quickly.

@mraleph

This comment has been minimized.

Copy link

commented Aug 19, 2015

Yep, I completely agree with your assessment that putting GCtrace into that slot is without doubt the cleanest solution [I just was not sure how complicated it would be - so decided to approach it from the side].

The only draw back it has, as far as I see it, is that continuations pointing to these almost dead traces will retain more memory compared to a separate stitch link box approach: whole GCtrace object + potentially all side traces, sibling root traces and linked traces - in the corner case when trace becomes almost dead when GCproto, it was anchored to, died (as opposed to explicit trace flushing - which will just prune strong links between traces and minimize set of retained objects).

I will give it a shot and send you a PR.

Freeing the trace needs to blacklist the link, of course.

Do you mean flushing? Freeing (as in lj_trace_free) won't happen while GCtrace is referenced from the stack. Yep, flushing will mark trace as blacklisted.

@MikePall

This comment has been minimized.

Copy link
Member Author

commented Aug 19, 2015

Memory use ought to be irrelevant for a weird corner case that took 1.5 years to reproduce. ;-)

Yes, flushing.

@MikePall

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2015

Thanks! The second approach has been applied to my git master (with modifications).

@MikePall MikePall closed this Aug 29, 2015

akopytov pushed a commit to akopytov/LuaJIT that referenced this issue Oct 15, 2016

gut pushed a commit to PPC64/LuaJIT that referenced this issue Aug 30, 2017

Gustavo Serra Scalet
Fix debug information for PPC64
Removed the complicated handling of lj_vm_ffi_call (it was a variable
size frame) and now backtrace works all over (e.g:)

 #0  0x00003fffb7d4875c in __libc_send (fd=32, buf=0x3fffb09a0028, len=8192, flags=0) at ../sysdeps/unix/sysv/linux/send.c:31
 #1  0x00003fffb7bea214 in socket_send (ps=0x3fffb7bc7778, data=0x3fffb09a0028 'A' <repeats 200 times>..., count=8192, sent=0x3fffffffee60, tm=0x3fffb7bc97d8) at usocket.c:205
 #2  0x00003fffb7be4ef8 in sendraw (buf=0x3fffb7bc77a0, data=0x3fffb09a0028 'A' <repeats 200 times>..., count=52428800, sent=0x3fffffffeee8) at buffer.c:176
 #3  0x00003fffb7be4960 in buffer_meth_send (L=0x3fffb7f6d280, buf=0x3fffb7bc77a0) at buffer.c:87
 LuaJIT#4  0x00003fffb7bec3f4 in meth_send (L=0x3fffb7f6d280) at tcp.c:130
 LuaJIT#5  0x0000000010042d44 in lj_BC_FUNCC ()
 LuaJIT#6  0x0000000010043f24 in lj_ff_coroutine_resume ()
 LuaJIT#7  0x000000001001d7d4 in lua_pcall (L=0x3fffb7f60378, nargs=0, nresults=-1, errfunc=2) at lj_api.c:1129
 LuaJIT#8  0x00000000100045e8 in docall (L=0x3fffb7f60378, narg=0, clear=0) at luajit.c:121
 LuaJIT#9  0x00000000100053ec in handle_script (L=0x3fffb7f60378, argx=0x3ffffffffa40) at luajit.c:291
 LuaJIT#10 0x0000000010006600 in pmain (L=0x3fffb7f60378) at luajit.c:551
 LuaJIT#11 0x0000000010042d44 in lj_BC_FUNCC ()
 LuaJIT#12 0x000000001001da40 in lua_cpcall (L=0x3fffb7f60378, func=0x10006334 <pmain>, ud=0x0) at lj_api.c:1153
 LuaJIT#13 0x00000000100067a4 in main (argc=2, argv=0x3ffffffffa38) at luajit.c:580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.