JIT compile native calls #676

niner · 2017-09-06T18:20:08Z

This is accompanied by jit_nativecall branches in nqp and rakudo.

jnthn · 2017-09-06T18:44:29Z

src/spesh/facts.c

@@ -606,6 +606,11 @@ static void add_bb_facts(MVMThreadContext *tc, MVMSpeshGraph *g, MVMSpeshBB *bb,
            g->facts[result.reg.orig][result.reg.i].flags |= MVM_SPESH_FACT_DECONTED;
            break;
        }
+        case MVM_OP_nativecallinvokejit: {
+            for (i = 0; i < g->num_locals; i++)
+                g->facts[i][g->fact_counts[i] - 1].usages++;


This looks weird and fragile. It seems to be giving every local a bump of 1...why? Usage counts are computed up at line 376 for every op.

The nativecallinvokejit OP is a bit special as it expects the arguments to be in the first locals of the call frame. Spesh didn't know this and removed them as dead weight. Upping the usage count is done to reflect the actual usage.

Ouch, I think what I originally suggested (though it probably wasn't clear why, and so probably sounded odd...) was that nativecallinvokejit used the standard args buffer and then was just a kind of invoke instruction (so marked -a in oplist to terminate an args sequence). Then args would be placed into the args buffer like a normal call, and later we'd pull the values out of the frame->args buffer instead of frame-work (so I suspect it's a relatively small change on that side of things). This way, there'd be no issues with usage counts and so forth, since args in the arg buffer are obviously used by the callee.

Having spent so long of late removing cheats like this in spesh (many driven by the fact that each of them bit back somewhere and cost a bunch of debugging time), I'm quite reluctant to see another one like this go in. Especially as I suspect it's actually not sufficient in the case that a frame with nativecallinvokejit in it were to be inlined. If that happens, then we don't re-run the whole facts process on the inlinee, just do some much cheaper usage count setting, and so on that code path we wouldn't do extra usage bumps like is being done here, and then dead code elimination would eat them. We could of course do similar compensation as we do here, but that only makes me fear where we get caught out next...

jnthn · 2017-09-06T18:45:23Z

src/6model/reprs/NativeCall.c

@@ -38,6 +38,7 @@ static void copy_to(MVMThreadContext *tc, MVMSTable *st, void *src, MVMObject *d
        memcpy(dest_body->arg_types, src_body->arg_types, src_body->num_args * sizeof(MVMint16));
    }
    dest_body->ret_type = src_body->ret_type;
+    dest_body->jitcode = src_body->jitcode;


I think this file also wants a line for jitcode added to gc_free

zhuomingliang · 2017-09-06T19:14:00Z

src/core/oplist

@@ -830,6 +830,7 @@ atomicstore_o       r(obj) r(obj) :invokish
 atomicstore_i       r(obj) r(int64)
 barrierfull
 coveragecontrol     r(int64)
+nativecallinvokejit w(obj) r(obj) r(obj)


Seems that some args are not used

bdw

Pretty good, but please address the cleanup of the jitcode parameter.

bdw · 2017-09-22T05:37:53Z

src/6model/reprs/NativeCall.c

@@ -88,6 +89,8 @@ static void gc_cleanup(MVMThreadContext *tc, MVMSTable *st, void *data) {
        MVM_free(body->arg_types);
    if (body->arg_info)
        MVM_free(body->arg_info);
+    if (body->jitcode)
+        MVM_free(body->jitcode);


This will leak pages allocated by MVM_jit_compiler_assemble, needs a MVM_jit_code_destroy

bdw · 2017-09-22T06:31:16Z

src/jit/x64/emit.dasc

+    case MVM_JIT_REG_DYNIDX:
+        | get_cur_op TMP5;
+        | xor TMP6, TMP6;
+        | mov TMP6w, word [TMP5 + arg.v.reg*2]


Fwiw, the two lines above can, I think, become:
| movzx TMP6, word [TMP5 + arg.v.reg*2];
And, the nicer thing to do (I think) would probably be to use a two-word sized type:
| movzs TMP6, word U16:TMP5[arg.v.reg*2];

I think this is only necessary because you might call this nativecall object from separate positions in the bytecode (with the argument stored in a different location each time). In which case this makes sense, but it would probably be a good idea to add some documentation to explain why this can't be 'hardcoded'.

bdw · 2017-09-22T06:36:37Z

src/jit/x64/emit.dasc

    case MVM_JIT_DATA_LABEL:
        | lea TMP6, [=>(arg.v.lit_i64)];
        break;
+    case MVM_JIT_SAVED_RV:
+        | mov TMP6, qword [rbp-0x28];
+        break;


There are other ways to do this; specifically, I've been using the args array as a temporary buffer in the compilation of unless_o and if_o. Not sure if that is possible in this case, but it would save you the implementation of SAVED_RV.
Alternatively, you can give [rbp-0x28] a symbolic name with
|.define SAVED_RV, [rbp-0x28]

I don't see an advantage of using the args buffer over just putting it on the stack. I'd have to locate the args buffer first, at least for ahead of time compilation causing unnecessary memory churn. I have it the symbolic name. Works like a charm :)

bdw · 2017-09-22T06:38:47Z

src/jit/x64/emit.dasc

+        | mov TMP6, TC->cur_frame;
+        | mov TMP6, FRAME:TMP6->args;
+        | mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64];
+        break;


The frame args and frame local variables are always contiguous (or used to be, maybe @jnthn knows), so if you know the size of the whole frame you can compute the offset directly from args (and save yourself having to read FRAME->args, because you can just use rbx instead).

Yes, it would be safe to do that.

At the time when this code is compiled, the code calling ncinvoke_o does not even exist yet. So while args is always part of WORK, I don't know yet where exactly in WORK it will end up in. Hence the need to read FRAME->args.

Later when the frame is speshed and compiled again, this information is available and I emit MVM_JIT_PARAM_I64 instead of MVM_JIT_ARG_I64. The former can access WORK directly, as the offsets are known.

bdw · 2017-09-22T06:41:02Z

src/jit/x64/emit.dasc

+        | mov TMP6, FRAME:TMP6->args;
+        | mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64];
+        | mov TMP6, aword STOOGE:TMP6->data;
+        break;


Same as above, but more importantly, I think this should have a more descriptive name because it dereferences the data pointer of a MVMObjectStooge, which I wouldn't expect from the name.

Naming is hard :) The enums are named after what is passed. Pointers are passed as CPointer objects which contain the actual pointer in their data. So it's indeed one level of dereferencing, but the result is still a pointer which is passed to the native function. So MVM_JIT_ARG_PTR does indeed result in a Pointer being passed. It just does the unboxing, too.
Maybe a comment like this ^^^ will suffice?

Yep, I'll be perfectly happy with a comment.

jnthn · 2017-09-22T09:12:38Z

src/core/nativecall.c

@@ -372,8 +372,162 @@ static const char *dlerror(void)
 }
 #endif

+void init_c_call_node(MVMThreadContext *tc, MVMSpeshGraph *sg, MVMJitNode *node, void *func_ptr, MVMint16 num_args, MVMJitCallArg *args) {


Does this bunch of code need to be conditionally compiled depending on whether or not we build the JIT?

I can't find any conditional compilation of JIT code? Only the emit part but not the graph itself.

What we do is 'stub' the JIT compiled code. The JIT graph declarations are always compiled, though, so this should be safe. We do not, in fact, have declared a MVM_JIT_IS_COMPILED parameter or something of that sort, so at worst, this is wasteful.

jnthn · 2017-09-22T09:14:52Z

src/jit/graph.c

+            }
+
+            body = MVM_nativecall_get_nc_body(tc, object_facts->value.o);
+            nc_jg = MVM_nativecall_jit_graph_for_caller_code(tc, iter->graph, body, restype, dst);


This may be premature, but just wondering if the code doing this maybe should be somewhere under src/jit/x64/, since it's specific to that. Would be fine enough seeing that as a future cleanup though.

MVM_nativecall_build builds a JIT graph manually and compiles it. The result is attached to the MVMNativeCallBody. For now the new nativeinvoke_* ops (mirroring invoke_*) will enter that code. Ideally we will replace the whole frame containing the nativeinvoke by JIT compiled code. The nativeinvoke_o op just invokes the JITed code which will handle argument fetching on its own so we can avoid the overhead of copying arguments into an nqp::list. We need to unblock GC on the current thread after we return from the native function and before we box the return value. Otherwise if GC is triggered by the boxing function, we will hang as GC is blocked. You can now tell the JIT that the C function you want to call should take an argument from WORK[cur_op + whatever] which is the equivalent of all those GET_REG(cur_op, whatever) in interp.c. Same for return values. When in interp.c you'd write GET_REG(cur_op, 0) = ...; you tell the JIT: node->u.call.rv_mode = MVM_JIT_RV_DYNIDX; box_rv_node->u.call.rv_idx = 0;

While we already generated machine code for performing the actual call to the C function, we perform all deconting and unboxing of arguments in high level code. This is so that spesh can optimize this code to what's actually necessary for a given call site. For JIT compilation of the specialized code, we compile the call to the C function again, but this time instead of tailoring it to getting the return type from the runloop, we use can access information gathered by spesh for determining the register holding the return type. The generated JIT graph is seamlessly included in the wrapper function body's graph. A function call with a single argument thus results in JIT compiled code for: <sp_getarg_o>, <set>, <decont_i>, invokish control guard, <set>, <PHI>, <PHI>, <wval>, <wval>, MVM_gc_mark_thread_blocked, the actual call, MVM_gc_mark_thread_unblocked, call to box the return value, <return_o> Since at this stage the code calling the nativeinvoke_o op cannot change anymore, we can access the WORK registers directly, saving the copying of those to tc->cur_frame->args.

niner · 2017-09-22T15:34:40Z

I've addressed all the excellent suggestions (some by adding explanatory comments).

jnthn

I'm happy with this now. Nice work, thanks!

jnthn reviewed Sep 6, 2017

View reviewed changes

zhuomingliang reviewed Sep 6, 2017

View reviewed changes

niner force-pushed the jit_nativecall branch 5 times, most recently from ecdba29 to 1edff49 Compare September 19, 2017 19:42

bdw reviewed Sep 22, 2017

View reviewed changes

jnthn reviewed Sep 22, 2017

View reviewed changes

niner force-pushed the jit_nativecall branch from 1edff49 to 71aab57 Compare September 22, 2017 13:13

niner added 2 commits September 22, 2017 17:32

niner force-pushed the jit_nativecall branch from 71aab57 to 76ff81c Compare September 22, 2017 15:32

jnthn approved these changes Sep 27, 2017

View reviewed changes

jnthn merged commit b3dd812 into master Sep 27, 2017

jnthn deleted the jit_nativecall branch September 27, 2017 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT compile native calls #676

JIT compile native calls #676

niner commented Sep 6, 2017

jnthn Sep 6, 2017

niner Sep 6, 2017

jnthn Sep 6, 2017

jnthn Sep 6, 2017

zhuomingliang Sep 6, 2017 •

edited

Loading

bdw left a comment

bdw Sep 22, 2017

bdw Sep 22, 2017

bdw Sep 22, 2017

niner Sep 22, 2017

bdw Sep 22, 2017

jnthn Sep 22, 2017

niner Sep 22, 2017

bdw Sep 22, 2017

niner Sep 22, 2017

bdw Sep 25, 2017

jnthn Sep 22, 2017

niner Sep 22, 2017

bdw Sep 25, 2017

jnthn Sep 22, 2017

niner commented Sep 22, 2017

jnthn left a comment

JIT compile native calls #676

JIT compile native calls #676

Conversation

niner commented Sep 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhuomingliang Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

bdw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niner commented Sep 22, 2017

jnthn left a comment

Choose a reason for hiding this comment

zhuomingliang Sep 6, 2017 •

edited

Loading