-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT compile native calls #676
Conversation
src/spesh/facts.c
Outdated
@@ -606,6 +606,11 @@ static void add_bb_facts(MVMThreadContext *tc, MVMSpeshGraph *g, MVMSpeshBB *bb, | |||
g->facts[result.reg.orig][result.reg.i].flags |= MVM_SPESH_FACT_DECONTED; | |||
break; | |||
} | |||
case MVM_OP_nativecallinvokejit: { | |||
for (i = 0; i < g->num_locals; i++) | |||
g->facts[i][g->fact_counts[i] - 1].usages++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird and fragile. It seems to be giving every local a bump of 1...why? Usage counts are computed up at line 376 for every op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nativecallinvokejit OP is a bit special as it expects the arguments to be in the first locals of the call frame. Spesh didn't know this and removed them as dead weight. Upping the usage count is done to reflect the actual usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, I think what I originally suggested (though it probably wasn't clear why, and so probably sounded odd...) was that nativecallinvokejit
used the standard args buffer and then was just a kind of invoke
instruction (so marked -a
in oplist
to terminate an args sequence). Then args would be placed into the args buffer like a normal call, and later we'd pull the values out of the frame->args
buffer instead of frame-work
(so I suspect it's a relatively small change on that side of things). This way, there'd be no issues with usage counts and so forth, since args in the arg buffer are obviously used by the callee.
Having spent so long of late removing cheats like this in spesh (many driven by the fact that each of them bit back somewhere and cost a bunch of debugging time), I'm quite reluctant to see another one like this go in. Especially as I suspect it's actually not sufficient in the case that a frame with nativecallinvokejit
in it were to be inlined. If that happens, then we don't re-run the whole facts process on the inlinee, just do some much cheaper usage count setting, and so on that code path we wouldn't do extra usage bumps like is being done here, and then dead code elimination would eat them. We could of course do similar compensation as we do here, but that only makes me fear where we get caught out next...
@@ -38,6 +38,7 @@ static void copy_to(MVMThreadContext *tc, MVMSTable *st, void *src, MVMObject *d | |||
memcpy(dest_body->arg_types, src_body->arg_types, src_body->num_args * sizeof(MVMint16)); | |||
} | |||
dest_body->ret_type = src_body->ret_type; | |||
dest_body->jitcode = src_body->jitcode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file also wants a line for jitcode
added to gc_free
src/core/oplist
Outdated
@@ -830,6 +830,7 @@ atomicstore_o r(obj) r(obj) :invokish | |||
atomicstore_i r(obj) r(int64) | |||
barrierfull | |||
coveragecontrol r(int64) | |||
nativecallinvokejit w(obj) r(obj) r(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that some args are not used
ecdba29
to
1edff49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty good, but please address the cleanup of the jitcode parameter.
src/6model/reprs/NativeCall.c
Outdated
@@ -88,6 +89,8 @@ static void gc_cleanup(MVMThreadContext *tc, MVMSTable *st, void *data) { | |||
MVM_free(body->arg_types); | |||
if (body->arg_info) | |||
MVM_free(body->arg_info); | |||
if (body->jitcode) | |||
MVM_free(body->jitcode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will leak pages allocated by MVM_jit_compiler_assemble, needs a MVM_jit_code_destroy
src/jit/x64/emit.dasc
Outdated
case MVM_JIT_REG_DYNIDX: | ||
| get_cur_op TMP5; | ||
| xor TMP6, TMP6; | ||
| mov TMP6w, word [TMP5 + arg.v.reg*2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, the two lines above can, I think, become:
| movzx TMP6, word [TMP5 + arg.v.reg*2];
And, the nicer thing to do (I think) would probably be to use a two-word sized type:
| movzs TMP6, word U16:TMP5[arg.v.reg*2];
I think this is only necessary because you might call this nativecall object from separate positions in the bytecode (with the argument stored in a different location each time). In which case this makes sense, but it would probably be a good idea to add some documentation to explain why this can't be 'hardcoded'.
src/jit/x64/emit.dasc
Outdated
case MVM_JIT_DATA_LABEL: | ||
| lea TMP6, [=>(arg.v.lit_i64)]; | ||
break; | ||
case MVM_JIT_SAVED_RV: | ||
| mov TMP6, qword [rbp-0x28]; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other ways to do this; specifically, I've been using the args array as a temporary buffer in the compilation of unless_o and if_o. Not sure if that is possible in this case, but it would save you the implementation of SAVED_RV.
Alternatively, you can give [rbp-0x28]
a symbolic name with
|.define SAVED_RV, [rbp-0x28]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an advantage of using the args buffer over just putting it on the stack. I'd have to locate the args buffer first, at least for ahead of time compilation causing unnecessary memory churn. I have it the symbolic name. Works like a charm :)
| mov TMP6, TC->cur_frame; | ||
| mov TMP6, FRAME:TMP6->args; | ||
| mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64]; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frame args and frame local variables are always contiguous (or used to be, maybe @jnthn knows), so if you know the size of the whole frame you can compute the offset directly from args (and save yourself having to read FRAME->args, because you can just use rbx instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be safe to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time when this code is compiled, the code calling ncinvoke_o does not even exist yet. So while args is always part of WORK, I don't know yet where exactly in WORK it will end up in. Hence the need to read FRAME->args.
Later when the frame is speshed and compiled again, this information is available and I emit MVM_JIT_PARAM_I64 instead of MVM_JIT_ARG_I64. The former can access WORK directly, as the offsets are known.
| mov TMP6, FRAME:TMP6->args; | ||
| mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64]; | ||
| mov TMP6, aword STOOGE:TMP6->data; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, but more importantly, I think this should have a more descriptive name because it dereferences the data pointer of a MVMObjectStooge, which I wouldn't expect from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is hard :) The enums are named after what is passed. Pointers are passed as CPointer objects which contain the actual pointer in their data. So it's indeed one level of dereferencing, but the result is still a pointer which is passed to the native function. So MVM_JIT_ARG_PTR does indeed result in a Pointer being passed. It just does the unboxing, too.
Maybe a comment like this ^^^ will suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll be perfectly happy with a comment.
@@ -372,8 +372,162 @@ static const char *dlerror(void) | |||
} | |||
#endif | |||
|
|||
void init_c_call_node(MVMThreadContext *tc, MVMSpeshGraph *sg, MVMJitNode *node, void *func_ptr, MVMint16 num_args, MVMJitCallArg *args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this bunch of code need to be conditionally compiled depending on whether or not we build the JIT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any conditional compilation of JIT code? Only the emit part but not the graph itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we do is 'stub' the JIT compiled code. The JIT graph declarations are always compiled, though, so this should be safe. We do not, in fact, have declared a MVM_JIT_IS_COMPILED
parameter or something of that sort, so at worst, this is wasteful.
} | ||
|
||
body = MVM_nativecall_get_nc_body(tc, object_facts->value.o); | ||
nc_jg = MVM_nativecall_jit_graph_for_caller_code(tc, iter->graph, body, restype, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be premature, but just wondering if the code doing this maybe should be somewhere under src/jit/x64/
, since it's specific to that. Would be fine enough seeing that as a future cleanup though.
1edff49
to
71aab57
Compare
MVM_nativecall_build builds a JIT graph manually and compiles it. The result is attached to the MVMNativeCallBody. For now the new nativeinvoke_* ops (mirroring invoke_*) will enter that code. Ideally we will replace the whole frame containing the nativeinvoke by JIT compiled code. The nativeinvoke_o op just invokes the JITed code which will handle argument fetching on its own so we can avoid the overhead of copying arguments into an nqp::list. We need to unblock GC on the current thread after we return from the native function and before we box the return value. Otherwise if GC is triggered by the boxing function, we will hang as GC is blocked. You can now tell the JIT that the C function you want to call should take an argument from WORK[cur_op + whatever] which is the equivalent of all those GET_REG(cur_op, whatever) in interp.c. Same for return values. When in interp.c you'd write GET_REG(cur_op, 0) = ...; you tell the JIT: node->u.call.rv_mode = MVM_JIT_RV_DYNIDX; box_rv_node->u.call.rv_idx = 0;
While we already generated machine code for performing the actual call to the C function, we perform all deconting and unboxing of arguments in high level code. This is so that spesh can optimize this code to what's actually necessary for a given call site. For JIT compilation of the specialized code, we compile the call to the C function again, but this time instead of tailoring it to getting the return type from the runloop, we use can access information gathered by spesh for determining the register holding the return type. The generated JIT graph is seamlessly included in the wrapper function body's graph. A function call with a single argument thus results in JIT compiled code for: <sp_getarg_o>, <set>, <decont_i>, invokish control guard, <set>, <PHI>, <PHI>, <wval>, <wval>, MVM_gc_mark_thread_blocked, the actual call, MVM_gc_mark_thread_unblocked, call to box the return value, <return_o> Since at this stage the code calling the nativeinvoke_o op cannot change anymore, we can access the WORK registers directly, saving the copying of those to tc->cur_frame->args.
71aab57
to
76ff81c
Compare
I've addressed all the excellent suggestions (some by adding explanatory comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this now. Nice work, thanks!
This is accompanied by jit_nativecall branches in nqp and rakudo.