Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT compile native calls #676

Merged
merged 2 commits into from
Sep 27, 2017
Merged

JIT compile native calls #676

merged 2 commits into from
Sep 27, 2017

Conversation

niner
Copy link
Contributor

@niner niner commented Sep 6, 2017

This is accompanied by jit_nativecall branches in nqp and rakudo.

@@ -606,6 +606,11 @@ static void add_bb_facts(MVMThreadContext *tc, MVMSpeshGraph *g, MVMSpeshBB *bb,
g->facts[result.reg.orig][result.reg.i].flags |= MVM_SPESH_FACT_DECONTED;
break;
}
case MVM_OP_nativecallinvokejit: {
for (i = 0; i < g->num_locals; i++)
g->facts[i][g->fact_counts[i] - 1].usages++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird and fragile. It seems to be giving every local a bump of 1...why? Usage counts are computed up at line 376 for every op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nativecallinvokejit OP is a bit special as it expects the arguments to be in the first locals of the call frame. Spesh didn't know this and removed them as dead weight. Upping the usage count is done to reflect the actual usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, I think what I originally suggested (though it probably wasn't clear why, and so probably sounded odd...) was that nativecallinvokejit used the standard args buffer and then was just a kind of invoke instruction (so marked -a in oplist to terminate an args sequence). Then args would be placed into the args buffer like a normal call, and later we'd pull the values out of the frame->args buffer instead of frame-work (so I suspect it's a relatively small change on that side of things). This way, there'd be no issues with usage counts and so forth, since args in the arg buffer are obviously used by the callee.

Having spent so long of late removing cheats like this in spesh (many driven by the fact that each of them bit back somewhere and cost a bunch of debugging time), I'm quite reluctant to see another one like this go in. Especially as I suspect it's actually not sufficient in the case that a frame with nativecallinvokejit in it were to be inlined. If that happens, then we don't re-run the whole facts process on the inlinee, just do some much cheaper usage count setting, and so on that code path we wouldn't do extra usage bumps like is being done here, and then dead code elimination would eat them. We could of course do similar compensation as we do here, but that only makes me fear where we get caught out next...

@@ -38,6 +38,7 @@ static void copy_to(MVMThreadContext *tc, MVMSTable *st, void *src, MVMObject *d
memcpy(dest_body->arg_types, src_body->arg_types, src_body->num_args * sizeof(MVMint16));
}
dest_body->ret_type = src_body->ret_type;
dest_body->jitcode = src_body->jitcode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file also wants a line for jitcode added to gc_free

src/core/oplist Outdated
@@ -830,6 +830,7 @@ atomicstore_o r(obj) r(obj) :invokish
atomicstore_i r(obj) r(int64)
barrierfull
coveragecontrol r(int64)
nativecallinvokejit w(obj) r(obj) r(obj)
Copy link
Member

@zhuomingliang zhuomingliang Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that some args are not used

@niner niner force-pushed the jit_nativecall branch 5 times, most recently from ecdba29 to 1edff49 Compare September 19, 2017 19:42
Copy link
Contributor

@bdw bdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good, but please address the cleanup of the jitcode parameter.

@@ -88,6 +89,8 @@ static void gc_cleanup(MVMThreadContext *tc, MVMSTable *st, void *data) {
MVM_free(body->arg_types);
if (body->arg_info)
MVM_free(body->arg_info);
if (body->jitcode)
MVM_free(body->jitcode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will leak pages allocated by MVM_jit_compiler_assemble, needs a MVM_jit_code_destroy

case MVM_JIT_REG_DYNIDX:
| get_cur_op TMP5;
| xor TMP6, TMP6;
| mov TMP6w, word [TMP5 + arg.v.reg*2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, the two lines above can, I think, become:
| movzx TMP6, word [TMP5 + arg.v.reg*2];
And, the nicer thing to do (I think) would probably be to use a two-word sized type:
| movzs TMP6, word U16:TMP5[arg.v.reg*2];

I think this is only necessary because you might call this nativecall object from separate positions in the bytecode (with the argument stored in a different location each time). In which case this makes sense, but it would probably be a good idea to add some documentation to explain why this can't be 'hardcoded'.

case MVM_JIT_DATA_LABEL:
| lea TMP6, [=>(arg.v.lit_i64)];
break;
case MVM_JIT_SAVED_RV:
| mov TMP6, qword [rbp-0x28];
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other ways to do this; specifically, I've been using the args array as a temporary buffer in the compilation of unless_o and if_o. Not sure if that is possible in this case, but it would save you the implementation of SAVED_RV.
Alternatively, you can give [rbp-0x28] a symbolic name with
|.define SAVED_RV, [rbp-0x28]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an advantage of using the args buffer over just putting it on the stack. I'd have to locate the args buffer first, at least for ahead of time compilation causing unnecessary memory churn. I have it the symbolic name. Works like a charm :)

| mov TMP6, TC->cur_frame;
| mov TMP6, FRAME:TMP6->args;
| mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64];
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frame args and frame local variables are always contiguous (or used to be, maybe @jnthn knows), so if you know the size of the whole frame you can compute the offset directly from args (and save yourself having to read FRAME->args, because you can just use rbx instead).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be safe to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time when this code is compiled, the code calling ncinvoke_o does not even exist yet. So while args is always part of WORK, I don't know yet where exactly in WORK it will end up in. Hence the need to read FRAME->args.

Later when the frame is speshed and compiled again, this information is available and I emit MVM_JIT_PARAM_I64 instead of MVM_JIT_ARG_I64. The former can access WORK directly, as the offsets are known.

| mov TMP6, FRAME:TMP6->args;
| mov TMP6, qword REGISTER:TMP6[arg.v.lit_i64];
| mov TMP6, aword STOOGE:TMP6->data;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, but more importantly, I think this should have a more descriptive name because it dereferences the data pointer of a MVMObjectStooge, which I wouldn't expect from the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard :) The enums are named after what is passed. Pointers are passed as CPointer objects which contain the actual pointer in their data. So it's indeed one level of dereferencing, but the result is still a pointer which is passed to the native function. So MVM_JIT_ARG_PTR does indeed result in a Pointer being passed. It just does the unboxing, too.
Maybe a comment like this ^^^ will suffice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll be perfectly happy with a comment.

@@ -372,8 +372,162 @@ static const char *dlerror(void)
}
#endif

void init_c_call_node(MVMThreadContext *tc, MVMSpeshGraph *sg, MVMJitNode *node, void *func_ptr, MVMint16 num_args, MVMJitCallArg *args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this bunch of code need to be conditionally compiled depending on whether or not we build the JIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any conditional compilation of JIT code? Only the emit part but not the graph itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do is 'stub' the JIT compiled code. The JIT graph declarations are always compiled, though, so this should be safe. We do not, in fact, have declared a MVM_JIT_IS_COMPILED parameter or something of that sort, so at worst, this is wasteful.

}

body = MVM_nativecall_get_nc_body(tc, object_facts->value.o);
nc_jg = MVM_nativecall_jit_graph_for_caller_code(tc, iter->graph, body, restype, dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be premature, but just wondering if the code doing this maybe should be somewhere under src/jit/x64/, since it's specific to that. Would be fine enough seeing that as a future cleanup though.

MVM_nativecall_build builds a JIT graph manually and compiles it. The
result is attached to the MVMNativeCallBody. For now the new nativeinvoke_*
ops (mirroring invoke_*) will enter that code. Ideally we will replace the
whole frame containing the nativeinvoke by JIT compiled code.

The nativeinvoke_o op just invokes the JITed code which will handle argument
fetching on its own so we can avoid the overhead of copying arguments into an
nqp::list.

We need to unblock GC on the current thread after we return from the
native function and before we box the return value. Otherwise if GC is
triggered by the boxing function, we will hang as GC is blocked.

You can now tell the JIT that the C function you want to call should take an
argument from WORK[cur_op + whatever] which is the equivalent of all those
GET_REG(cur_op, whatever) in interp.c.
Same for return values. When in interp.c you'd write GET_REG(cur_op, 0) = ...;
you tell the JIT: node->u.call.rv_mode = MVM_JIT_RV_DYNIDX;
box_rv_node->u.call.rv_idx = 0;
While we already generated machine code for performing the actual call to the
C function, we perform all deconting and unboxing of arguments in high level
code. This is so that spesh can optimize this code to what's actually
necessary for a given call site.

For JIT compilation of the specialized code, we compile the call to the C
function again, but this time instead of tailoring it to getting the return
type from the runloop, we use can access information gathered by spesh for
determining the register holding the return type. The generated JIT graph is
seamlessly included in the wrapper function body's graph.

A function call with a single argument thus results in JIT compiled code for:
<sp_getarg_o>, <set>, <decont_i>, invokish control guard, <set>, <PHI>, <PHI>,
<wval>, <wval>, MVM_gc_mark_thread_blocked, the actual call,
MVM_gc_mark_thread_unblocked, call to box the return value, <return_o>

Since at this stage the code calling the nativeinvoke_o op cannot change
anymore, we can access the WORK registers directly, saving the copying
of those to tc->cur_frame->args.
@niner
Copy link
Contributor Author

niner commented Sep 22, 2017

I've addressed all the excellent suggestions (some by adding explanatory comments).

Copy link
Member

@jnthn jnthn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now. Nice work, thanks!

@jnthn jnthn merged commit b3dd812 into master Sep 27, 2017
@jnthn jnthn deleted the jit_nativecall branch September 27, 2017 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants