New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new JIT tier which is tightly coupled to the AST Interpreter #651
Conversation
with jemalloc:
|
with jemalloc:
|
4e5d195
to
359c6e2
Compare
I tried to reduce the number of GC allocated mem (to decrease the number of collections) by creating a wrapper around the I also tried to align the jump targets to 16byte adresses by emitting multi byte nop sequences but this also didn't help. (Would have surprised me if it would have helped - don't think there is much to gain in this area) |
279bca1
to
fa593f6
Compare
@@ -790,6 +790,8 @@ void Assembler::cmp(Indirect mem, Register reg) { | |||
} | |||
|
|||
void Assembler::lea(Indirect mem, Register reg) { | |||
RELEASE_ASSERT(mem.base != RSP, "We have so generate the SIB byte..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need a SIB byte for R12.
The main issues I see currently:
|
|
||
class JitFragment; | ||
|
||
// Manages a fixed size memory block which stores JITed code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor request -- I think it'd be nice to start with a little bit more detail here about the overall baseline-JIT strategy and how it relates to the interpreter. Just a quick high-level overview of what the classes in this file are for, how they get used, what our strategy is for going back and forth between the interpreter/jitted code.
I'll read some more of the code in a sec, but overall my feedback is: yeah there's some stuff that is a little bit quick-and-dirty, but I think that's ok for now. I'd say, let's just be good about leaving notes on the things that we think could be better, then work on getting this in and iterating. I'm ok with this area of the code being a little bit rough to start since it feels like we'll be experimenting with it a lot. |
a.push(assembler::RBP); | ||
a.mov(assembler::RSP, assembler::RBP); | ||
|
||
static_assert(scratch_size % 16 == 0, "stack aligment code depends on this"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure that the eh_frame we emit matches the stack size that we ask for -- I think the eh_frame_template hardcodes a certain amount of scratch (48 bytes I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I choose to use frame pointers because IMHO if I emit the EH info for a function with frame pointers it doesn't contain the amount of bytes the stack has to be adjusted but just that the stack pointer has to be restored by copying the base pointer.
At least that's what I thought whats happening by comparing different EH table output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah you're right :)
fa593f6
to
c964b34
Compare
c964b34
to
425c08c
Compare
I addressed all your immediate comments and but some of the non JIT changes into a separate request (#664) and rebased. |
sorted_symbol_table[source_info->getInternedStrings().get(PASSED_GENERATOR_NAME)] = generator; | ||
// TODO: we will immediately want the liveness info again in the jit, we should pass | ||
// it through. | ||
std::unique_ptr<LivenessAnalysis> liveness = computeLivenessInfo(source_info->cfg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be getLiveness()? I think the call to computeLivenessInfo() in irgen.cpp could also be changed to access the cached version.
425c08c
to
f586f49
Compare
addressed your comments in the last commit |
I think the issue with unique_ptr's and forward-declared types is that the compiler will try to generate a default destructor (for the containing class) which won't work. You can get around this by declaring a destructor (ie `~SourceInfo();``); the fun part is you might not have to define it since it probably doesn't get called anywhere :P |
I think we might need to be careful about reentrancy -- what if we are in the baseline jit, and we hit a function call that ends up triggering the baseline jit on the same function? I haven't finished reading through the code, but assuming there isn't anything that explicitly handles this, I think there could be issues. For instance, the JitFragmentWriter will save the I think it might be enough to say "if the current codeblock offset at the end is not the same as when we started, abort the fragment"? But I'm not sure if that's necessary (or enough). |
The |
Ah ok, nice :) Could we still have an assert in fragmentFinished/finishCompilation that asserts the fragmentwriter and the codeblock agree on where the code will end up? Also I think it's safe to assume we're not getting rid of the GIL any time soon. Probably not until after Windows and Python 3 support :/ |
I will continue tomorrow with eliminating the shared_ptr and adding the check. |
ok finally read through it all :) looks good to me; feel free to rebase/squash and we should be able to get this merged in tomorrow! |
This JIT is tightly coupled to the ASTInterpreter, at every CFGBlock* entry/exit on can switch between interpreting and directly executing the generated code without having to do any translations. Generating the code is pretty fast compared to the LLVM tier but the generated code is not as fast as code generated by the higher LLVM tiers. But because the JITed can use runtime ICs, avoids a lot of interpretation overhead and stores CFGBlock locals sysbols inside register/stack slots, it's much faster than the interpreter.
f586f49
to
fc36656
Compare
I made the last changes and squashed. There should be no functional changes besides that I enabled again the callattr IC because I don't get the protobuf error any more. I don't think the issue is fixed but I also don't think it's hiding inside the JIT (more likely Runtime ICs/rewritter) if we encounter it again I will try to track it down. |
Add a new JIT tier which is tightly coupled to the AST Interpreter
Awesome! having a baseline JIT will change a lot of things :) |
Cool :-) |
This JIT is tightly coupled to the ASTInterpreter, at every CFGBlock* entry/exit
on can switch between interpreting and directly executing the generated code without having
to do any translations.
Generating the code is pretty fast compared to the LLVM tier but the generated code is not as fast
as code generated by the higher LLVM tiers.
But because the JITed can use runtime ICs, avoids a lot of interpretation overhead and
stores CFGBlock locals symbols inside register/stack slots, it's much faster than the interpreter.
Current perf numbers:
best of 3 runs:
because of the increased malloc traffic this PR should benefit a lot from jemalloc...