New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a cache for JITed object code #433
Conversation
Oh :-( should have run make check before deciding that some piece of code is unnecessary... |
sre_parse_parse goes from 2s to 0.9s :) Though with That's too bad about printing the patchpoints, but maybe we could use the existing I think we'll probably want to iterate on the cache-management strategy, so I'm not too worried if it's not great in the initial PR. We might want to bound the size of the cache overall (LRU with max size?) since we might run a ton of code through a single executable. |
std::error_code error_code; | ||
llvm::raw_fd_ostream IRObjectFile(cache_file.c_str(), error_code, llvm::sys::fs::F_RW); | ||
RELEASE_ASSERT(!error_code, ""); | ||
IRObjectFile << Obj.getBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to be more careful here -- if the user ctrl-C's while we're writing out the object file, on the next run we'll happily load it and then segfault. I ran into this with the .pyc files as well -- I was hoping that we wouldn't have to worry about it, but it actually happened enough even just during development that we had to add checksums. I'm not sure how easy that is to do here, but I think that'd be ideal; another option would be to write out the file to a temporary name and then rename it to the correct name, which guards against the ctrl-C behavior but not other kinds of corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
Another issue you probably already noticed (because of the test failures)
is that I currently have a design issue in the change I integrated to
generate deterministic node names (aka replacement for #%p). I was under
the impression that the strings must only be unique inside the CFG. But
thats not true 😂. Do you have a idea how to generate per module
deterministic node names which are unique? I'm currently afk but will look
into it tomorrow.
Am 04.04.2015 00:18 schrieb "Kevin Modzelewski" notifications@github.com:
In src/codegen/entry.cpp
#433 (comment):#endif
- {
RELEASE_ASSERT(module_identifier == M->getModuleIdentifier(), "");
RELEASE_ASSERT(!hash_before_codegen.empty(), "");
llvm::SmallString<128> cache_file = cache_dir;
llvm::sys::path::append(cache_file, hash_before_codegen);
if (!llvm::sys::fs::exists(cache_dir.str()) && llvm::sys::fs::create_directory(cache_dir.str())) {
fprintf(stderr, "Unable to create cache directory\n");
return;
}
std::error_code error_code;
llvm::raw_fd_ostream IRObjectFile(cache_file.c_str(), error_code, llvm::sys::fs::F_RW);
RELEASE_ASSERT(!error_code, "");
IRObjectFile << Obj.getBuffer();
I think we might need to be more careful here -- if the user ctrl-C's
while we're writing out the object file, on the next run we'll happily load
it and then segfault. I ran into this with the .pyc files as well -- I was
hoping that we wouldn't have to worry about it, but it actually happened
enough even just during development that we had to add checksums. I'm not
sure how easy that is to do here, but I think that'd be ideal; another
option would be to write out the file to a temporary name and then rename
it to the correct name, which guards against the ctrl-C behavior but not
other kinds of corruption.—
Reply to this email directly or view it on GitHub
https://github.com/dropbox/pyston/pull/433/files#r27759807.
Ok I think I found the source of the errors -- I think the issue is that when we remap generator expressions / set+dict comprehensions we generate an inner function object, and currently we call nodeName() during the CFG process of the outer scope, and insert those names into the inner scope. Previously this was fine but with your changes we would get clashes when the ids overlapped. Can you double-check, but I think it's ok to just give these specific values hard-coded names, since I don't think there can ever be more than one of them per scope. I sent you a PR that I think fixes it. |
Update:
The cache file corruption check is not yet implemented and everything needs more testing |
The llvm patch seems pretty reasonable to me; I would maybe email llvm-dev + Andrew Trick + Jeurgen Ributzka for comments? As a general comment about testing: Michael has some tester improvements that should make it easier to write integration tests, so we could/should use that for this. I think it's ok though if we push this without having 100% confidence in it; let's do some reasonable testing, add a flag to turn it off on demand, and push it :) |
@@ -959,6 +968,9 @@ CompiledFunction* doCompile(SourceInfo* source, ParamNames* param_names, const O | |||
source->cfg->print(); | |||
|
|||
assert(g.cur_module == NULL); | |||
|
|||
clearRelocatableSymsMap(); | |||
|
|||
std::string name = getUniqueFunctionName(nameprefix, effort, entry_descriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this end up causing cache misses if the generated function name is different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this can generate cache misses because of the static num_functions variable. :-(
I sent the (slightly changed) llvm patch to the llvm commits mailinglist http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150413/271030.html |
eb488c1
to
ad3d0a1
Compare
I updated the pull request put did not squash it yet in order to make it easier to see what has changed:
Not done:
[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150330/269160.html |
looks good to me :) |
fb41bae
to
9a3c499
Compare
I squashed it a little bit, and rebased to HEAD. + // Sort the entries by name to make the order deterministic.
+ std::vector<InternedString> referenced_from_nested_sorted(usage->referenced_from_nested.begin(),
+ usage->referenced_from_nested.end());
+ std::sort(referenced_from_nested_sorted.begin(), referenced_from_nested_sorted.end());
int i = 0;
- for (auto& p : usage->referenced_from_nested) {
+ for (auto& p : referenced_from_nested_sorted) {
closure_offsets[p] = i;
i++; This removes several cache misses. |
Fixed the merge conflicts. |
17c0f60
to
589cf7b
Compare
We still need to generate the IR but if we can find, a cache file created for the exact same IR we will load it and skip instruction selection etc...
… the jit object cache
When remapping generator expressions / set+dict comprehensions, we create an explicit new function, which will later get run through CFG again. With the new changes, we can't use a nodeName() that was generated in the parent scope, since that will end up clashing when the generated scope generates its own names. I think it's not too bad to fix since in these cases the arguments are only ever used inside of the inner scope, so we just have to use names that aren't used elsewhere.
… for the passed node
Adds the LZ4 compression library and use it for compressing cached objects. This saves alot of space (on my test it reduces the required space to about one-tenth), and adds a checksum to the file in order to detect truncated cache files, without reducing the speed.
closing + reopening to force a new Travis-CI build (I had broken master when you pushed your most recent commit...). |
Add a cache for JITed object code
This is awesome :) django_test.py takes 22s for me on the first run but only 9.4s on the second run! |
Cool, thanks for tracking down the remaining cache misses. |
This is a simple cache - we still need to generate the IR but if we can find a cache file created for the exact same IR we will load it and skip instruction selection etc...
Our benchmark test suite does not benefit much from this patch because the JITing time is for most tests too short.
The cache works be creating symbols for all embedded pointers which change from one pyston start to the next using the same executable (Fixed executable specific addresses will be directly emitted in order to reduce the amount of symbols which will need relocations).
The IR for the module in text representation will get crc32 hashed (I'm very open to other ideas :-)) and if we find a file with the same hash in the
pyston_object_cache
directory we will use it. While this approach is very safe, it fails if the IR output has not the same variable names, line numbers,...That's why I changed the variable name assignment to us a incremental index in the name instead of the pointer value as string.
Even after this patch there are still a few instances of nondeterministic IR output (but by far the most cases should get handled), I plan to improve this in a follow up patch.
On our devel machines we will generate a huge amount of cache files with this patch because the cache file only works for the exact same executable. I plan to generate a hash of the pyston executable and save the cache file in a directory with the hash name. And remove on startup all directories which do not contain this hash. Better ideas?
Another issue is that the IR is now more difficult to read because the patchpoints func destinations will all call to a dummy -1 address but there is a option to disable the cache if one has to debug something.