Add a cache for JITed object code #433

undingen · 2015-04-03T18:23:29Z

This is a simple cache - we still need to generate the IR but if we can find a cache file created for the exact same IR we will load it and skip instruction selection etc...

Our benchmark test suite does not benefit much from this patch because the JITing time is for most tests too short.

I see a +2% slowdown on the first run against unmodified pyston (we have to do additional hashing, disk writes, pointer relocating)
on the second run it turns into a -2% speedup.
django_setup.py improves from 2.3sec to about 1.6sec.

The cache works be creating symbols for all embedded pointers which change from one pyston start to the next using the same executable (Fixed executable specific addresses will be directly emitted in order to reduce the amount of symbols which will need relocations).
The IR for the module in text representation will get crc32 hashed (I'm very open to other ideas :-)) and if we find a file with the same hash in the pyston_object_cache directory we will use it. While this approach is very safe, it fails if the IR output has not the same variable names, line numbers,...
That's why I changed the variable name assignment to us a incremental index in the name instead of the pointer value as string.
Even after this patch there are still a few instances of nondeterministic IR output (but by far the most cases should get handled), I plan to improve this in a follow up patch.
On our devel machines we will generate a huge amount of cache files with this patch because the cache file only works for the exact same executable. I plan to generate a hash of the pyston executable and save the cache file in a directory with the hash name. And remove on startup all directories which do not contain this hash. Better ideas?
Another issue is that the IR is now more difficult to read because the patchpoints func destinations will all call to a dummy -1 address but there is a option to disable the cache if one has to debug something.

undingen · 2015-04-03T20:05:52Z

Oh :-( should have run make check before deciding that some piece of code is unnecessary...

kmod · 2015-04-03T22:03:13Z

sre_parse_parse goes from 2s to 0.9s :) Though with -I it's still 0.2s... but at least now the next things to optimize are our unoptimized analyses where I think we have a lot of room to do much better (as opposed to the already-optimized LLVM compilation).

That's too bad about printing the patchpoints, but maybe we could use the existing dumpPrettyIR() functionality -- where we copy + modify the entire IR to make it nicer for printing -- and insert the function names there? Though I don't think we have to do that for this pr.

I think we'll probably want to iterate on the cache-management strategy, so I'm not too worried if it's not great in the initial PR. We might want to bound the size of the cache overall (LRU with max size?) since we might run a ton of code through a single executable.

kmod · 2015-04-03T22:18:32Z

src/codegen/entry.cpp

+        std::error_code error_code;
+        llvm::raw_fd_ostream IRObjectFile(cache_file.c_str(), error_code, llvm::sys::fs::F_RW);
+        RELEASE_ASSERT(!error_code, "");
+        IRObjectFile << Obj.getBuffer();


I think we might need to be more careful here -- if the user ctrl-C's while we're writing out the object file, on the next run we'll happily load it and then segfault. I ran into this with the .pyc files as well -- I was hoping that we wouldn't have to worry about it, but it actually happened enough even just during development that we had to add checksums. I'm not sure how easy that is to do here, but I think that'd be ideal; another option would be to write out the file to a temporary name and then rename it to the correct name, which guards against the ctrl-C behavior but not other kinds of corruption.

You are right.
Another issue you probably already noticed (because of the test failures)
is that I currently have a design issue in the change I integrated to
generate deterministic node names (aka replacement for #%p). I was under
the impression that the strings must only be unique inside the CFG. But
thats not true 😂. Do you have a idea how to generate per module
deterministic node names which are unique? I'm currently afk but will look
into it tomorrow.
Am 04.04.2015 00:18 schrieb "Kevin Modzelewski" notifications@github.com:

In src/codegen/entry.cpp
#433 (comment):

#endif

{

RELEASE_ASSERT(module_identifier == M->getModuleIdentifier(), "");

RELEASE_ASSERT(!hash_before_codegen.empty(), "");

llvm::SmallString<128> cache_file = cache_dir;

llvm::sys::path::append(cache_file, hash_before_codegen);

if (!llvm::sys::fs::exists(cache_dir.str()) && llvm::sys::fs::create_directory(cache_dir.str())) {

fprintf(stderr, "Unable to create cache directory\n");

return;

}

std::error_code error_code;

llvm::raw_fd_ostream IRObjectFile(cache_file.c_str(), error_code, llvm::sys::fs::F_RW);

RELEASE_ASSERT(!error_code, "");

IRObjectFile << Obj.getBuffer();

I think we might need to be more careful here -- if the user ctrl-C's
while we're writing out the object file, on the next run we'll happily load
it and then segfault. I ran into this with the .pyc files as well -- I was
hoping that we wouldn't have to worry about it, but it actually happened
enough even just during development that we had to add checksums. I'm not
sure how easy that is to do here, but I think that'd be ideal; another
option would be to write out the file to a temporary name and then rename
it to the correct name, which guards against the ctrl-C behavior but not
other kinds of corruption.

—
Reply to this email directly or view it on GitHub
https://github.com/dropbox/pyston/pull/433/files#r27759807.

kmod · 2015-04-03T23:12:05Z

Ok I think I found the source of the errors -- I think the issue is that when we remap generator expressions / set+dict comprehensions we generate an inner function object, and currently we call nodeName() during the CFG process of the outer scope, and insert those names into the inner scope. Previously this was fine but with your changes we would get clashes when the ids overlapped. Can you double-check, but I think it's ok to just give these specific values hard-coded names, since I don't think there can ever be more than one of them per scope. I sent you a PR that I think fixes it.

undingen · 2015-04-05T17:46:33Z

Update:

@kmod I added your pull request, thanks this fixed the issue with the duplicate node names.
I changed the hash to SHA256.
I patched llvm to support const GlobalVariables as arguments to a patchpoint without having to copy the address in a register. This fixed the "ran out of registers during register allocation" bug I have encountered with the object cache enabled. It works by generating symbol entries inside the stackpoint constant table. The llvm patch needs definitely a test + more comments and I'm doubtful that the patch does not introduce a problem.

The cache file corruption check is not yet implemented and everything needs more testing

kmod · 2015-04-06T22:13:18Z

The llvm patch seems pretty reasonable to me; I would maybe email llvm-dev + Andrew Trick + Jeurgen Ributzka for comments?

As a general comment about testing: Michael has some tester improvements that should make it easier to write integration tests, so we could/should use that for this. I think it's ok though if we push this without having 100% confidence in it; let's do some reasonable testing, add a flag to turn it off on demand, and push it :)

kmod · 2015-04-06T22:14:03Z

src/codegen/irgen.cpp

@@ -959,6 +968,9 @@ CompiledFunction* doCompile(SourceInfo* source, ParamNames* param_names, const O
        source->cfg->print();

    assert(g.cur_module == NULL);
+
+    clearRelocatableSymsMap();
+
    std::string name = getUniqueFunctionName(nameprefix, effort, entry_descriptor);


does this end up causing cache misses if the generated function name is different?

Yes this can generate cache misses because of the static num_functions variable. :-(

undingen · 2015-04-14T18:17:51Z

I sent the (slightly changed) llvm patch to the llvm commits mailinglist http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150413/271030.html
I will update this pull request when I receive feedback on the patch.

undingen · 2015-04-15T21:55:25Z

I updated the pull request put did not squash it yet in order to make it easier to see what has changed:
The changes start from: Update llvm patch e472a8d

I updated the LLVM patch to the one I sent to the llvm-commits mailing-list. Changes are mostly formatting and adding a test.
Added code to cleanup the cache directory, the heuristic is very simple: there is a configurable amount of cache files (currently 500) I check on start up if there are more files than that and if true remove the oldest ones. All the cache files are now located in: ~/.cache/pyston/object_cache/
save the cache files compressed using the very fast LZ4 library. Reduces the cache files to about one-tenth of the uncompressed size and adds a checksum for detecting e.g. truncated cache files, but does not decrease performance.
Sadly I can't use the current lz4 ubuntu package (too old) so I had to import the git module and change the make and cmake support. (Hope I did this right). I also tried zlib but it was noticeable slowing down the cache.
In order to generate deterministic cfg node name I made nodeName not dependent on the passed name but instead generate a new unique name. (While this changes the behavior, it looks like nothing depends on getting the same name for two calls to nodeName with the same node argument)

Not done:

There are still sometimes cache misses because the generated IR is not always deterministic but I think this can be fixed later.
Also there is a llvm patch [1] on the mailing-list which adds support for symbolic patchpoint function addresses. This patch would remove part of the change in this pull request to the patchpoints. But the patch did not yet land in the llvm repo and I also did not make use in this pull request yet. (But we could - just one more patch to llvm)

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150330/269160.html

kmod · 2015-04-17T00:59:17Z

looks good to me :)

undingen · 2015-04-17T13:53:48Z

I squashed it a little bit, and rebased to HEAD.
I also sneaked in another small change :-(, it's the change to src/analysis/scoping_analysis.cpp:

+        // Sort the entries by name to make the order deterministic.
+        std::vector<InternedString> referenced_from_nested_sorted(usage->referenced_from_nested.begin(),
+                                                                  usage->referenced_from_nested.end());
+        std::sort(referenced_from_nested_sorted.begin(), referenced_from_nested_sorted.end());
         int i = 0;
-        for (auto& p : usage->referenced_from_nested) {
+        for (auto& p : referenced_from_nested_sorted) {
             closure_offsets[p] = i;
             i++;

This removes several cache misses.
I think it's ready to get merged.

undingen · 2015-04-19T15:41:02Z

Fixed the merge conflicts.

We still need to generate the IR but if we can find, a cache file created for the exact same IR we will load it and skip instruction selection etc...

… the jit object cache

When remapping generator expressions / set+dict comprehensions, we create an explicit new function, which will later get run through CFG again. With the new changes, we can't use a nodeName() that was generated in the parent scope, since that will end up clashing when the generated scope generates its own names. I think it's not too bad to fix since in these cases the arguments are only ever used inside of the inner scope, so we just have to use names that aren't used elsewhere.

… for the passed node

Adds the LZ4 compression library and use it for compressing cached objects. This saves alot of space (on my test it reduces the required space to about one-tenth), and adds a checksum to the file in order to detect truncated cache files, without reducing the speed.

kmod · 2015-04-20T18:46:14Z

closing + reopening to force a new Travis-CI build (I had broken master when you pushed your most recent commit...).

Add a cache for JITed object code

kmod · 2015-04-20T21:14:52Z

This is awesome :) django_test.py takes 22s for me on the first run but only 9.4s on the second run!

undingen · 2015-04-20T21:17:49Z

Cool, thanks for tracking down the remaining cache misses.

undingen force-pushed the objectcache branch from 488e49c to bceadc0 Compare April 3, 2015 19:05

kmod reviewed Apr 3, 2015
View reviewed changes

undingen force-pushed the objectcache branch from 677844e to e47706c Compare April 5, 2015 17:08

kmod reviewed Apr 6, 2015
View reviewed changes

undingen force-pushed the objectcache branch from e47706c to 93c372e Compare April 7, 2015 22:54

undingen force-pushed the objectcache branch 2 times, most recently from eb488c1 to ad3d0a1 Compare April 15, 2015 20:52

undingen force-pushed the objectcache branch 2 times, most recently from fb41bae to 9a3c499 Compare April 17, 2015 13:49

undingen force-pushed the objectcache branch from 9a3c499 to c8f8da3 Compare April 19, 2015 15:40

undingen force-pushed the objectcache branch 2 times, most recently from 17c0f60 to 589cf7b Compare April 19, 2015 17:12

undingen and others added 6 commits April 20, 2015 11:25

Add a cache for JITed object code.

ca7765a

We still need to generate the IR but if we can find, a cache file created for the exact same IR we will load it and skip instruction selection etc...

Emit PHI nodes in deterministic order to improve the effectiveness of…

b47e242

… the jit object cache

Let nodeName() create a new unique name instead of returning the name…

23c27d2

… for the passed node

LLVM: Add support for symbolic entries in the stackmap constant table

b6ebc81

undingen force-pushed the objectcache branch from 589cf7b to 729a81c Compare April 20, 2015 09:25

kmod closed this Apr 20, 2015

kmod reopened this Apr 20, 2015

kmod added a commit that referenced this pull request Apr 20, 2015

Merge pull request #433 from undingen/objectcache

691b8d6

Add a cache for JITed object code

kmod merged commit 691b8d6 into pyston:master Apr 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a cache for JITed object code #433

Add a cache for JITed object code #433

undingen commented Apr 3, 2015

undingen commented Apr 3, 2015

kmod commented Apr 3, 2015

kmod Apr 3, 2015

undingen Apr 3, 2015

kmod commented Apr 3, 2015

undingen commented Apr 5, 2015

kmod commented Apr 6, 2015

kmod Apr 6, 2015

undingen Apr 7, 2015

undingen commented Apr 14, 2015

undingen commented Apr 15, 2015

kmod commented Apr 17, 2015

undingen commented Apr 17, 2015

undingen commented Apr 19, 2015

kmod commented Apr 20, 2015

kmod commented Apr 20, 2015

undingen commented Apr 20, 2015

Add a cache for JITed object code #433

Add a cache for JITed object code #433

Conversation

undingen commented Apr 3, 2015

undingen commented Apr 3, 2015

kmod commented Apr 3, 2015

kmod Apr 3, 2015

Choose a reason for hiding this comment

undingen Apr 3, 2015

Choose a reason for hiding this comment

kmod commented Apr 3, 2015

undingen commented Apr 5, 2015

kmod commented Apr 6, 2015

kmod Apr 6, 2015

Choose a reason for hiding this comment

undingen Apr 7, 2015

Choose a reason for hiding this comment

undingen commented Apr 14, 2015

undingen commented Apr 15, 2015

kmod commented Apr 17, 2015

undingen commented Apr 17, 2015

undingen commented Apr 19, 2015

kmod commented Apr 20, 2015

kmod commented Apr 20, 2015

undingen commented Apr 20, 2015