Towards a semi-performant recursive interpreter #37

timholy · 2019-01-20T22:25:53Z

This is a pretty big overhaul of this package. I've tried to keep its original functionality, but perhaps a split should be the next step (see #32). The first 5 commits get the tests passing, with the exception of the ui tests which I didn't even look at (CC @staticfloat re #34, #36; @pfitzseb re #33).

The rest is much more ambitious. I worked a lot on the performance, since an "easy" way to add robust breakpoints, etc, is via running your code in the interpreter. And of course there's interest in using the interpreter to circumvent compile-time cost. But then performance matters.

The most important changes here are focused on reducing the cost of dynamic dispatch (or its equivalent here, "dynamic lowered code lookup"). Some highlights:

At the lowest level, all calls are to Builtins and IntrinsicFunctions. I used the tfunc data in Core.Inference to auto-generate an evaluator that resolve all calls with fixed number of arguments. A straightforward extension would be resolve all calls with bounded numbers of arguments (e.g., those that have between 2 and 4 arguments).
To reduce the overhead of which, this adds "local method tables," one per :call Expr. These are cached by exact type, since isa(x, Int) is fast but isa(x, Integer) is slow. So it uses MethodInstance comparisons rather than Method signature comparisons, even though it might look up the same lowered code. I think a slightly more elegant way to do this would be to add a new type,

struct LocalMethodTable
    call_expr::Expr
    knownmethods::TypeMapEntry
end

to the list of valid types in a CodeInfo. (I did it this way at first but it breaks things like basic-block computation and ssa-usage analysis. So I resorted to storing this info in a separate fields.)

For a simple summation test, I'm getting about 15us per iteration. Compiled code is about 5ns, so this is still dirt-slow. But just getting it to this point was quite a major overhaul.

CC @JeffBezanson, @vtjnash, @StefanKarpinski.

jpsamaroo · 2019-01-20T22:41:44Z

Awesome! What's that summation test look like? The compiled version could possibly be compiling down to a static result (i.e. removing the loop and just returning the result directly), since 5ns is wicked fast. That wouldn't therefore be a good measure of this PR's performance gains.

timholy · 2019-01-20T23:04:29Z

The summation test is here. 5ns/iteration is expected for a GHz CPU (it would be faster still if I had added @inbounds and @simd). So 15us/iteration implies a few thousand CPU/cycles per iteration, which is pretty slow. It would be faster if we didn't care about stashing the result of computations in JuliaStackFrames, but that's essential for this package's utility as a debugger.

JeffBezanson · 2019-01-21T00:25:15Z

src/generate_builtins.jl

+        end
+        print(io,
+"""
+            $head name == :$fname


I think the best way to do this is to evaluate the function (args[1]) and then compare directly against the builtin objects, f === Core.tuple, f === Core.ifelse, etc. We should also pre-process the code to replace constant GlobalRefs with quoted values to avoid lookups in the common case.

Agreed. I thought of that and then talked myself out of it for fears of introducing the interpreter-equivalent of JuliaLang/julia#265. But now I'm not so sure that's a risk, and in any event I currently clear the cache in between @interpret invocations.

KristofferC · 2019-01-21T09:32:52Z

Thanks a lot for this. I pushed a commit to update the Project + Manifest to get CI running (hope you don't mind).

I'll look at the UI tests.

KristofferC · 2019-01-21T13:37:32Z

So the UI tests pass as long as there is a good way to set the fullpath variable to false in the JuliaFrameCode. Previously, this was done by a keyword to the JuliaStackFrame. Of course I could hack something in but do you have any preference how to expose this option @timholy, or perhaps we could tweak the tests themselves?

timholy · 2019-01-21T20:35:10Z

Manifest.toml

 uuid = "67417a49-6d77-5db2-98c7-c13144130cd2"
 version = "0.1.2+"

+[[DebuggingUtilities]]


Sorry, I added this while I was debugging my own work on this PR---@showln is very useful. But we don't need it to be part of this package long-term. I'll trust it's OK with you if I overwrite this.

vtjnash · 2019-01-21T21:00:22Z

to auto-generate an evaluator that resolve all calls with fixed number of arguments

That all sounds very confusing. Any reason not to just execute them directly? They are just normal methods, albeit without source code accessible to reflection (and that block overloading).

timholy · 2019-01-21T22:08:34Z

OK, this fixes what I think are the remaining problems. This enhances the performance on my loop tests a teeny bit more (down to 13us per iteration), presumably because of resolving GlobalRefs during optimize!. I also fixed a breakage on master due to the very nice JuliaLang/julia#30641.

The most important new commit is 19c1492, which fixed a serious bug that caused it to sometimes look up the wrong lowered code for generated functions. Now this seems to behave quite robustly; for example, I can run the subarray tests. They're not fast, but not ~~much~~ insanely slower than in compiled mode (300s for compiled, 900s for interpreted). Most of the time is spent on one line to eval %new expressions.

@KristofferC, I also added kwargs support back to the JuliaStackFrame constructor, so hopefully you can get the ui tests passing. Thanks for tackling that! ❤️

I think this can be merged. (I can't do that, however.) And I do think we should plan on splitting out the DebuggerFramework functionality out very shortly afterwards. There's plenty of reason to be interested in an interpreter independent of its usage for a IDE or REPL debugger. So perhaps before hitting merge we should think a bit about what that will look like (and what the package names should be).

KristofferC · 2019-01-21T22:12:34Z

One thing that I know have been discussed is to rename this package (or whatever is the package that a user will finally interact with). In 0.6 Gallium took that place (simply reexporting ASTInterpreter2) but I wonder if we should just come up with something fresh.

timholy · 2019-01-21T22:13:05Z

That all sounds very confusing. Any reason not to just execute them directly? They are just normal methods, albeit without source code accessible to reflection (and that block overloading).

That is what it does, see https://github.com/timholy/ASTInterpreter2.jl/blob/teh/localmt/src/builtins.jl. Rather than typing those all out, though, I just check the tfunc tables in Core.Compiler to see how many args are supported. They are generated by code in https://github.com/timholy/ASTInterpreter2.jl/blob/teh/localmt/src/generate_builtins.jl, which is about half the size of the finished product, and would have the bonus that if any of the builtins or intrinsics ever change, this should pretty much auto-update.

One thing I didn't do: we might want to consider adding if nargs == cases for arrayref and arrayset for low dimensionality.

timholy · 2019-01-21T22:14:49Z

I've thought about CodeInfoInterpreter.jl or LoweredInterpreter.jl or just Interpreter.jl.

Keno · 2019-01-21T22:15:46Z

Awesome work. I'll let @KristofferC take this forward in detail. My only concern is that ASTInterpreter2 was supposed to be very small and maintainable, so I'm afraid to add too much here that would make it more complicated. On the other hand, perhaps that's offset by more people looking at that. Also seconded on @vtjnash's question why generating the source file is necessary.

timholy · 2019-01-21T22:16:37Z

src/builtins.jl

+        return Some{Any}(Core._apply_pure(getargs(args, frame)...))
+    elseif f === Core._expr
+        return Some{Any}(Core._expr(getargs(args, frame)...))
+    elseif f === Core._typevar


Looks like this isn't present on 1.0. One wonders if we should avoid committing builtins.jl and just generate it during Pkg.build?

Either that or maybe @static check their existence.

@static isdefined(Core, :_typevar) ? f === Core._typevar : false

During build time seems likely to be better though since otherwise we might miss things if we remove some builtins in the future.

timholy · 2019-01-21T22:18:42Z

Also seconded on @vtjnash's question why generating the source file is necessary.

We don't have to do it that way, although Core._typevar points out some potential advantages to autogeneration (if it's sufficiently robust...you could imagine it going either way as far as maintainability goes).

timholy · 2019-01-21T22:44:43Z

My only concern is that ASTInterpreter2 was supposed to be very small and maintainable, so I'm afraid to add too much here that would make it more complicated. On the other hand, perhaps that's offset by more people looking at that.

I think the biggest reason to do it is that if we can get reasonable performance with the interpreter it becomes feasible to support breakpoints in the short term. (Supporting breakpoints is easy if all the code is running in the interpreter.)

I don't think even a loving author could say that the performance of this PR makes it "reasonable," but it is a ~60x improvement over where I started. If I could get another 10x I'd be really happy. But I'm skeptical that we can get there without some additional higher-level analysis, most important probably being a certain amount of lowered-code inlining to reduce the number of created stackframes. (That seems likely to require a limited form of type-inference...) Julia's polymorphism makes it much more difficult to write a fast interpreter for than, say, languages where for i = 1:100 is a construct of the language itself (e.g., https://dzone.com/articles/adventures-in-jit-compilation-part-1-an-interptete, where even "simpleinterp" is about 180x faster than this). We have to recurse into every call to iterate.

vtjnash · 2019-01-21T23:44:12Z

Most of the time is spent on one line to eval %new expressions.

It'd probably be much faster to ccall that function directly, rather than having eval do it. I've seen calls to that ccall elsewhere in this PR already...

Rather than typing those all out

That's good to hear, since that list is large, but f isa Core.Intrinsic && f(args...) should run faster than checking for the values 1 to 100 in separate if statements. If you want to skip _apply, hard coding for length instead should be fast and general (e.g. unroll each call site for 1-N args, where N is a small integer of about 6)

timholy · 2019-01-22T00:56:50Z

Doesn't actually work out that way:

julia> using BenchmarkTools, ASTInterpreter2

julia> function runifbuiltin(qf, qx)
           f, x = qf.value, qx.value
           if isa(f, Core.Builtin)
               return Some{Any}(f(x))
           end
           return qx
       end
runifbuiltin (generic function with 1 method)

julia> function runifintrinsic(qf, qx)
           f, x = qf.value, qx.value
           if isa(f, Core.IntrinsicFunction)
               return Some{Any}(f(x))
           end
           return qx
       end
runifintrinsic (generic function with 1 method)

julia> qf = QuoteNode(abs)
:($(QuoteNode(abs)))

julia> qx = QuoteNode(3)
:($(QuoteNode(3)))

julia> @btime runifbuiltin(qf, qx)
  64.096 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))

julia> @btime runifintrinsic(qf, qx)
  11.302 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))

julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(abs)))($(QuoteNode(3))))

julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
  16.826 ns (0 allocations: 0 bytes)
:(($(QuoteNode(abs)))($(QuoteNode(3))))

julia> qf = QuoteNode(Core.sizeof)
:($(QuoteNode(Core.sizeof)))

julia> @btime runifbuiltin(qf, qx)
  96.828 ns (1 allocation: 16 bytes)
Some(8)

julia> @btime runifintrinsic(qf, qx)
  14.321 ns (0 allocations: 0 bytes)
:($(QuoteNode(3)))

julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(Core.sizeof)))($(QuoteNode(3))))

julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
  30.768 ns (1 allocation: 16 bytes)
Some(8)

julia> qf = QuoteNode(Base.neg_int)
:($(QuoteNode(neg_int)))

julia> @btime runifbuiltin(qf, qx)
  113.218 ns (1 allocation: 16 bytes)
Some(-3)

julia> @btime runifintrinsic(qf, qx)
  56.686 ns (1 allocation: 16 bytes)
Some(-3)

julia> ex = Expr(:call, qf, qx)
:(($(QuoteNode(neg_int)))($(QuoteNode(3))))

julia> @btime ASTInterpreter2.maybe_evaluate_builtin(nothing, ex)
  45.757 ns (1 allocation: 16 bytes)
Some(-3)

I've also tried a middle ground, checking isa(f, Core.IntrinsicFunction) and then "dispatching" by number of arguments, and that's slower. I've not found any approach that's faster than the one in this PR.

timholy · 2019-01-22T03:33:40Z

It'd probably be much faster to ccall that function directly, rather than having eval do it.

Good call! 900s->590s.

KristofferC · 2019-01-22T10:59:50Z

I updated the UI tests (and their dependencies) so tests now pass on 1.1 and nightly. 1.0 fails due to the already mentioned _typevar issue.

KristofferC · 2019-01-22T13:05:27Z

Project.toml


 [targets]
 test = ["Test", "TerminalRegressionTests", "VT100"]
+build = ["InteractiveUtils"]


There is currently no build target.

Recommended approach? Just make it a dependency of the package?

timholy · 2019-01-22T13:13:16Z

OK, I've modified this to generate the file at build time. I expect tests to pass. Before merging, I think it would be best to wait a bit in case anyone wants to give this a detailed review.

There are also a couple of issues to discuss. First, I recognize the excitement to get debugging working in Juno again. But I'd urge a bit of patience to tackle a couple of things that should be done first:

any package splitting/reorganization/renaming
integration with Revise's ability to correct line numbers. A debugger is nice for, well, fixing bugs. But if this results in line number changes in methods that don't get recompiled, the next time you try to step into code you'll presumably be at the wrong line number of the source file, and users will be really confused.
a couple of days for me to experiment with this for interpreting code in module-scope. I think it should work, but I haven't tried to play with it. Revise will evaluate changing its internal organization to use a stripped-down ASTInterpreter2 (just the interpreter part) to more robustly find method signatures and add "lowering backedges" (see How do you feel about hammers? #32).

The point being that I don't think it would be great to reintroduce a stepping-debugger to the Julia world and then break it only a couple of days later.

KristofferC · 2019-01-22T13:22:05Z

Hm, it seems test only dependencies that are using a checked out branch in the Manifest aren't too happy when they get put in the test section. Will investigate.

timholy · 2019-01-22T13:22:55Z

Also one technical issue to highlight: before merging, perhaps I should experiment with moving the local method tables from the JuliaFrameCode to the JuliaStackFrame (the first being the "reusable component" and the second being the "instance for this particular call"). This may have performance advantages, if I can do it in a way that doesn't throw out the method tables for any recursive calls.

timholy · 2019-01-22T15:59:27Z

Surprisingly, moving the local method tables led to a 15% slowdown. So I say let's leave these as they are. So the only barriers I know of to merging are (1) any review comments, and (2) fixing the Pkg issues. Then after merging we can discuss splitting the package.

timholy · 2019-01-23T17:45:34Z

To ensure that this package can be hacked on by more than a small handful of people, I've added a bunch of docstrings and some Documenter narrative, including a somewhat gentle introduction to Julia's lowered AST representation. I haven't implemented deployment yet since I think it would be better to do that once we're basically ready to release. But you can view the docs locally by navigating to the docs directory and running julia make.jl and then opening build/index.html in a browser.

This also necessitates changing how Builtins are handled, but this is more robust than the previous module/symbol check.

This is needed for debugging

The generator can in principle return arbitrary expressions depending on the exact type.

This is new in JuliaLang/julia#30641.

This also makes minor code cleanups.

At least module-scope code can use Ints for :gotoifnot expressions.

Because we run the optimizer we don't want to contaminate the original.

Separates out the `JuliaFrameCode` constructor, generalizes `moduleof`, adds pc utilities

fixup some toml files and .travis

JeffBezanson reviewed Jan 21, 2019

View reviewed changes

KristofferC closed this Jan 21, 2019

KristofferC reopened this Jan 21, 2019

Ken-B mentioned this pull request Jan 21, 2019

Debugging support julia-vscode/julia-vscode#125

Closed

timholy mentioned this pull request Jan 21, 2019

Fix tfunc tables for fptoui and fptosi JuliaLang/julia#30787

Merged

timholy commented Jan 21, 2019

View reviewed changes

KristofferC reviewed Jan 22, 2019

View reviewed changes

pfitzseb mentioned this pull request Jan 22, 2019

no unique matching method found for the specified argument types JunoLab/Traceur.jl#23

Closed

timholy and others added 16 commits January 27, 2019 14:38

Lookup GlobalRefs during optimize!

670ed65

This also necessitates changing how Builtins are handled, but this is more robust than the previous module/symbol check.

Record position prior to entering new frame

5b3695d

This is needed for debugging

For generators, index by argtypes when caching lowered code

925ec77

The generator can in principle return arbitrary expressions depending on the exact type.

Update the project dependencies

a8610c9

Unwrap toplevel expression from parse_input_line

2672537

This is new in JuliaLang/julia#30641.

Pass kwargs through JuliaStackFrame constructor

9663bf7

Speed up evaluation of %new expressions by ccalling

e1983f8

update ui tests

7317756

Generate src/builtins.jl at build time

ab0775a

Add docstrings for key interpreter methods.

cdac1e9

This also makes minor code cleanups.

Add Documenter documentation

0be3c37

Don't offset the SSAValues

265ae94

Fix line numbers in optimize!

2f2ca9f

At least module-scope code can use Ints for :gotoifnot expressions.

Actually copy the code in copy_codeinfo

7d45fbe

Because we run the optimizer we don't want to contaminate the original.

A few improvements useful for external callers

0480577

Separates out the `JuliaFrameCode` constructor, generalizes `moduleof`, adds pc utilities

checkout untagged test-deps in travis script

d03e44e

fixup some toml files and .travis

KristofferC merged commit c2e3809 into JuliaDebug:master Jan 27, 2019

This was referenced Jan 27, 2019

ASTInterpreter2.jl fails to build under Julia 1.0 #27

Closed

Fail with parametric type #17

Closed

Attempting to print values causes ERROR: access to invalid slot number #36

Open

UI tests are failing #39

Closed

Debugger fails with function handle #24

Open

timholy deleted the teh/localmt branch January 27, 2019 13:55

This was referenced Jan 27, 2019

@enter fails for simple kwarg function #34

Closed

Problem with stepping over @pack macro from Parameters module #18

Open

timholy mentioned this pull request Jan 30, 2019

Add ability to erode/dilate using structuring elements JuliaImages/ImageMorphology.jl#11

Closed

KristofferC mentioned this pull request Feb 4, 2019

Split into Interpreter + Debugger. #46

Closed

mkborregaard mentioned this pull request Feb 8, 2019

Slow initialization JuliaPlots/Plots.jl#917

Closed

timholy mentioned this pull request Feb 8, 2019

Making test dependencies available at the REPL JuliaLang/Pkg.jl#1008

Closed

timholy mentioned this pull request Dec 29, 2019

disallow setindex on immutable values JuliaLang/julia#34176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards a semi-performant recursive interpreter #37

Towards a semi-performant recursive interpreter #37

timholy commented Jan 20, 2019 •

edited

jpsamaroo commented Jan 20, 2019

timholy commented Jan 20, 2019

JeffBezanson Jan 21, 2019

timholy Jan 21, 2019

KristofferC commented Jan 21, 2019 •

edited

KristofferC commented Jan 21, 2019

timholy Jan 21, 2019

vtjnash commented Jan 21, 2019

timholy commented Jan 21, 2019 •

edited

KristofferC commented Jan 21, 2019

timholy commented Jan 21, 2019

timholy commented Jan 21, 2019

Keno commented Jan 21, 2019

timholy Jan 21, 2019 •

edited

KristofferC Jan 22, 2019 •

edited

timholy commented Jan 21, 2019

timholy commented Jan 21, 2019

vtjnash commented Jan 21, 2019

timholy commented Jan 22, 2019

timholy commented Jan 22, 2019

KristofferC commented Jan 22, 2019 •

edited

KristofferC Jan 22, 2019

timholy Jan 22, 2019

KristofferC Jan 22, 2019

timholy commented Jan 22, 2019

KristofferC commented Jan 22, 2019

timholy commented Jan 22, 2019 •

edited

timholy commented Jan 22, 2019

timholy commented Jan 23, 2019

Towards a semi-performant recursive interpreter #37

Towards a semi-performant recursive interpreter #37

Conversation

timholy commented Jan 20, 2019 • edited

jpsamaroo commented Jan 20, 2019

timholy commented Jan 20, 2019

JeffBezanson Jan 21, 2019

Choose a reason for hiding this comment

timholy Jan 21, 2019

Choose a reason for hiding this comment

KristofferC commented Jan 21, 2019 • edited

KristofferC commented Jan 21, 2019

timholy Jan 21, 2019

Choose a reason for hiding this comment

vtjnash commented Jan 21, 2019

timholy commented Jan 21, 2019 • edited

KristofferC commented Jan 21, 2019

timholy commented Jan 21, 2019

timholy commented Jan 21, 2019

Keno commented Jan 21, 2019

timholy Jan 21, 2019 • edited

Choose a reason for hiding this comment

KristofferC Jan 22, 2019 • edited

Choose a reason for hiding this comment

timholy commented Jan 21, 2019

timholy commented Jan 21, 2019

vtjnash commented Jan 21, 2019

timholy commented Jan 22, 2019

timholy commented Jan 22, 2019

KristofferC commented Jan 22, 2019 • edited

KristofferC Jan 22, 2019

Choose a reason for hiding this comment

timholy Jan 22, 2019

Choose a reason for hiding this comment

KristofferC Jan 22, 2019

Choose a reason for hiding this comment

timholy commented Jan 22, 2019

KristofferC commented Jan 22, 2019

timholy commented Jan 22, 2019 • edited

timholy commented Jan 22, 2019

timholy commented Jan 23, 2019

timholy commented Jan 20, 2019 •

edited

KristofferC commented Jan 21, 2019 •

edited

timholy commented Jan 21, 2019 •

edited

timholy Jan 21, 2019 •

edited

KristofferC Jan 22, 2019 •

edited

KristofferC commented Jan 22, 2019 •

edited

timholy commented Jan 22, 2019 •

edited