Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new core pass called sys_core_alias #1080

Merged
merged 1 commit into from
Jul 7, 2017

Conversation

josevalim
Copy link
Contributor

@josevalim josevalim commented Jun 1, 2016

Introduce a new core pass called sys_core_alias

The goal of this pass is to find values that are built from patterns and generate aliases for those values to remove pressure from the GC. For example, this code:

   example({ok, Val}) ->
       {ok, Val}.

shall become:

   example({ok, Val} = Tuple) ->
       Tuple.

Currently this pass aliases tuple and cons nodes made of literals, variables and other cons. The tuple/cons may appear anywhere in the pattern and it will be aliased if used later on.

Notice a tuple/cons made only of literals is not aliased as it may be part of the literal pool.

This is a WIP and tests are still pending.

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@bjorng
Copy link
Contributor

bjorng commented Jun 3, 2016

Have you run the compiler test suite? I get 8 failed test cases.

I think that the code in sys_core_alias could be significantly reduced by using cerl_trees:map fold/3 or cerl_trees:map_fold/4 to do the boring (and error-prone) traversing of the Core Erlang code. I also want to recommend the functions for handling "data" in cerl that simplifies handling of lists, tuples, and literals (see cerl:is_data/1 and the following functions).

Testing: Have a look at how test case modules are "cloned" in lib/compiler/test/Makefile. It is important to run some of the test suites both with and without the new pass.

Note about the time frame: This is too late for OTP 19.0. We could possibly include it in in OTP 19.1 provided that the new pass is turned off by default.

@josevalim
Copy link
Contributor Author

I think that the code in sys_core_alias could be significantly reduced by using cerl_trees:map fold/3 or cerl_trees:map_fold/4 to do the boring (and error-prone) traversing of the Core Erlang code.

That's how I started but I ran into issues both when having to pass data from pre to post (as I had to pass it implicitly via the state) and also when handling guards and patterns, as we don't traverse the first (although maybe we should) and the second is traversed differently. I will give this another go now that a big chunk of the solution is implemented.

I will look into the remaining of your feedback though and into the tests hopefully today. About the time frame, I have no rush to include this. I am fine with waiting until Erlang 20. Thank you!

@josevalim
Copy link
Contributor Author

josevalim commented Jun 3, 2016

@bjorng I took a look at the failing tests and the failures seem to be caused by the old inliner. I assumed that after sys_core_fold we would no longer have shadowing but this seems to be violated by the old inliner. That said, is my expectation that there should be no shadowing after sys_core_fold true? If it is true, then we need to fix upstream, otherwise we need to consider it in the new pass.

Meanwhile I have pushed a new implementation that uses cerl_trees, the trick to solve the problem I mentioned above was to set the patterns to an empty list between pre and post traversal. The code currently commented in the PR is the one that helped me find violations (i.e. variables in patterns being shadowed).

@josevalim josevalim force-pushed the jv-sys-core-replace branch 2 times, most recently from c913436 to e0d4eac Compare June 3, 2016 12:03
@bjorng
Copy link
Contributor

bjorng commented Jun 3, 2016

There can always be shadowing in Core Erlang. An optimization pass is not allowed to assume that a previous optional optimization pass has been run; note that almost all optimization passes can be disabled with an option.

So you will have to deal with shadowing in your pass.

@josevalim
Copy link
Contributor Author

josevalim commented Jun 3, 2016

So you will have to deal with shadowing in your pass.

http://www.nooooooooooooooo.com/

I will have to think a bit more on how to handle shadowing then, thank you!

@psyeugenic
Copy link
Contributor

First, I would like to say that I like this new compiler pass!

A couple of things. In addition to what @bjorng already has said I also would like you to use cerl library as much as possible instead of using records directly. It's cleaner.
The second thing is that the bar for testing is a bit higher for the compiler. Make sure you try to reach 100% code coverage in your pass.

As for 19.1 or 20.0. An argument for 19.1 could be testing. Putting it out there as an optional experimental pass before making it default in 20.0. Possibly.

@okeuday
Copy link
Contributor

okeuday commented Jun 3, 2016

This compiler pass looks to me like it should encourage sloppy Erlang source code, since it will be a reason to not reference Erlang term reuse explicitly with variables, which reduces the visible equivalence (associations the reader sees, i.e., implementation logic) in the source code.

Could we instead have this check create warnings so the source code can be fixed to be clearer?

Either way, it seems best that this check applies to all Erlang term types, despite the fact that it should be an O(n^2) penalty (n terms, of length n) to pay in the compiler. If this must be a compiler pass, then all Erlang source code can assume reuse of Erlang terms will naturally happen, and reuse never needs to be explicit, but with a warning, it should help the developer reuse all Erlang terms and attempt to explain why the source code is doing so.

@psyeugenic
Copy link
Contributor

@okeuday I don't think I agree with your premise. It feels like it boils down to the straw man: "compiler shouldn't do optimizations". For instance, it shouldn't do memoizations on pure calls or subexpr eliminations (which the Erlang compiler doesn't) since it doesn't really do what I tell it to do. Not exactly. Should it do constant folding? We don't do inlining by default since it will mess with tracing .. this is bad.

I don't know. Perhaps I agree with the visualization bit of your argument anyways.
It's great that you question an additional pass! Lob all the grenades you can at it and see if it holds.

@josevalim
Copy link
Contributor Author

Hi @okeuday, thanks for checking the pass!

This compiler pass looks to me like it should encourage sloppy Erlang source code, since it will be a reason to not reference Erlang term reuse explicitly with variables, which reduces the visible equivalence (associations the reader sees, i.e., implementation logic) in the source code.

This seems to be a very personal interpretation of what constitutes sloppy source code. For example, if I write code such as:

{ok, Value} = some_fun(A, B),
print(Value),
{ok, Value}.

this pass would ask me to rewrite it to:

{ok, Value} = Return = some_fun(A, B),
print(Value),
Return.

I don't consider the first version to be sloppy. More importantly, I find the "fixed" version considerably worse than the original. Previously I could look at the last line and see the shape of what is returned. Now I need to chase the Return variable wherever it is defined. Well, that's my personal preference.

Finally, I don't believe it is the role of the Erlang compiler to judge if some code is sloppy or not. If, in any circumstance, I find the version without reusing variables to be more readable, I would like to be able to write it as I prefer without the compiler yelling at me. The job of the compiler is to optimize and it is the developer job to write readable code (although tools like linters can help).

Other than that, I agree we would like to make this compiler as effective as possible as long as it runs reasonably fast. It currently covers both cases that matter: tuples and cons. Maps do not apply because patterns are partial. Suggestions are welcome.

TL;DR - the compiler should optimize no matter what. The definition of what is sloppy and what is not belongs elsewhere.

@okeuday
Copy link
Contributor

okeuday commented Jun 3, 2016

Ok, the compiler shouldn't care about source code style. As you said, tuples and cons should cover all composite data types. Thanks for the explanation.

@josevalim
Copy link
Contributor Author

I have pushed a new version that takes into account variable shadowing. We keep the list of all variables we have seen so far and, as soon as any of them is shadowed, we create a new substitution record. The logic is in the new sub_fold and sub_unfold functions. With those changes, the test suite no longer fails. 🎉

I have also spent some time with the cerl functions and I fail to understand how I could use the cerl data functions. We are not interested in data nodes since they must have at least one variable in them. I could the data functions to check if I need to traverse the node or not but that wouldn't make the code any shorter as I would still need my traversing function. Am I missing something?

@bjorng
Copy link
Contributor

bjorng commented Jun 5, 2016

Here is an attempt to use the data functions. I have not tested it. It combines the last three clauses in pre/2 into a single clause. It is not much shorter but more readable (IMO). I have assumed that the cerl functions are imported (heavy users of cerl import all functions that are used).

pre(Node, Sub0) ->
    case is_data(Node) andalso not is_literal(Node) of
        false ->
            {Node,Sub0};
        true ->
            Kind = data_type(Node),
            Es = data_es(Node),
            case sub_cache_nodes(Kind, Es, Sub0) of
                {Name,Sub1} ->
                    {ann_c_var(get_ann(Node), Name),Sub1};
                error ->
                    {Node,Sub0}
            end
    end.

@bjorng
Copy link
Contributor

bjorng commented Jun 5, 2016

And here is an attempt to handle nodes_to_key/3. Again not tested. It handles tuples, cons cells, literals, and the default case:

nodes_to_key([Node|T], Acc0, AllLiteral0) ->
    case is_data(Node) of
        false ->
            error;
        true ->
            case nodes_to_key(data_es(Nodes), [], AllLiteral0) of
                {Es,AllLiteral1} ->
                    Key = list_to_tuple([data_type(Node)|Es]),
                    nodes_to_key(T, [Key|Acc0], AllLiteral1);
                error ->
                    error
            end
    end;

@josevalim
Copy link
Contributor Author

Thank you, those are good improvements. We can also keep the Key as a cons, so we avoid the list_to_tuple calls. :D I will play with those after I write the tests!

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@bjorng
Copy link
Contributor

bjorng commented Sep 22, 2016

Ping?

If you'll do the updates we discussed and rebase the branch on the latest master, we could start testing the branch in our daily builds.

@bjorng bjorng added the waiting waiting for changes/input from author label Sep 22, 2016
@josevalim
Copy link
Contributor Author

Now that #1078 has been merged, I will focus on this PR once more. I will let you know when the fixes above have been pushed.

@fenollp
Copy link
Contributor

fenollp commented Feb 2, 2017

We are not interested in data nodes since they must have at least one variable in them.

How about looking for repeated constants too?

match_this(<<"my long binary">>, _, Bla) ->
    {ok, Blip} = do_something(Bla),
    quux(Blip, <<"my long binary">>).

Or is <<"my long binary">> caught already?

@psyeugenic
Copy link
Contributor

Repeated literals is not a problem.

@josevalim
Copy link
Contributor Author

Thanks for the review @bjorng. I am glad to hear the results have been positive! I have left some questions based on your review.

try to wrap my head around how you handle variable shadowing.

Spoiler: the idea is that we keep a stack of substitutions in the #sub.t field. If there is shadowing in sub_fold, a new sub is created with the previous sub stored in t. If there is no shadowing, we store the current t to be kept as is on sub_unfold.

The downside of this approach is that one variable being shadowed disables the optimization for all other variables up to that point.

ordsets:is_disjoint(Vars1, Vars2).

merge_variables(Vars1, Vars2) ->
ordsets:union(Vars1, Vars2).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorng should I use cerl_sets for the functions above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a good idea. Running eprof shows that ordsets:union/2 does seem to consume a significant part of the run time:

$ erlc '+{eprof,sys_core_alias}' -S core_parse.erl
.
.
.
lists:foldl/3                                           42182     4.15    7123  [      0.17]
sys_core_alias:'-def/1-anonymous-0-'/2                  39306     6.10   10476  [      0.27]
cerl:is_data/1                                          38458     6.27   10762  [      0.28]
cerl_trees:mapfold_list/4                               34582     7.87   13515  [      0.39]
cerl_trees:mapfold/4                                    39306     8.78   15066  [      0.38]
ordsets:union/2                                         70616    10.19   17492  [      0.25]
cerl:type/1                                             62350    10.62   18225  [      0.29]
-----------------------------------------------------  ------  -------  ------  [----------]
Total:                                                 695127  100.00%  171679  [      0.25]

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

Some time ago we decided to prefer spaces over tabs for indenting. Since sys_core_alias is a completely new module, could you make that all indentation is done using spaces?

@josevalim
Copy link
Contributor Author

Some time ago we decided to prefer spaces over tabs for indenting. Since sys_core_alias is a completely new module, could you make that all indentation is done using spaces?

Yes! Yes!

{iff,'to_core',{done,"core"}}]}
{iff,'to_core',{done,"core"}}]},
{pass,sys_core_alias},
{iff,dalias,{listing,"core_alias"}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorng please verify pass positioning one more time.

@josevalim
Copy link
Contributor Author

All feedback has been addressed and I force pushed to the branch.

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

Thanks! I have read the code again and think I understand what it is doing. Looks good.

I have pushed a branch with three fixup commits. Please have a look:

https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080

The first commit fixes two nit picks.

The second commit adds a test case to ensure that the error clause in nodes_to_alias/5 is covered. I think that I gave you the name of the wrong function. Now every line is covered.

The third commit tweaks compile.erl. I have moved sys_core_alias so that it can be turned off with no_copt (along with sys_core_fold). I have also added no_fold to turn off sys_core_fold and no_alias to turn off sys_core_alias. Those options are very useful when investigating compiler bugs.

I have not addressed the Dialyzer errors pointed out by Travis CI. Please fix them.

@richcarl
Copy link
Contributor

richcarl commented Jul 6, 2017

No time to read the code properly right now, but great - this ticks off one item from my wish list from 1996 or thereabouts!

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

The latest dialyzer failures are my fault. I will soon push a correction.

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

I have pushed a partial correction. The placement of sys_core_alias is corrected so that dialyzer will work at all. There are still complaints from dialyzer.

https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

I have now pushed a correction for the dialyzer warning to the same branch.

The goal of this pass is to find values that are built from
patterns and generate aliases for those values to remove
pressure from the GC. For example, this code:

   example({ok, Val}) ->
       {ok, Val}.

shall become:

   example({ok, Val} = Tuple) ->
       Tuple.

Currently this pass aliases tuple and cons nodes made of literals,
variables and other cons. The tuple/cons may appear anywhere in the
pattern and it will be aliased if used later on.

Notice a tuple/cons made only of literals is not aliased as it may
be part of the literal pool.
@josevalim
Copy link
Contributor Author

@bjorng thanks, I was working on a fix but you were faster than me! I have force pushed once more.

@josevalim
Copy link
Contributor Author

josevalim commented Jul 6, 2017

@richcarl glad to hear! Do you have any other pending optimization in that list that would be interested in sharing? 😁

@bjorng
Copy link
Contributor

bjorng commented Jul 6, 2017

Added to our daily builds. I will merge tomorrow unless any new problems show up.

@bjorng bjorng merged commit d256e32 into erlang:master Jul 7, 2017
@josevalim josevalim deleted the jv-sys-core-replace branch July 7, 2017 10:19
@richcarl
Copy link
Contributor

richcarl commented Jul 7, 2017

@josevalim Well, if you want to try something along the same lines as the pattern/term aliasing, I've always wanted to have what is sometimes called "tail merging". In idiomatic Erlang, you often get a bunch of clauses where many of them do the same thing: "case E of P1 -> foo(X+1, ...); P2-> foo(X+1, ...); P3->bar(X-1,...); P4->bar(X-1,...), ... end, ...Exprs... ." This can be rewritten as "case E of P1 -> c1(X,Y,Z); P2 -> c1(X,Y,Z); P3-> c2(X,Y,Z); P4->c2(X,Y,Z); ... end." where "c1(X,Y,Z) -> foo(X+1,...), cc(X,Y,Z).", "c2(X,Y,Z) -> bar(X-1, ...).", and "cc(X,Y,Z) ->...Exprs... ." (You just have to make sure that you get all variable scopes right, and minimize shuffling.) - Note that it might be nicer to do this on the Beam code level (also to handle compiler-generated code from previous passes), so see the examples just as an illustration. It won't speed up anything (and might slow down things a tiny bit), but it will keep the code small even when you repeat the same thing in different clauses.

@josevalim
Copy link
Contributor Author

josevalim commented Jul 11, 2017

Thanks @richcarl, I will also add it to my list.

@bjorng I have just noticed another optimization we could perform on core. Imagine this code:

-module(foo).
-compile(export_all).
bar() ->
  (fun(X) -> X end)(3).

it compiles to this kernel code:

fdef 'bar'/0() =
  do
    bif (internal 'make_fun'/2)('-bar/0-fun-0-', 1) >> <_@c2>
  then
  enter (_@c2)(3)

The optimization is to compile calls on "make_funs" (or on variables that point to make_funs) directly to their local or remote equivalent. In this case, we would compile to:

fdef 'bar'/0() =
  enter (local '-bar/0-fun-0-'/1)(3)

We could also apply a similar optimization to inline_list_funcs. If the function given as argument is a "make_fun", the code emitted by inline_list_funcs could call directly the function as well as avoid the function guard checks.

I believe such change would belong to sys_core_fold. What do you think?

@richcarl
Copy link
Contributor

@josevalim The core-level inliner should remove such applications trivially.

@richcarl
Copy link
Contributor

@josevalim The Erlang community has (at least previously) been averse to inlining by default. Elixir might be a different case, so you could consider enabling the core inliner by default when you compile for Elixir, with suitably tuned size/effort parameters.

@josevalim
Copy link
Contributor Author

@richcarl the problem with the inliner it messes up stacktraces, so it changes the semantics, no? The optimization above is safe, so afaik there is no reason to always do it.

@richcarl
Copy link
Contributor

I wouldn't call stack traces part of the language semantics, but yes, the behaviour of some programs that inspect stack traces could change. (And I would argue that they deserve what they get.) More importantly, before we had line number information, inlining could make it look like an exception was caused in a different location, which is a problem for general debuggability.

@fenollp
Copy link
Contributor

fenollp commented Jul 11, 2017

I have forked @weinholt's Erlang supercompiler project: https://github.com/fenollp/erlscp
It would inline such calls (and others, lists functions can also be inlined but that is opt-in (and work in progress for other modules)).
The way this project handles pattern matching is not great though (some normal code can fail to compile with erlscp), plus this really shouldn't be a parse_transform but a Core Erlang pass.
Posting this here as I am far from being the most knowledgeable person on this topic and I would love to see people getting this project farther.
The future JIT would probably be doing all these optimizations and more, so maybe not the right approach...

@richcarl
Copy link
Contributor

@fenollp Cool! I had not seen that one. Now I must read @weinholt's master's thesis from 2013. Supercompilation is basically partial evaluation on acid. It can turn your code inside out, but is hard to control, and the code can easily explode. I did work on partial evaluation for Erlang in the late 90:s, but the main practical results were the syntax tools, core Erlang, and the inliner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants