Introduce a new core pass called sys_core_alias #1080

josevalim · 2016-06-01T17:33:04Z

Introduce a new core pass called sys_core_alias

The goal of this pass is to find values that are built from patterns and generate aliases for those values to remove pressure from the GC. For example, this code:

   example({ok, Val}) ->
       {ok, Val}.

shall become:

   example({ok, Val} = Tuple) ->
       Tuple.

Currently this pass aliases tuple and cons nodes made of literals, variables and other cons. The tuple/cons may appear anywhere in the pattern and it will be aliased if used later on.

Notice a tuple/cons made only of literals is not aliased as it may be part of the literal pool.

This is a WIP and tests are still pending.

OTP-Maintainer · 2016-06-01T18:56:03Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

OTP-Maintainer · 2016-06-02T18:07:11Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

bjorng · 2016-06-03T04:05:08Z

Have you run the compiler test suite? I get 8 failed test cases.

I think that the code in sys_core_alias could be significantly reduced by using cerl_trees:map fold/3 or cerl_trees:map_fold/4 to do the boring (and error-prone) traversing of the Core Erlang code. I also want to recommend the functions for handling "data" in cerl that simplifies handling of lists, tuples, and literals (see cerl:is_data/1 and the following functions).

Testing: Have a look at how test case modules are "cloned" in lib/compiler/test/Makefile. It is important to run some of the test suites both with and without the new pass.

Note about the time frame: This is too late for OTP 19.0. We could possibly include it in in OTP 19.1 provided that the new pass is turned off by default.

josevalim · 2016-06-03T07:58:52Z

I think that the code in sys_core_alias could be significantly reduced by using cerl_trees:map fold/3 or cerl_trees:map_fold/4 to do the boring (and error-prone) traversing of the Core Erlang code.

That's how I started but I ran into issues both when having to pass data from pre to post (as I had to pass it implicitly via the state) and also when handling guards and patterns, as we don't traverse the first (although maybe we should) and the second is traversed differently. I will give this another go now that a big chunk of the solution is implemented.

I will look into the remaining of your feedback though and into the tests hopefully today. About the time frame, I have no rush to include this. I am fine with waiting until Erlang 20. Thank you!

josevalim · 2016-06-03T11:27:35Z

@bjorng I took a look at the failing tests and the failures seem to be caused by the old inliner. I assumed that after sys_core_fold we would no longer have shadowing but this seems to be violated by the old inliner. That said, is my expectation that there should be no shadowing after sys_core_fold true? If it is true, then we need to fix upstream, otherwise we need to consider it in the new pass.

Meanwhile I have pushed a new implementation that uses cerl_trees, the trick to solve the problem I mentioned above was to set the patterns to an empty list between pre and post traversal. The code currently commented in the PR is the one that helped me find violations (i.e. variables in patterns being shadowed).

bjorng · 2016-06-03T12:57:03Z

There can always be shadowing in Core Erlang. An optimization pass is not allowed to assume that a previous optional optimization pass has been run; note that almost all optimization passes can be disabled with an option.

So you will have to deal with shadowing in your pass.

josevalim · 2016-06-03T13:12:55Z

So you will have to deal with shadowing in your pass.

http://www.nooooooooooooooo.com/

I will have to think a bit more on how to handle shadowing then, thank you!

psyeugenic · 2016-06-03T13:19:14Z

First, I would like to say that I like this new compiler pass!

A couple of things. In addition to what @bjorng already has said I also would like you to use cerl library as much as possible instead of using records directly. It's cleaner.
The second thing is that the bar for testing is a bit higher for the compiler. Make sure you try to reach 100% code coverage in your pass.

As for 19.1 or 20.0. An argument for 19.1 could be testing. Putting it out there as an optional experimental pass before making it default in 20.0. Possibly.

okeuday · 2016-06-03T14:35:00Z

This compiler pass looks to me like it should encourage sloppy Erlang source code, since it will be a reason to not reference Erlang term reuse explicitly with variables, which reduces the visible equivalence (associations the reader sees, i.e., implementation logic) in the source code.

Could we instead have this check create warnings so the source code can be fixed to be clearer?

Either way, it seems best that this check applies to all Erlang term types, despite the fact that it should be an O(n^2) penalty (n terms, of length n) to pay in the compiler. If this must be a compiler pass, then all Erlang source code can assume reuse of Erlang terms will naturally happen, and reuse never needs to be explicit, but with a warning, it should help the developer reuse all Erlang terms and attempt to explain why the source code is doing so.

psyeugenic · 2016-06-03T14:56:28Z

@okeuday I don't think I agree with your premise. It feels like it boils down to the straw man: "compiler shouldn't do optimizations". For instance, it shouldn't do memoizations on pure calls or subexpr eliminations (which the Erlang compiler doesn't) since it doesn't really do what I tell it to do. Not exactly. Should it do constant folding? We don't do inlining by default since it will mess with tracing .. this is bad.

I don't know. Perhaps I agree with the visualization bit of your argument anyways.
It's great that you question an additional pass! Lob all the grenades you can at it and see if it holds.

josevalim · 2016-06-03T15:15:27Z

Hi @okeuday, thanks for checking the pass!

This compiler pass looks to me like it should encourage sloppy Erlang source code, since it will be a reason to not reference Erlang term reuse explicitly with variables, which reduces the visible equivalence (associations the reader sees, i.e., implementation logic) in the source code.

This seems to be a very personal interpretation of what constitutes sloppy source code. For example, if I write code such as:

{ok, Value} = some_fun(A, B),
print(Value),
{ok, Value}.

this pass would ask me to rewrite it to:

{ok, Value} = Return = some_fun(A, B),
print(Value),
Return.

I don't consider the first version to be sloppy. More importantly, I find the "fixed" version considerably worse than the original. Previously I could look at the last line and see the shape of what is returned. Now I need to chase the Return variable wherever it is defined. Well, that's my personal preference.

Finally, I don't believe it is the role of the Erlang compiler to judge if some code is sloppy or not. If, in any circumstance, I find the version without reusing variables to be more readable, I would like to be able to write it as I prefer without the compiler yelling at me. The job of the compiler is to optimize and it is the developer job to write readable code (although tools like linters can help).

Other than that, I agree we would like to make this compiler as effective as possible as long as it runs reasonably fast. It currently covers both cases that matter: tuples and cons. Maps do not apply because patterns are partial. Suggestions are welcome.

TL;DR - the compiler should optimize no matter what. The definition of what is sloppy and what is not belongs elsewhere.

okeuday · 2016-06-03T16:03:00Z

Ok, the compiler shouldn't care about source code style. As you said, tuples and cons should cover all composite data types. Thanks for the explanation.

josevalim · 2016-06-04T12:22:46Z

I have pushed a new version that takes into account variable shadowing. We keep the list of all variables we have seen so far and, as soon as any of them is shadowed, we create a new substitution record. The logic is in the new sub_fold and sub_unfold functions. With those changes, the test suite no longer fails. 🎉

I have also spent some time with the cerl functions and I fail to understand how I could use the cerl data functions. We are not interested in data nodes since they must have at least one variable in them. I could the data functions to check if I need to traverse the node or not but that wouldn't make the code any shorter as I would still need my traversing function. Am I missing something?

bjorng · 2016-06-05T04:25:16Z

Here is an attempt to use the data functions. I have not tested it. It combines the last three clauses in pre/2 into a single clause. It is not much shorter but more readable (IMO). I have assumed that the cerl functions are imported (heavy users of cerl import all functions that are used).

pre(Node, Sub0) ->
    case is_data(Node) andalso not is_literal(Node) of
        false ->
            {Node,Sub0};
        true ->
            Kind = data_type(Node),
            Es = data_es(Node),
            case sub_cache_nodes(Kind, Es, Sub0) of
                {Name,Sub1} ->
                    {ann_c_var(get_ann(Node), Name),Sub1};
                error ->
                    {Node,Sub0}
            end
    end.

bjorng · 2016-06-05T04:35:15Z

And here is an attempt to handle nodes_to_key/3. Again not tested. It handles tuples, cons cells, literals, and the default case:

nodes_to_key([Node|T], Acc0, AllLiteral0) ->
    case is_data(Node) of
        false ->
            error;
        true ->
            case nodes_to_key(data_es(Nodes), [], AllLiteral0) of
                {Es,AllLiteral1} ->
                    Key = list_to_tuple([data_type(Node)|Es]),
                    nodes_to_key(T, [Key|Acc0], AllLiteral1);
                error ->
                    error
            end
    end;

josevalim · 2016-06-05T09:08:55Z

Thank you, those are good improvements. We can also keep the Key as a cons, so we avoid the list_to_tuple calls. :D I will play with those after I write the tests!

OTP-Maintainer · 2016-06-05T18:21:25Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

bjorng · 2016-09-22T13:10:27Z

Ping?

If you'll do the updates we discussed and rebase the branch on the latest master, we could start testing the branch in our daily builds.

josevalim · 2017-02-01T17:20:31Z

Now that #1078 has been merged, I will focus on this PR once more. I will let you know when the fixes above have been pushed.

fenollp · 2017-02-02T09:04:41Z

We are not interested in data nodes since they must have at least one variable in them.

How about looking for repeated constants too?

match_this(<<"my long binary">>, _, Bla) ->
    {ok, Blip} = do_something(Bla),
    quux(Blip, <<"my long binary">>).

Or is <<"my long binary">> caught already?

psyeugenic · 2017-02-02T09:12:29Z

Repeated literals is not a problem.

josevalim · 2017-07-05T14:30:28Z

Thanks for the review @bjorng. I am glad to hear the results have been positive! I have left some questions based on your review.

try to wrap my head around how you handle variable shadowing.

Spoiler: the idea is that we keep a stack of substitutions in the #sub.t field. If there is shadowing in sub_fold, a new sub is created with the previous sub stored in t. If there is no shadowing, we store the current t to be kept as is on sub_unfold.

The downside of this approach is that one variable being shadowed disables the optimization for all other variables up to that point.

josevalim · 2017-07-05T14:39:49Z

lib/compiler/src/sys_core_alias.erl

+    ordsets:is_disjoint(Vars1, Vars2).
+
+merge_variables(Vars1, Vars2) ->
+    ordsets:union(Vars1, Vars2).


@bjorng should I use cerl_sets for the functions above?

That could be a good idea. Running eprof shows that ordsets:union/2 does seem to consume a significant part of the run time:

$ erlc '+{eprof,sys_core_alias}' -S core_parse.erl . . . lists:foldl/3 42182 4.15 7123 [ 0.17] sys_core_alias:'-def/1-anonymous-0-'/2 39306 6.10 10476 [ 0.27] cerl:is_data/1 38458 6.27 10762 [ 0.28] cerl_trees:mapfold_list/4 34582 7.87 13515 [ 0.39] cerl_trees:mapfold/4 39306 8.78 15066 [ 0.38] ordsets:union/2 70616 10.19 17492 [ 0.25] cerl:type/1 62350 10.62 18225 [ 0.29] ----------------------------------------------------- ------ ------- ------ [----------] Total: 695127 100.00% 171679 [ 0.25]

bjorng · 2017-07-06T05:19:35Z

Some time ago we decided to prefer spaces over tabs for indenting. Since sys_core_alias is a completely new module, could you make that all indentation is done using spaces?

josevalim · 2017-07-06T09:19:40Z

Some time ago we decided to prefer spaces over tabs for indenting. Since sys_core_alias is a completely new module, could you make that all indentation is done using spaces?

Yes! Yes!

josevalim · 2017-07-06T10:04:09Z

lib/compiler/src/compile.erl

-       {iff,'to_core',{done,"core"}}]}
+       {iff,'to_core',{done,"core"}}]},
+     {pass,sys_core_alias},
+     {iff,dalias,{listing,"core_alias"}}


@bjorng please verify pass positioning one more time.

josevalim · 2017-07-06T10:04:30Z

All feedback has been addressed and I force pushed to the branch.

bjorng · 2017-07-06T11:21:43Z

Thanks! I have read the code again and think I understand what it is doing. Looks good.

I have pushed a branch with three fixup commits. Please have a look:

https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080

The first commit fixes two nit picks.

The second commit adds a test case to ensure that the error clause in nodes_to_alias/5 is covered. I think that I gave you the name of the wrong function. Now every line is covered.

The third commit tweaks compile.erl. I have moved sys_core_alias so that it can be turned off with no_copt (along with sys_core_fold). I have also added no_fold to turn off sys_core_fold and no_alias to turn off sys_core_alias. Those options are very useful when investigating compiler bugs.

I have not addressed the Dialyzer errors pointed out by Travis CI. Please fix them.

richcarl · 2017-07-06T13:32:27Z

No time to read the code properly right now, but great - this ticks off one item from my wish list from 1996 or thereabouts!

bjorng · 2017-07-06T14:46:32Z

The latest dialyzer failures are my fault. I will soon push a correction.

bjorng · 2017-07-06T14:54:23Z

I have pushed a partial correction. The placement of sys_core_alias is corrected so that dialyzer will work at all. There are still complaints from dialyzer.

https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080

bjorng · 2017-07-06T15:00:38Z

I have now pushed a correction for the dialyzer warning to the same branch.

The goal of this pass is to find values that are built from patterns and generate aliases for those values to remove pressure from the GC. For example, this code: example({ok, Val}) -> {ok, Val}. shall become: example({ok, Val} = Tuple) -> Tuple. Currently this pass aliases tuple and cons nodes made of literals, variables and other cons. The tuple/cons may appear anywhere in the pattern and it will be aliased if used later on. Notice a tuple/cons made only of literals is not aliased as it may be part of the literal pool.

josevalim · 2017-07-06T15:09:02Z

@bjorng thanks, I was working on a fix but you were faster than me! I have force pushed once more.

josevalim · 2017-07-06T15:10:20Z

@richcarl glad to hear! Do you have any other pending optimization in that list that would be interested in sharing? 😁

bjorng · 2017-07-06T15:34:26Z

Added to our daily builds. I will merge tomorrow unless any new problems show up.

richcarl · 2017-07-07T10:25:29Z

@josevalim Well, if you want to try something along the same lines as the pattern/term aliasing, I've always wanted to have what is sometimes called "tail merging". In idiomatic Erlang, you often get a bunch of clauses where many of them do the same thing: "case E of P1 -> foo(X+1, ...); P2-> foo(X+1, ...); P3->bar(X-1,...); P4->bar(X-1,...), ... end, ...Exprs... ." This can be rewritten as "case E of P1 -> c1(X,Y,Z); P2 -> c1(X,Y,Z); P3-> c2(X,Y,Z); P4->c2(X,Y,Z); ... end." where "c1(X,Y,Z) -> foo(X+1,...), cc(X,Y,Z).", "c2(X,Y,Z) -> bar(X-1, ...).", and "cc(X,Y,Z) ->...Exprs... ." (You just have to make sure that you get all variable scopes right, and minimize shuffling.) - Note that it might be nicer to do this on the Beam code level (also to handle compiler-generated code from previous passes), so see the examples just as an illustration. It won't speed up anything (and might slow down things a tiny bit), but it will keep the code small even when you repeat the same thing in different clauses.

josevalim · 2017-07-11T11:26:34Z

Thanks @richcarl, I will also add it to my list.

@bjorng I have just noticed another optimization we could perform on core. Imagine this code:

-module(foo).
-compile(export_all).
bar() ->
  (fun(X) -> X end)(3).

it compiles to this kernel code:

fdef 'bar'/0() =
  do
    bif (internal 'make_fun'/2)('-bar/0-fun-0-', 1) >> <_@c2>
  then
  enter (_@c2)(3)

The optimization is to compile calls on "make_funs" (or on variables that point to make_funs) directly to their local or remote equivalent. In this case, we would compile to:

fdef 'bar'/0() =
  enter (local '-bar/0-fun-0-'/1)(3)

We could also apply a similar optimization to inline_list_funcs. If the function given as argument is a "make_fun", the code emitted by inline_list_funcs could call directly the function as well as avoid the function guard checks.

I believe such change would belong to sys_core_fold. What do you think?

richcarl · 2017-07-11T19:41:53Z

@josevalim The core-level inliner should remove such applications trivially.

richcarl · 2017-07-11T20:08:47Z

@josevalim The Erlang community has (at least previously) been averse to inlining by default. Elixir might be a different case, so you could consider enabling the core inliner by default when you compile for Elixir, with suitably tuned size/effort parameters.

josevalim · 2017-07-11T20:43:19Z

@richcarl the problem with the inliner it messes up stacktraces, so it changes the semantics, no? The optimization above is safe, so afaik there is no reason to always do it.

richcarl · 2017-07-11T20:59:54Z

I wouldn't call stack traces part of the language semantics, but yes, the behaviour of some programs that inspect stack traces could change. (And I would argue that they deserve what they get.) More importantly, before we had line number information, inlining could make it look like an exception was caused in a different location, which is a problem for general debuggability.

fenollp · 2017-07-11T23:18:21Z

I have forked @weinholt's Erlang supercompiler project: https://github.com/fenollp/erlscp
It would inline such calls (and others, lists functions can also be inlined but that is opt-in (and work in progress for other modules)).
The way this project handles pattern matching is not great though (some normal code can fail to compile with erlscp), plus this really shouldn't be a parse_transform but a Core Erlang pass.
Posting this here as I am far from being the most knowledgeable person on this topic and I would love to see people getting this project farther.
The future JIT would probably be doing all these optimizations and more, so maybe not the right approach...

richcarl · 2017-07-12T08:37:53Z

@fenollp Cool! I had not seen that one. Now I must read @weinholt's master's thesis from 2013. Supercompilation is basically partial evaluation on acid. It can turn your code inside out, but is hard to control, and the code can easily explode. I did work on partial evaluation for Erlang in the late 90:s, but the main practical results were the syntax tools, core Erlang, and the inliner.

josevalim force-pushed the jv-sys-core-replace branch from 06d1ae5 to 83b64ec Compare June 1, 2016 18:30

psyeugenic added team:MW enhancement labels Jun 1, 2016

josevalim force-pushed the jv-sys-core-replace branch from 83b64ec to db1a395 Compare June 1, 2016 21:05

psyeugenic assigned bjorng Jun 2, 2016

josevalim force-pushed the jv-sys-core-replace branch 2 times, most recently from c913436 to e0d4eac Compare June 3, 2016 12:03

josevalim force-pushed the jv-sys-core-replace branch from e0d4eac to 074caa8 Compare June 4, 2016 12:13

bjorng added the waiting waiting for changes/input from author label Sep 22, 2016

josevalim commented Jul 5, 2017

View reviewed changes

josevalim force-pushed the jv-sys-core-replace branch from 877ad09 to 89e704c Compare July 6, 2017 10:03

josevalim commented Jul 6, 2017

View reviewed changes

josevalim force-pushed the jv-sys-core-replace branch from 89e704c to 8254a0c Compare July 6, 2017 14:06

josevalim force-pushed the jv-sys-core-replace branch from 8254a0c to d4a27e9 Compare July 6, 2017 15:08

bjorng merged commit d256e32 into erlang:master Jul 7, 2017

josevalim deleted the jv-sys-core-replace branch July 7, 2017 10:19

InoMurko mentioned this pull request Feb 14, 2020

Add documentation to @moduledocs and @docs omgnetwork/elixir-omg#1307

Merged

mheiber mentioned this pull request Jan 12, 2021

Eqwalizer: very initial minimalistic version WhatsApp/erlt#477

Merged

Introduce a new core pass called sys_core_alias #1080

Introduce a new core pass called sys_core_alias #1080

Conversation

josevalim commented Jun 1, 2016 • edited Loading

OTP-Maintainer commented Jun 1, 2016

OTP-Maintainer commented Jun 2, 2016

bjorng commented Jun 3, 2016 • edited Loading

josevalim commented Jun 3, 2016

josevalim commented Jun 3, 2016 • edited Loading

bjorng commented Jun 3, 2016

josevalim commented Jun 3, 2016 • edited Loading

psyeugenic commented Jun 3, 2016

okeuday commented Jun 3, 2016

psyeugenic commented Jun 3, 2016

josevalim commented Jun 3, 2016

okeuday commented Jun 3, 2016

josevalim commented Jun 4, 2016

bjorng commented Jun 5, 2016

bjorng commented Jun 5, 2016 • edited Loading

josevalim commented Jun 5, 2016

OTP-Maintainer commented Jun 5, 2016

bjorng commented Sep 22, 2016

josevalim commented Feb 1, 2017

fenollp commented Feb 2, 2017

psyeugenic commented Feb 2, 2017

josevalim commented Jul 5, 2017

josevalim Jul 5, 2017

Choose a reason for hiding this comment

bjorng Jul 6, 2017

Choose a reason for hiding this comment

bjorng commented Jul 6, 2017

josevalim commented Jul 6, 2017

josevalim Jul 6, 2017

Choose a reason for hiding this comment

josevalim commented Jul 6, 2017

bjorng commented Jul 6, 2017

richcarl commented Jul 6, 2017

bjorng commented Jul 6, 2017

bjorng commented Jul 6, 2017

bjorng commented Jul 6, 2017

josevalim commented Jul 6, 2017

josevalim commented Jul 6, 2017 • edited Loading

bjorng commented Jul 6, 2017

richcarl commented Jul 7, 2017

josevalim commented Jul 11, 2017 • edited Loading

richcarl commented Jul 11, 2017

richcarl commented Jul 11, 2017

josevalim commented Jul 11, 2017

richcarl commented Jul 11, 2017

fenollp commented Jul 11, 2017

richcarl commented Jul 12, 2017

josevalim commented Jun 1, 2016 •

edited

Loading

bjorng commented Jun 3, 2016 •

edited

Loading

josevalim commented Jun 3, 2016 •

edited

Loading

josevalim commented Jun 3, 2016 •

edited

Loading

bjorng commented Jun 5, 2016 •

edited

Loading

josevalim commented Jul 6, 2017 •

edited

Loading

josevalim commented Jul 11, 2017 •

edited

Loading