-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a new core pass called sys_core_alias #1080
Conversation
06d1ae5
to
83b64ec
Compare
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
83b64ec
to
db1a395
Compare
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Have you run the compiler test suite? I get 8 failed test cases. I think that the code in Testing: Have a look at how test case modules are "cloned" in Note about the time frame: This is too late for OTP 19.0. We could possibly include it in in OTP 19.1 provided that the new pass is turned off by default. |
That's how I started but I ran into issues both when having to pass data from pre to post (as I had to pass it implicitly via the state) and also when handling guards and patterns, as we don't traverse the first (although maybe we should) and the second is traversed differently. I will give this another go now that a big chunk of the solution is implemented. I will look into the remaining of your feedback though and into the tests hopefully today. About the time frame, I have no rush to include this. I am fine with waiting until Erlang 20. Thank you! |
@bjorng I took a look at the failing tests and the failures seem to be caused by the old inliner. I assumed that after sys_core_fold we would no longer have shadowing but this seems to be violated by the old inliner. That said, is my expectation that there should be no shadowing after sys_core_fold true? If it is true, then we need to fix upstream, otherwise we need to consider it in the new pass. Meanwhile I have pushed a new implementation that uses cerl_trees, the trick to solve the problem I mentioned above was to set the patterns to an empty list between pre and post traversal. The code currently commented in the PR is the one that helped me find violations (i.e. variables in patterns being shadowed). |
c913436
to
e0d4eac
Compare
There can always be shadowing in Core Erlang. An optimization pass is not allowed to assume that a previous optional optimization pass has been run; note that almost all optimization passes can be disabled with an option. So you will have to deal with shadowing in your pass. |
http://www.nooooooooooooooo.com/ I will have to think a bit more on how to handle shadowing then, thank you! |
First, I would like to say that I like this new compiler pass! A couple of things. In addition to what @bjorng already has said I also would like you to use As for 19.1 or 20.0. An argument for 19.1 could be testing. Putting it out there as an optional experimental pass before making it default in 20.0. Possibly. |
This compiler pass looks to me like it should encourage sloppy Erlang source code, since it will be a reason to not reference Erlang term reuse explicitly with variables, which reduces the visible equivalence (associations the reader sees, i.e., implementation logic) in the source code. Could we instead have this check create warnings so the source code can be fixed to be clearer? Either way, it seems best that this check applies to all Erlang term types, despite the fact that it should be an O(n^2) penalty (n terms, of length n) to pay in the compiler. If this must be a compiler pass, then all Erlang source code can assume reuse of Erlang terms will naturally happen, and reuse never needs to be explicit, but with a warning, it should help the developer reuse all Erlang terms and attempt to explain why the source code is doing so. |
@okeuday I don't think I agree with your premise. It feels like it boils down to the straw man: "compiler shouldn't do optimizations". For instance, it shouldn't do memoizations on pure calls or subexpr eliminations (which the Erlang compiler doesn't) since it doesn't really do what I tell it to do. Not exactly. Should it do constant folding? We don't do inlining by default since it will mess with tracing .. this is bad. I don't know. Perhaps I agree with the visualization bit of your argument anyways. |
Hi @okeuday, thanks for checking the pass!
This seems to be a very personal interpretation of what constitutes sloppy source code. For example, if I write code such as: {ok, Value} = some_fun(A, B),
print(Value),
{ok, Value}. this pass would ask me to rewrite it to: {ok, Value} = Return = some_fun(A, B),
print(Value),
Return. I don't consider the first version to be sloppy. More importantly, I find the "fixed" version considerably worse than the original. Previously I could look at the last line and see the shape of what is returned. Now I need to chase the Finally, I don't believe it is the role of the Erlang compiler to judge if some code is sloppy or not. If, in any circumstance, I find the version without reusing variables to be more readable, I would like to be able to write it as I prefer without the compiler yelling at me. The job of the compiler is to optimize and it is the developer job to write readable code (although tools like linters can help). Other than that, I agree we would like to make this compiler as effective as possible as long as it runs reasonably fast. It currently covers both cases that matter: tuples and cons. Maps do not apply because patterns are partial. Suggestions are welcome. TL;DR - the compiler should optimize no matter what. The definition of what is sloppy and what is not belongs elsewhere. |
Ok, the compiler shouldn't care about source code style. As you said, tuples and cons should cover all composite data types. Thanks for the explanation. |
e0d4eac
to
074caa8
Compare
I have pushed a new version that takes into account variable shadowing. We keep the list of all variables we have seen so far and, as soon as any of them is shadowed, we create a new substitution record. The logic is in the new I have also spent some time with the |
Here is an attempt to use the data functions. I have not tested it. It combines the last three clauses in
|
And here is an attempt to handle
|
Thank you, those are good improvements. We can also keep the Key as a cons, so we avoid the list_to_tuple calls. :D I will play with those after I write the tests! |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Ping? If you'll do the updates we discussed and rebase the branch on the latest master, we could start testing the branch in our daily builds. |
Now that #1078 has been merged, I will focus on this PR once more. I will let you know when the fixes above have been pushed. |
How about looking for repeated constants too? match_this(<<"my long binary">>, _, Bla) ->
{ok, Blip} = do_something(Bla),
quux(Blip, <<"my long binary">>). Or is |
Repeated literals is not a problem. |
Thanks for the review @bjorng. I am glad to hear the results have been positive! I have left some questions based on your review.
Spoiler: the idea is that we keep a stack of substitutions in the The downside of this approach is that one variable being shadowed disables the optimization for all other variables up to that point. |
lib/compiler/src/sys_core_alias.erl
Outdated
ordsets:is_disjoint(Vars1, Vars2). | ||
|
||
merge_variables(Vars1, Vars2) -> | ||
ordsets:union(Vars1, Vars2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorng should I use cerl_sets
for the functions above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a good idea. Running eprof shows that ordsets:union/2
does seem to consume a significant part of the run time:
$ erlc '+{eprof,sys_core_alias}' -S core_parse.erl
.
.
.
lists:foldl/3 42182 4.15 7123 [ 0.17]
sys_core_alias:'-def/1-anonymous-0-'/2 39306 6.10 10476 [ 0.27]
cerl:is_data/1 38458 6.27 10762 [ 0.28]
cerl_trees:mapfold_list/4 34582 7.87 13515 [ 0.39]
cerl_trees:mapfold/4 39306 8.78 15066 [ 0.38]
ordsets:union/2 70616 10.19 17492 [ 0.25]
cerl:type/1 62350 10.62 18225 [ 0.29]
----------------------------------------------------- ------ ------- ------ [----------]
Total: 695127 100.00% 171679 [ 0.25]
Some time ago we decided to prefer spaces over tabs for indenting. Since |
|
877ad09
to
89e704c
Compare
lib/compiler/src/compile.erl
Outdated
{iff,'to_core',{done,"core"}}]} | ||
{iff,'to_core',{done,"core"}}]}, | ||
{pass,sys_core_alias}, | ||
{iff,dalias,{listing,"core_alias"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorng please verify pass positioning one more time.
All feedback has been addressed and I force pushed to the branch. |
Thanks! I have read the code again and think I understand what it is doing. Looks good. I have pushed a branch with three fixup commits. Please have a look: https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080 The first commit fixes two nit picks. The second commit adds a test case to ensure that the error clause in The third commit tweaks compile.erl. I have moved I have not addressed the Dialyzer errors pointed out by Travis CI. Please fix them. |
No time to read the code properly right now, but great - this ticks off one item from my wish list from 1996 or thereabouts! |
89e704c
to
8254a0c
Compare
The latest dialyzer failures are my fault. I will soon push a correction. |
I have pushed a partial correction. The placement of sys_core_alias is corrected so that dialyzer will work at all. There are still complaints from dialyzer. https://github.com/bjorng/otp/tree/josevalim/compiler/sys-core-alias/PR-1080 |
I have now pushed a correction for the dialyzer warning to the same branch. |
The goal of this pass is to find values that are built from patterns and generate aliases for those values to remove pressure from the GC. For example, this code: example({ok, Val}) -> {ok, Val}. shall become: example({ok, Val} = Tuple) -> Tuple. Currently this pass aliases tuple and cons nodes made of literals, variables and other cons. The tuple/cons may appear anywhere in the pattern and it will be aliased if used later on. Notice a tuple/cons made only of literals is not aliased as it may be part of the literal pool.
8254a0c
to
d4a27e9
Compare
@bjorng thanks, I was working on a fix but you were faster than me! I have force pushed once more. |
@richcarl glad to hear! Do you have any other pending optimization in that list that would be interested in sharing? 😁 |
Added to our daily builds. I will merge tomorrow unless any new problems show up. |
@josevalim Well, if you want to try something along the same lines as the pattern/term aliasing, I've always wanted to have what is sometimes called "tail merging". In idiomatic Erlang, you often get a bunch of clauses where many of them do the same thing: "case E of P1 -> foo(X+1, ...); P2-> foo(X+1, ...); P3->bar(X-1,...); P4->bar(X-1,...), ... end, ...Exprs... ." This can be rewritten as "case E of P1 -> c1(X,Y,Z); P2 -> c1(X,Y,Z); P3-> c2(X,Y,Z); P4->c2(X,Y,Z); ... end." where "c1(X,Y,Z) -> foo(X+1,...), cc(X,Y,Z).", "c2(X,Y,Z) -> bar(X-1, ...).", and "cc(X,Y,Z) ->...Exprs... ." (You just have to make sure that you get all variable scopes right, and minimize shuffling.) - Note that it might be nicer to do this on the Beam code level (also to handle compiler-generated code from previous passes), so see the examples just as an illustration. It won't speed up anything (and might slow down things a tiny bit), but it will keep the code small even when you repeat the same thing in different clauses. |
Thanks @richcarl, I will also add it to my list. @bjorng I have just noticed another optimization we could perform on core. Imagine this code: -module(foo).
-compile(export_all).
bar() ->
(fun(X) -> X end)(3). it compiles to this kernel code:
The optimization is to compile calls on "make_funs" (or on variables that point to make_funs) directly to their local or remote equivalent. In this case, we would compile to:
We could also apply a similar optimization to inline_list_funcs. If the function given as argument is a "make_fun", the code emitted by inline_list_funcs could call directly the function as well as avoid the function guard checks. I believe such change would belong to sys_core_fold. What do you think? |
@josevalim The core-level inliner should remove such applications trivially. |
@josevalim The Erlang community has (at least previously) been averse to inlining by default. Elixir might be a different case, so you could consider enabling the core inliner by default when you compile for Elixir, with suitably tuned size/effort parameters. |
@richcarl the problem with the inliner it messes up stacktraces, so it changes the semantics, no? The optimization above is safe, so afaik there is no reason to always do it. |
I wouldn't call stack traces part of the language semantics, but yes, the behaviour of some programs that inspect stack traces could change. (And I would argue that they deserve what they get.) More importantly, before we had line number information, inlining could make it look like an exception was caused in a different location, which is a problem for general debuggability. |
I have forked @weinholt's Erlang supercompiler project: https://github.com/fenollp/erlscp |
@fenollp Cool! I had not seen that one. Now I must read @weinholt's master's thesis from 2013. Supercompilation is basically partial evaluation on acid. It can turn your code inside out, but is hard to control, and the code can easily explode. I did work on partial evaluation for Erlang in the late 90:s, but the main practical results were the syntax tools, core Erlang, and the inliner. |
Introduce a new core pass called sys_core_alias
The goal of this pass is to find values that are built from patterns and generate aliases for those values to remove pressure from the GC. For example, this code:
shall become:
Currently this pass aliases tuple and cons nodes made of literals, variables and other cons. The tuple/cons may appear anywhere in the pattern and it will be aliased if used later on.
Notice a tuple/cons made only of literals is not aliased as it may be part of the literal pool.
This is a WIP and tests are still pending.