-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code generation optimizations #87
Comments
Also note that the Z-code compiler uses a Also from that forum thread:
|
General note: My usual rule is to not change code generation without a good reason. There's some value in being able to rebuild an old game file from old source exactly. There's a lot of value in being able to rebuild an old game file at all. (We really don't want changes that make game files larger -- that might bump a game over a Z-code size limit.) Optimizations like this count as a good reason. (They make game files shorter!) However, we might want an optimization setting, whose values are "better code" or "compile like it's 2010", defaulting to "better code". On the flip side, this switch might make the compiler source pretty murky! We'll have to see what the implementation looks like. |
A branch gets emitted if there's nothing inside:
|
Another example I just noticed:
This generates |
|
I7 makes a whole lot of dead code. Here's another example:
|
What I7 code generated that? |
From Counterfeit Monkey:
I suspect it happens any time the rulebook has a default outcome of success and the rule manually succeeds as well. |
Probably, yeah. Thanks. |
This code:
gets compiled to this:
When an if-clause only holds a jump, it would be smart to perform that jump immediatiely in the compare+jump instruction. |
I took a stab at the Say you have lines like
You'd expect the entire second line to be optimized out. But the We're going to need a way to "this statement is unreachable to begin with, so all labels within it are phantoms." But what if there's an explicit label inside the For a concrete example: there are two places in I6 lib 6/11 where this kind of optimization kicks in.
TASKS_PROVIDED and SACK_OBJECT may be known at compile time, so we want to skip the entire following block. But with the change I've got, only a few opcodes get skipped before we hit a label. |
Okay, I have working implementation. This turned out to be a difficult problem! Discussion: https://intfiction.org/t/i6-compiler-code-generation-improvements/54739 I'd like more testing from more people before I open a PR. This change does the following:
It does not try to:
|
You can see a diff of the The majority of the changes are dropping unused labels; this makes no difference to the generated code. In a few places (see lines 128, 223, 572, 623, 634, 812, 833, 874) we drop an unused label after a jump/return and a following instruction, because that instruction can never be reached. On lines 520 and 532, we drop a The two big chunks of code which get eliminated are associated with TASKS_PROVIDED and SACK_OBJECT, as noted above. |
PR: #164 . This is largely the same as what I had last month, but I've improved the "statement never reached" warning logic. |
@erkyrath Can this issue be closed, or is there still more to be done? |
This issue turned into a grab-bag. Ideas still open:
Up to you whether those should be split out into separate issues. The first one is easy. The other two are hard and may never get done. |
I don't think there's much to be gained in splitting the issues out, happy to leave this open while there are still further optimizations possible. |
Adding this ... Looking at output generated with
and
Couldn't this be simplified to (a tleast for zcode):
respectivly, and save 4 bytes per occurance? |
I think this would boil down to adding a stack-pointer case to all the places in expressc.c that look like
(Pretty sure none of these cases go on to access temp_var1 after that. It's always used as a strict temporary.) |
Z-code could optimize Expressions like (They wouldn't be common in hand-written code, but I7-generated code does all sorts of awkward stuff. Or you might write |
I was trying to protect very old interpreters that lacked the We can just generate I do not plan to change the strict-mode RT__ChPrintC() routine, which does extra error checking. |
Probably not trivial, but... Code like below (in the
The
The compiler knows at compile time if |
That boils down to "constant-fold the |
Revisting this issue I have created a branch (https://github.com/heasm66/Inform6/tree/code_generating_optimizations) that changes the code generated for Change patterns like:
-->
and
-->
This was pretty common patterns and quite easy to identify. It saves ~250 bytes on Advent and ~400 bytes on Curses. Example code in
|
Thanks for looking at this. As usual, any change that affects code generation (in a non-opt-in way) is a very large testing burden. (We have to test game file behavior rather than just checking that the game file is identical.) So I'd rather accumulate a bunch of these optimizations and put them in a future I6 release so we can test them all at once. Before we get there, I'd also like to add some game-file-behavior tests in the inform6-testing suite! It currently only has "game file is identical" tests. So that needs some attention. It may be a few weeks before I have a chance to look at it. |
I understand what you are saying... How about introducing an option ($OPTIMIZE_CODE or something) that would allow optimizations to be introduced on an "use at your own risk" basis and improvements can be introduced in smaller chunks and over a longer period? Doing it this way would lessen the burden to testing that no old gamecode will get broken. Newly constructed games that is in development will probable be more extensively tested and developer would be aware that it is a more of an "experimental" option. We still need some testing, of course. I ran through Curses with a walkthrough (without issues) but I don't know how to automate this process. |
We could do that, but it doesn't decrease the testing burden because then we have to test with the option both on and off. We could also maintain an "optimization" branch in the repo for people who want to live on the bleeding edge. |
In the same branch as above I've added suggestion-code for these simplifications:
I've also tested a playthrough of Curses without issues. Current code optimization in branch:
Maintaing an "optimization" branch that the "in" people can run through extensive testing could work. |
Adding this, may not be trivial to change.
Example 1:
Example 2:
It seems that either the The generating code is in
Either The special switch-statements that switches on verbs don't use the
|
I've made progress on your suggestions. The inform6-testing repo now has support for running regtest scripts on the compiled game files. I'm using a complete Advent.inf run as the big test, but I also check Library of Horror, and all the unit tests that print "All passed", and a few other cases. See directory: https://github.com/erkyrath/Inform6-Testing/tree/master/reg I have a branch with your optimizations (edited for code style): https://github.com/erkyrath/Inform6/tree/optimization This passes all the regtest scripts. I don't want to push it in just yet though. There are more cases which could benefit from similar code. For example, I might also want to port some of these fixes to the Glulx side. Feel free to give the |
Sorry for the delay... (Feels like I mostly struggled with cherry-picking your commits into my own fork, guess I'm not fluent yet in git/github parlor.) I have tested the These patterns are fairly common (I guess these are the ones you mean):
There're also a couple of places where a result is stored on the stack then immediately pulled and tested on. These are maybe harder to identify when they probably a spread out over multiple calls to
|
Yeah, those. This change handles those cases neatly. I couldn't find any other cases subject to this sort of pinhole optimization. |
Whoops, found a bunch more! And yes, such lines turn up in I7-generated code all the time. Even the I6 parser has the line
...and |
Took the latest changes out for a testrun. It saved an additonal Nice! |
A routine like this:
Compiles to:
The instruction label I have a first code suggestion that does this in the branch |
Nifty. I'll take a look. |
Imported and tidied up: https://github.com/erkyrath/Inform6/compare/8916ebb..ae74890 I note that this doesn't eliminate the destination |
Well if a more thorough DCE ever gets implemented (as requested above #87 (comment)) it would presumably handle that too. |
Yeah, I thought about the possibility that the destination label most likely gets orphaned but couldn't find any easy solution. As I understand it the label currently have a flag if it's used or not (true/false). Maybe this could be a counter instead that keeps track on how many times it's used. If you remove a reference to a label you decrease the counter and when it reaches 0 it's unused. I'm gonna make a stab at a jump immediately following a branch (above) because the fix probably also will be in the |
A suggestion for optimization when a jump immediately follows a branch is in this commit: heasm66@893c531 Some current benchmarks (including this change):
|
There is solid dead-code elimination now. But it happens during parse_routine(), whereas this change is in assemble_routine_end(), which is a later phase! Maybe this whole question needs to be addressed earlier, at parse_routine() time. But that's tricky because when you're originally generating the (forward) jump, you don't know the destination opcode yet. |
I've sifted through the thread and see these remaing issues:
Any other? |
We already catch |
(As suggested in this thread: https://intfiction.org/t/next-steps-for-inform-6-compiler/51008)
For lines like
...the compiler generates an unconditional jump, but then generates the return opcode anyway.
Ideally, the author would use
#ifdef
instead ofif
for this sort of code. However, I7 generates trivial conditions sometimes. E.g....generates the stanza:
The compiler successfully collapses
(~~(((true) && (true))))
to(false)
but still generates thev = GPR_FAIL;
assignment.This can almost be addressed with the
execution_never_reaches_here
flag, except that that flag is currently used to generate the "statement can never be reached" warning, which is not exactly the same thing. (Seeassembleg_1_branch()
, which turns off that flag to avoid generating a warning on "if (false)..." because the author probably did it on purpose.) (I7 didn't do it on purpose, but I7 suppresses all warnings anyhow.)I suspect the fix here is for
execution_never_reaches_here
to have three states: false, true, and "true but the author did it on purpose". If non-false, we can skip generating code entirely in assembleg_instruction() / assemblez_instruction().(Is it safe to just bail out of those functions? What about inline strings in Z-code? Sequence points? Symbols used only in dead code? So many corner cases to check!)
Note that the compiler assumes that every jump label is live code, even if no jump statement goes there. (See
assemble_label_no()
.) We should probably keep this. It allows the author to avoid dead-branch optimization on a case-by-case basis:(This might be needed for occult code generation techniques like inter-function jumps. Which we don't support, but maybe someone is trying to do it anyway.)
The text was updated successfully, but these errors were encountered: