-
Notifications
You must be signed in to change notification settings - Fork 14
Description
The latest release of Mindcode comes with a new Loop Unrolling optimization!
Loop unrolling is an optimization aimed at speeding up program execution, at the price of increased code size. The description of the Mindcode's implementation of this optimization is available here, while here the handling of the tradeoff between the execution speedup and code size increase is explained.
Implementing loop unrolling took quite some time, as the infrastructure supporting this kind of optimization had to be developed first (namely, multiple optimization passes and data flow analysis). The results are worth the work, I'd say :)
The opportunity for loop unrolling might not always arise, as only loops with fixed number of iterations can be unrolled by Mindcode at this moment. (Please note that list iteration loops are, somewhat unexpectedly, not unrolled by current version of Mindcode, despite their number of iterations being always fixed and known to the compiler. Support for list iteration loops unrolling will be added shortly.) Furthermore, using complex expressions in loop condition and loop control variable updates might make it impossible to the optimizer to recognize loops suitable for unrolling. In the right circumstances, though, the speedups can be significant. One of the good opportunities is clearing up memory banks and memory cells. With loop unrolling, the following code:
for i in 0 ... 10
cell1[i] = 0
end
compiles to this:
write 0 cell1 0
write 0 cell1 1
write 0 cell1 2
write 0 cell1 3
write 0 cell1 4
write 0 cell1 5
write 0 cell1 6
write 0 cell1 7
write 0 cell1 8
write 0 cell1 9
reducing the number of instructions executed to a third, as the instructions to increase the i variable, as well as the conditional jump terminating the loop itself, are eliminated.
Some examples of speedups I've noticed in my Mindcode codebase:
- One of my programs contains a loop that clears the entire memory bank, all 512 entries. The time it takes to clear the memory bank using Logic Processor was clearly noticeable. As the program is not very large overall, there's enough space to unroll the entire loop, reducing the initialization time by two thirds.
- I've adapted a beautiful mlogjs Breakout implementation to Mindcode, for compiler benchmarking. There are two nested loops handling bricks drawing/collision testing, and both of these can be unrolled. The speedup is hard to measure exactly, but is discernible just by watching both version of the code running side-by-side.
- Included in the Mindcode test suite are a few small programs which are compiled and executed on an emulated processor. The run statistics, including the number of instruction executed while running the task to completion, are logged into a versioned text file, so that the gradual improvement (or deterioration) of the compiled code can be reviewed. While the test programs certainly aren't representative of typical in-game Mindustry Logic programs, they do demonstrate the evolution of the compiler. The files showcasing loop unrolling benefits are here and here (look at the history of these files).
The Loop Unrolling is quite new and while I sought to test it well, issues with the new code are certainly possible. Any feedback about this feature, especially bug reports, are very welcome.