New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCC ARM optimization flag should be -Os, not -O2 for GCC versions later than 4.5.3 #664
Comments
Hi, thanks for looking at gcc toolchain in Tools. You can send a pull request for these changes. If we switch O0 to Og, we should state it somewhere that it requires GCC 4.8. |
I just thought I would give my thoughts on this issue. Take it for what it is worth, which is probably not much :) TL;DR I think the defaults for 'Release' builds should be -O2 and 'Debug' builds should be -O0. While the following code snippet is a bit contrived, it does demonstrate issues I have really encountered with the various optimization settings in GCC over the last couple of years. #include "LPC17xx.h"
volatile int g_LoopDummy;
int main(int argc, char** argv)
{
LPC_GPIO1->FIODIR |= 1 << 18; // P1.18 connected to LED1
while(1)
{
LPC_GPIO1->FIOPIN ^= 1 << 18; // Toggle P1.18
for (int i = 0 ; i < 5000000 && !g_LoopDummy ; i++)
{
}
}
return 0;
} This is the disassembly of main() when the optimization level was set to -O2, my preferred level.
This is the disassembly of main() when -Os is instead used.
These two examples demonstrate the type of issue I have often encountered with -Os code generation. The global addresses of volatile variables are constant but get treated as loop variant in -Os. In the above example, you will see that the address of g_loopDummy is 0x10000354 for both examples. This address is loaded into r1 at address 0x122 when using -O2. This load is outside of any of the loops. However it is loaded into r1 at address 0x136 when the optimization level is set to -Os. This places it inside of the loop so it happens for every iteration of the loop. In my experience this ends up having a noticeable slow down on some real world driver code which perform such bit twiddling on device registers. I don't know why -Os does this. It just makes the code slower and doesn't result in smaller code. On the plus side, -Os does typically generate smaller code and I have used it in situations where I really need to get the smallest possible code but due to issues such as the above example, I don't use it until I really need to. This is the disassembly of main() when it is built with -O0. It is quite a bit longer than any of the others.
This is the same code compiled with -Og
The code is indeed smaller than when compiled with -O0. However, I don't know if the debugging experience will be quite what people expect when they create a 'Debug' build. The following shows an sample GDB session with this -Og compiled version.
Here I tried to set a breakpoint on line 25 which should be the line of code which toggles the P1.18 pin. If you look at the address this resolved to, 0x12c, in the disassembly you will see that this address is outside of the loop so it will only be hit once and then never again. This is not what a user would expect when debugging a "Debug" build.
Never hits it again and I end up manually breaking in.
This is a very contrived case for this simple code snippet but let me assure you it happens with real code as well. The argc parameter isn't used by this code (and isn't really set by the code which calls main() either) so it's value wasn't maintained. If I try to dump the argc parameter, I just get this warning. There are scenarios where variables like this are optimized out by the time you get to some code which crashes but if you had access to it, it would give you more information about what scenario led to the issue. Typically I want to have access to as many variables as possible in my 'Debug' builds. |
Hi, I think it would be better to add the optimization option for example,
dinau |
As an "easy" task while browsing through the project python tools I took on @dinau's suggestion from above. Sample implementation for GCC toolchain is here: shirishb/mbed@c3ea4e4 It does not resolve the core issue here, but if the approach is acceptable I can extend it to cover other toolchains and submit a pull request. |
Hi, I totally agreed with the suggestion of Adam, and option "-Og" is still quite error-prone on arm back end currently, so adding a switch would be more flexible. @adamgreen For your sample code, I think you could report it on https://launchpad.net/gcc-arm-embedded if you haven't already done so. There's a quite active group on launchpad working on the embedded gcc toolchain. |
I completely overlooked the dinau's suggestion. I proposed it last year, the feedback was negative, thus I just added the option - debug_info which sets them to 0, as we use it now. |
-Os certainly produces slower code than -O2. That's the trade-off. (As to Adam's specific example - the optimisations being turned off by -Os are "high level" early ones that have a tendency to lead to ultimately bigger code. In some specific cases the optimisation may not have actually led to bigger final code but there's no multiple pass system to go back and try again if you find it wasn't a space benefit in the end. In this case it presumably doesn't hoist the constant out because that optimisation can increase size - the extra register required to hold the address increases register pressure which may lead to more loads/stores. In this case it doesn't, because the loop contents are so simple. And it knows that a literal load really isn't that expensive - the trade-off is different to hoisting a real subexpression.) But I know that for everything we work on in the 6LoWPAN area, space not speed is the issue. We've got more processor power than we know what to do with, compared to the speed of 6LoWPAN networks. So I would always choose lower size in the size/speed tradeoff. I think -Os would be a more sensible default than -O2, but given that it is a trade-off, there should indeed be an easy way for users to flip it to -O2 (or even higher). On -O0 versus -Og, I agree with Adam. I did experiment with -Og while looking at the settings for a different build system, and concluded that -Og wasn't debuggable enough. We settled on -O0 for the debug builds. |
ARM Internal Ref: IOTMORF-312 |
This should be resolved. The default profile for GCC specifices |
The bug referenced in workspace_tools/toolchains/gcc.py
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46762
has been fixed a long time ago (v 4.5.3) in ARM GCC.
Because of this, we should consider changing the optimization from -O2 to -Os
Also, latest versions of ARM GCC (4.8) have added the -Og option, which is described as "Optimize for debugging experience rather than speed or size"; this results in considerably smaller code than the current -O0 optimization when DEBUG is set.
Perhaps some way could be added to allow those of us who are debugging and using later ARM gcc versions to use -Og instead of -O0?
The text was updated successfully, but these errors were encountered: