-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable restrict keyword in GCC/Clang #586
Conversation
You can water down the comment a bit, I think. Its rather unlikely to affect secp, but its also unlikely to speed anything up, so just not worth it. utACK either way. |
utACK |
8216b11
to
f4f0e08
Compare
I very rudimentary benchmark shows it indeed doesn't affect verification speed. I'll do a more thorough benchmark later. |
utACK |
I've done some more benchmarking, and can't observe a slowdown in verification (if there is one, it's less than 0.2%). |
Oops, no. That was with assembly code enabled, which obviously takes the restrictness into account, but isn't affected by the restrict keyword. When I disable asm, verification goes from 277000 cycles to 282000 cycles on a Threadripper 2950X system, a 1.8% slowdown. |
1.8% sounds reasonable enough to me for the added safety. |
Meh: 1.8% is a rather large slowdown to take for a "maybe the compiler might screw up in ways that the tests don't find". Assuming the 1.8% number applies on ARM, that change is essentially 1-2 hours of additional sync time on a non-assumevalid initial sync on a small bitcoin node. Here is my reasoning: There are a significant number of optimizations in the codebase with 1% - 3% level improvements which could be excluded on the basis of "maybe they get miscompiled" or "maybe they have a bug". If the tests (including run time self tests) aren't adequate to catch all plausible miscompilation, we should improve them. This has an added benefit of also protecting the code against unforseen risks, something that avoiding long standing language features does not accomplish. Or, if 1% vs the level of risk that using a less common compiler annotation implies is really the trade-off we want to take we should probably remove all the distinct normalization cases (and replace all with the a single constant time full normalization), the R comparison in projective space optimization, x86_64 assembly (which is now not much faster than what the compiler can produce), effective affine, etc. Each of which has 1-2% performance impacts and arguably has a worse risk/performance trade-off than the use of restrict. Similarly, under that logic we should probably remove endomorphism and gmp support from the codebase (zero performance gain in the normal config, non-zero risk of introducing bugs in the normal configuration). I don't actually think the above is a good idea: I think our principle before has been to include essentially any optimization-- even relatively complex ones-- which would make a measurable difference in a real system subject to the constraint that we could test it well enough to be reasonably confident that its safe. But since there is essentially no competing library for this curve which I'm aware of which is anywhere near as fast, I think a case could be made for simplifying the codebase until it is merely significantly faster than the next less-well-tested alternative. To the extent that safety is our primary goal we only need to be fast enough to ensure that users who mostly care about speed select this code over a less safe alternative. [also, the audience of embedded devices that frequently adopt obviously less safe code might be swayed by a smaller binary sizes...] But whatever we do, we should try to be intentional and consistent in our strategy so that we get the full benefit of it. I also think that a case could be made for doing this if it were shown that the slowdown didn't happen on ARM (since that's where a performance loss is especially critical), or as a temporary measure while tests were improved. |
I don't think that's so clear. This library offers exceptional quality due to
Unfortunately, the first two bullets don't help in case of miscompilation. Tests do help of course but tests rarely show the absence of bugs. Another concern is that compilers behave differently on different architectures, with different versions, different flags, etc, and it's not clear if people run the tests every time after they compile the code. And sure, other code can be miscompiled, too. But without a specific bug in mind, there is no reason why a specific implementation (say projective space stuff) should be more prone to miscompilation than other, except that it uses more lines of code. In our case, we know that these specific compiler bugs just won't happen if we disable restrict. The scope of the compiler bug is quite narrow and we know what to do to prevent it, and we know the slowdown. If gcc released a new flag |
This is not a justification for this change. It is a justification for runtime self-tests. I agree that the concern that the tests aren't getting run where they need to be run is a good one, but the answer to it isn't not to make a particular tradeoff one way for one thing, the answer is to fix the underlying. Again, I still hold that this particular decision doesn't carry unique risks compared to a dozen other choices, nor do I think a case has been made that it has a worse trade-off.
If you really want, I can go and justify the other examples in light of prior miscomplation bugs specifically, but I think it would be a waste of time. Besides, "except that it uses more lines of code" is a perfectly fine justification on its own. (not more lines precisely but more total machine code with more total corner cases)
The "new" kind of confuses the issue-- anything new also carries a higher degree of unknown unknowns--, but we certainly compile with optimizations and there is a long and illustrious history of compiler optimizations and miscompilation. The benefit is (presumably!) more than 1.8% but the risk, historically, is that there have been a tremendous number of miscomplations which were optimization linked. GCC's bug tracker is filled with scads of reports of miscompilation both current and in recent history (some even reported by some of us in the past...). But no attention is being given to that, only to this one thing that someone noticed. This seems to me to be a reactionary response ungrounded in a systematic approach to managing the tradeoffs of software quality. |
As an aside, the linked GCC misbehaviour only occurs with -O3 (the same applies to the similar behaviour in LLVM) ... which fits nicely with my above comment about optimization. I'd find an argument to not use O3 in general to be more compelling... |
/* As both clang and GCC have known miscompilation bugs related to restrict, disable it for now. | ||
* TODO: re-evaluate when this bugs are fixed, and enable restrict in known good compilers. | ||
* See: | ||
* - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the referenced bugs give the impression that restrict is forbidden for those compilers because of those bugs, while the real reason is "it works one day, it fails the other day". GCC has just, 8 hours before, fixed the bug in master. Upstream fix doesn't mean everyone is using the latest compiler, as even GCC 8.3, released 2 hours ago, doesn't have the patch.
Running this PR on an aarch64 (ARMv8) based ODROID C2 I get a 2.43% slowdown (452us vs 463us) in |
@jonasnick Can you try benchmarking on the same device with restrict but with -O2 instead of O3? |
All the measurements were done after increasing the |
^ If we want to be paranoid, I think Jonas' results would support using O2 instead of O3, not only does it address the restrict concern in both GCC and Clang but also a long history of more bugs at O3. |
Scratch that, the default is -O3 apparently. |
At one point earlier I almost responded with "this restrict issue only exists at O3, lets not worry about it" ... then though "hm. wait a minute."... O3 is also why asm vs non-asm doesn't make that big a difference on x86_64 IIRC. |
Fwiw, this is fixed in GCC 7, 8 ,9 (>=7.4.1, >= 8.3.1, >= 9.0) according to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609. It's not yet fixed in LLVM but patches have been proposed already. |
Now that we switched to O2, I hoped that this issue had been resolved on the way. But apparently, the miscompilation of the example program from the GCC bug only disappears on GCC, where it's fixed anyway in the most recent versions. Clang/LLVM does loop unrolling even with O2, so you additionally need
500000 runs each. I'm not a friend of tweaking flags too much but given that there are anyway performance differences between gcc and clang and we unrolling is disabled in gcc, I could indeed imagine adding |
This fixes #585.