Disable restrict keyword in GCC/Clang #586

sipa · 2019-01-18T18:46:47Z

This fixes #585.

TheBlueMatt · 2019-01-18T18:49:45Z

You can water down the comment a bit, I think. Its rather unlikely to affect secp, but its also unlikely to speed anything up, so just not worth it. utACK either way.

real-or-random · 2019-01-18T18:50:38Z

utACK

sipa · 2019-01-18T18:57:07Z

I very rudimentary benchmark shows it indeed doesn't affect verification speed.

I'll do a more thorough benchmark later.

jgarzik · 2019-01-18T20:11:29Z

utACK

sipa · 2019-02-04T22:50:41Z

I've done some more benchmarking, and can't observe a slowdown in verification (if there is one, it's less than 0.2%).

sipa · 2019-02-05T22:42:32Z

Oops, no. That was with assembly code enabled, which obviously takes the restrictness into account, but isn't affected by the restrict keyword.

When I disable asm, verification goes from 277000 cycles to 282000 cycles on a Threadripper 2950X system, a 1.8% slowdown.

real-or-random · 2019-02-06T15:33:49Z

1.8% sounds reasonable enough to me for the added safety.

gmaxwell · 2019-02-15T00:44:54Z

Meh: 1.8% is a rather large slowdown to take for a "maybe the compiler might screw up in ways that the tests don't find". Assuming the 1.8% number applies on ARM, that change is essentially 1-2 hours of additional sync time on a non-assumevalid initial sync on a small bitcoin node.

Here is my reasoning: There are a significant number of optimizations in the codebase with 1% - 3% level improvements which could be excluded on the basis of "maybe they get miscompiled" or "maybe they have a bug".

If the tests (including run time self tests) aren't adequate to catch all plausible miscompilation, we should improve them. This has an added benefit of also protecting the code against unforseen risks, something that avoiding long standing language features does not accomplish.

Or, if 1% vs the level of risk that using a less common compiler annotation implies is really the trade-off we want to take we should probably remove all the distinct normalization cases (and replace all with the a single constant time full normalization), the R comparison in projective space optimization, x86_64 assembly (which is now not much faster than what the compiler can produce), effective affine, etc. Each of which has 1-2% performance impacts and arguably has a worse risk/performance trade-off than the use of restrict. Similarly, under that logic we should probably remove endomorphism and gmp support from the codebase (zero performance gain in the normal config, non-zero risk of introducing bugs in the normal configuration).

I don't actually think the above is a good idea: I think our principle before has been to include essentially any optimization-- even relatively complex ones-- which would make a measurable difference in a real system subject to the constraint that we could test it well enough to be reasonably confident that its safe. But since there is essentially no competing library for this curve which I'm aware of which is anywhere near as fast, I think a case could be made for simplifying the codebase until it is merely significantly faster than the next less-well-tested alternative. To the extent that safety is our primary goal we only need to be fast enough to ensure that users who mostly care about speed select this code over a less safe alternative. [also, the audience of embedded devices that frequently adopt obviously less safe code might be swayed by a smaller binary sizes...]

But whatever we do, we should try to be intentional and consistent in our strategy so that we get the full benefit of it.

I also think that a case could be made for doing this if it were shown that the slowdown didn't happen on ARM (since that's where a performance loss is especially critical), or as a temporary measure while tests were improved.

real-or-random · 2019-02-15T19:06:18Z

Each of which has 1-2% performance impacts and arguably has a worse risk/performance trade-off than the use of restrict.

I don't think that's so clear. This library offers exceptional quality due to

careful programming,
careful review, and
good tests.

Unfortunately, the first two bullets don't help in case of miscompilation. Tests do help of course but tests rarely show the absence of bugs. Another concern is that compilers behave differently on different architectures, with different versions, different flags, etc, and it's not clear if people run the tests every time after they compile the code.

And sure, other code can be miscompiled, too. But without a specific bug in mind, there is no reason why a specific implementation (say projective space stuff) should be more prone to miscompilation than other, except that it uses more lines of code.

In our case, we know that these specific compiler bugs just won't happen if we disable restrict. The scope of the compiler bug is quite narrow and we know what to do to prevent it, and we know the slowdown.

If gcc released a new flag -finline-experimental-1.8 with the description "speeds up your code by 1.8 % but we've seen cases of miscompilation", would you enable it in secp if the tests pass? I wouldn't.

gmaxwell · 2019-02-20T09:56:33Z

Another concern is that compilers behave differently on different architectures, with different versions, different flags, etc, and it's not clear if people run the tests every time after they compile the code.

This is not a justification for this change. It is a justification for runtime self-tests. I agree that the concern that the tests aren't getting run where they need to be run is a good one, but the answer to it isn't not to make a particular tradeoff one way for one thing, the answer is to fix the underlying.

Again, I still hold that this particular decision doesn't carry unique risks compared to a dozen other choices, nor do I think a case has been made that it has a worse trade-off.

And sure, other code can be miscompiled, too. But without a specific bug in mind, there is no reason why a specific implementation (say projective space stuff) should be more prone to miscompilation than other,

If you really want, I can go and justify the other examples in light of prior miscomplation bugs specifically, but I think it would be a waste of time. Besides, "except that it uses more lines of code" is a perfectly fine justification on its own. (not more lines precisely but more total machine code with more total corner cases)

If gcc released a new flag -finline-experimental-1.8 with the description "speeds up your code by 1.8 % but we've seen cases of miscompilation

The "new" kind of confuses the issue-- anything new also carries a higher degree of unknown unknowns--, but we certainly compile with optimizations and there is a long and illustrious history of compiler optimizations and miscompilation. The benefit is (presumably!) more than 1.8% but the risk, historically, is that there have been a tremendous number of miscomplations which were optimization linked.

GCC's bug tracker is filled with scads of reports of miscompilation both current and in recent history (some even reported by some of us in the past...). But no attention is being given to that, only to this one thing that someone noticed. This seems to me to be a reactionary response ungrounded in a systematic approach to managing the tradeoffs of software quality.

gmaxwell · 2019-02-20T10:09:23Z

As an aside, the linked GCC misbehaviour only occurs with -O3 (the same applies to the similar behaviour in LLVM) ... which fits nicely with my above comment about optimization. I'd find an argument to not use O3 in general to be more compelling...

DesWurstes · 2019-02-22T17:20:17Z

src/util.h

+/* As both clang and GCC have known miscompilation bugs related to restrict, disable it for now.
+ * TODO: re-evaluate when this bugs are fixed, and enable restrict in known good compilers.
+ * See:
+ * - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609


I think the referenced bugs give the impression that restrict is forbidden for those compilers because of those bugs, while the real reason is "it works one day, it fails the other day". GCC has just, 8 hours before, fixed the bug in master. Upstream fix doesn't mean everyone is using the latest compiler, as even GCC 8.3, released 2 hours ago, doesn't have the patch.

jonasnick · 2019-02-26T18:10:46Z

Running this PR on an aarch64 (ARMv8) based ODROID C2 I get a 2.43% slowdown (452us vs 463us) in bench_verify. This is without using ASM which is not enabled by default on ARM architectures.

gmaxwell · 2019-03-07T00:03:10Z

@jonasnick Can you try benchmarking on the same device with restrict but with -O2 instead of O3?

jonasnick · 2019-03-07T17:43:50Z

O2:  min 450us / avg 450us / max 451us
O3:  min 452us / avg 452us / max 461us

All the measurements were done after increasing the ecdsa_verify benchmark "count" from 10 to 100

gmaxwell · 2019-03-09T21:27:00Z

^ If we want to be paranoid, I think Jonas' results would support using O2 instead of O3, not only does it address the restrict concern in both GCC and Clang but also a long history of more bugs at O3.

sipa · 2019-03-12T23:02:57Z

~~I don't think we ever (neither in this repo, or in the Bitcoin Core build) use -O3 by default?~~

Scratch that, the default is -O3 apparently.

gmaxwell · 2019-03-13T06:20:15Z

At one point earlier I almost responded with "this restrict issue only exists at O3, lets not worry about it" ... then though "hm. wait a minute."... O3 is also why asm vs non-asm doesn't make that big a difference on x86_64 IIRC.

real-or-random · 2020-01-07T22:59:57Z

Fwiw, this is fixed in GCC 7, 8 ,9 (>=7.4.1, >= 8.3.1, >= 9.0) according to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609.

It's not yet fixed in LLVM but patches have been proposed already.

real-or-random · 2020-03-26T19:57:00Z

As an aside, the linked GCC misbehaviour only occurs with -O3 (the same applies to the similar behaviour in LLVM) ... which fits nicely with my above comment about optimization. I'd find an argument to not use O3 in general to be more compelling...

Now that we switched to O2, I hoped that this issue had been resolved on the way. But apparently, the miscompilation of the example program from the GCC bug only disappears on GCC, where it's fixed anyway in the most recent versions. Clang/LLVM does loop unrolling even with O2, so you additionally need -fno-unroll-loops (https://godbolt.org/z/urkkWD), which seems to have some performance impact at least if x86_64 asm is disabled (gmp does not make a big difference).

./configure --with-bignum=gmp --with-asm=x86_64 CC=clang CFLAGS=-fno-unroll-loops
ecdsa_verify: min 81.3us / avg 83.4us / max 85.6us
ecdsa_verify: min 81.9us / avg 82.9us / max 84.3us

./configure --with-bignum=gmp --with-asm=x86_64 CC=clang CFLAGS= 
ecdsa_verify: min 81.4us / avg 83.7us / max 86.5us
ecdsa_verify: min 81.4us / avg 83.1us / max 87.0us

./configure --with-bignum=no --with-asm=no CC=clang CFLAGS= 
ecdsa_verify: min 97.1us / avg 98.6us / max 100us

./configure --with-bignum=no --with-asm=no CC=clang CFLAGS=-fno-unroll-loops
ecdsa_verify: min 95.9us / avg 101us / max 104us

500000 runs each.

I'm not a friend of tweaking flags too much but given that there are anyway performance differences between gcc and clang and we unrolling is disabled in gcc, I could indeed imagine adding -fno-unroll-loops if other can confirm my benchmarks.

Disable restrict keyword in GCC/Clang

f4f0e08

sipa force-pushed the 201901_norestrict branch from 8216b11 to f4f0e08 Compare January 18, 2019 18:54

DesWurstes reviewed Feb 22, 2019

View reviewed changes

real-or-random closed this Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable restrict keyword in GCC/Clang #586

Disable restrict keyword in GCC/Clang #586

sipa commented Jan 18, 2019 •

edited

Loading

TheBlueMatt commented Jan 18, 2019 •

edited

Loading

real-or-random commented Jan 18, 2019

sipa commented Jan 18, 2019

jgarzik commented Jan 18, 2019

sipa commented Feb 4, 2019

sipa commented Feb 5, 2019 •

edited

Loading

real-or-random commented Feb 6, 2019

gmaxwell commented Feb 15, 2019

real-or-random commented Feb 15, 2019

gmaxwell commented Feb 20, 2019 •

edited

Loading

gmaxwell commented Feb 20, 2019 •

edited

Loading

DesWurstes Feb 22, 2019

jonasnick commented Feb 26, 2019

gmaxwell commented Mar 7, 2019

jonasnick commented Mar 7, 2019

gmaxwell commented Mar 9, 2019

sipa commented Mar 12, 2019 •

edited

Loading

gmaxwell commented Mar 13, 2019

real-or-random commented Jan 7, 2020

real-or-random commented Mar 26, 2020

Disable restrict keyword in GCC/Clang #586

Disable restrict keyword in GCC/Clang #586

Conversation

sipa commented Jan 18, 2019 • edited Loading

TheBlueMatt commented Jan 18, 2019 • edited Loading

real-or-random commented Jan 18, 2019

sipa commented Jan 18, 2019

jgarzik commented Jan 18, 2019

sipa commented Feb 4, 2019

sipa commented Feb 5, 2019 • edited Loading

real-or-random commented Feb 6, 2019

gmaxwell commented Feb 15, 2019

real-or-random commented Feb 15, 2019

gmaxwell commented Feb 20, 2019 • edited Loading

gmaxwell commented Feb 20, 2019 • edited Loading

DesWurstes Feb 22, 2019

Choose a reason for hiding this comment

jonasnick commented Feb 26, 2019

gmaxwell commented Mar 7, 2019

jonasnick commented Mar 7, 2019

gmaxwell commented Mar 9, 2019

sipa commented Mar 12, 2019 • edited Loading

gmaxwell commented Mar 13, 2019

real-or-random commented Jan 7, 2020

real-or-random commented Mar 26, 2020

sipa commented Jan 18, 2019 •

edited

Loading

TheBlueMatt commented Jan 18, 2019 •

edited

Loading

sipa commented Feb 5, 2019 •

edited

Loading

gmaxwell commented Feb 20, 2019 •

edited

Loading

gmaxwell commented Feb 20, 2019 •

edited

Loading

sipa commented Mar 12, 2019 •

edited

Loading