Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang may not be vectorizing xor_8regs_* in arch/arm/lib/xor-neon.c #503

Closed
nickdesaulniers opened this issue May 31, 2019 · 6 comments
Closed
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list.

Comments

@nickdesaulniers
Copy link
Member

via this thread w/ @nathanchance and @arndb
https://groups.google.com/forum/#!topic/clang-built-linux/FOEWAfxhIAc

It seems like -Rpass-missed='.*' is suggesting that some code expected to be vectorized is not being vectorized. We should investigate (current warnings say the transform is possible but not worthwhile).

forked from #496

@nickdesaulniers nickdesaulniers added [BUG] Untriaged Something isn't working question Question asked by issue author. low priority This bug is not critical and not a priority [ARCH] arm32 This bug impacts ARCH=arm labels May 31, 2019
@nathanchance
Copy link
Member

nathanchance commented Jun 1, 2019

LLVM bug that @kbeyls commented on as well: https://llvm.org/pr40976

@nickdesaulniers nickdesaulniers added [BUG] llvm A bug that should be fixed in upstream LLVM and removed [BUG] Untriaged Something isn't working question Question asked by issue author. labels Jun 2, 2019
nathanchance pushed a commit that referenced this issue Apr 16, 2020
In the macsec_changelink(), "struct macsec_tx_sa tx_sc" is used to
store "macsec_secy.tx_sc".
But, the struct type of tx_sc is macsec_tx_sc, not macsec_tx_sa.
So, the macsec_tx_sc should be used instead.

Test commands:
    ip link add dummy0 type dummy
    ip link add macsec0 link dummy0 type macsec
    ip link set macsec0 type macsec encrypt off

Splat looks like:
[61119.963483][ T9335] ==================================================================
[61119.964709][ T9335] BUG: KASAN: slab-out-of-bounds in macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.965787][ T9335] Read of size 160 at addr ffff888020d69c68 by task ip/9335
[61119.966699][ T9335]
[61119.966979][ T9335] CPU: 0 PID: 9335 Comm: ip Not tainted 5.6.0+ #503
[61119.967791][ T9335] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[61119.968914][ T9335] Call Trace:
[61119.969324][ T9335]  dump_stack+0x96/0xdb
[61119.969809][ T9335]  ? macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.970554][ T9335]  print_address_description.constprop.5+0x1be/0x360
[61119.971294][ T9335]  ? macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.971973][ T9335]  ? macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.972703][ T9335]  __kasan_report+0x12a/0x170
[61119.973323][ T9335]  ? macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.973942][ T9335]  kasan_report+0xe/0x20
[61119.974397][ T9335]  check_memory_region+0x149/0x1a0
[61119.974866][ T9335]  memcpy+0x1f/0x50
[61119.975209][ T9335]  macsec_changelink.part.34+0xb6/0x200 [macsec]
[61119.975825][ T9335]  ? macsec_get_stats64+0x3e0/0x3e0 [macsec]
[61119.976451][ T9335]  ? kernel_text_address+0x111/0x120
[61119.976990][ T9335]  ? pskb_expand_head+0x25f/0xe10
[61119.977503][ T9335]  ? stack_trace_save+0x82/0xb0
[61119.977986][ T9335]  ? memset+0x1f/0x40
[61119.978397][ T9335]  ? __nla_validate_parse+0x98/0x1ab0
[61119.978936][ T9335]  ? macsec_alloc_tfm+0x90/0x90 [macsec]
[61119.979511][ T9335]  ? __kasan_slab_free+0x111/0x150
[61119.980021][ T9335]  ? kfree+0xce/0x2f0
[61119.980700][ T9335]  ? netlink_trim+0x196/0x1f0
[61119.981420][ T9335]  ? nla_memcpy+0x90/0x90
[61119.982036][ T9335]  ? register_lock_class+0x19e0/0x19e0
[61119.982776][ T9335]  ? memcpy+0x34/0x50
[61119.983327][ T9335]  __rtnl_newlink+0x922/0x1270
[ ... ]

Fixes: 3cf3227 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@nickdesaulniers nickdesaulniers added the Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. label Nov 7, 2020
@nickdesaulniers
Copy link
Member Author

Some notes from #496 and the upstream discussion. If we use #pragma clang loop vectorize(enable), we can overrule the cost model to provide a loop versioned on non-overlapping arguments that is vectorized. If we instead or in addition mark the parameters as __restrict qualified, then we get vectorized only loops, though we'd need to check that callers weren't passing overlapping parameters.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Nov 13, 2020
Drop warning because kernel now requires GCC >= v4.9 after
commit 6ec4476 ("Raise gcc version requirement to 4.9")
and clarify that -ftree-vectorize now always needs enabling
for GCC by directly testing the presence of CONFIG_CC_IS_GCC.

Another reason to remove the warning is that Clang exposes
itself as GCC < 4.6 so it triggers the warning about GCC
which doesn't make much sense and risks misleading users.

As a side-note remark, -fttree-vectorize is on by default in
Clang, but it currently does not work (see linked issues).

Link: ClangBuiltLinux#496
Link: ClangBuiltLinux#503
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 18, 2021
Drop warning because kernel now requires GCC >= v4.9 after
commit 6ec4476 ("Raise gcc version requirement to 4.9")
and clarify that -ftree-vectorize now always needs enabling
for GCC by directly testing the presence of CONFIG_CC_IS_GCC.

Another reason to remove the warning is that Clang exposes
itself as GCC < 4.6 so it triggers the warning about GCC
which doesn't make much sense and risks misleading users.

As a side-note remark, -fttree-vectorize is on by default in
Clang, but it currently does not work (see linked issues).

Link: ClangBuiltLinux#496
Link: ClangBuiltLinux#503
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
@nickdesaulniers
Copy link
Member Author

v4 was resent. I'm still not yet convinced there is a bug here.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 21, 2021
Drop warning because kernel now requires GCC >= v4.9 after
commit 6ec4476 ("Raise gcc version requirement to 4.9")
and clarify that -ftree-vectorize now always needs enabling
for GCC by directly testing the presence of CONFIG_CC_IS_GCC.

Another reason to remove the warning is that Clang exposes
itself as GCC < 4.6 so it triggers the warning about GCC
which doesn't make much sense and misleads Clang users by
telling them to update GCC.

Because Clang is now supported by the kernel print a clear
Clang-specific warning.

Link: ClangBuiltLinux#496
Link: ClangBuiltLinux#503
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
@nathanchance
Copy link
Member

@nathanchance nathanchance added [PATCH] Submitted A patch has been submitted for review and removed low priority This bug is not critical and not a priority labels Jan 27, 2022
fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 5, 2022
The ARM version of the accelerated XOR routines are simply the 8-way C
routines passed through the auto-vectorizer with SIMD codegen enabled.
This used to require GCC version 4.6 at least, but given that 5.1 is now
the baseline, this check is no longer necessary, and actually
misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
well, but makes no attempt at doing this in a way that conveys feature
parity with a certain version of GCC (which would not be a great idea in
the first place).

So let's drop the version check, and make the auto-vectorize pragma
(which is based on a GCC-specific command line option) GCC-only. Since
Clang performs SIMD auto-vectorization by default at -O2, no pragma is
necessary here.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: ClangBuiltLinux#496
Link: ClangBuiltLinux#503
fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 5, 2022
The ARM version of the accelerated XOR routines are simply the 8-way C
routines passed through the auto-vectorizer with SIMD codegen enabled.
This used to require GCC version 4.6 at least, but given that 5.1 is now
the baseline, this check is no longer necessary, and actually
misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
well, but makes no attempt at doing this in a way that conveys feature
parity with a certain version of GCC (which would not be a great idea in
the first place).

So let's drop the version check, and make the auto-vectorize pragma
(which is based on a GCC-specific command line option) GCC-only. Since
Clang performs SIMD auto-vectorization by default at -O2, no pragma is
necessary here.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: ClangBuiltLinux#496
Link: ClangBuiltLinux#503
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Feb 11, 2022
The ARM version of the accelerated XOR routines are simply the 8-way C
routines passed through the auto-vectorizer with SIMD codegen enabled.
This used to require GCC version 4.6 at least, but given that 5.1 is now
the baseline, this check is no longer necessary, and actually
misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
well, but makes no attempt at doing this in a way that conveys feature
parity with a certain version of GCC (which would not be a great idea in
the first place).

So let's drop the version check, and make the auto-vectorize pragma
(which is based on a GCC-specific command line option) GCC-only. Since
Clang performs SIMD auto-vectorization by default at -O2, no pragma is
necessary here.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: ClangBuiltLinux/linux#496
Link: ClangBuiltLinux/linux#503
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Feb 11, 2022
The ARM version of the accelerated XOR routines are simply the 8-way C
routines passed through the auto-vectorizer with SIMD codegen enabled.
This used to require GCC version 4.6 at least, but given that 5.1 is now
the baseline, this check is no longer necessary, and actually
misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
well, but makes no attempt at doing this in a way that conveys feature
parity with a certain version of GCC (which would not be a great idea in
the first place).

So let's drop the version check, and make the auto-vectorize pragma
(which is based on a GCC-specific command line option) GCC-only. Since
Clang performs SIMD auto-vectorization by default at -O2, no pragma is
necessary here.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: ClangBuiltLinux/linux#496
Link: ClangBuiltLinux/linux#503
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
@nathanchance
Copy link
Member

nathanchance commented Feb 11, 2022

@nathanchance nathanchance added [PATCH] Accepted A submitted patch has been accepted upstream and removed [PATCH] Submitted A patch has been submitted for review labels Feb 11, 2022
@nathanchance
Copy link
Member

Merged into mainline: https://git.kernel.org/linus/297565aa22cfa80ab0f88c3569693aea0b6afb6d

@nathanchance nathanchance removed the [PATCH] Accepted A submitted patch has been accepted upstream label Mar 23, 2022
@nathanchance nathanchance added the [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 label Mar 23, 2022
danascape pushed a commit to danascape/kernel-msm-5.15 that referenced this issue Nov 24, 2022
The ARM version of the accelerated XOR routines are simply the 8-way C
routines passed through the auto-vectorizer with SIMD codegen enabled.
This used to require GCC version 4.6 at least, but given that 5.1 is now
the baseline, this check is no longer necessary, and actually
misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
well, but makes no attempt at doing this in a way that conveys feature
parity with a certain version of GCC (which would not be a great idea in
the first place).

So let's drop the version check, and make the auto-vectorize pragma
(which is based on a GCC-specific command line option) GCC-only. Since
Clang performs SIMD auto-vectorization by default at -O2, no pragma is
necessary here.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: ClangBuiltLinux/linux#496
Link: ClangBuiltLinux/linux#503
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit a69cb445f7d129abf7c50d48c8a8eca7c8d5df15)
Fixes: 0195659 ("ARM: crypto: add NEON accelerated XOR implementation")
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie4e498eaee4ddb79352fde5fb822b03677348754
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LINUX] 5.18 This bug was fixed in Linux 5.18 Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list.
Projects
None yet
Development

No branches or pull requests

2 participants