Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thumb missing .w InstAlias' for ldr and str #1296

Closed
nickdesaulniers opened this issue Feb 8, 2021 · 13 comments
Closed

thumb missing .w InstAlias' for ldr and str #1296

nickdesaulniers opened this issue Feb 8, 2021 · 13 comments
Assignees
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LLVM] 13 This bug was fixed in LLVM 13.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. [TOOL] integrated-as The issue is relevant to LLVM integrated assembler

Comments

@nickdesaulniers
Copy link
Member

arch/arm/lib/copy_template.S

<instantiation>:1:23: error: too many operands for instruction
9999: ldr.w r3, [r1], #4; .pushsection __ex_table,"a"; .align 3; .long 9999b,20f; .popsection
                      ^
arch/arm/lib/copy_template.S:131:3: note: while in macro instantiation
  ldr1w r1, r3, abort=20f
  ^
<instantiation>:1:17: error: too many operands for instruction
str.w r3, [r0], #4
                ^
arch/arm/lib/copy_template.S:149:3: note: while in macro instantiation
  str1w r0, r3, abort=20f
  ^

(forked from #1270, cc @arndb )

@nickdesaulniers nickdesaulniers added [TOOL] integrated-as The issue is relevant to LLVM integrated assembler [ARCH] arm32 This bug impacts ARCH=arm labels Feb 8, 2021
@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Feb 9, 2021

https://llvm.org/pr49118

@nickdesaulniers nickdesaulniers added the Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. label Feb 9, 2021
@nickdesaulniers
Copy link
Member Author

I suspect this is arm code mixed in or built as -Wa,-mthumb.

@nickdesaulniers
Copy link
Member Author

yeah, that's it.

diff --git a/arch/arm/lib/copy_template.S b/arch/arm/lib/copy_template.S
index 810a805d36dc..da82b45f7a6c 100644
--- a/arch/arm/lib/copy_template.S
+++ b/arch/arm/lib/copy_template.S
@@ -69,6 +69,7 @@
  *     than one 32bit instruction in Thumb-2)
  */
 
+.arm
 
        UNWIND( .fnstart                        )
                enter   r4, lr
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 6fecc12a1f51..45bf44a2b47f 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -12,6 +12,7 @@
 #include <asm/unwind.h>
 
                .text
+               .arm
 
 /*
  * Prototype: void *memmove(void *dest, const void *src, size_t n);

@arndb do we need to limit the .arm directives to only the arm instructions (or push/pop that we're using asm instructions)? It looks like GAS permits mixing ARM and Thumb assembly without directives, while Clang does not.

@arndb
Copy link

arndb commented Feb 9, 2021

I think the .arm and .thumb directives have to be used before a the symbol is defined, either at file scope or by passing the -marm command line parameter. Doing it inside of copy_template.S (which is included inside of functions from the other files) would likely lead to invalid instruction encodings.

If you don't think ias can be fixed to recognize these instructions, we could work around it locally like

--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -15,7 +15,12 @@
 #define STR1W_SHIFT    0
 
        .macro ldr1w ptr reg abort
-       W(ldr) \reg, [\ptr], #4
+#if defined(CONFIG_LLVM_IAS) || !defined(CONFIG_THUMB2_KERNEL)
+       /* clang ias rejects the .w suffix on instructions that are already wide */
+       ldr \reg, [\ptr], #4
+#else
+       ldr.w \reg, [\ptr], #4
+#endif
        .endm
 
        .macro ldr4w ptr reg1 reg2 reg3 reg4 abort
    Arnd

@nickdesaulniers
Copy link
Member Author

@nickdesaulniers nickdesaulniers added [PATCH] Submitted A patch has been submitted for review [BUG] llvm A bug that should be fixed in upstream LLVM labels Feb 12, 2021
@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Feb 12, 2021

I need to reread the ARM ARM more closely, but if LLVM folks agree that T4 should not have .w suffixes, we might be able to do:

diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..197cbb6005aa 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -60,7 +60,7 @@
 #define LDR1W_SHIFT    0
 
        .macro ldr1w ptr reg abort
-       USERL(\abort, W(ldr) \reg, [\ptr], #4)
+       USERL(\abort, ldr \reg, [\ptr], #4)
        .endm
 
        .macro ldr4w ptr reg1 reg2 reg3 reg4 abort
@@ -80,7 +80,7 @@
 #define STR1W_SHIFT    0
 
        .macro str1w ptr reg abort
-       W(str) \reg, [\ptr], #4
+       str \reg, [\ptr], #4
        .endm
 
        .macro str8w ptr reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 abort
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index ebfe4cb3d912..ba58f84c3884 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -34,7 +34,7 @@
 #define LDR1W_SHIFT    0
 
        .macro ldr1w ptr reg abort
-       W(ldr) \reg, [\ptr], #4
+       ldr \reg, [\ptr], #4
        .endm
 
        .macro ldr4w ptr reg1 reg2 reg3 reg4 abort
@@ -77,7 +77,7 @@
 #define STR1W_SHIFT    0

        .macro ldr1w ptr reg abort
-       W(ldr) \reg, [\ptr], #4
+       ldr \reg, [\ptr], #4
        .endm
 
        .macro ldr4w ptr reg1 reg2 reg3 reg4 abort
@@ -31,7 +31,7 @@
        .endm
 
        .macro str1w ptr reg abort
-       W(str) \reg, [\ptr], #4
+       str \reg, [\ptr], #4
        .endm
 
        .macro str8w ptr reg1 reg2 reg3 reg4 reg5 reg6 reg7 reg8 abort
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 6fecc12a1f51..35c5c06b7588 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -84,24 +84,24 @@ WEAK(memmove)
                addne   pc, pc, ip              @ C is always clear here
                b       7f
 6:             W(nop)
-               W(ldr)  r3, [r1, #-4]!
-               W(ldr)  r4, [r1, #-4]!
-               W(ldr)  r5, [r1, #-4]!
-               W(ldr)  r6, [r1, #-4]!
-               W(ldr)  r7, [r1, #-4]!
-               W(ldr)  r8, [r1, #-4]!
-               W(ldr)  lr, [r1, #-4]!
+               ldr     r3, [r1, #-4]!
+               ldr     r4, [r1, #-4]!
+               ldr     r5, [r1, #-4]!
+               ldr     r6, [r1, #-4]!
+               ldr     r7, [r1, #-4]!
+               ldr     r8, [r1, #-4]!
+               ldr     lr, [r1, #-4]!
 
                add     pc, pc, ip
                nop
                W(nop)
-               W(str)  r3, [r0, #-4]!
-               W(str)  r4, [r0, #-4]!
-               W(str)  r5, [r0, #-4]!
-               W(str)  r6, [r0, #-4]!
-               W(str)  r7, [r0, #-4]!
-               W(str)  r8, [r0, #-4]!
-               W(str)  lr, [r0, #-4]!
+               str     r3, [r0, #-4]!
+               str     r4, [r0, #-4]!
+               str     r5, [r0, #-4]!
+               str     r6, [r0, #-4]!
+               str     r7, [r0, #-4]!
+               str     r8, [r0, #-4]!
+               str     lr, [r0, #-4]!
 
        CALGN(  bcs     2b                      )

@arndb
Copy link

arndb commented Feb 13, 2021

I need to reread the ARM ARM more closely, but if LLVM folks agree that T4 should not have .w suffixes, we might be able to do:

diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..197cbb6005aa 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -60,7 +60,7 @@
 #define LDR1W_SHIFT    0
 
        .macro ldr1w ptr reg abort
-       USERL(\abort, W(ldr) \reg, [\ptr], #4)
+       USERL(\abort, ldr \reg, [\ptr], #4)
        .endm

@ardbiesheuvel said that wouldn't work when I suggested the same patch earlier:

https://lore.kernel.org/lkml/CAMj1kXEx-mUCgX5F6xg8-6jKtpqQ=sRosmo4u-0jhW5zu9A-fw@mail.gmail.com/

@ardbiesheuvel
Copy link

I need to reread the ARM ARM more closely, but if LLVM folks agree that T4 should not have .w suffixes, we might be able to do:

diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..197cbb6005aa 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -60,7 +60,7 @@
 #define LDR1W_SHIFT    0
 
        .macro ldr1w ptr reg abort
-       USERL(\abort, W(ldr) \reg, [\ptr], #4)
+       USERL(\abort, ldr \reg, [\ptr], #4)
        .endm

@ardbiesheuvel said that wouldn't work when I suggested the same patch earlier:

https://lore.kernel.org/lkml/CAMj1kXEx-mUCgX5F6xg8-6jKtpqQ=sRosmo4u-0jhW5zu9A-fw@mail.gmail.com/

In this particular case, I agree that the .w suffix can be dropped. T4 is a wide encoding and it is the only one that has a post-indexed variant, so there is no ambiguity here.

However, the post-indexed variant of T4 is defined as

LDR{<c>}{<q>} <Rt>, [<Rn>], #{+/-}<imm>

where q represents the optional suffix, and so Clang is violating the architecture here by rejecting ldr.w as the mnemonic. Can we please fix this at a more fundamental level as well?

@nickdesaulniers nickdesaulniers self-assigned this Feb 22, 2021
nickdesaulniers added a commit to llvm/llvm-project that referenced this issue Feb 23, 2021
The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
instructions with immediate operands and wide encodings.

These are the T4 variants of the follow sections from the Arm ARM.
F5.1.72 LDR (immediate)
F5.1.229 STR (immediate)

I wasn't able to represent these simple aliases using t2InstAlias due to
the Constraints on the non-suffixed existing instructions, which results
in some manual parsing logic needing to be added.

F1.2 Standard assembler syntax fields
describes the use of the .w (wide) vs .n (narrow) encoding suffix.

Link: https://bugs.llvm.org/show_bug.cgi?id=49118
Link: ClangBuiltLinux/linux#1296
Reported-by: Stefan Agner <stefan@agner.ch>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96632
@nickdesaulniers
Copy link
Member Author

@nickdesaulniers nickdesaulniers added [FIXED][LLVM] 13 This bug was fixed in LLVM 13.x and removed [PATCH] Submitted A patch has been submitted for review labels Feb 23, 2021
@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Feb 23, 2021

Can we please fix this at a more fundamental level as well?

I would like to, adding individual aliases is tedious. I'm trying to think through valid logic for this, since not all instructions have wide AND narrow encodings (some have 1 or the other, or both, or multiple encodings).

pseudocode:

if !tryToMatchInstructionAsWritten() && endsWithNORW():
  suffix = stripOffNORW()
  instruction = tryToMatchInstructionWithoutSuffix()
  if instruction is 32b && suffix == 'w':
    return instruction
  if instruction is 16b && suffix == 'n':
    return instruction
  return error

Though I'm not sure that's how the assembler performs its table based parsing, or that I've seen enough of the instruction set to fully grasp all irregularities that might be related.

@ardbiesheuvel
Copy link

ardbiesheuvel commented Feb 23, 2021 via email

@nickdesaulniers
Copy link
Member Author

F1.2 Standard assembler syntax fields:
If neither .W nor .N is specified, the assembler can select either 16-bit or 32-bit encodings. If both
are available, it must select a 16-bit encoding.

So I guess I'd have to adjust my psuedocode.

The ARM ARM is clear about the .w prefix being permitted on
all A32 (i.e., non-Thumb) encodings as well.

Yes, from the same section:

When assembling to the A32 instruction set, the .N qualifier produces an assembler error and the .W
qualifier has no effect.

And I'd bet that's useful for assembler built both as ARM and THUMB2 under different build configs.

@ardbiesheuvel
Copy link

ardbiesheuvel commented Feb 24, 2021 via email

morehouse pushed a commit to morehouse/llvm-project that referenced this issue Mar 4, 2021
The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
instructions with immediate operands and wide encodings.

These are the T4 variants of the follow sections from the Arm ARM.
F5.1.72 LDR (immediate)
F5.1.229 STR (immediate)

I wasn't able to represent these simple aliases using t2InstAlias due to
the Constraints on the non-suffixed existing instructions, which results
in some manual parsing logic needing to be added.

F1.2 Standard assembler syntax fields
describes the use of the .w (wide) vs .n (narrow) encoding suffix.

Link: https://bugs.llvm.org/show_bug.cgi?id=49118
Link: ClangBuiltLinux/linux#1296
Reported-by: Stefan Agner <stefan@agner.ch>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96632
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
instructions with immediate operands and wide encodings.

These are the T4 variants of the follow sections from the Arm ARM.
F5.1.72 LDR (immediate)
F5.1.229 STR (immediate)

I wasn't able to represent these simple aliases using t2InstAlias due to
the Constraints on the non-suffixed existing instructions, which results
in some manual parsing logic needing to be added.

F1.2 Standard assembler syntax fields
describes the use of the .w (wide) vs .n (narrow) encoding suffix.

Link: https://bugs.llvm.org/show_bug.cgi?id=49118
Link: ClangBuiltLinux/linux#1296
Reported-by: Stefan Agner <stefan@agner.ch>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96632
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] arm32 This bug impacts ARCH=arm [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LLVM] 13 This bug was fixed in LLVM 13.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. [TOOL] integrated-as The issue is relevant to LLVM integrated assembler
Projects
None yet
Development

No branches or pull requests

3 participants