Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terribly suboptimal code generated for Linux kernel arch/powerpc/lib/xor_vmx.c #1713

Closed
chleroy opened this issue Sep 21, 2022 · 8 comments
Closed
Labels
[ARCH] powerpc This bug impacts ARCH=powerpc [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 6.9 This bug was fixed in Linux 6.9

Comments

@chleroy
Copy link

chleroy commented Sep 21, 2022

Generated code with GCC, seems properly optimised, doesn't use stack.
With CLANG huge amount of stack use.

Dumping only __xor_altivec_2() but same problem with the 4 functions.

GCC 12.2:

arch/powerpc/lib/xor_vmx.o:     file format elf32-powerpc


Disassembly of section .text:

00000000 <__xor_altivec_2>:
   0:    54 69 d1 be     rlwinm  r9,r3,26,6,31
   4:    38 e4 00 10     addi    r7,r4,16
   8:    7d 29 03 a6     mtctr   r9
   c:    39 04 00 20     addi    r8,r4,32
  10:    39 44 00 30     addi    r10,r4,48
  14:    39 65 00 10     addi    r11,r5,16
  18:    38 65 00 20     addi    r3,r5,32
  1c:    38 c5 00 30     addi    r6,r5,48
  20:    39 20 00 00     li      r9,0
  24:    7c 24 48 ce     lvx     v1,r4,r9
  28:    7d a7 48 ce     lvx     v13,r7,r9
  2c:    7c 05 48 ce     lvx     v0,r5,r9
  30:    11 80 0c c4     vxor    v12,v0,v1
  34:    7c 28 48 ce     lvx     v1,r8,r9
  38:    7c 0a 48 ce     lvx     v0,r10,r9
  3c:    7d 84 49 ce     stvx    v12,r4,r9
  40:    7d 8b 48 ce     lvx     v12,r11,r9
  44:    11 ad 64 c4     vxor    v13,v13,v12
  48:    7d a7 49 ce     stvx    v13,r7,r9
  4c:    7d a3 48 ce     lvx     v13,r3,r9
  50:    10 21 6c c4     vxor    v1,v1,v13
  54:    7c 28 49 ce     stvx    v1,r8,r9
  58:    7c 26 48 ce     lvx     v1,r6,r9
  5c:    10 00 0c c4     vxor    v0,v0,v1
  60:    7c 0a 49 ce     stvx    v0,r10,r9
  64:    39 29 00 40     addi    r9,r9,64
  68:    42 00 ff bc     bdnz    24 <__xor_altivec_2+0x24>
  6c:    4e 80 00 20     blr


CLANG 14:


arch/powerpc/lib/xor_vmx.o:     file format elf32-powerpc


Disassembly of section .text:

00000000 <__xor_altivec_2>:
       0:    94 21 ff 70     stwu    r1,-144(r1)
       4:    91 c1 00 48     stw     r14,72(r1)
       8:    54 66 d1 be     rlwinm  r6,r3,26,6,31
       c:    91 e1 00 4c     stw     r15,76(r1)
      10:    38 64 ff c0     addi    r3,r4,-64
      14:    92 01 00 50     stw     r16,80(r1)
      18:    38 85 ff c0     addi    r4,r5,-64
      1c:    92 21 00 54     stw     r17,84(r1)
      20:    92 41 00 58     stw     r18,88(r1)
      24:    92 61 00 5c     stw     r19,92(r1)
      28:    92 81 00 60     stw     r20,96(r1)
      2c:    92 a1 00 64     stw     r21,100(r1)
      30:    92 c1 00 68     stw     r22,104(r1)
      34:    92 e1 00 6c     stw     r23,108(r1)
      38:    93 01 00 70     stw     r24,112(r1)
      3c:    93 21 00 74     stw     r25,116(r1)
      40:    93 41 00 78     stw     r26,120(r1)
      44:    93 61 00 7c     stw     r27,124(r1)
      48:    93 81 00 80     stw     r28,128(r1)
      4c:    93 a1 00 84     stw     r29,132(r1)
      50:    93 c1 00 88     stw     r30,136(r1)
      54:    93 e1 00 8c     stw     r31,140(r1)
      58:    7c c9 03 a6     mtctr   r6
      5c:    8c e3 00 40     lbzu    r7,64(r3)
      60:    8c c4 00 40     lbzu    r6,64(r4)
      64:    88 a3 00 15     lbz     r5,21(r3)
      68:    89 03 00 0f     lbz     r8,15(r3)
      6c:    7c c6 3a 78     xor     r6,r6,r7
      70:    90 a1 00 38     stw     r5,56(r1)
      74:    88 a3 00 14     lbz     r5,20(r3)
      78:    8b e3 00 0e     lbz     r31,14(r3)
      7c:    90 a1 00 3c     stw     r5,60(r1)
      80:    88 a3 00 13     lbz     r5,19(r3)
      84:    89 c3 00 0d     lbz     r14,13(r3)
      88:    90 a1 00 40     stw     r5,64(r1)
      8c:    88 a3 00 12     lbz     r5,18(r3)
      90:    88 e4 00 0e     lbz     r7,14(r4)
      94:    90 a1 00 44     stw     r5,68(r1)
      98:    88 a4 00 0f     lbz     r5,15(r4)
      9c:    7c e7 fa 78     xor     r7,r7,r31
      a0:    89 e3 00 0b     lbz     r15,11(r3)
      a4:    7c a5 42 78     xor     r5,r5,r8
      a8:    89 04 00 0d     lbz     r8,13(r4)
      ac:    8a 03 00 0a     lbz     r16,10(r3)
      b0:    98 a3 00 0f     stb     r5,15(r3)
      b4:    7d 08 72 78     xor     r8,r8,r14
      b8:    88 a3 00 10     lbz     r5,16(r3)
      bc:    8b e4 00 0b     lbz     r31,11(r4)
      c0:    89 c4 00 0a     lbz     r14,10(r4)
      c4:    8a 23 00 09     lbz     r17,9(r3)
      c8:    7f ef 7a 78     xor     r15,r31,r15
      cc:    8a 43 00 08     lbz     r18,8(r3)
      d0:    7d d0 82 78     xor     r16,r14,r16
      d4:    90 a1 00 30     stw     r5,48(r1)
      d8:    88 a3 00 1c     lbz     r5,28(r3)
      dc:    8b e4 00 09     lbz     r31,9(r4)
      e0:    89 c4 00 08     lbz     r14,8(r4)
      e4:    8a 63 00 07     lbz     r19,7(r3)
      e8:    7f f1 8a 78     xor     r17,r31,r17
      ec:    8a 83 00 06     lbz     r20,6(r3)
      f0:    7d d2 92 78     xor     r18,r14,r18
      f4:    90 a1 00 2c     stw     r5,44(r1)
      f8:    88 a3 00 2e     lbz     r5,46(r3)
      fc:    8b e4 00 07     lbz     r31,7(r4)
     100:    89 c4 00 06     lbz     r14,6(r4)
     104:    8a a3 00 05     lbz     r21,5(r3)
     108:    7f f3 9a 78     xor     r19,r31,r19
     10c:    8a c3 00 04     lbz     r22,4(r3)
     110:    7d d4 a2 78     xor     r20,r14,r20
     114:    90 a1 00 28     stw     r5,40(r1)
     118:    88 a3 00 2d     lbz     r5,45(r3)
     11c:    8b e4 00 05     lbz     r31,5(r4)
     120:    89 c4 00 04     lbz     r14,4(r4)
     124:    8a e3 00 03     lbz     r23,3(r3)
     128:    7f f5 aa 78     xor     r21,r31,r21
     12c:    8b 03 00 02     lbz     r24,2(r3)
     130:    7d d6 b2 78     xor     r22,r14,r22
     134:    90 a1 00 24     stw     r5,36(r1)
     138:    88 a3 00 2b     lbz     r5,43(r3)
     13c:    8b e4 00 03     lbz     r31,3(r4)
     140:    89 c4 00 02     lbz     r14,2(r4)
     144:    8b 23 00 01     lbz     r25,1(r3)
     148:    7f f7 ba 78     xor     r23,r31,r23
     14c:    8b 43 00 0c     lbz     r26,12(r3)
     150:    7d d8 c2 78     xor     r24,r14,r24
     154:    90 a1 00 20     stw     r5,32(r1)
     158:    88 a3 00 2a     lbz     r5,42(r3)
     15c:    8b e4 00 01     lbz     r31,1(r4)
     160:    89 c4 00 0c     lbz     r14,12(r4)
     164:    8b 63 00 1f     lbz     r27,31(r3)
     168:    7f f9 ca 78     xor     r25,r31,r25
     16c:    8b 83 00 1e     lbz     r28,30(r3)
     170:    7d da d2 78     xor     r26,r14,r26
     174:    90 a1 00 1c     stw     r5,28(r1)
     178:    88 a3 00 29     lbz     r5,41(r3)
     17c:    8b e4 00 1f     lbz     r31,31(r4)
     180:    89 c4 00 1e     lbz     r14,30(r4)
     184:    8b a3 00 1d     lbz     r29,29(r3)
     188:    7f fb da 78     xor     r27,r31,r27
     18c:    8b c3 00 1b     lbz     r30,27(r3)
     190:    7d dc e2 78     xor     r28,r14,r28
     194:    90 a1 00 18     stw     r5,24(r1)
     198:    88 a3 00 28     lbz     r5,40(r3)
     19c:    8b e4 00 1d     lbz     r31,29(r4)
     1a0:    89 c4 00 1b     lbz     r14,27(r4)
     1a4:    88 03 00 1a     lbz     r0,26(r3)
     1a8:    7f fd ea 78     xor     r29,r31,r29
     1ac:    89 83 00 19     lbz     r12,25(r3)
     1b0:    7d de f2 78     xor     r30,r14,r30
     1b4:    90 a1 00 14     stw     r5,20(r1)
     1b8:    88 a3 00 27     lbz     r5,39(r3)
     1bc:    8b e4 00 1a     lbz     r31,26(r4)
     1c0:    89 c4 00 19     lbz     r14,25(r4)
     1c4:    89 63 00 18     lbz     r11,24(r3)
     1c8:    7f e0 02 78     xor     r0,r31,r0
     1cc:    89 43 00 17     lbz     r10,23(r3)
     1d0:    7d cc 62 78     xor     r12,r14,r12
     1d4:    90 a1 00 10     stw     r5,16(r1)
     1d8:    88 a3 00 26     lbz     r5,38(r3)
     1dc:    8b e4 00 18     lbz     r31,24(r4)
     1e0:    89 c4 00 17     lbz     r14,23(r4)
     1e4:    89 23 00 16     lbz     r9,22(r3)
     1e8:    7f eb 5a 78     xor     r11,r31,r11
     1ec:    98 c3 00 00     stb     r6,0(r3)
     1f0:    7d ca 52 78     xor     r10,r14,r10
     1f4:    88 c3 00 11     lbz     r6,17(r3)
     1f8:    90 a1 00 0c     stw     r5,12(r1)
     1fc:    8b e4 00 16     lbz     r31,22(r4)
     200:    89 c4 00 15     lbz     r14,21(r4)
     204:    80 a1 00 38     lwz     r5,56(r1)
     208:    7f e9 4a 78     xor     r9,r31,r9
     20c:    90 c1 00 34     stw     r6,52(r1)
     210:    98 e3 00 0e     stb     r7,14(r3)
     214:    7d ce 2a 78     xor     r14,r14,r5
     218:    8b e4 00 14     lbz     r31,20(r4)
     21c:    88 a4 00 13     lbz     r5,19(r4)
     220:    80 c1 00 3c     lwz     r6,60(r1)
     224:    80 e1 00 40     lwz     r7,64(r1)
     228:    7f ff 32 78     xor     r31,r31,r6
     22c:    88 c4 00 12     lbz     r6,18(r4)
     230:    7c a5 3a 78     xor     r5,r5,r7
     234:    80 e1 00 44     lwz     r7,68(r1)
     238:    98 a3 00 13     stb     r5,19(r3)
     23c:    88 a4 00 11     lbz     r5,17(r4)
     240:    7c c6 3a 78     xor     r6,r6,r7
     244:    80 e1 00 34     lwz     r7,52(r1)
     248:    98 c3 00 12     stb     r6,18(r3)
     24c:    88 c4 00 10     lbz     r6,16(r4)
     250:    7c a5 3a 78     xor     r5,r5,r7
     254:    80 e1 00 30     lwz     r7,48(r1)
     258:    98 a3 00 11     stb     r5,17(r3)
     25c:    88 a4 00 1c     lbz     r5,28(r4)
     260:    7c c6 3a 78     xor     r6,r6,r7
     264:    80 e1 00 2c     lwz     r7,44(r1)
     268:    99 03 00 0d     stb     r8,13(r3)
     26c:    89 03 00 2f     lbz     r8,47(r3)
     270:    7c a5 3a 78     xor     r5,r5,r7
     274:    98 c3 00 10     stb     r6,16(r3)
     278:    88 c4 00 2f     lbz     r6,47(r4)
     27c:    98 a3 00 1c     stb     r5,28(r3)
     280:    88 a4 00 2e     lbz     r5,46(r4)
     284:    7c c6 42 78     xor     r6,r6,r8
     288:    80 e1 00 28     lwz     r7,40(r1)
     28c:    98 c3 00 2f     stb     r6,47(r3)
     290:    88 c4 00 2d     lbz     r6,45(r4)
     294:    7c a5 3a 78     xor     r5,r5,r7
     298:    80 e1 00 24     lwz     r7,36(r1)
     29c:    98 a3 00 2e     stb     r5,46(r3)
     2a0:    88 a4 00 2b     lbz     r5,43(r4)
     2a4:    7c c6 3a 78     xor     r6,r6,r7
     2a8:    80 e1 00 20     lwz     r7,32(r1)
     2ac:    99 e3 00 0b     stb     r15,11(r3)
     2b0:    98 c3 00 2d     stb     r6,45(r3)
     2b4:    7c a5 3a 78     xor     r5,r5,r7
     2b8:    88 c4 00 2a     lbz     r6,42(r4)
     2bc:    81 e1 00 1c     lwz     r15,28(r1)
     2c0:    98 a3 00 2b     stb     r5,43(r3)
     2c4:    88 a4 00 29     lbz     r5,41(r4)
     2c8:    7c c6 7a 78     xor     r6,r6,r15
     2cc:    81 e1 00 18     lwz     r15,24(r1)
     2d0:    88 e4 00 28     lbz     r7,40(r4)
     2d4:    7c a5 7a 78     xor     r5,r5,r15
     2d8:    81 e1 00 14     lwz     r15,20(r1)
     2dc:    98 c3 00 2a     stb     r6,42(r3)
     2e0:    88 c4 00 27     lbz     r6,39(r4)
     2e4:    7c e7 7a 78     xor     r7,r7,r15
     2e8:    81 e1 00 10     lwz     r15,16(r1)
     2ec:    98 a3 00 29     stb     r5,41(r3)
     2f0:    88 a4 00 26     lbz     r5,38(r4)
     2f4:    7c c6 7a 78     xor     r6,r6,r15
     2f8:    81 e1 00 0c     lwz     r15,12(r1)
     2fc:    9a 03 00 0a     stb     r16,10(r3)
     300:    8a 03 00 25     lbz     r16,37(r3)
     304:    7c a5 7a 78     xor     r5,r5,r15
     308:    98 e3 00 28     stb     r7,40(r3)
     30c:    88 e4 00 25     lbz     r7,37(r4)
     310:    9a 23 00 09     stb     r17,9(r3)
     314:    8a 23 00 24     lbz     r17,36(r3)
     318:    7c e7 82 78     xor     r7,r7,r16
     31c:    98 c3 00 27     stb     r6,39(r3)
     320:    88 c4 00 24     lbz     r6,36(r4)
     324:    9a 43 00 08     stb     r18,8(r3)
     328:    8a 43 00 23     lbz     r18,35(r3)
     32c:    7c c6 8a 78     xor     r6,r6,r17
     330:    98 a3 00 26     stb     r5,38(r3)
     334:    88 a4 00 23     lbz     r5,35(r4)
     338:    9a 63 00 07     stb     r19,7(r3)
     33c:    8a 63 00 22     lbz     r19,34(r3)
     340:    7c a5 92 78     xor     r5,r5,r18
     344:    98 e3 00 25     stb     r7,37(r3)
     348:    88 e4 00 22     lbz     r7,34(r4)
     34c:    9a 83 00 06     stb     r20,6(r3)
     350:    8a 83 00 21     lbz     r20,33(r3)
     354:    7c e7 9a 78     xor     r7,r7,r19
     358:    98 c3 00 24     stb     r6,36(r3)
     35c:    88 c4 00 21     lbz     r6,33(r4)
     360:    9a a3 00 05     stb     r21,5(r3)
     364:    8a a3 00 20     lbz     r21,32(r3)
     368:    7c c6 a2 78     xor     r6,r6,r20
     36c:    98 a3 00 23     stb     r5,35(r3)
     370:    88 a4 00 20     lbz     r5,32(r4)
     374:    9a c3 00 04     stb     r22,4(r3)
     378:    8a c3 00 2c     lbz     r22,44(r3)
     37c:    7c a5 aa 78     xor     r5,r5,r21
     380:    98 e3 00 22     stb     r7,34(r3)
     384:    88 e4 00 2c     lbz     r7,44(r4)
     388:    9a e3 00 03     stb     r23,3(r3)
     38c:    8a e3 00 3f     lbz     r23,63(r3)
     390:    7c e7 b2 78     xor     r7,r7,r22
     394:    98 c3 00 21     stb     r6,33(r3)
     398:    88 c4 00 3f     lbz     r6,63(r4)
     39c:    9b 03 00 02     stb     r24,2(r3)
     3a0:    8b 03 00 3e     lbz     r24,62(r3)
     3a4:    7c c6 ba 78     xor     r6,r6,r23
     3a8:    98 a3 00 20     stb     r5,32(r3)
     3ac:    88 a4 00 3e     lbz     r5,62(r4)
     3b0:    9b 23 00 01     stb     r25,1(r3)
     3b4:    8b 23 00 3d     lbz     r25,61(r3)
     3b8:    7c a5 c2 78     xor     r5,r5,r24
     3bc:    98 e3 00 2c     stb     r7,44(r3)
     3c0:    88 e4 00 3d     lbz     r7,61(r4)
     3c4:    9b 43 00 0c     stb     r26,12(r3)
     3c8:    8b 43 00 3b     lbz     r26,59(r3)
     3cc:    7c e7 ca 78     xor     r7,r7,r25
     3d0:    98 c3 00 3f     stb     r6,63(r3)
     3d4:    88 c4 00 3b     lbz     r6,59(r4)
     3d8:    9b 63 00 1f     stb     r27,31(r3)
     3dc:    8b 63 00 3a     lbz     r27,58(r3)
     3e0:    7c c6 d2 78     xor     r6,r6,r26
     3e4:    98 a3 00 3e     stb     r5,62(r3)
     3e8:    88 a4 00 3a     lbz     r5,58(r4)
     3ec:    9b 83 00 1e     stb     r28,30(r3)
     3f0:    8b 83 00 39     lbz     r28,57(r3)
     3f4:    7c a5 da 78     xor     r5,r5,r27
     3f8:    98 e3 00 3d     stb     r7,61(r3)
     3fc:    88 e4 00 39     lbz     r7,57(r4)
     400:    99 c3 00 15     stb     r14,21(r3)
     404:    89 c3 00 38     lbz     r14,56(r3)
     408:    7c e7 e2 78     xor     r7,r7,r28
     40c:    98 c3 00 3b     stb     r6,59(r3)
     410:    88 c4 00 38     lbz     r6,56(r4)
     414:    9b e3 00 14     stb     r31,20(r3)
     418:    8b e3 00 37     lbz     r31,55(r3)
     41c:    7c c6 72 78     xor     r6,r6,r14
     420:    98 a3 00 3a     stb     r5,58(r3)
     424:    88 a4 00 37     lbz     r5,55(r4)
     428:    9b a3 00 1d     stb     r29,29(r3)
     42c:    8b a3 00 36     lbz     r29,54(r3)
     430:    7c a5 fa 78     xor     r5,r5,r31
     434:    98 e3 00 39     stb     r7,57(r3)
     438:    88 e4 00 36     lbz     r7,54(r4)
     43c:    9b c3 00 1b     stb     r30,27(r3)
     440:    8b c3 00 35     lbz     r30,53(r3)
     444:    7c e7 ea 78     xor     r7,r7,r29
     448:    98 c3 00 38     stb     r6,56(r3)
     44c:    88 c4 00 35     lbz     r6,53(r4)
     450:    98 03 00 1a     stb     r0,26(r3)
     454:    88 03 00 34     lbz     r0,52(r3)
     458:    7c c6 f2 78     xor     r6,r6,r30
     45c:    98 a3 00 37     stb     r5,55(r3)
     460:    88 a4 00 34     lbz     r5,52(r4)
     464:    99 83 00 19     stb     r12,25(r3)
     468:    89 83 00 33     lbz     r12,51(r3)
     46c:    7c a5 02 78     xor     r5,r5,r0
     470:    98 e3 00 36     stb     r7,54(r3)
     474:    88 e4 00 33     lbz     r7,51(r4)
     478:    99 63 00 18     stb     r11,24(r3)
     47c:    89 63 00 32     lbz     r11,50(r3)
     480:    7c e7 62 78     xor     r7,r7,r12
     484:    98 c3 00 35     stb     r6,53(r3)
     488:    88 c4 00 32     lbz     r6,50(r4)
     48c:    99 43 00 17     stb     r10,23(r3)
     490:    89 43 00 31     lbz     r10,49(r3)
     494:    7c c6 5a 78     xor     r6,r6,r11
     498:    98 a3 00 34     stb     r5,52(r3)
     49c:    88 a4 00 31     lbz     r5,49(r4)
     4a0:    99 23 00 16     stb     r9,22(r3)
     4a4:    89 23 00 30     lbz     r9,48(r3)
     4a8:    7c a5 52 78     xor     r5,r5,r10
     4ac:    98 e3 00 33     stb     r7,51(r3)
     4b0:    88 e4 00 30     lbz     r7,48(r4)
     4b4:    89 03 00 3c     lbz     r8,60(r3)
     4b8:    98 c3 00 32     stb     r6,50(r3)
     4bc:    88 c4 00 3c     lbz     r6,60(r4)
     4c0:    98 a3 00 31     stb     r5,49(r3)
     4c4:    7c e5 4a 78     xor     r5,r7,r9
     4c8:    98 a3 00 30     stb     r5,48(r3)
     4cc:    7c c5 42 78     xor     r5,r6,r8
     4d0:    98 a3 00 3c     stb     r5,60(r3)
     4d4:    42 00 fb 88     bdnz    5c <__xor_altivec_2+0x5c>
     4d8:    83 e1 00 8c     lwz     r31,140(r1)
     4dc:    83 c1 00 88     lwz     r30,136(r1)
     4e0:    83 a1 00 84     lwz     r29,132(r1)
     4e4:    83 81 00 80     lwz     r28,128(r1)
     4e8:    83 61 00 7c     lwz     r27,124(r1)
     4ec:    83 41 00 78     lwz     r26,120(r1)
     4f0:    83 21 00 74     lwz     r25,116(r1)
     4f4:    83 01 00 70     lwz     r24,112(r1)
     4f8:    82 e1 00 6c     lwz     r23,108(r1)
     4fc:    82 c1 00 68     lwz     r22,104(r1)
     500:    82 a1 00 64     lwz     r21,100(r1)
     504:    82 81 00 60     lwz     r20,96(r1)
     508:    82 61 00 5c     lwz     r19,92(r1)
     50c:    82 41 00 58     lwz     r18,88(r1)
     510:    82 21 00 54     lwz     r17,84(r1)
     514:    82 01 00 50     lwz     r16,80(r1)
     518:    81 e1 00 4c     lwz     r15,76(r1)
     51c:    81 c1 00 48     lwz     r14,72(r1)
     520:    38 21 00 90     addi    r1,r1,144
     524:    4e 80 00 20     blr

@nickdesaulniers nickdesaulniers added [BUG] llvm A bug that should be fixed in upstream LLVM [ARCH] powerpc This bug impacts ARCH=powerpc labels Sep 21, 2022
@nickdesaulniers
Copy link
Member

Thanks for the report. I'm immediately reminded of:

@nathanchance
Copy link
Member

@chleroy
Copy link
Author

chleroy commented Sep 22, 2022

No, that's without KASAN, you can see in the generated code that there is no call to kasan functions.

@ernsteiswuerfel
Copy link

ernsteiswuerfel commented Jan 20, 2023

Same on ppc64 (PowerMac G5 build).

GCC 12.2.1_p20221126:

gcc12_xor_vmx.o:     file format elf64-powerpc


Disassembly of section .text:

0000000000000000 <__xor_altivec_2>:
   0:   78 6a d1 82     srdi    r10,r3,6
   4:   39 20 00 00     li      r9,0
   8:   38 e4 00 10     addi    r7,r4,16
   c:   39 04 00 20     addi    r8,r4,32
  10:   7d 49 03 a6     mtctr   r10
  14:   39 65 00 10     addi    r11,r5,16
  18:   39 44 00 30     addi    r10,r4,48
  1c:   38 65 00 20     addi    r3,r5,32
  20:   38 c5 00 30     addi    r6,r5,48
  24:   60 00 00 00     nop
  28:   60 00 00 00     nop
  2c:   60 42 00 00     ori     r2,r2,0
  30:   7d 04 48 ce     lvx     v8,r4,r9
  34:   7d a7 48 ce     lvx     v13,r7,r9
  38:   7c 28 48 ce     lvx     v1,r8,r9
  3c:   7c 0a 48 ce     lvx     v0,r10,r9
  40:   7d 85 48 ce     lvx     v12,r5,r9
  44:   7d 2b 48 ce     lvx     v9,r11,r9
  48:   7d 43 48 ce     lvx     v10,r3,r9
  4c:   7d 66 48 ce     lvx     v11,r6,r9
  50:   11 8c 44 c4     vxor    v12,v12,v8
  54:   11 ad 4c c4     vxor    v13,v13,v9
  58:   10 21 54 c4     vxor    v1,v1,v10
  5c:   10 00 5c c4     vxor    v0,v0,v11
  60:   7d 84 49 ce     stvx    v12,r4,r9
  64:   7d a7 49 ce     stvx    v13,r7,r9
  68:   7c 28 49 ce     stvx    v1,r8,r9
  6c:   7c 0a 49 ce     stvx    v0,r10,r9
  70:   39 29 00 40     addi    r9,r9,64
  74:   42 00 ff bc     bdnz    30 <__xor_altivec_2+0x30>
  78:   4e 80 00 20     blr
  7c:   60 42 00 00     ori     r2,r2,0

CLANG 15.0.6:

clang15_xor_vmx.o:     file format elf64-powerpc


Disassembly of section .text:

0000000000000000 <__xor_altivec_2>:
       0:       3c 4c 00 00     addis   r2,r12,0
       4:       38 42 00 00     addi    r2,r2,0
       8:       f9 c1 ff 70     std     r14,-144(r1)
       c:       78 66 d1 82     srdi    r6,r3,6
      10:       38 64 ff c0     addi    r3,r4,-64
      14:       38 85 ff c0     addi    r4,r5,-64
      18:       f9 e1 ff 78     std     r15,-136(r1)
      1c:       fa 01 ff 80     std     r16,-128(r1)
      20:       fa 21 ff 88     std     r17,-120(r1)
      24:       fa 41 ff 90     std     r18,-112(r1)
      28:       fa 61 ff 98     std     r19,-104(r1)
      2c:       fa 81 ff a0     std     r20,-96(r1)
      30:       fa a1 ff a8     std     r21,-88(r1)
      34:       fa c1 ff b0     std     r22,-80(r1)
      38:       fa e1 ff b8     std     r23,-72(r1)
      3c:       fb 01 ff c0     std     r24,-64(r1)
      40:       fb 21 ff c8     std     r25,-56(r1)
      44:       fb 41 ff d0     std     r26,-48(r1)
      48:       fb 61 ff d8     std     r27,-40(r1)
      4c:       fb 81 ff e0     std     r28,-32(r1)
      50:       fb a1 ff e8     std     r29,-24(r1)
      54:       fb c1 ff f0     std     r30,-16(r1)
      58:       fb e1 ff f8     std     r31,-8(r1)
      5c:       f8 41 ff 68     std     r2,-152(r1)
      60:       7c c9 03 a6     mtctr   r6
      64:       60 00 00 00     nop
      68:       60 00 00 00     nop
      6c:       60 00 00 00     nop
      70:       8c c4 00 40     lbzu    r6,64(r4)
      74:       8c a3 00 40     lbzu    r5,64(r3)
      78:       7c c5 2a 78     xor     r5,r6,r5
      7c:       88 e3 00 0f     lbz     r7,15(r3)
      80:       98 a3 00 00     stb     r5,0(r3)
      84:       88 a4 00 0f     lbz     r5,15(r4)
      88:       7c a5 3a 78     xor     r5,r5,r7
      8c:       89 03 00 0e     lbz     r8,14(r3)
      90:       98 a3 00 0f     stb     r5,15(r3)
      94:       88 a4 00 0e     lbz     r5,14(r4)
      98:       7c a5 42 78     xor     r5,r5,r8
      9c:       88 43 00 0d     lbz     r2,13(r3)
      a0:       98 a3 00 0e     stb     r5,14(r3)
      a4:       88 a4 00 0d     lbz     r5,13(r4)
      a8:       7c a5 12 78     xor     r5,r5,r2
      ac:       8b e3 00 0b     lbz     r31,11(r3)
      b0:       98 a3 00 0d     stb     r5,13(r3)
      b4:       88 a4 00 0b     lbz     r5,11(r4)
      b8:       7c a5 fa 78     xor     r5,r5,r31
      bc:       89 c3 00 0a     lbz     r14,10(r3)
      c0:       98 a3 00 0b     stb     r5,11(r3)
      c4:       88 a4 00 0a     lbz     r5,10(r4)
      c8:       7c a5 72 78     xor     r5,r5,r14
      cc:       89 e3 00 09     lbz     r15,9(r3)
      d0:       98 a3 00 0a     stb     r5,10(r3)
      d4:       88 a4 00 09     lbz     r5,9(r4)
      d8:       7c a5 7a 78     xor     r5,r5,r15
      dc:       8a 03 00 08     lbz     r16,8(r3)
      e0:       98 a3 00 09     stb     r5,9(r3)
      e4:       88 a4 00 08     lbz     r5,8(r4)
      e8:       7c a5 82 78     xor     r5,r5,r16
      ec:       8a 23 00 07     lbz     r17,7(r3)
      f0:       98 a3 00 08     stb     r5,8(r3)
      f4:       88 a4 00 07     lbz     r5,7(r4)
      f8:       7c a5 8a 78     xor     r5,r5,r17
      fc:       8a 43 00 06     lbz     r18,6(r3)
     100:       98 a3 00 07     stb     r5,7(r3)
     104:       88 a4 00 06     lbz     r5,6(r4)
     108:       7c a5 92 78     xor     r5,r5,r18
     10c:       8a 63 00 05     lbz     r19,5(r3)
     110:       98 a3 00 06     stb     r5,6(r3)
     114:       88 a4 00 05     lbz     r5,5(r4)
     118:       7c a5 9a 78     xor     r5,r5,r19
     11c:       8a 83 00 04     lbz     r20,4(r3)
     120:       98 a3 00 05     stb     r5,5(r3)
     124:       88 a4 00 04     lbz     r5,4(r4)
     128:       7c a5 a2 78     xor     r5,r5,r20
     12c:       8a a3 00 03     lbz     r21,3(r3)
     130:       98 a3 00 04     stb     r5,4(r3)
     134:       88 a4 00 03     lbz     r5,3(r4)
     138:       7c a5 aa 78     xor     r5,r5,r21
     13c:       8a c3 00 02     lbz     r22,2(r3)
     140:       98 a3 00 03     stb     r5,3(r3)
     144:       88 a4 00 02     lbz     r5,2(r4)
     148:       7c a5 b2 78     xor     r5,r5,r22
     14c:       8a e3 00 01     lbz     r23,1(r3)
     150:       98 a3 00 02     stb     r5,2(r3)
     154:       88 a4 00 01     lbz     r5,1(r4)
     158:       7c a5 ba 78     xor     r5,r5,r23
     15c:       8b 03 00 0c     lbz     r24,12(r3)
     160:       98 a3 00 01     stb     r5,1(r3)
     164:       88 a4 00 0c     lbz     r5,12(r4)
     168:       7c a5 c2 78     xor     r5,r5,r24
     16c:       8b 23 00 1f     lbz     r25,31(r3)
     170:       98 a3 00 0c     stb     r5,12(r3)
     174:       88 a4 00 1f     lbz     r5,31(r4)
     178:       7c a5 ca 78     xor     r5,r5,r25
     17c:       8b 43 00 1e     lbz     r26,30(r3)
     180:       98 a3 00 1f     stb     r5,31(r3)
     184:       88 a4 00 1e     lbz     r5,30(r4)
     188:       7c a5 d2 78     xor     r5,r5,r26
     18c:       8b 63 00 1d     lbz     r27,29(r3)
     190:       98 a3 00 1e     stb     r5,30(r3)
     194:       88 a4 00 1d     lbz     r5,29(r4)
     198:       7c a5 da 78     xor     r5,r5,r27
     19c:       8b 83 00 1b     lbz     r28,27(r3)
     1a0:       98 a3 00 1d     stb     r5,29(r3)
     1a4:       88 a4 00 1b     lbz     r5,27(r4)
     1a8:       7c a5 e2 78     xor     r5,r5,r28
     1ac:       8b a3 00 1a     lbz     r29,26(r3)
     1b0:       98 a3 00 1b     stb     r5,27(r3)
     1b4:       88 a4 00 1a     lbz     r5,26(r4)
     1b8:       88 c3 00 1c     lbz     r6,28(r3)
     1bc:       7c a5 ea 78     xor     r5,r5,r29
     1c0:       8b c3 00 19     lbz     r30,25(r3)
     1c4:       98 a3 00 1a     stb     r5,26(r3)
     1c8:       88 a4 00 19     lbz     r5,25(r4)
     1cc:       90 c1 ff 50     stw     r6,-176(r1)
     1d0:       7c a5 f2 78     xor     r5,r5,r30
     1d4:       88 c3 00 2e     lbz     r6,46(r3)
     1d8:       88 03 00 18     lbz     r0,24(r3)
     1dc:       98 a3 00 19     stb     r5,25(r3)
     1e0:       88 a4 00 18     lbz     r5,24(r4)
     1e4:       90 c1 ff 4c     stw     r6,-180(r1)
     1e8:       7c a5 02 78     xor     r5,r5,r0
     1ec:       88 c3 00 2d     lbz     r6,45(r3)
     1f0:       89 83 00 17     lbz     r12,23(r3)
     1f4:       90 c1 ff 48     stw     r6,-184(r1)
     1f8:       88 c3 00 2b     lbz     r6,43(r3)
     1fc:       98 a3 00 18     stb     r5,24(r3)
     200:       88 a4 00 17     lbz     r5,23(r4)
     204:       90 c1 ff 44     stw     r6,-188(r1)
     208:       7c a5 62 78     xor     r5,r5,r12
     20c:       88 c3 00 2a     lbz     r6,42(r3)
     210:       89 63 00 16     lbz     r11,22(r3)
     214:       98 a3 00 17     stb     r5,23(r3)
     218:       88 a4 00 16     lbz     r5,22(r4)
     21c:       90 c1 ff 40     stw     r6,-192(r1)
     220:       7c a5 5a 78     xor     r5,r5,r11
     224:       88 c3 00 29     lbz     r6,41(r3)
     228:       89 43 00 15     lbz     r10,21(r3)
     22c:       89 23 00 14     lbz     r9,20(r3)
     230:       98 a3 00 16     stb     r5,22(r3)
     234:       88 a4 00 15     lbz     r5,21(r4)
     238:       90 c1 ff 3c     stw     r6,-196(r1)
     23c:       7c a5 52 78     xor     r5,r5,r10
     240:       88 c3 00 28     lbz     r6,40(r3)
     244:       91 21 ff 54     stw     r9,-172(r1)
     248:       89 23 00 13     lbz     r9,19(r3)
     24c:       90 c1 ff 38     stw     r6,-200(r1)
     250:       98 a3 00 15     stb     r5,21(r3)
     254:       88 a4 00 14     lbz     r5,20(r4)
     258:       80 c1 ff 54     lwz     r6,-172(r1)
     25c:       91 21 ff 58     stw     r9,-168(r1)
     260:       7c a5 32 78     xor     r5,r5,r6
     264:       89 23 00 12     lbz     r9,18(r3)
     268:       98 a3 00 14     stb     r5,20(r3)
     26c:       88 a4 00 13     lbz     r5,19(r4)
     270:       80 c1 ff 58     lwz     r6,-168(r1)
     274:       91 21 ff 5c     stw     r9,-164(r1)
     278:       7c a5 32 78     xor     r5,r5,r6
     27c:       89 23 00 11     lbz     r9,17(r3)
     280:       98 a3 00 13     stb     r5,19(r3)
     284:       88 a4 00 12     lbz     r5,18(r4)
     288:       80 c1 ff 5c     lwz     r6,-164(r1)
     28c:       91 21 ff 60     stw     r9,-160(r1)
     290:       7c a5 32 78     xor     r5,r5,r6
     294:       89 23 00 10     lbz     r9,16(r3)
     298:       98 a3 00 12     stb     r5,18(r3)
     29c:       88 a4 00 11     lbz     r5,17(r4)
     2a0:       80 c1 ff 60     lwz     r6,-160(r1)
     2a4:       91 21 ff 64     stw     r9,-156(r1)
     2a8:       7c a5 32 78     xor     r5,r5,r6
     2ac:       98 a3 00 11     stb     r5,17(r3)
     2b0:       88 a4 00 10     lbz     r5,16(r4)
     2b4:       80 c1 ff 64     lwz     r6,-156(r1)
     2b8:       7c a5 32 78     xor     r5,r5,r6
     2bc:       80 c1 ff 50     lwz     r6,-176(r1)
     2c0:       98 a3 00 10     stb     r5,16(r3)
     2c4:       88 a4 00 1c     lbz     r5,28(r4)
     2c8:       7c a5 32 78     xor     r5,r5,r6
     2cc:       88 e3 00 2f     lbz     r7,47(r3)
     2d0:       98 a3 00 1c     stb     r5,28(r3)
     2d4:       88 a4 00 2f     lbz     r5,47(r4)
     2d8:       7c a5 3a 78     xor     r5,r5,r7
     2dc:       80 c1 ff 4c     lwz     r6,-180(r1)
     2e0:       98 a3 00 2f     stb     r5,47(r3)
     2e4:       88 a4 00 2e     lbz     r5,46(r4)
     2e8:       7c a5 32 78     xor     r5,r5,r6
     2ec:       80 41 ff 48     lwz     r2,-184(r1)
     2f0:       98 a3 00 2e     stb     r5,46(r3)
     2f4:       88 a4 00 2d     lbz     r5,45(r4)
     2f8:       88 c4 00 2b     lbz     r6,43(r4)
     2fc:       7c a5 12 78     xor     r5,r5,r2
     300:       80 41 ff 44     lwz     r2,-188(r1)
     304:       98 a3 00 2d     stb     r5,45(r3)
     308:       7c c6 12 78     xor     r6,r6,r2
     30c:       88 a4 00 2a     lbz     r5,42(r4)
     310:       80 41 ff 40     lwz     r2,-192(r1)
     314:       98 c3 00 2b     stb     r6,43(r3)
     318:       7c a5 12 78     xor     r5,r5,r2
     31c:       88 c4 00 29     lbz     r6,41(r4)
     320:       80 41 ff 3c     lwz     r2,-196(r1)
     324:       98 a3 00 2a     stb     r5,42(r3)
     328:       7c c6 12 78     xor     r6,r6,r2
     32c:       88 a4 00 28     lbz     r5,40(r4)
     330:       80 41 ff 38     lwz     r2,-200(r1)
     334:       8b e3 00 27     lbz     r31,39(r3)
     338:       7c a5 12 78     xor     r5,r5,r2
     33c:       98 c3 00 29     stb     r6,41(r3)
     340:       88 c4 00 27     lbz     r6,39(r4)
     344:       89 c3 00 26     lbz     r14,38(r3)
     348:       7c c6 fa 78     xor     r6,r6,r31
     34c:       98 a3 00 28     stb     r5,40(r3)
     350:       88 a4 00 26     lbz     r5,38(r4)
     354:       89 e3 00 25     lbz     r15,37(r3)
     358:       7c a5 72 78     xor     r5,r5,r14
     35c:       98 c3 00 27     stb     r6,39(r3)
     360:       88 c4 00 25     lbz     r6,37(r4)
     364:       8a 03 00 24     lbz     r16,36(r3)
     368:       7c c6 7a 78     xor     r6,r6,r15
     36c:       98 a3 00 26     stb     r5,38(r3)
     370:       88 a4 00 24     lbz     r5,36(r4)
     374:       8a 23 00 23     lbz     r17,35(r3)
     378:       7c a5 82 78     xor     r5,r5,r16
     37c:       98 c3 00 25     stb     r6,37(r3)
     380:       88 c4 00 23     lbz     r6,35(r4)
     384:       8a 43 00 22     lbz     r18,34(r3)
     388:       7c c6 8a 78     xor     r6,r6,r17
     38c:       98 a3 00 24     stb     r5,36(r3)
     390:       88 a4 00 22     lbz     r5,34(r4)
     394:       8a 63 00 21     lbz     r19,33(r3)
     398:       7c a5 92 78     xor     r5,r5,r18
     39c:       98 c3 00 23     stb     r6,35(r3)
     3a0:       88 c4 00 21     lbz     r6,33(r4)
     3a4:       8a 83 00 20     lbz     r20,32(r3)
     3a8:       7c c6 9a 78     xor     r6,r6,r19
     3ac:       98 a3 00 22     stb     r5,34(r3)
     3b0:       88 a4 00 20     lbz     r5,32(r4)
     3b4:       8a a3 00 2c     lbz     r21,44(r3)
     3b8:       7c a5 a2 78     xor     r5,r5,r20
     3bc:       98 c3 00 21     stb     r6,33(r3)
     3c0:       88 c4 00 2c     lbz     r6,44(r4)
     3c4:       8a c3 00 3f     lbz     r22,63(r3)
     3c8:       7c c6 aa 78     xor     r6,r6,r21
     3cc:       98 a3 00 20     stb     r5,32(r3)
     3d0:       88 a4 00 3f     lbz     r5,63(r4)
     3d4:       8a e3 00 3e     lbz     r23,62(r3)
     3d8:       7c a5 b2 78     xor     r5,r5,r22
     3dc:       98 c3 00 2c     stb     r6,44(r3)
     3e0:       88 c4 00 3e     lbz     r6,62(r4)
     3e4:       8b 03 00 3d     lbz     r24,61(r3)
     3e8:       7c c6 ba 78     xor     r6,r6,r23
     3ec:       98 a3 00 3f     stb     r5,63(r3)
     3f0:       88 a4 00 3d     lbz     r5,61(r4)
     3f4:       8b 23 00 3b     lbz     r25,59(r3)
     3f8:       7c a5 c2 78     xor     r5,r5,r24
     3fc:       98 c3 00 3e     stb     r6,62(r3)
     400:       88 c4 00 3b     lbz     r6,59(r4)
     404:       8b 43 00 3a     lbz     r26,58(r3)
     408:       7c c6 ca 78     xor     r6,r6,r25
     40c:       98 a3 00 3d     stb     r5,61(r3)
     410:       88 a4 00 3a     lbz     r5,58(r4)
     414:       8b 63 00 39     lbz     r27,57(r3)
     418:       7c a5 d2 78     xor     r5,r5,r26
     41c:       98 c3 00 3b     stb     r6,59(r3)
     420:       88 c4 00 39     lbz     r6,57(r4)
     424:       8b 83 00 38     lbz     r28,56(r3)
     428:       7c c6 da 78     xor     r6,r6,r27
     42c:       98 a3 00 3a     stb     r5,58(r3)
     430:       88 a4 00 38     lbz     r5,56(r4)
     434:       8b a3 00 37     lbz     r29,55(r3)
     438:       7c a5 e2 78     xor     r5,r5,r28
     43c:       98 c3 00 39     stb     r6,57(r3)
     440:       88 c4 00 37     lbz     r6,55(r4)
     444:       8b c3 00 36     lbz     r30,54(r3)
     448:       7c c6 ea 78     xor     r6,r6,r29
     44c:       98 a3 00 38     stb     r5,56(r3)
     450:       88 a4 00 36     lbz     r5,54(r4)
     454:       88 03 00 35     lbz     r0,53(r3)
     458:       7c a5 f2 78     xor     r5,r5,r30
     45c:       98 c3 00 37     stb     r6,55(r3)
     460:       88 c4 00 35     lbz     r6,53(r4)
     464:       89 83 00 34     lbz     r12,52(r3)
     468:       7c c6 02 78     xor     r6,r6,r0
     46c:       98 a3 00 36     stb     r5,54(r3)
     470:       88 a4 00 34     lbz     r5,52(r4)
     474:       89 63 00 33     lbz     r11,51(r3)
     478:       7c a5 62 78     xor     r5,r5,r12
     47c:       98 c3 00 35     stb     r6,53(r3)
     480:       88 c4 00 33     lbz     r6,51(r4)
     484:       89 43 00 32     lbz     r10,50(r3)
     488:       7c c6 5a 78     xor     r6,r6,r11
     48c:       98 a3 00 34     stb     r5,52(r3)
     490:       88 a4 00 32     lbz     r5,50(r4)
     494:       89 23 00 31     lbz     r9,49(r3)
     498:       7c a5 52 78     xor     r5,r5,r10
     49c:       98 c3 00 33     stb     r6,51(r3)
     4a0:       88 c4 00 31     lbz     r6,49(r4)
     4a4:       89 03 00 30     lbz     r8,48(r3)
     4a8:       7c c6 4a 78     xor     r6,r6,r9
     4ac:       98 a3 00 32     stb     r5,50(r3)
     4b0:       88 a4 00 30     lbz     r5,48(r4)
     4b4:       88 e3 00 3c     lbz     r7,60(r3)
     4b8:       7c a5 42 78     xor     r5,r5,r8
     4bc:       98 c3 00 31     stb     r6,49(r3)
     4c0:       88 c4 00 3c     lbz     r6,60(r4)
     4c4:       98 a3 00 30     stb     r5,48(r3)
     4c8:       7c c5 3a 78     xor     r5,r6,r7
     4cc:       98 a3 00 3c     stb     r5,60(r3)
     4d0:       42 00 fb a0     bdnz    70 <__xor_altivec_2+0x70>
     4d4:       e8 41 ff 68     ld      r2,-152(r1)
     4d8:       eb e1 ff f8     ld      r31,-8(r1)
     4dc:       eb c1 ff f0     ld      r30,-16(r1)
     4e0:       eb a1 ff e8     ld      r29,-24(r1)
     4e4:       eb 81 ff e0     ld      r28,-32(r1)
     4e8:       eb 61 ff d8     ld      r27,-40(r1)
     4ec:       eb 41 ff d0     ld      r26,-48(r1)
     4f0:       eb 21 ff c8     ld      r25,-56(r1)
     4f4:       eb 01 ff c0     ld      r24,-64(r1)
     4f8:       ea e1 ff b8     ld      r23,-72(r1)
     4fc:       ea c1 ff b0     ld      r22,-80(r1)
     500:       ea a1 ff a8     ld      r21,-88(r1)
     504:       ea 81 ff a0     ld      r20,-96(r1)
     508:       ea 61 ff 98     ld      r19,-104(r1)
     50c:       ea 41 ff 90     ld      r18,-112(r1)
     510:       ea 21 ff 88     ld      r17,-120(r1)
     514:       ea 01 ff 80     ld      r16,-128(r1)
     518:       e9 e1 ff 78     ld      r15,-136(r1)
     51c:       e9 c1 ff 70     ld      r14,-144(r1)
     520:       4e 80 00 20     blr

gcc12_xor_vmx.o.gz
clang15_xor_vmx.o.gz

@ernsteiswuerfel
Copy link

Seems even a bit worse with CLANG 16.0.4 (Talos II build).

GCC 12.2.1_p20230428:

/usr/src/linux-stable/arch/powerpc/lib/xor_vmx.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <__xor_altivec_2>:
   0:	78 6a d1 82 	srdi    r10,r3,6
   4:	39 20 00 00 	li      r9,0
   8:	38 e4 00 10 	addi    r7,r4,16
   c:	39 04 00 20 	addi    r8,r4,32
  10:	39 65 00 10 	addi    r11,r5,16
  14:	38 65 00 20 	addi    r3,r5,32
  18:	38 c5 00 30 	addi    r6,r5,48
  1c:	7d 49 03 a6 	mtctr   r10
  20:	39 44 00 30 	addi    r10,r4,48
  24:	60 00 00 00 	nop
  28:	60 00 00 00 	nop
  2c:	60 00 00 00 	nop
  30:	7d 04 48 ce 	lvx     v8,r4,r9
  34:	7d a7 48 ce 	lvx     v13,r7,r9
  38:	7c 28 48 ce 	lvx     v1,r8,r9
  3c:	7c 0a 48 ce 	lvx     v0,r10,r9
  40:	7d 85 48 ce 	lvx     v12,r5,r9
  44:	7d 2b 48 ce 	lvx     v9,r11,r9
  48:	7d 43 48 ce 	lvx     v10,r3,r9
  4c:	7d 66 48 ce 	lvx     v11,r6,r9
  50:	11 8c 44 c4 	vxor    v12,v12,v8
  54:	11 ad 4c c4 	vxor    v13,v13,v9
  58:	10 21 54 c4 	vxor    v1,v1,v10
  5c:	10 00 5c c4 	vxor    v0,v0,v11
  60:	7d 84 49 ce 	stvx    v12,r4,r9
  64:	7d a7 49 ce 	stvx    v13,r7,r9
  68:	7c 28 49 ce 	stvx    v1,r8,r9
  6c:	7c 0a 49 ce 	stvx    v0,r10,r9
  70:	39 29 00 40 	addi    r9,r9,64
  74:	42 00 ff bc 	bdnz    30 <__xor_altivec_2+0x30>
  78:	4e 80 00 20 	blr
  7c:	60 00 00 00 	nop

CLANG 16.0.4

/usr/src/linux-stable/arch/powerpc/lib/xor_vmx.o:    file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <__xor_altivec_2>:
       0:	3c 4c 00 00 	addis   r2,r12,0
       4:	38 42 00 00 	addi    r2,r2,0
       8:	f8 21 fe b1 	stdu    r1,-336(r1)
       c:	78 66 d1 82 	srdi    r6,r3,6
      10:	f9 c1 00 c0 	std     r14,192(r1)
      14:	38 64 ff c0 	addi    r3,r4,-64
      18:	38 85 ff c0 	addi    r4,r5,-64
      1c:	f9 e1 00 c8 	std     r15,200(r1)
      20:	fa 01 00 d0 	std     r16,208(r1)
      24:	fa 21 00 d8 	std     r17,216(r1)
      28:	fa 41 00 e0 	std     r18,224(r1)
      2c:	fa 61 00 e8 	std     r19,232(r1)
      30:	fa 81 00 f0 	std     r20,240(r1)
      34:	fa a1 00 f8 	std     r21,248(r1)
      38:	fa c1 01 00 	std     r22,256(r1)
      3c:	fa e1 01 08 	std     r23,264(r1)
      40:	fb 01 01 10 	std     r24,272(r1)
      44:	fb 21 01 18 	std     r25,280(r1)
      48:	fb 41 01 20 	std     r26,288(r1)
      4c:	fb 61 01 28 	std     r27,296(r1)
      50:	fb 81 01 30 	std     r28,304(r1)
      54:	fb a1 01 38 	std     r29,312(r1)
      58:	fb c1 01 40 	std     r30,320(r1)
      5c:	fb e1 01 48 	std     r31,328(r1)
      60:	f8 41 00 b8 	std     r2,184(r1)
      64:	7c c9 03 a6 	mtctr   r6
      68:	60 00 00 00 	nop
      6c:	60 00 00 00 	nop
      70:	8c a3 00 40 	lbzu    r5,64(r3)
      74:	8a 83 00 0c 	lbz     r20,12(r3)
      78:	8d 44 00 40 	lbzu    r10,64(r4)
      7c:	88 44 00 0c 	lbz     r2,12(r4)
      80:	8a a3 00 01 	lbz     r21,1(r3)
      84:	8b e4 00 01 	lbz     r31,1(r4)
      88:	8a c3 00 02 	lbz     r22,2(r3)
      8c:	89 c4 00 02 	lbz     r14,2(r4)
      90:	8a e3 00 03 	lbz     r23,3(r3)
      94:	89 e4 00 03 	lbz     r15,3(r4)
      98:	8b 03 00 04 	lbz     r24,4(r3)
      9c:	8a 04 00 04 	lbz     r16,4(r4)
      a0:	8b 23 00 05 	lbz     r25,5(r3)
      a4:	8a 24 00 05 	lbz     r17,5(r4)
      a8:	8b 43 00 06 	lbz     r26,6(r3)
      ac:	8a 44 00 06 	lbz     r18,6(r4)
      b0:	8b 63 00 07 	lbz     r27,7(r3)
      b4:	8a 64 00 07 	lbz     r19,7(r4)
      b8:	8b 83 00 08 	lbz     r28,8(r3)
      bc:	89 03 00 0e 	lbz     r8,14(r3)
      c0:	89 63 00 0f 	lbz     r11,15(r3)
      c4:	8b a3 00 09 	lbz     r29,9(r3)
      c8:	8b c3 00 0a 	lbz     r30,10(r3)
      cc:	88 03 00 0b 	lbz     r0,11(r3)
      d0:	89 83 00 0d 	lbz     r12,13(r3)
      d4:	88 c3 00 19 	lbz     r6,25(r3)
      d8:	88 e3 00 1a 	lbz     r7,26(r3)
      dc:	89 23 00 1b 	lbz     r9,27(r3)
      e0:	90 a1 00 b4 	stw     r5,180(r1)
      e4:	88 a3 00 1d 	lbz     r5,29(r3)
      e8:	90 a1 00 b0 	stw     r5,176(r1)
      ec:	7c 45 a2 78 	xor     r5,r2,r20
      f0:	88 44 00 36 	lbz     r2,54(r4)
      f4:	8a 84 00 37 	lbz     r20,55(r4)
      f8:	90 a1 00 ac 	stw     r5,172(r1)
      fc:	7f e5 aa 78 	xor     r5,r31,r21
     100:	8b e4 00 33 	lbz     r31,51(r4)
     104:	8a a4 00 3a 	lbz     r21,58(r4)
     108:	90 a1 00 a8 	stw     r5,168(r1)
     10c:	7d c5 b2 78 	xor     r5,r14,r22
     110:	8a c4 00 1c 	lbz     r22,28(r4)
     114:	89 c4 00 32 	lbz     r14,50(r4)
     118:	90 a1 00 a4 	stw     r5,164(r1)
     11c:	7d e5 ba 78 	xor     r5,r15,r23
     120:	89 e4 00 31 	lbz     r15,49(r4)
     124:	8a e4 00 3e 	lbz     r23,62(r4)
     128:	90 a1 00 a0 	stw     r5,160(r1)
     12c:	7e 05 c2 78 	xor     r5,r16,r24
     130:	8b 04 00 0e 	lbz     r24,14(r4)
     134:	8a 04 00 30 	lbz     r16,48(r4)
     138:	90 a1 00 9c 	stw     r5,156(r1)
     13c:	7e 25 ca 78 	xor     r5,r17,r25
     140:	8a 24 00 34 	lbz     r17,52(r4)
     144:	8b 24 00 3f 	lbz     r25,63(r4)
     148:	90 a1 00 98 	stw     r5,152(r1)
     14c:	7e 45 d2 78 	xor     r5,r18,r26
     150:	8b 44 00 0f 	lbz     r26,15(r4)
     154:	7f 08 42 78 	xor     r8,r24,r8
     158:	8b 03 00 1c 	lbz     r24,28(r3)
     15c:	8a 44 00 35 	lbz     r18,53(r4)
     160:	90 a1 00 94 	stw     r5,148(r1)
     164:	7e 65 da 78 	xor     r5,r19,r27
     168:	8b 64 00 08 	lbz     r27,8(r4)
     16c:	91 01 00 5c 	stw     r8,92(r1)
     170:	8a 64 00 38 	lbz     r19,56(r4)
     174:	7f 48 5a 78 	xor     r8,r26,r11
     178:	8b 43 00 10 	lbz     r26,16(r3)
     17c:	89 63 00 11 	lbz     r11,17(r3)
     180:	90 a1 00 90 	stw     r5,144(r1)
     184:	88 a3 00 18 	lbz     r5,24(r3)
     188:	7f 7c e2 78 	xor     r28,r27,r28
     18c:	83 61 00 b4 	lwz     r27,180(r1)
     190:	91 01 00 54 	stw     r8,84(r1)
     194:	89 03 00 12 	lbz     r8,18(r3)
     198:	93 81 00 8c 	stw     r28,140(r1)
     19c:	8b 84 00 09 	lbz     r28,9(r4)
     1a0:	7d 4a da 78 	xor     r10,r10,r27
     1a4:	91 41 00 48 	stw     r10,72(r1)
     1a8:	7e ca c2 78 	xor     r10,r22,r24
     1ac:	8b 04 00 10 	lbz     r24,16(r4)
     1b0:	7f 9d ea 78 	xor     r29,r28,r29
     1b4:	8b 84 00 0a 	lbz     r28,10(r4)
     1b8:	8a c4 00 3b 	lbz     r22,59(r4)
     1bc:	91 41 00 b4 	stw     r10,180(r1)
     1c0:	93 a1 00 88 	stw     r29,136(r1)
     1c4:	8b a3 00 17 	lbz     r29,23(r3)
     1c8:	7f 0a d2 78 	xor     r10,r24,r26
     1cc:	8b 44 00 11 	lbz     r26,17(r4)
     1d0:	7f 9e f2 78 	xor     r30,r28,r30
     1d4:	8b 84 00 0b 	lbz     r28,11(r4)
     1d8:	91 41 00 84 	stw     r10,132(r1)
     1dc:	93 c1 00 7c 	stw     r30,124(r1)
     1e0:	8b c3 00 16 	lbz     r30,22(r3)
     1e4:	7f 4a 5a 78 	xor     r10,r26,r11
     1e8:	89 64 00 12 	lbz     r11,18(r4)
     1ec:	7f 80 02 78 	xor     r0,r28,r0
     1f0:	8b 84 00 0d 	lbz     r28,13(r4)
     1f4:	90 01 00 74 	stw     r0,116(r1)
     1f8:	88 03 00 15 	lbz     r0,21(r3)
     1fc:	91 41 00 80 	stw     r10,128(r1)
     200:	7d 68 42 78 	xor     r8,r11,r8
     204:	7f 8c 62 78 	xor     r12,r28,r12
     208:	8b 83 00 13 	lbz     r28,19(r3)
     20c:	91 01 00 78 	stw     r8,120(r1)
     210:	89 04 00 13 	lbz     r8,19(r4)
     214:	91 81 00 6c 	stw     r12,108(r1)
     218:	89 83 00 14 	lbz     r12,20(r3)
     21c:	7d 08 e2 78 	xor     r8,r8,r28
     220:	91 01 00 70 	stw     r8,112(r1)
     224:	89 04 00 14 	lbz     r8,20(r4)
     228:	7d 08 62 78 	xor     r8,r8,r12
     22c:	89 84 00 24 	lbz     r12,36(r4)
     230:	91 01 00 68 	stw     r8,104(r1)
     234:	89 04 00 15 	lbz     r8,21(r4)
     238:	7d 08 02 78 	xor     r8,r8,r0
     23c:	91 01 00 64 	stw     r8,100(r1)
     240:	89 04 00 16 	lbz     r8,22(r4)
     244:	7d 08 f2 78 	xor     r8,r8,r30
     248:	91 01 00 58 	stw     r8,88(r1)
     24c:	89 04 00 17 	lbz     r8,23(r4)
     250:	7d 08 ea 78 	xor     r8,r8,r29
     254:	91 01 00 50 	stw     r8,80(r1)
     258:	89 04 00 18 	lbz     r8,24(r4)
     25c:	7d 05 2a 78 	xor     r5,r8,r5
     260:	89 04 00 1e 	lbz     r8,30(r4)
     264:	90 a1 00 44 	stw     r5,68(r1)
     268:	88 a4 00 19 	lbz     r5,25(r4)
     26c:	7c a5 32 78 	xor     r5,r5,r6
     270:	88 c4 00 1d 	lbz     r6,29(r4)
     274:	90 a1 00 40 	stw     r5,64(r1)
     278:	88 a4 00 1a 	lbz     r5,26(r4)
     27c:	7c a5 3a 78 	xor     r5,r5,r7
     280:	90 a1 00 34 	stw     r5,52(r1)
     284:	88 a4 00 1b 	lbz     r5,27(r4)
     288:	7c a5 4a 78 	xor     r5,r5,r9
     28c:	90 a1 00 2c 	stw     r5,44(r1)
     290:	80 a1 00 b0 	lwz     r5,176(r1)
     294:	7c d8 2a 78 	xor     r24,r6,r5
     298:	88 c3 00 1e 	lbz     r6,30(r3)
     29c:	9b 03 00 1d 	stb     r24,29(r3)
     2a0:	7d 1b 32 78 	xor     r27,r8,r6
     2a4:	88 c3 00 1f 	lbz     r6,31(r3)
     2a8:	89 04 00 1f 	lbz     r8,31(r4)
     2ac:	9b 63 00 1e 	stb     r27,30(r3)
     2b0:	7d 1d 32 78 	xor     r29,r8,r6
     2b4:	88 c3 00 2c 	lbz     r6,44(r3)
     2b8:	89 04 00 2c 	lbz     r8,44(r4)
     2bc:	9b a3 00 1f 	stb     r29,31(r3)
     2c0:	83 a1 00 2c 	lwz     r29,44(r1)
     2c4:	7d 05 32 78 	xor     r5,r8,r6
     2c8:	88 c3 00 20 	lbz     r6,32(r3)
     2cc:	89 04 00 20 	lbz     r8,32(r4)
     2d0:	90 a1 00 b0 	stw     r5,176(r1)
     2d4:	9b a3 00 1b 	stb     r29,27(r3)
     2d8:	83 a1 00 34 	lwz     r29,52(r1)
     2dc:	7d 05 32 78 	xor     r5,r8,r6
     2e0:	88 c3 00 21 	lbz     r6,33(r3)
     2e4:	89 04 00 21 	lbz     r8,33(r4)
     2e8:	90 a1 00 60 	stw     r5,96(r1)
     2ec:	9b a3 00 1a 	stb     r29,26(r3)
     2f0:	83 a1 00 40 	lwz     r29,64(r1)
     2f4:	7d 05 32 78 	xor     r5,r8,r6
     2f8:	88 c3 00 22 	lbz     r6,34(r3)
     2fc:	89 04 00 22 	lbz     r8,34(r4)
     300:	90 a1 00 4c 	stw     r5,76(r1)
     304:	9b a3 00 19 	stb     r29,25(r3)
     308:	83 a1 00 44 	lwz     r29,68(r1)
     30c:	7d 05 32 78 	xor     r5,r8,r6
     310:	88 c3 00 23 	lbz     r6,35(r3)
     314:	89 04 00 23 	lbz     r8,35(r4)
     318:	90 a1 00 3c 	stw     r5,60(r1)
     31c:	9b a3 00 18 	stb     r29,24(r3)
     320:	83 a1 00 50 	lwz     r29,80(r1)
     324:	7d 05 32 78 	xor     r5,r8,r6
     328:	88 c3 00 24 	lbz     r6,36(r3)
     32c:	89 03 00 25 	lbz     r8,37(r3)
     330:	90 a1 00 30 	stw     r5,48(r1)
     334:	9b a3 00 17 	stb     r29,23(r3)
     338:	83 a1 00 58 	lwz     r29,88(r1)
     33c:	7d 85 32 78 	xor     r5,r12,r6
     340:	89 84 00 25 	lbz     r12,37(r4)
     344:	90 a1 00 24 	stw     r5,36(r1)
     348:	9b a3 00 16 	stb     r29,22(r3)
     34c:	83 a1 00 64 	lwz     r29,100(r1)
     350:	7d 9a 42 78 	xor     r26,r12,r8
     354:	89 03 00 26 	lbz     r8,38(r3)
     358:	89 84 00 26 	lbz     r12,38(r4)
     35c:	9b a3 00 15 	stb     r29,21(r3)
     360:	83 a1 00 68 	lwz     r29,104(r1)
     364:	9b 43 00 25 	stb     r26,37(r3)
     368:	7d 9c 42 78 	xor     r28,r12,r8
     36c:	89 03 00 27 	lbz     r8,39(r3)
     370:	89 84 00 27 	lbz     r12,39(r4)
     374:	9b a3 00 14 	stb     r29,20(r3)
     378:	83 a1 00 70 	lwz     r29,112(r1)
     37c:	9b 83 00 26 	stb     r28,38(r3)
     380:	7d 9e 42 78 	xor     r30,r12,r8
     384:	89 03 00 28 	lbz     r8,40(r3)
     388:	89 84 00 28 	lbz     r12,40(r4)
     38c:	9b a3 00 13 	stb     r29,19(r3)
     390:	83 a1 00 78 	lwz     r29,120(r1)
     394:	9b c3 00 27 	stb     r30,39(r3)
     398:	7d 80 42 78 	xor     r0,r12,r8
     39c:	89 03 00 29 	lbz     r8,41(r3)
     3a0:	89 84 00 29 	lbz     r12,41(r4)
     3a4:	9b a3 00 12 	stb     r29,18(r3)
     3a8:	83 a1 00 80 	lwz     r29,128(r1)
     3ac:	98 03 00 28 	stb     r0,40(r3)
     3b0:	7d 8b 42 78 	xor     r11,r12,r8
     3b4:	89 03 00 2a 	lbz     r8,42(r3)
     3b8:	89 84 00 2a 	lbz     r12,42(r4)
     3bc:	9b a3 00 11 	stb     r29,17(r3)
     3c0:	83 a1 00 84 	lwz     r29,132(r1)
     3c4:	99 63 00 29 	stb     r11,41(r3)
     3c8:	7d 8a 42 78 	xor     r10,r12,r8
     3cc:	89 03 00 2b 	lbz     r8,43(r3)
     3d0:	89 84 00 2b 	lbz     r12,43(r4)
     3d4:	9b a3 00 10 	stb     r29,16(r3)
     3d8:	83 a1 00 b4 	lwz     r29,180(r1)
     3dc:	99 43 00 2a 	stb     r10,42(r3)
     3e0:	7d 89 42 78 	xor     r9,r12,r8
     3e4:	89 03 00 2d 	lbz     r8,45(r3)
     3e8:	89 84 00 2d 	lbz     r12,45(r4)
     3ec:	9b a3 00 1c 	stb     r29,28(r3)
     3f0:	99 23 00 2b 	stb     r9,43(r3)
     3f4:	7d 87 42 78 	xor     r7,r12,r8
     3f8:	89 03 00 2e 	lbz     r8,46(r3)
     3fc:	89 84 00 2e 	lbz     r12,46(r4)
     400:	98 e3 00 2d 	stb     r7,45(r3)
     404:	7d 86 42 78 	xor     r6,r12,r8
     408:	89 03 00 2f 	lbz     r8,47(r3)
     40c:	89 84 00 2f 	lbz     r12,47(r4)
     410:	98 c3 00 2e 	stb     r6,46(r3)
     414:	7d 85 42 78 	xor     r5,r12,r8
     418:	89 03 00 3c 	lbz     r8,60(r3)
     41c:	89 84 00 3c 	lbz     r12,60(r4)
     420:	98 a3 00 2f 	stb     r5,47(r3)
     424:	80 a1 00 24 	lwz     r5,36(r1)
     428:	7d 88 42 78 	xor     r8,r12,r8
     42c:	89 84 00 39 	lbz     r12,57(r4)
     430:	91 01 00 38 	stw     r8,56(r1)
     434:	89 03 00 30 	lbz     r8,48(r3)
     438:	98 a3 00 24 	stb     r5,36(r3)
     43c:	80 a1 00 30 	lwz     r5,48(r1)
     440:	7e 08 42 78 	xor     r8,r16,r8
     444:	8a 03 00 31 	lbz     r16,49(r3)
     448:	98 a3 00 23 	stb     r5,35(r3)
     44c:	80 a1 00 3c 	lwz     r5,60(r1)
     450:	91 01 00 28 	stw     r8,40(r1)
     454:	89 04 00 3d 	lbz     r8,61(r4)
     458:	7d f0 82 78 	xor     r16,r15,r16
     45c:	89 e3 00 32 	lbz     r15,50(r3)
     460:	98 a3 00 22 	stb     r5,34(r3)
     464:	80 a1 00 4c 	lwz     r5,76(r1)
     468:	9a 03 00 31 	stb     r16,49(r3)
     46c:	7d cf 7a 78 	xor     r15,r14,r15
     470:	89 c3 00 33 	lbz     r14,51(r3)
     474:	98 a3 00 21 	stb     r5,33(r3)
     478:	80 a1 00 60 	lwz     r5,96(r1)
     47c:	99 e3 00 32 	stb     r15,50(r3)
     480:	7f ee 72 78 	xor     r14,r31,r14
     484:	8b e3 00 34 	lbz     r31,52(r3)
     488:	98 a3 00 20 	stb     r5,32(r3)
     48c:	80 a1 00 b0 	lwz     r5,176(r1)
     490:	99 c3 00 33 	stb     r14,51(r3)
     494:	7e 31 fa 78 	xor     r17,r17,r31
     498:	8b e3 00 35 	lbz     r31,53(r3)
     49c:	98 a3 00 2c 	stb     r5,44(r3)
     4a0:	80 a1 00 28 	lwz     r5,40(r1)
     4a4:	9a 23 00 34 	stb     r17,52(r3)
     4a8:	7e 52 fa 78 	xor     r18,r18,r31
     4ac:	8b e3 00 36 	lbz     r31,54(r3)
     4b0:	98 a3 00 30 	stb     r5,48(r3)
     4b4:	80 a1 00 38 	lwz     r5,56(r1)
     4b8:	9a 43 00 35 	stb     r18,53(r3)
     4bc:	7c 5f fa 78 	xor     r31,r2,r31
     4c0:	88 43 00 37 	lbz     r2,55(r3)
     4c4:	98 a3 00 3c 	stb     r5,60(r3)
     4c8:	9b e3 00 36 	stb     r31,54(r3)
     4cc:	7e 94 12 78 	xor     r20,r20,r2
     4d0:	88 43 00 38 	lbz     r2,56(r3)
     4d4:	9a 83 00 37 	stb     r20,55(r3)
     4d8:	7e 73 12 78 	xor     r19,r19,r2
     4dc:	88 43 00 39 	lbz     r2,57(r3)
     4e0:	9a 63 00 38 	stb     r19,56(r3)
     4e4:	7d 8c 12 78 	xor     r12,r12,r2
     4e8:	88 43 00 3a 	lbz     r2,58(r3)
     4ec:	99 83 00 39 	stb     r12,57(r3)
     4f0:	7e b5 12 78 	xor     r21,r21,r2
     4f4:	88 43 00 3b 	lbz     r2,59(r3)
     4f8:	9a a3 00 3a 	stb     r21,58(r3)
     4fc:	7e d6 12 78 	xor     r22,r22,r2
     500:	88 43 00 3d 	lbz     r2,61(r3)
     504:	9a c3 00 3b 	stb     r22,59(r3)
     508:	7d 08 12 78 	xor     r8,r8,r2
     50c:	88 43 00 3e 	lbz     r2,62(r3)
     510:	99 03 00 3d 	stb     r8,61(r3)
     514:	7e f7 12 78 	xor     r23,r23,r2
     518:	88 43 00 3f 	lbz     r2,63(r3)
     51c:	9a e3 00 3e 	stb     r23,62(r3)
     520:	7f 39 12 78 	xor     r25,r25,r2
     524:	80 41 00 48 	lwz     r2,72(r1)
     528:	9b 23 00 3f 	stb     r25,63(r3)
     52c:	98 43 00 00 	stb     r2,0(r3)
     530:	80 41 00 54 	lwz     r2,84(r1)
     534:	98 43 00 0f 	stb     r2,15(r3)
     538:	80 41 00 5c 	lwz     r2,92(r1)
     53c:	98 43 00 0e 	stb     r2,14(r3)
     540:	80 41 00 6c 	lwz     r2,108(r1)
     544:	98 43 00 0d 	stb     r2,13(r3)
     548:	80 41 00 74 	lwz     r2,116(r1)
     54c:	98 43 00 0b 	stb     r2,11(r3)
     550:	80 41 00 7c 	lwz     r2,124(r1)
     554:	98 43 00 0a 	stb     r2,10(r3)
     558:	80 41 00 88 	lwz     r2,136(r1)
     55c:	98 43 00 09 	stb     r2,9(r3)
     560:	80 41 00 8c 	lwz     r2,140(r1)
     564:	98 43 00 08 	stb     r2,8(r3)
     568:	80 41 00 90 	lwz     r2,144(r1)
     56c:	98 43 00 07 	stb     r2,7(r3)
     570:	80 41 00 94 	lwz     r2,148(r1)
     574:	98 43 00 06 	stb     r2,6(r3)
     578:	80 41 00 98 	lwz     r2,152(r1)
     57c:	98 43 00 05 	stb     r2,5(r3)
     580:	80 41 00 9c 	lwz     r2,156(r1)
     584:	98 43 00 04 	stb     r2,4(r3)
     588:	80 41 00 a0 	lwz     r2,160(r1)
     58c:	98 43 00 03 	stb     r2,3(r3)
     590:	80 41 00 a4 	lwz     r2,164(r1)
     594:	98 43 00 02 	stb     r2,2(r3)
     598:	80 41 00 a8 	lwz     r2,168(r1)
     59c:	98 43 00 01 	stb     r2,1(r3)
     5a0:	80 41 00 ac 	lwz     r2,172(r1)
     5a4:	98 43 00 0c 	stb     r2,12(r3)
     5a8:	42 00 fa c8 	bdnz    70 <__xor_altivec_2+0x70>
     5ac:	e8 41 00 b8 	ld      r2,184(r1)
     5b0:	eb e1 01 48 	ld      r31,328(r1)
     5b4:	eb c1 01 40 	ld      r30,320(r1)
     5b8:	eb a1 01 38 	ld      r29,312(r1)
     5bc:	eb 81 01 30 	ld      r28,304(r1)
     5c0:	eb 61 01 28 	ld      r27,296(r1)
     5c4:	eb 41 01 20 	ld      r26,288(r1)
     5c8:	eb 21 01 18 	ld      r25,280(r1)
     5cc:	eb 01 01 10 	ld      r24,272(r1)
     5d0:	ea e1 01 08 	ld      r23,264(r1)
     5d4:	ea c1 01 00 	ld      r22,256(r1)
     5d8:	ea a1 00 f8 	ld      r21,248(r1)
     5dc:	ea 81 00 f0 	ld      r20,240(r1)
     5e0:	ea 61 00 e8 	ld      r19,232(r1)
     5e4:	ea 41 00 e0 	ld      r18,224(r1)
     5e8:	ea 21 00 d8 	ld      r17,216(r1)
     5ec:	ea 01 00 d0 	ld      r16,208(r1)
     5f0:	e9 e1 00 c8 	ld      r15,200(r1)
     5f4:	e9 c1 00 c0 	ld      r14,192(r1)
     5f8:	38 21 01 50 	addi    r1,r1,336
     5fc:	4e 80 00 20 	blr
	...
     60c:	60 00 00 00 	nop

gcc12-xor_vmx.o.gz
clang16-xor_vmx.o.gz

@ernsteiswuerfel
Copy link

In numbers the performance difference of the 2 compilers looks like this (on a Talos II).

GCC 12.2.1_p20221126:

xor: measuring software checksum speed
   8regs           :  7446 MB/sec
   8regs_prefetch  :  5979 MB/sec
   32regs          :  7467 MB/sec
   32regs_prefetch :  5914 MB/sec
   altivec         :  8519 MB/sec
xor: using function: altivec (8519 MB/sec)

CLANG 16.0.4:

xor: measuring software checksum speed
   8regs           :  7204 MB/sec
   8regs_prefetch  :  5670 MB/sec
   32regs          :  7303 MB/sec
   32regs_prefetch :  6180 MB/sec
   altivec         :   700 MB/sec
xor: using function: 32regs (7303 MB/sec)

@ernsteiswuerfel
Copy link

ernsteiswuerfel commented Apr 15, 2024

Seems this issue has been taken care of in CLANG 18.1.x. 👍 The resulting code is still larger compared to GCC but performance is en par with GCC now (in-kernel xor benchmark is faster with CLANG, raid6 benchmark is faster with GCC).

GCC 13.2.1_p20240210::

0000000000000000 <__xor_altivec_2>:
   0:	78 6a d1 82 	srdi    r10,r3,6
   4:	39 20 00 00 	li      r9,0
   8:	38 e4 00 10 	addi    r7,r4,16
   c:	39 04 00 20 	addi    r8,r4,32
  10:	39 65 00 10 	addi    r11,r5,16
  14:	38 65 00 20 	addi    r3,r5,32
  18:	38 c5 00 30 	addi    r6,r5,48
  1c:	7d 49 03 a6 	mtctr   r10
  20:	39 44 00 30 	addi    r10,r4,48
  24:	60 00 00 00 	nop
  28:	60 00 00 00 	nop
  2c:	60 00 00 00 	nop
  30:	7d 04 48 ce 	lvx     v8,r4,r9
  34:	7d a7 48 ce 	lvx     v13,r7,r9
  38:	7c 28 48 ce 	lvx     v1,r8,r9
  3c:	7c 0a 48 ce 	lvx     v0,r10,r9
  40:	7d 85 48 ce 	lvx     v12,r5,r9
  44:	7d 2b 48 ce 	lvx     v9,r11,r9
  48:	7d 43 48 ce 	lvx     v10,r3,r9
  4c:	7d 66 48 ce 	lvx     v11,r6,r9
  50:	11 8c 44 c4 	vxor    v12,v12,v8
  54:	11 ad 4c c4 	vxor    v13,v13,v9
  58:	10 21 54 c4 	vxor    v1,v1,v10
  5c:	10 00 5c c4 	vxor    v0,v0,v11
  60:	7d 84 49 ce 	stvx    v12,r4,r9
  64:	7d a7 49 ce 	stvx    v13,r7,r9
  68:	7c 28 49 ce 	stvx    v1,r8,r9
  6c:	7c 0a 49 ce 	stvx    v0,r10,r9
  70:	39 29 00 40 	addi    r9,r9,64
  74:	42 00 ff bc 	bdnz    30 <__xor_altivec_2+0x30>
  78:	38 60 00 00 	li      r3,0
  7c:	38 80 00 00 	li      r4,0
  80:	38 a0 00 00 	li      r5,0
  84:	38 c0 00 00 	li      r6,0
  88:	38 e0 00 00 	li      r7,0
  8c:	39 00 00 00 	li      r8,0
  90:	39 20 00 00 	li      r9,0
  94:	39 40 00 00 	li      r10,0
  98:	39 60 00 00 	li      r11,0
  9c:	4e 80 00 20 	blr

CLANG 18.1.3:

0000000000000000 <__xor_altivec_2>:
   0:	78 66 d1 82 	srdi    r6,r3,6
   4:	70 63 00 40 	andi.   r3,r3,64
   8:	40 82 00 0c 	bne     14 <__xor_altivec_2+0x14>
   c:	7c c3 33 78 	mr      r3,r6
  10:	48 00 00 5c 	b       6c <__xor_altivec_2+0x6c>
  14:	7c 40 20 ce 	lvx     v2,0,r4
  18:	7c 00 28 ce 	lvx     v0,0,r5
  1c:	38 60 00 10 	li      r3,16
  20:	38 e0 00 20 	li      r7,32
  24:	39 00 00 30 	li      r8,48
  28:	7c 64 18 ce 	lvx     v3,r4,r3
  2c:	7c 84 38 ce 	lvx     v4,r4,r7
  30:	7c a4 40 ce 	lvx     v5,r4,r8
  34:	7c 25 18 ce 	lvx     v1,r5,r3
  38:	7c c5 40 ce 	lvx     v6,r5,r8
  3c:	10 40 14 c4 	vxor    v2,v0,v2
  40:	7c 05 38 ce 	lvx     v0,r5,r7
  44:	38 a5 00 40 	addi    r5,r5,64
  48:	10 61 1c c4 	vxor    v3,v1,v3
  4c:	10 a6 2c c4 	vxor    v5,v6,v5
  50:	7c 40 21 ce 	stvx    v2,0,r4
  54:	7c 64 19 ce 	stvx    v3,r4,r3
  58:	38 66 ff ff 	addi    r3,r6,-1
  5c:	7c a4 41 ce 	stvx    v5,r4,r8
  60:	10 80 24 c4 	vxor    v4,v0,v4
  64:	7c 84 39 ce 	stvx    v4,r4,r7
  68:	38 84 00 40 	addi    r4,r4,64
  6c:	28 26 00 01 	cmpldi  r6,1
  70:	4d 82 00 20 	beqlr
  74:	38 c0 00 00 	li      r6,0
  78:	38 e0 00 10 	li      r7,16
  7c:	39 00 00 20 	li      r8,32
  80:	39 20 00 30 	li      r9,48
  84:	fb a1 ff e8 	std     r29,-24(r1)
  88:	39 40 00 40 	li      r10,64
  8c:	39 60 00 50 	li      r11,80
  90:	39 80 00 60 	li      r12,96
  94:	38 00 00 70 	li      r0,112
  98:	fb c1 ff f0 	std     r30,-16(r1)
  9c:	60 00 00 00 	nop
  a0:	7c 44 30 ce 	lvx     v2,r4,r6
  a4:	7c 65 30 ce 	lvx     v3,r5,r6
  a8:	7f c4 32 14 	add     r30,r4,r6
  ac:	7f a5 32 14 	add     r29,r5,r6
  b0:	38 63 ff fe 	addi    r3,r3,-2
  b4:	7c 1d 38 ce 	lvx     v0,r29,r7
  b8:	7c fe 50 ce 	lvx     v7,r30,r10
  bc:	7c 9e 40 ce 	lvx     v4,r30,r8
  c0:	7c 3d 40 ce 	lvx     v1,r29,r8
  c4:	7c be 48 ce 	lvx     v5,r30,r9
  c8:	7c dd 48 ce 	lvx     v6,r29,r9
  cc:	28 23 00 00 	cmpldi  r3,0
  d0:	10 43 14 c4 	vxor    v2,v3,v2
  d4:	7c 7e 38 ce 	lvx     v3,r30,r7
  d8:	7c 44 31 ce 	stvx    v2,r4,r6
  dc:	7c 5d 50 ce 	lvx     v2,r29,r10
  e0:	10 81 24 c4 	vxor    v4,v1,v4
  e4:	7c 3e 60 ce 	lvx     v1,r30,r12
  e8:	10 a6 2c c4 	vxor    v5,v6,v5
  ec:	7c de 00 ce 	lvx     v6,r30,r0
  f0:	38 c6 00 80 	addi    r6,r6,128
  f4:	7c 9e 41 ce 	stvx    v4,r30,r8
  f8:	7c be 49 ce 	stvx    v5,r30,r9
  fc:	10 60 1c c4 	vxor    v3,v0,v3
 100:	7c 1e 58 ce 	lvx     v0,r30,r11
 104:	10 42 3c c4 	vxor    v2,v2,v7
 108:	7c fd 58 ce 	lvx     v7,r29,r11
 10c:	7c 7e 39 ce 	stvx    v3,r30,r7
 110:	7c 5e 51 ce 	stvx    v2,r30,r10
 114:	10 07 04 c4 	vxor    v0,v7,v0
 118:	7c fd 60 ce 	lvx     v7,r29,r12
 11c:	7c 1e 59 ce 	stvx    v0,r30,r11
 120:	10 27 0c c4 	vxor    v1,v7,v1
 124:	7c fd 00 ce 	lvx     v7,r29,r0
 128:	7c 3e 61 ce 	stvx    v1,r30,r12
 12c:	10 c7 34 c4 	vxor    v6,v7,v6
 130:	7c de 01 ce 	stvx    v6,r30,r0
 134:	40 82 ff 6c 	bne     a0 <__xor_altivec_2+0xa0>
 138:	eb c1 ff f0 	ld      r30,-16(r1)
 13c:	eb a1 ff e8 	ld      r29,-24(r1)
 140:	4e 80 00 20 	blr

Here again performance comparison on my Talos, kernel v6.9-rc4:

GCC 13.2.1_p20240210:

 xor: measuring software checksum speed
    8regs           :  7384 MB/sec
    8regs_prefetch  :  5888 MB/sec
    32regs          :  7468 MB/sec
    32regs_prefetch :  6010 MB/sec
    altivec         :  8517 MB/sec
 xor: using function: altivec (8517 MB/sec)

CLANG 18.1.3:

 xor: measuring software checksum speed
    8regs           :  7321 MB/sec
    8regs_prefetch  :  5664 MB/sec
    32regs          :  7196 MB/sec
    32regs_prefetch :  6188 MB/sec
    altivec         :  9779 MB/sec
 xor: using function: altivec (9779 MB/sec)

dmesg_69-rc4_p9_clang18.txt
dmesg_69-rc4_p9_gcc13.txt

config_69-rc4_p9_clang18.txt
config_69-rc4_p9_gcc13.txt

@nathanchance
Copy link
Member

nathanchance commented Apr 15, 2024

Ha, so this appears to be fixed by my commit 35f20786c481 ("powerpc: xor_vmx: Add '-mhard-float' to CFLAGS"), which was fixing a new compiler error from clang-19, but I think this is basically pointing out that prior to the compiler error being added, -maltivec was having no effect in combination with -msoft-float. Now that -mhard-float overrides -msoft-float, we get nice vectorized code with -maltivec.

Closing this up for now.

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 6.9 This bug was fixed in Linux 6.9 and removed [BUG] llvm A bug that should be fixed in upstream LLVM labels Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] powerpc This bug impacts ARCH=powerpc [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 6.9 This bug was fixed in Linux 6.9
Projects
None yet
Development

No branches or pull requests

4 participants