Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc -march=armv8-a generates LSE instructions but not -march=armv8.2-a or -mcpu=neoverse-n1 #107

Closed
FranckPachot opened this issue May 16, 2021 · 2 comments

Comments

@FranckPachot
Copy link

Hi,

I compiled PostgreSQL without specific flags (on m6gd in this blog post https://dev.to/aws-heroes/aws-postgresql-on-graviton2-with-newer-gcc-3aha) and the ./configure compiles with -march=armv8-a+crc
I tried -march=armv8.2-a and -mcpu=neoverse-n1 and -mtune=neoverse-n1 as it is what Graviton2 is supposed to be but didn't get LSE optimisations.

I've tested different combinations.
I have a c6gn instance:

[ec2-user@ip-172-31-4-46 ~]$ echo $(curl -s http://169.254.169.254/latest/meta-data/instance-type) $(uname -m)
c6gn.xlarge aarch64

This is Graviton 2:

[ec2-user@ip-172-31-4-46 ~]$ head /proc/cpuinfo
processor       : 0
BogoMIPS        : 243.75
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

This is ARM (0x41) Neoverse N1 (0xD0CU)

I have the latest PostgreSQL devel snapshot (https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz) and GCC 11:

PostgreSQL 14devel on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 11.1.1 20210515, 64-bit

Without setting CFLAGS the ./configure runs with -march=armv8-a+crc and I have the following ARM optimisations:

[ec2-user@ip-172-31-4-46 postgresql-14devel]$ objdump -d /usr/local/pgsql/bin/postgres | awk '/(ld|st)a?xr/{print $3}/__aarch64_/{sub(/[>+].*/,">",$0);print $NF}' | sort | uniq -c
    112 <__aarch64_cas4_acq_rel>
     13 <__aarch64_cas8_acq_rel>
     13 <__aarch64_ldadd4_acq_rel>
      5 <__aarch64_ldadd8_acq_rel>
      4 <__aarch64_ldclr4_acq_rel>
      6 <__aarch64_ldset4_acq_rel>
     34 <__aarch64_swp4_acq>
      7 ldaxr
      1 stxr

However, if I add the mcpu for Neoverse N1, with CFLAGS="-march=armv8-a -mcpu=neoverse-n1" I see less cas4 instructions

[ec2-user@ip-172-31-4-46 postgresql-14devel]$ objdump -d /usr/local/pgsql/bin/postgres | awk '/(ld|st)a?xr/{print $3}/__aarch64_/{sub(/[>+].*/,">",$0);print $NF}' | sort | uniq -c
     10 <__aarch64_cas4_acq_rel>
     13 <__aarch64_cas8_acq_rel>
     13 <__aarch64_ldadd4_acq_rel>
      5 <__aarch64_ldadd8_acq_rel>
      4 <__aarch64_ldclr4_acq_rel>
      6 <__aarch64_ldset4_acq_rel>
     34 <__aarch64_swp4_acq>
      7 ldaxr
      1 stxr

With CFLAGS="-march=armv8-a+crc -mtune=neoverse-n1" there are more cas8

[ec2-user@ip-172-31-4-46 postgresql-14devel]$ objdump -d /usr/local/pgsql/bin/postgres | awk '/(ld|st)a?xr/{print $3}/__aarch64_/{sub(/[>+].*/,">",$0);print $NF}' | sort | uniq -c
     10 <__aarch64_cas4_acq_rel>
    117 <__aarch64_cas8_acq_rel>
     13 <__aarch64_ldadd4_acq_rel>
      5 <__aarch64_ldadd8_acq_rel>
      4 <__aarch64_ldclr4_acq_rel>
      6 <__aarch64_ldset4_acq_rel>
     34 <__aarch64_swp4_acq>
      7 ldaxr
      1 stxr

And finally the recommendation from https://github.com/aws/aws-graviton-getting-started/blob/main/c-c%2B%2B.md#cc-on-graviton
flags "-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1":

[ec2-user@ip-172-31-4-46 postgresql-14devel]$ objdump -d /usr/local/pgsql/bin/postgres | awk '/(ld|st)a?xr/{print $3}/__aarch64_/{sub(/[>+].*/,">",$0);print $NF}' | sort | uniq -c
[ec2-user@ip-172-31-4-46 postgresql-14devel]$ nm ./src/backend/postgres | grep -E "aarch64(_have_lse_atomics)?"

I see no LSE instructions.

@FranckPachot FranckPachot changed the title gcc -march=armv8 generates LSE instructions but not -march=armv8.2-a or -mcpu=neoverse-n1 gcc -march=armv8-a generates LSE instructions but not -march=armv8.2-a or -mcpu=neoverse-n1 May 16, 2021
@AGSaidi
Copy link
Member

AGSaidi commented May 17, 2021

The symbols you're searching for (e.g. __aarch64_cas4_acq_rel) are symbols used by -moutline-atomics to determine at run-time if a LSE (e.g. casal or a load-store exclusive e.g.ldaxr/stlxr are used)

0000000000400650 <__aarch64_cas4_acq_rel>:
  400650:	90000110 	adrp	x16, 420000 <getauxval@GLIBC_2.17>
  400654:	39409610 	ldrb	w16, [x16, #37]
  400658:	34000070 	cbz	w16, 400664 <__aarch64_cas4_acq_rel+0x14>
  40065c:	88e0fc41 	casal	w0, w1, [x2]
  400660:	d65f03c0 	ret
  400664:	2a0003f0 	mov	w16, w0
  400668:	885ffc40 	ldaxr	w0, [x2]
  40066c:	6b10001f 	cmp	w0, w16
  400670:	54000061 	b.ne	40067c <__aarch64_cas4_acq_rel+0x2c>  // b.any
  400674:	8811fc41 	stlxr	w17, w1, [x2]
  400678:	35ffff91 	cbnz	w17, 400668 <__aarch64_cas4_acq_rel+0x18>
  40067c:	d65f03c0 	ret

When you compile with -mno-outline-atomics or a compiler that didn't make -moutline-atomics the default and -march=armv8-a the compile will just inline the ldaxr/stlxr in the calling function the symbol you reference above won't be present. (note: GCC 7 in Amazon Linux2 includes back ported patches making -moutline-atomics the default.

When you compile with -march=armv8.2-a or with -mcpu=neoverse-n1 similarly a casal or other LSE atomic will simply be inlined in the caller instead of jumping to the symbols that come from libgcc which includes the above mentioned __aarcch64_* functions.

@FranckPachot
Copy link
Author

Thanks, got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants