Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AARCH64 some changes / performance increasing for Apple M1 / Cortex-A7x #7

Merged
merged 3 commits into from
Dec 12, 2020

Conversation

magurosan
Copy link
Collaborator

@magurosan magurosan commented Dec 10, 2020

変更点

  1. Makefile.aarch64について、今時のコンパイラのトレンドに合わせてデフォルトをclangに変更しました(AppleとGoogleのGNU離れ)。
  2. 同上、以前(4年前)と比べてセルフホストのAARCH64(ARM64)環境が増えてきたこともあり、x86と同じく test-arm-M* から test-std-M* に変更しております。ARM社のAARCH64上の命名規則に従いneon→asimdとしています。
  3. 他方、32bitARMの方は現状通りとします。
  4. SFMT-neon.hを変更しCortex-A75以降での性能を向上しました。またApple A13/A14/M1でサポートされるSHA3命令を使うことでより性能が向上するようにしています。
  5. これらの変更により生成される乱数列に差異はありません。

性能の向上

Ubuntus on UserLAnd on Oculus Quest 2 (Cortex-A77)

[変更前]
32 bit BLOCK:259ms for 100000000 randoms generation
32 bit SEQUE:311ms for 100000000 randoms generation
64 bit BLOCK:259ms for 50000000 randoms generation
64 bit SEQUE:266ms for 50000000 randoms generation

[変更後]
32 bit BLOCK:149ms for 100000000 randoms generation
32 bit SEQUE:314ms for 100000000 randoms generation
64 bit BLOCK:149ms for 50000000 randoms generation
64 bit SEQUE:228ms for 50000000 randoms generation

Apple M1 on macOS 11 / MacBook Pro

[変更前]
32 bit BLOCK:84ms for 100000000 randoms generation
32 bit SEQUE:119ms for 100000000 randoms generation
64 bit BLOCK:84ms for 50000000 randoms generation
64 bit SEQUE:102ms for 50000000 randoms generation

[変更後]
32 bit BLOCK:44ms for 100000000 randoms generation
32 bit SEQUE:78ms for 100000000 randoms generation
64 bit BLOCK:44ms for 50000000 randoms generation
64 bit SEQUE:62ms for 50000000 randoms generation

[変更後(+SHA3)]
32 bit BLOCK:37ms for 100000000 randoms generation
32 bit SEQUE:71ms for 100000000 randoms generation
64 bit BLOCK:37ms for 50000000 randoms generation
64 bit SEQUE:55ms for 50000000 randoms generation

@magurosan magurosan changed the title AARCH64 some changes / performance increasing for Apple M1 / Cortex A7x AARCH64 some changes / performance increasing for Apple M1 / Cortex-A7x Dec 10, 2020
sha3-check added
@MSaito MSaito merged commit 49940fc into MersenneTwister-Lab:master Dec 12, 2020
@MSaito
Copy link
Member

MSaito commented Dec 12, 2020

ありがとう。
Thank You

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants