M1 performance improvements #82

chriselrod · 2021-05-06T02:02:55Z

No description provided.

chriselrod · 2021-05-06T02:13:11Z

Two items:

M1 seems to have higher overhead on threading than x86 chips, so I increased the threading threshold for non x86. Until I've tried other ARM CPUs or Power, I figured it's safer to go with the higher threshold. I've already done the same thing in LoopVectorization.
The M1 has 2 cache levels: 64 KiB L1D, and 4 MiB L2, both core-local. These are massive! More to the point, the x86 chips I tested on have three cache levels: the first two local (like the M1), and then a third that is shared. For comparison, Haswell, the first AVX2 chip, has 32 KiB L1D and 0.25 MiB L2, while Tiger Lake, Intel's latest, has 48 KiB L1D and 1.25 MiB L2. On the x86 chips, A (in C = A*B) would be blocked to fit in the L2 cache, while B would be split in the L3 cache and shared among threads (as the L3 cache is shared). On Octavian master with the M1, it would block A and B to fit in the L1 and L2 caches instead. This was bad, because while the M1's L1D cache is very large at 64 KiB, it is still much smaller than an L2 cache, and blocking shared B in the L2 caches doesn't make sense either because they're not shared between cores anyway. Therefore, the updated behavior is to match 3-cache level x86: block A in the L2 (allowing for tremendous reuse, due to the L2's tremendous size), and then leave B in a higher level. Unfortunately, that higher level on the M1 isn't a cache but system RAM, so more testing/optimization will be needed evetually to figure out the best thing to do there. My guess: probably best not to block B at all.

Unfortunately, Octavian isn't going to contend with Accelerate anytime soon, because Octavian is using Neon, while Accelerate is using Apple's secret AMX/matrix instructions.

codecov · 2021-05-06T02:14:32Z

Codecov Report

Merging #82 (02149f7) into master (9457cdb) will decrease coverage by 1.85%.
The diff coverage is 42.10%.

@@            Coverage Diff             @@
##           master      #82      +/-   ##
==========================================
- Coverage   86.54%   84.69%   -1.86%     
==========================================
  Files          10       10              
  Lines         565      575      +10     
==========================================
- Hits          489      487       -2     
- Misses         76       88      +12

Impacted Files	Coverage Δ
src/global_constants.jl	`50.00% <0.00%> (-16.67%)`	⬇️
src/matmul.jl	`89.25% <80.00%> (-0.18%)`	⬇️
src/block_sizes.jl	`94.91% <0.00%> (-1.70%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9457cdb...02149f7. Read the comment docs.

chriselrod · 2021-05-06T02:27:51Z

The B-blocking optimizations can come in a separate PR. I'll merge this for now after tests pass.

chriselrod added 2 commits April 5, 2021 07:58

Adjust threading threshold for better performance on M1

c35f6c5

Fix cache sizes for M1

2266e62

chriselrod requested a review from DilumAluthge May 6, 2021 02:02

Merge branch 'master' into m1threadthresh

0261f6c

Bump version.

02149f7

chriselrod enabled auto-merge (squash) May 6, 2021 02:27

chriselrod merged commit 2dd77ea into master May 6, 2021

chriselrod deleted the m1threadthresh branch May 6, 2021 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M1 performance improvements #82

M1 performance improvements #82

Uh oh!

chriselrod commented May 6, 2021

Uh oh!

chriselrod commented May 6, 2021 •

edited

Loading

Uh oh!

codecov bot commented May 6, 2021 •

edited

Loading

Uh oh!

chriselrod commented May 6, 2021

Uh oh!

Uh oh!

M1 performance improvements #82

M1 performance improvements #82

Uh oh!

Conversation

chriselrod commented May 6, 2021

Uh oh!

chriselrod commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chriselrod commented May 6, 2021

Uh oh!

Uh oh!

chriselrod commented May 6, 2021 •

edited

Loading

codecov bot commented May 6, 2021 •

edited

Loading