## Week 6 Day 1: rectorization

Notes from before:

- · Issue on Quiz 5.2 (s.1?)
- · Projects
- · FD vs. CD for search



i vedor of

"C" style (C, Pr+hon)

Column-

major

Memory Layout:

$$A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$$

Note:

$$A^{T} = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}$$

Also note:

but this is Y, X in a graph!

Mamory Access / note: several Levels RAM clock cycles (as less) 103

Cache clock cycles (as less) 103

Registers Computers love sequential access! Computers tend to load wide batches of memory Toad to cache to your cache asking for 0-4 is now much faster, but St is still slow Hordware vectorization Cache-> Registrs modern hardware can compute truly in parallel on some registers! A a t b b = C C c on cpu! You might be able to take advantage at this, but only with a proper data layout.

|                            | 32 bit | 646it  | width                |
|----------------------------|--------|--------|----------------------|
| 55E                        | 4      | ×<br>2 | 128 bits             |
| 55E2<br>AVX(1+2)<br>AVX512 | 8      | 4      | 256 bits<br>512 bits |

Take away message: if you avial explicit loops, you'll be ready for vectorization!

Note: From now on, we'll mostly use the term "vectorization" in the MATLAB sence, for Array-at-a-time programming.

But at least now you know what real rectorization is!