# High Performance Programming SIMD – MMX/SSE/AVX FISE2-INFO2

Guillaume MULLER

April 11, 2021

#### Classical approach to HPP = distribute on cluster

- What if the initial code is inefficient?
- $\circ \Rightarrow$  optimize locally first!

Classical approach to HPP = distribute on cluster

- What if the initial code is inefficient?
- ⇒ optimize locally first!

Current machines already are massively parallel

Multi-Core, Multi-Thread...

#### Classical approach to HPP = distribute on cluster

- What if the initial code is inefficient?
- → optimize locally first!

# Current machines already are massively parallel

Multi-Core, Multi-Thread...

# A large part of optimizations can not be treated automatically

- $\circ \Rightarrow$  impossible to rely on tools written by others
- $\circ \Rightarrow$  as (future) engineers in CS: mandatory knowledge

# Why SIMD/MMX/SSE/AVX?

#### Task Parallelism

- $\circ$  Execute coarse-grained pieces of code on  $\neq$  pieces of hardware
- Multi-Cores, Multi-Threading. . .

# Why SIMD/MMX/SSE/AVX?

#### Task Parallelism

- ullet Execute coarse-grained pieces of code on eq pieces of hardware
- Multi-Cores, Multi-Threading. . .

#### Instruction/Data Parallelism

- Tiny pieces of code/data simultaneously on = hardware
- SIMD

# Why SIMD/MMX/SSE/AVX?

#### Task Parallelism

- ullet Execute coarse-grained pieces of code on eq pieces of hardware
- Multi-Cores, Multi-Threading. . .

#### Instruction/Data Parallelism

- Tiny pieces of code/data simultaneously on = hardware
- SIMD

• Who has already used/programmed a processor ≠ Intel?

**Designer**Micro-Architecture







# "Reminder"



## "Reminder"



# Instruction Types Flow IP / JMP ... Mem LD / ST ... Calc ADD / SUB / MULT / DIV ...

## "Reminder"



#### Instruction Types

Flow IP / JMP ...

Mem LD / ST ...

Calc ADD / SUB / MULT / DIV ...

## Instruction Cycle

- Fetch
- Decode
- Execute
- o . . .