#### last time

#### hazards in pipelines

hazard = extra work needed to make instruction run correctly in pipeline data hazard = ...reading value with pending update control hazard = ...can't compute next instruction to fetch

stalling (pause instructions until ready) to resolve hazards

forwarding — take pending value from later in pipeline MUX to select versus value from register compare register numbers to see if forwarding needed can combine with stalling

branch prediction — guess what jump will do if wrong; undo guess when actual outcome known

## anonymous feedback (1)

pipeline assignment — deadline move?

I know we didn't completely cover branch prediction

think assignment text is enough + no quiz due next Tuesday past experience: assignment is quicker to do than typical assignment

pipeline assignment — how partial credit?

rubric categories checking for things like: one instruction per stage per cycle instructions pass through stages in order + never skip stages identifies when misprediction occurs instruction X fetched after instruction Y correctly identifies data hazard requiring stalling

...

#### on upcoming quiz

next quiz due Tuesday after Thanksgiving

will release tomorrow (so you can start early if you want)

```
cmpq %r8, %r9
       ine LABEL
                    // not taken
       xorq %r10, %r11
       movg %r11, 0(%r12)
                             cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
                                             М
ine LABEL
                                             Ε
                                                М
                                           D
                                                   W
(do nothing)
                                                   М
(do nothing)
                                                   Е
                                                        W
xorg %r10, %r11
                                                   D
                                                        М
                                                           W
movg %r11, 0(%r12)
```

```
cmpq %r8, %r9
       ine LABEL
                     // not taken
       xorq %r10, %r11
       movg %r11, 0(%r12)
                             cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
                          compare sets flags | E
ine LABEL
                                              Ε
                                           D
                                                 М
                                                    W
(do nothing)
                                                    М
(do nothing)
                                                    Е
                                                         W
xorg %r10, %r11
                                                    D
                                                         М
                                                            W
movq %r11, 0(%r12)
```

```
cmpq %r8, %r9
       ine LABEL // not taken
       xorq %r10, %r11
       movg %r11, 0(%r12)
                            cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
ine LABEL
           compute if jump goes to LABEL
(do nothing)
                                                 М
(do nothing)
                                                 Е
                                                      W
xorg %r10, %r11
                                                 D
                                                      М
                                                         W
movg %r11, 0(%r12)
```

```
cmpq %r8, %r9
       ine LABEL
                     // not taken
       xorq %r10, %r11
       movg %r11, 0(%r12)
                             cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
                                              М
ine LABEL
                                              Ε
                                                    W
(do nothing)
                                                    М
(do nothing)
                                                    Ε
                                                         W
xorg %r10, %r11
                              use computed result | F
                                                         М
                                                            W
movg %r11, 0(%r12)
```

#### making guesses

```
cmpq %r8, %r9
jne LABEL
xorq %r10, %r11
movq %r11, 0(%r12)
...
```

```
LABEL: addq %r8, %r9 imul %r13, %r14
```

speculate (guess): jne won't go to LABEL

right: 2 cycles faster!; wrong: undo guess before too late

# jXX: speculating right (1)

```
cmpq %r8, %r9
        ine LABEL
        xorq %r10, %r11
       movg %r11, 0(%r12)
        . . .
LABEL: addg %r8, %r9
        imul %r13, %r14
        . . .
                               cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
                                              Ε
                                                 М
                                           D
ine LABEL
                                                 Ε
xorq %r10, %r11
                                                 D
                                                       М
```

•••

movq %r11, 0(%r12)

Е

## jXX: speculating wrong



•••

## jXX: speculating wrong

```
cycle # 0 1 2 3 4 5 6 7 8
cmpq %r8, %r9
ine LABEL
                          F
                             D
xorq %r10, %r11
                               D instruction "squashed"
(inserted nop)
movq %r11, 0(%r12)
                                  instruction "squashed"
(inserted nop)
                                     Е
                                          W
LABEL: addq %r8, %r9
                                          М
                                     D
imul %r13, %r14
```

8

#### "squashed" instructions

on misprediction need to undo partially executed instructions

mostly: remove from pipeline registers

more complicated pipelines: replace written values in cache/registers/etc.

## performance

#### hypothetical instruction mix

| kind          | portion | cycles<br>(predict<br>not-taken) | cycles<br>(stall) |
|---------------|---------|----------------------------------|-------------------|
| taken jXX     | 3%      | 3                                | 3                 |
| non-taken jXX | 5%      | 1                                | 3                 |
| others        | 92%     | 1*                               | 1*                |

# performance

#### hypothetical instruction mix

| kind          | portion | cycles<br>(predict<br>not-taken) |    |
|---------------|---------|----------------------------------|----|
| taken jXX     | 3%      | 3                                | 3  |
| non-taken jXX | 5%      | 1                                | 3  |
| others        | 92%     | 1*                               | 1* |

```
exercise: predict+forward (1)
                  cycle # 0 1 2 3 4 5 6 7 8
 addg %r8, %r9
                          FDEMW
 subg %r7, %r8
                             FDEMW
 ile foo (taken)
 foo: andq %r9, %r8
if ile is correctly predicted:
    in andg, %r9 is _____ addg.
     in andg. %r8 is ______ subg.
     A: not forwarded from [assume read while writing requires forwarding]
     B-D: forwarded to decode from {execute, memory, writeback} stage of
```

```
exercise: predict+forward (1)
                  cycle # 0 1 2 3 4 5 6 7 8
 addg %r8, %r9
                          FDEMW
 subg %r7, %r8
                            FDEMW
 ile foo (taken)
 foo: andq %r9, %r8
                                  FDEMW
if ile is correctly predicted:
    in andg, %r9 is _____ addg.
     in andg. %r8 is ______ subg.
     A: not forwarded from [assume read while writing requires forwarding]
     B-D: forwarded to decode from {execute, memory, writeback} stage of
```

```
exercise: predict+forward (2)

cycle # 0 1 2 3 4 5 6 7 8
 addg %r8, %r9
                            FDEMW
 subg %r7, %r8
                              FDEMW
 ile foo (taken)
 foo: andq %r9, %r8
if ile is mispredicted + resolved after ile's execute:
     in andg, %r9 is _____ addg.
     in andg, %r9 is _____ subg.
     A: not forwarded from [assume read while writing requires forwarding]
     B-D: forwarded to decode from {execute, memory, writeback} stage of
```

```
exercise: predict+forward (2)
                 cycle # 0 1 2 3 4 5 6 7 8
 addg %r8, %r9
                         FDEMW
 subq %r7, %r8
                            FDEMW
 ile foo (taken)
 (mispredicted)
                                 FDEMW
 (mispredicted)
                                   FDEM
 foo: andq %r9, %r8
                                      FDEMW
if jle is mispredicted + resolved after ile's execute:
    in andg, %r9 is _____ addg.
    in and 9, 9 is _____ subg.
    A: not forwarded from [assume read while writing requires forwarding]
    B-D: forwarded to decode from {execute, memory, writeback} stage of
```

### other pipelines?

showed fetch / decode / execute / memory / writeback very common early pipeline design

not only option!

### hazards versus dependencies

dependency — X needs result of instruction Y?

has potential for being messed up by pipeline
(since part of X may run before Y finishes)

hazard — will it not work in some pipeline?

before extra work is done to "resolve" hazards
multiple kinds: so far, data hazards

```
      addq
      %rax,
      %rbx

      subq
      %rax,
      %rcx

      movq
      $100,
      %rcx

      addq
      %rcx,
      %r10

      addq
      %rbx,
      %r10
```

where are dependencies? which are hazards in our pipeline? which are resolved with forwarding?

```
addq %rax, %rbx
subq %rax, %rcx
movq $100, %rcx
addq %rcx, %r10
addq %rbx, %r10
```

where are dependencies? which are hazards in our pipeline? which are resolved with forwarding?

```
addq %rax, %rbx
subq %rax, %rcx
movq $100, %rcx
addq %rcx, %r10
addq %rbx, %r10
```

where are dependencies? which are hazards in our pipeline? which are resolved with forwarding?

```
addq %rax, %rbx

subq %rax, %rcx

movq $100, %rcx

addq %rcx, %r10

addq %rbx, %r10
```

where are dependencies? which are hazards in our pipeline? which are resolved with forwarding?

## pipeline with different hazards

```
example: 4-stage pipeline:
fetch/decode/execute+memory/writeback

// 4 stage // 5 stage
addq %rax, %r8 // // W
subq %rax, %r9 // W // M
xorq %rax, %r10 // EM // E
andq %r8, %r11 // D // D
```

## pipeline with different hazards

```
example: 4-stage pipeline:
fetch/decode/execute+memory/writeback
              // 4 stage // 5 stage
addq %rax, %r8 // // W
subq %rax, %r9 // W // M
xorq %rax, %r10 // EM // E
andq %r8, %r11 // D // D
addg/andg is hazard with 5-stage pipeline
addq/andq is not a hazard with 4-stage pipeline
```

## pipeline with different hazards

```
example: 4-stage pipeline:
fetch/decode/execute+memory/writeback

// 4 stage // 5 stage
addq %rax, %r8 // // W
subq %rax, %r9 // W // M
xorq %rax, %r10 // EM // E
andq %r8, %r11 // D // D
```

more hazards with more pipeline stages

split execute into two stages: F/D/E1/E2/M/W

result only available near end of second execute stage

where does forwarding, stalls occur?

| cycle #              | 0 | 1 | 2  | 3  | 4 | 5 | 6 | 7 | 8 |  |
|----------------------|---|---|----|----|---|---|---|---|---|--|
| (1) addq %rcx, %r9   | F | D | E1 | E2 | М | W |   |   |   |  |
| (2) addq %r9, %rbx   |   |   |    |    |   |   |   |   |   |  |
| (3) addq %rax, %r9   |   |   |    |    |   |   |   |   |   |  |
| (4) movq %r9, (%rbx) |   |   |    |    |   |   |   |   |   |  |
| (5) movq %rcx, %r9   |   |   |    |    |   |   |   |   |   |  |

| cycle #                                 | 0 | 1 | 2  | 3  | 4 | 5 | 6 | 7 | 8 |  |
|-----------------------------------------|---|---|----|----|---|---|---|---|---|--|
| , , , , , , , , , , , , , , , , , , , , | F | D | E1 | E2 | М | W |   |   |   |  |
| addq %r9, %rbx                          |   |   |    |    |   |   |   |   |   |  |
|                                         |   |   |    |    |   |   |   |   |   |  |
| addq %rax, %r9                          |   |   |    |    |   |   |   |   |   |  |
| mova %r0 (%rby)                         |   |   |    |    |   |   |   |   |   |  |
| movq %r9, (%rbx)                        |   |   |    |    |   |   |   |   |   |  |
|                                         | : | : | :  |    |   |   | : |   |   |  |

| cycle #          | 0 | 1 | 2  | 3  | 4  | 5  | 6  | 7 | 8 |
|------------------|---|---|----|----|----|----|----|---|---|
| addq %rcx, %r9   | F | D | E1 | E2 | М  | W  |    |   |   |
| addq %r9, %rbx   |   | F | D  | E1 | E2 | М  | W  |   |   |
|                  |   |   |    |    |    |    |    |   |   |
| addq %rax, %r9   |   |   | F  | D  | E1 | E2 | М  | W |   |
|                  |   |   |    |    |    |    |    |   |   |
| movq %r9, (%rbx) | : |   |    | F  | D  | E1 | E2 | М | W |
|                  |   |   |    |    |    |    |    |   |   |

| cycle #          | 0 | 1 | 2  | 3  | 4  | 5  | 6  | 7  | 8 |   |
|------------------|---|---|----|----|----|----|----|----|---|---|
| addq %rcx, %r9   | F | D | E1 | E2 | М  | W  |    |    |   |   |
| addq %r9, %rbx   |   | F | D  | Ε1 | E2 | М  | W  |    |   |   |
| addq %r9, %rbx   | : | F | D  | D  | E1 | E2 | М  | W  |   |   |
| addq %rax, %r9   | : |   | F  | D  | Ε1 | E2 | М  | W  |   |   |
| addq %rax, %r9   |   |   | F  | F  | D  | E1 | E2 | М  | W |   |
| movq %r9, (%rbx) |   |   |    | F  | D  | E1 | E2 | M  | W |   |
| movq %r9, (%rbx) |   |   |    |    | F  | D  | E1 | E2 | М | W |

| cycle #          | 0 | 1 | 2  | 3  | 4  | 5  | 6  | 7  | 8 |   |
|------------------|---|---|----|----|----|----|----|----|---|---|
| addq %rcx, %r9   | F | D | E1 | E2 | М  | W  |    |    |   |   |
| addq %r9, %rbx   |   | F | D  | Ε1 | E2 | М  | W  |    |   |   |
| addq %r9, %rbx   | : | F | D  | D  | E1 | E2 | М  | W  |   |   |
| addq %rax, %r9   | : |   | F  | D  | Ε1 | E2 | М  | W  |   |   |
| addq %rax, %r9   |   |   | F  | F  | D  | E1 | E2 | М  | W |   |
| movq %r9, (%rbx) |   |   |    | F  | D  | E1 | E2 | M  | W |   |
| movq %r9, (%rbx) |   |   |    |    | F  | D  | E1 | E2 | М | W |

movq %r9, (%rbx)

movq %rcx, %r9

split execute into two stages: F/D/E1/E2/M/W cycle # 0 1 2 3 4 5 6 7 8 addq %rcx, %r9 D F1 F2 M addg %r9, %rbx F D E1 E2 M W addq %r9, %rbx D D E1 E2 M addg %rax, %r9 F D E1 E2 M W addq %rax, %r9 F D E1 E2 M movq %r9, (%rbx) F D E1 E2 M W

F D E1 E2 M W

D F1 F2

19

#### static branch prediction

```
forward (target > PC) not taken; backward taken
intuition: loops:
LOOP: ...
      ie LOOP
LOOP: ...
      ine SKIP_LOOP
      imp LOOP
SKIP LOOP:
```

#### exercise: static prediction

```
.global foo
foo:
   xor %eax, %eax // eax <- 0</pre>
foo_loop_top:
   test $0x1, %edi
   je foo loop bottom // if (edi & 1 == 0) goto .Lskip
   add %edi, %eax
foo loop bottom:
   jg for_loop_top // if (edi > 0) goto for_loop_top
    ret
suppose \%edi = 3 (initially)
and using forward-not-taken, backwards-taken strategy:
how many mispreditions for je? for il?
```































#### exercise

```
use 1-bit predictor on this loop
    executed in outer loop (not shown) many, many times
what is the conditional branch misprediction rate?
int i = 0;
while (true) {
  if (i % 3 == 0) goto next;
next:
  i += 1;
  if (i == 50) break;
```

# 1-cycle fetch?

assumption so far:

1 cycle to fetch instruction + identify if jmp, etc.

often not really practical

especially if:

complex machine code format many pipeline stages more complex instruction cache (future idea) fetching 2+ instructions/cycle

## branch target buffer

what if we can't decode LABEL from machine code for jmp LABEL or jle LABEL fast?

will happen in more complex pipelines

what if we can't decode that there's a RET, CALL, etc. fast?

## BTB: cache for branch targets

| idx  | valid | tag   | ofst | type | target   | (more info?) |
|------|-------|-------|------|------|----------|--------------|
| 0×00 | 1     | 0x400 | 5    | Jxx  | 0x3FFFF3 | •••          |
| 0x01 | 1     | 0x401 | С    | ЈМР  | 0x401035 |              |
| 0x02 | 0     |       |      |      |          |              |
| 0x03 | 1     | 0x400 | 9    | RET  |          | •••          |
| •••  | •••   | •••   | •••  | •••  | •••      | •••          |
| 0xFF | 1     | 0x3FF | 8    | CALL | 0x404033 | •••          |

| valid |     |
|-------|-----|
| 1     | ••• |
| 0     | ••• |
| 0     | ••• |
| 0     | ••• |
|       | ••• |
| 0     | ••• |

0x3FFFF3: movq %rax, %rsi

0x3FFFF7: pushq %rbx

0x3FFFF8: call 0x404033

0x400001: popq %rbx

0x400003: cmpq %rbx, %rax 0x400005: jle 0x3FFFF3

•••

0x400031: ret

. ..

## BTB: cache for branch targets

| idx  | valid | tag   | ofst | type | target   | (more info?) |
|------|-------|-------|------|------|----------|--------------|
| 0×00 | 1     | 0x400 | 5    | Jxx  | 0x3FFFF3 | •••          |
| 0x01 | 1     | 0x401 | С    | ЈМР  | 0x401035 |              |
| 0x02 | 0     |       |      |      |          |              |
| 0x03 | 1     | 0x400 | 9    | RET  |          | •••          |
| •••  | •••   | •••   |      | •••  | •••      | •••          |
| 0xFF | 1     | 0x3FF | 8    | CALL | 0x404033 | •••          |

| valid |     |
|-------|-----|
| 1     | ••• |
| 0     | ••• |
| 0     | ••• |
| 0     | ••• |
|       | ••• |
| 0     | ••• |

0x3FFFF3: movq %rax, %rsi

0x3FFFF7: pushq %rbx

0x3FFFF8: call 0x404033

0x400001: popq %rbx

0x400003: cmpq %rbx, %rax 0x400005: jle 0x3FFFF3

•••

0x400031: ret

. ..

## BTB: cache for branch targets

| idx  | valid | tag   | ofst | type | target   | (more info?) |
|------|-------|-------|------|------|----------|--------------|
| 0×00 | 1     | 0x400 | 5    | Jxx  | 0x3FFFF3 | •••          |
| 0x01 | 1     | 0x401 | С    | JMP  | 0x401035 |              |
| 0x02 | 0     |       |      |      |          |              |
| 0x03 | 1     | 0x400 | 9    | RET  |          | •••          |
| •••  | •••   | •••   |      | •••  | •••      | •••          |
| 0xFF | 1     | 0x3FF | 8    | CALL | 0x404033 | •••          |

| valid |     |
|-------|-----|
| 1     | ••• |
| 0     | ••• |
| 0     | ••• |
| 0     | ••• |
|       | ••• |
| 0     | ••• |

0x3FFFF3: movq %rax, %rsi

0x3FFFF7: pushq %rbx

0x3FFFF8: call 0x404033

0x400001: popq %rbx

0x400003: cmpq %rbx, %rax 0x400005: jle 0x3FFFF3

•••

0x400031: ret

...

## predicting ret: ministack of return addresses

predicting ret — ministack in processor registers push on ministack on call; pop on ret

ministack overflows? discard oldest, mispredict it later

| baz saved registers |
|---------------------|
| baz return address  |
| bar saved registers |
| bar return address  |
| foo local variables |
| foo saved registers |
| foo return address  |
| foo saved registers |
|                     |

baz return address
bar return address
foo return address

(partial?) stack in CPU registers

stack in memory

# 4-entry return address stack

4-entry return address stack in CPU



next saved return address from call

on call: increment index, save return address in that slot on ret: read prediction from index, decrement index

## beyond pipelining: multiple issue

start more than one instruction/cycle

multiple parallel pipelines; many-input/output register file

#### hazard handling much more complex

•••

# beyond pipelining: out-of-order

find later instructions to do instead of stalling

lists of available instructions in pipeline registers take any instruction with available values

provide illusion that work is still done in order much more complicated hazard handling logic

```
      cycle #
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11

      mov 0(%rbx), %r8
      F
      D
      R
      I
      E
      M
      M
      M
      W
      C

      sub %r8, %r9
      F
      D
      R
      I
      E
      W
      C

      add %r10, %r11
      F
      D
      R
      I
      E
      W
      C

      xor %r12, %r13
      F
      D
      R
      I
      E
      W
      C
```

•••

#### interlude: real CPUs

modern CPUs:

execute multiple instructions at once

execute instructions out of order — whenever values available

#### out-of-order and hazards

out-of-order execution makes hazards harder to handle

#### problems for forwarding:

value in last stage may not be most up-to-date older value may be written back before newer value?

#### problems for branch prediction:

mispredicted instructions may complete execution before squashing

#### which instructions to dispatch?

how to quickly find instructions that are ready?

#### out-of-order and hazards

out-of-order execution makes hazards harder to handle

#### problems for forwarding:

value in last stage may not be most up-to-date older value may be written back before newer value?

#### problems for branch prediction:

mispredicted instructions may complete execution before squashing

#### which instructions to dispatch?

how to quickly find instructions that are ready?

# read-after-write examples (1)

```
      cycle #
      0
      1
      2
      3
      4
      5
      6
      7
      8

      addq %r10, %r8
      F
      D
      E
      M
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W
      W</td
```

normal pipeline: two options for %r8? choose the one from *earliest stage* because it's from the most recent instruction

read-after-write examples (1) out-of-order execution: %r8 from earliest stage might be from *delayed instruction* can't use same forwarding logic addg %r12, %r8 cvcle # 0 1 2 3 4 5 6 7 8 addq %r10, %r8 movq %r8, (%rax) movq \$100, %r8

addq %r13, %r8

### register version tracking

goal: track different versions of registers

out-of-order execution: may compute versions at different times

only forward the correct version

strategy for doing this: preprocess instructions represent version info

makes forwarding, etc. lookup easier

# rewriting hazard examples (1)

```
addq %r10, %r8 | addq %r10, %r8_{v1} \rightarrow %r8_{v2} addq %r11, %r8 | addq %r11, %r8_{v2} \rightarrow %r8_{v3} addq %r12, %r8 | addq %r12, %r8_{v3} \rightarrow %r8_{v4}
```

read different version than the one written represent with three argument psuedo-instructions

forwarding a value? must match version exactly

for now: version numbers

later: something simpler to implement

```
      cycle #
      0
      1
      2
      3
      4
      5
      6
      7
      8

      addq %r10, %r8
      F
      F
      D
      E
      M
      W

      movq %r8, (%rax)
      F
      D
      E
      M
      W

      movq %r8, 8(%rax)
      F
      D
      E
      M
      W

      movq $100, %r8
      F
      D
      E
      M
      W

      addq %r13, %r8
      F
      D
      E
      M
      W
```

```
      cycle #
      0
      1
      2
      3
      4
      5
      6
      7
      8

      addq %r10, %r8
      F
      F
      D
      E
      M
      W

      movq %r8, (%rax)
      F
      D
      E
      M
      W

      movq %r8, 8(%rax)
      F
      D
      E
      M
      W

      movq $100, %r8
      F
      D
      E
      M
      W

      addq %r13, %r8
      F
      D
      E
      M
      W
```

out-of-order execution: if we don't do something, newest value could be overwritten!

```
      cycle #
      0
      1
      2
      3
      4
      5
      6
      7
      8

      addq %r10, %r8
      F
      F
      D
      E
      M
      W

      movq %r8, (%rax)
      F
      D
      E
      M
      W

      movq %r8, 8(%rax)
      F
      D
      E
      M
      W

      movq $100, %r8
      F
      D
      E
      M
      W

      addq %r13, %r8
      F
      D
      E
      M
      W
```

two instructions that haven't been started could need *different versions* of %r8!

```
cycle # 0 1 2 3 4 5 6 7 8
addq %r10, %r8
                      F
                                    Ε
movg %r8, (%rax)
                                            D
                                              Ε
                                                М
movq %r11, %r8
                        FDEM
                                 W
movq %r8, 8(%rax)
                                            М
movq $100, %r8
                           F D E
addg %r13, %r8
                                              Ε
                                                М
```

### keeping multiple versions

for write-after-write problem: need to keep copies of multiple versions

both the new version and the old version needed by delayed instructions

for read-after-write problem: need to distinguish different versions

solution: have lots of extra registers

...and assign each version a new 'real' register

called register renaming

### register renaming

rename architectural registers to physical registers

different physical register for each version of architectural track which physical registers are ready

compare physical register numbers to do forwarding





branch prediction needs to happen before instructions decoded done with cache-like tables of information about recent branches



register renaming done here stage needs to keep mapping from architectural to physical names



instruction queue holds pending renamed instructions combined with register-ready info to *issue* instructions (issue = start executing)



read from much larger register file and handle forwarding register file: typically read 6+ registers at a time (extra data paths wires for forwarding not shown)



many execution units actually do math or memory load/store some may have multiple pipeline stages some may take variable time (data cache, integer divide, ...)



writeback results to physical registers register file: typically support writing 3+ registers at a time



new commit (sometimes *retire*) stage finalizes instruction figures out when physical registers can be reused again



commit stage also handles branch misprediction reorder buffer tracks enough information to undo mispredicted instrs.

```
cycle #
                0 1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                     RIEW
addg %r02, %r05
                          IEW
                     R
addg %r03, %r04
                    DRIE
cmpg %r04, %r08
                            I E W
jne ...
                              IE
                         R
                                   W
                       D
addg %r01, %r05
                       DRIE
                                 W
addg %r02, %r05
                            RI
                                 Ε
                                   W
addg %r03, %r04
                                 IE
                          D
                           R
                                     W
cmpg %r04, %r08
                                   IEW
```



```
cycle #
                      1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                               E W
addq %r02, %r05
                                  Ε
                         R
addg %r03, %r04
                                  E issue instructions
                                    (to "execution units")
cmpg %r04, %r08
                                    when operands ready
jne ...
                               R
                            D
addg %r01, %r05
addg %r02, %r05
                                          W
addg %r03, %r04
                               D
                                  R
                                          Ε
cmpg %r04, %r08
```

```
cycle #
                  0 1 2 3 4 5 6 7 8 9
addq %r01, %r05 FDRIE
addq %r03 %r04
cmpq %r0 commit instructions in order waiting until next complete
                                      W
addg %r01, %r05
                                    W
addq %r02, %r05
                                    Ε
                                      W
addq %r03, %r04
                            D R
                                      Ε
cmpg %r04, %r08
```

```
cycle #
                 0 1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                     RIEW
addg %r02, %r05
                          IEW
                     R
addg %r03, %r04
                    DRIE
cmpg %r04, %r08
                            I E W
jne ...
                              I E
                         R
                                   W
addg %r01, %r05
                       DRIE
                                 W
addg %r02, %r05
                            RI
                                 Ε
                                   W
addg %r03, %r04
                                 IE
                          D
                            R
                                     W
cmpg %r04, %r08
                                   IEW
```

### register renaming

rename architectural registers to physical registers architectural = part of instruction set architecture

different name for each version of architectural register

## register renaming state

original

renamed

add %r10, %r8 ... add %r11, %r8 ... add %r12, %r8 ...

### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04 |
|------|------|
| %rcx | %x09 |
| •••  | •••  |
| %r8  | %x13 |
| %r9  | %x17 |
| %r10 | %x19 |
| %r11 | %x07 |
| %r12 | %x05 |
| •••  | •••  |

| %x18 |  |
|------|--|
| %x20 |  |
| %x21 |  |
| %x23 |  |
| %x24 |  |
| •••  |  |

### register renaming state

original add %r10, %r8 -add %r11, %r8 -add %r12, %r8 --

| arch —     | → phys register map |
|------------|---------------------|
| %rax       | %x04                |
| %rcx       | %x09                |
| •••        | •••                 |
| %r8<br>%r9 | %x13                |
|            | %x17                |
| %r10       | %x19                |
| %r11       | %x07                |
| %r12       | %x05                |
| •••        | •••                 |
|            |                     |

reramed table for architectural (external) and physical (internal) name (for next instr. to process)

| %x18 |
|------|
| %x20 |
| %x21 |
| %x23 |
| %x24 |
|      |

## register renaming state

# original

add %r10, %r8 add %r11, %r8

add %r12, %r8 ---

### $\operatorname{arch} o \operatorname{phys}$ register map

| %rax | %x04 |
|------|------|
| %rcx | %x09 |
| •••  | •••  |
| %r8  | %x13 |
| %r9  | %x17 |
| %r10 | %x19 |
| %r11 | %x07 |
| %r12 | %x05 |
| •••  | •••  |

#### renamed

list of available physical registers added to as instructions finish

free reg list

%x18 %x20 %x21 %x23 %x24

original add %r10, %r8 add %r11, %r8 add %r12, %r8

renamed

### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04 |
|------|------|
| %rcx | %x09 |
| •••  | •••  |
| %r8  | %x13 |
| %r9  | %x17 |
| %r10 | %x19 |
| %r11 | %x07 |
| %r12 | %x05 |
| •••  | •••  |

| %x18 |
|------|
| %x20 |
| %x21 |
| %x23 |
| %x24 |
| •••  |

```
original renamed add %r10, %r8 add %x19, %x13 \rightarrow %x18 add %r11, %r8 add %r12, %r8
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                 |
|------|----------------------|
| %rcx | %x09                 |
| •••  | •••                  |
| %r8  | <del>%x13</del> %x18 |
| %r9  | %x17                 |
| %r10 | %x19                 |
| %r11 | %x07                 |
| %r12 | %x05                 |
| •••  | •••                  |



```
original renamed add %r10, %r8 add %x19, %x13 \rightarrow %x18 add %r11, %r8 add %x07, %x18 \rightarrow %x20 add %r12, %r8
```

#### arch $\rightarrow$ phys register map

| %rax | %x04         |
|------|--------------|
| %rcx | %x09         |
| •••  | •••          |
| %r8  | %x13%x18%x20 |
| %r9  | %x17         |
| %r10 | %x19         |
| %r11 | %x07         |
| %r12 | %x05         |
| •••  | •••          |

| %x18            |
|-----------------|
| <del>%x20</del> |
| %x21            |
| %x23            |
| %x24            |
| •••             |

```
original renamed add %r10, %r8 add %x19, %x13 \rightarrow %x18 add %r11, %r8 add %x07, %x18 \rightarrow %x20 add %r12, %r8 add %x05, %x20 \rightarrow %x21
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04             |
|------|------------------|
| %rcx | %x09             |
| •••  | •••              |
| %r8  | %x13%x18%x20%x21 |
| %r9  | %x17             |
| %r10 | %x19             |
| %r11 | %x07             |
| %r12 | %x05             |
| •••  | •••              |

| %x18            |
|-----------------|
| <del>%x20</del> |
| <del>%x21</del> |
| %x23            |
| %x24            |
| •••             |

```
original renamed add %r10, %r8 add %x19, %x13 \rightarrow %x18 add %r11, %r8 add %x07, %x18 \rightarrow %x20 add %r12, %r8 add %x05, %x20 \rightarrow %x21
```

### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04             |
|------|------------------|
| %rcx | %x09             |
| •••  | •••              |
| %r8  | %x13%x18%x20%x21 |
| %r9  | %x17             |
| %r10 | %x19             |
| %r11 | %x07             |
| %r12 | %x05             |
| •••  | •••              |

| %x18 |
|------|
| %x20 |
| %x21 |
| %x23 |
| %x24 |
| •••  |

```
original renamed addq %r10, %r8
movq %r8, (%rax)
subq %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04 |
|------|------|
| %rcx | %x09 |
| •••  | •••  |
| %r8  | %x13 |
| %r9  | %x17 |
| %r10 | %x19 |
| %r11 | %x07 |
| %r12 | %x05 |
| %r13 | %x02 |
| •••  | •••  |

free regs %x18 %x20 %x21 %x23 %x24 ...

```
original
addq %r10, %r8
movq %r8, (%rax)
subq %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                 |
|------|----------------------|
| %rcx | %x09                 |
| •••  | •••                  |
| %r8  | <del>%x13</del> %x18 |
| %r9  | %x17                 |
| %r10 | %x19                 |
| %r11 | %x07                 |
| %r12 | %x05                 |
| %r13 | %x02                 |
| •••  | •••                  |

free

renamed

addg %x19, %x13  $\rightarrow$  %x18

regs

| <del>%</del> × | :1 | 8 |
|----------------|----|---|
| %х             | (2 | 0 |
| %х             | (2 | 1 |
| %x             | (2 | 3 |
| %х             | (2 | 4 |
|                |    |   |

```
original renamed addq %r10, %r8 addq %x19, %x13 \rightarrow %x18 movq %r8, (%rax) movq %x18, (%x04) \rightarrow (memory subq %r8, %r11 movq $(%r11), %r11 movq $100, %r8 addq %r11, %r8
```

 $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$ 

| %rax | %x04                 |
|------|----------------------|
| %rcx | %x09                 |
| •••  | •••                  |
| %r8  | <del>%x13</del> %x18 |
| %r9  | %x17                 |
| %r10 | %x19                 |
| %r11 | %x07                 |
| %r12 | %x05                 |
| %r13 | %x02                 |
| •••  | •••                  |

free regs

%x18 %x26 %x21 %x23 %x24 ...

```
original
addq %r10, %r8
                       movg %x18, (%x04) \rightarrow (memory)
movq %r8, (%rax)
subg %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

 $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$ 

| %rax | %x04                 |
|------|----------------------|
| %rcx | %x09                 |
| •••  | •••                  |
| %r8  | <del>%x13</del> %x18 |
| %r9  | %x17                 |
| %r10 | %x19                 |
| %r11 | %x07                 |
| %r12 | %x05                 |
| %r13 | %x02                 |
| •••  | •••                  |

could be that %rax = 8+%r11 could load before value written! possible data hazard! not handled via register renaming option 1: run load+stores in order option 2: compare load/store addresse %x21

%x23 %x24

renamed

addg %x19, %x13  $\rightarrow$  %x18

```
original
addq %r10, %r8
movq %r8, (%rax)
subq %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

```
renamed addq %x19, %x13 \rightarrow %x18 movq %x18, (%x04) \rightarrow (memory) subq %x18, %x07 \rightarrow %x20
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                 |
|------|----------------------|
| %rcx | %x09                 |
| •••  | •••                  |
| %r8  | <del>%x13</del> %x18 |
| %r9  | %x17                 |
| %r10 | %x19                 |
| %r11 | <del>%x07</del> %x20 |
| %r12 | %x05                 |
| %r13 | %x02                 |
| •••  | •••                  |

free regs <del>%x18</del> <del>%x20</del> %x21 %x23 %x24

```
original
addq %r10, %r8
movq %r8, (%rax)
subq %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

```
renamed addq %x19, %x13 \rightarrow %x18 movq %x18, (%x04) \rightarrow (memory) subq %x18, %x07 \rightarrow %x20 movq 8(%x20), (memory) \rightarrow %x21
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                     |
|------|--------------------------|
| %rcx | %x09                     |
| •••  | •••                      |
| %r8  | <del>%x13</del> %x18     |
| %r9  | %x17                     |
| %r10 | %x19                     |
| %r11 | <del>%x07%x20</del> %x21 |
| %r12 | %x05                     |
| %r13 | %x02                     |
| •••  | •••                      |

free regs %x18 %x20 %x21 %x23 %x24 ...

```
original
addq %r10, %r8
movq %r8, (%rax)
subq %r8, %r11
movq 8(%r11), %r11
movq $100, %r8
addq %r11, %r8
```

```
renamed addq %x19, %x13 \rightarrow %x18 movq %x18, (%x04) \rightarrow (memory) subq %x18, %x07 \rightarrow %x20 movq 8(%x20), (memory) \rightarrow %x21 movq $100 \rightarrow %x23
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                     |
|------|--------------------------|
| %rcx | %x09                     |
| •••  | •••                      |
| %r8  | %x13%x18%x23             |
| %r9  | %x17                     |
| %r10 | %x19                     |
| %r11 | <del>%x07%x20</del> %x21 |
| %r12 | %x05                     |
| %r13 | %x02                     |
| •••  | •••                      |

free regs %x18 %x20 %x21 %x23 %x24

```
original renamed addq %r10, %r8 addq %x19, %x13 \rightarrow %x18 movq %r8, (%rax) movq %x18, (%x04) \rightarrow (memory) subq %r8, %r11 subq %x18, %x07 \rightarrow %x20 movq 8(%r11), %r11 movq 8(%x20), (memory) \rightarrow %x21 movq $100, %r8 movq $100 \rightarrow %x23 addq %r11, %r8 addq %x21, %x23 \rightarrow %x24
```

#### $\operatorname{arch} \to \operatorname{phys} \operatorname{register} \operatorname{map}$

| %rax | %x04                     |
|------|--------------------------|
| %rcx | %x09                     |
| •••  | •••                      |
| %r8  | %x13%x18%x23%x24         |
| %r9  | %x17                     |
| %r10 | %x19                     |
| %r11 | <del>%x07%x20</del> %x21 |
| %r12 | %x05                     |
| %r13 | %x02                     |
| •••  | •••                      |

free regs

%x18 %x20 %x21 %x23 %x24 ...

### register renaming exercise

original addq %r8, %r9 movq \$100, %r10 subq %r10, %r8 xorq %r8, %r9 andq %rax, %r9 arch  $\rightarrow$  phys

| %rax | %x04 |
|------|------|
| %rcx | %x09 |
| •••  | •••  |
| %r8  | %x13 |
| %r9  | %x17 |
| %r10 | %x19 |
| %r11 | %x29 |
| %r12 | %x05 |
| %r13 | %x02 |
| •••  | •••  |

renamed

free regs

%x18 %x20 %x21 %x23 %x24 ...

```
cycle #
                0 1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                     RIEW
addg %r02, %r05
                          IEW
                     R
addg %r03, %r04
                    DRIE
cmpg %r04, %r08
                            I E W
jne ...
                              IE
                         R
                                   W
addg %r01, %r05
                       DRIE
                                W
addg %r02, %r05
                            RI
                                 Ε
                                   W
addq %r03, %r04
                                IE
                          D
                           R
                                     W
cmpg %r04, %r08
                                   IEW
```

## instruction queue and dispatch

#### instruction queue

| # | instruction                           |
|---|---------------------------------------|
| 1 | addq %x01, %x05 → %x06                |
| 2 | addq %x02, %x06 → %x07                |
| 3 | addq %x03, %x07 $\rightarrow$ %x08    |
| 4 | cmpq %x04, %x08 → %x09.cc             |
| 5 | jne %x09.cc,                          |
| 6 | addq %x01, %x08 $\rightarrow$ %x10    |
| 7 | addq %x02, %x10 $\rightarrow$ %x11    |
| 8 | addq %x03, %x11 $\rightarrow$ %x12    |
| 9 | cmpq %x04, %x12 $\rightarrow$ %x13.cc |
|   |                                       |

### scoreboard

| reg  | status  |
|------|---------|
| %x01 | ready   |
| %x02 | ready   |
| %x03 | ready   |
| %x04 | ready   |
| %x05 | ready   |
| %x06 | pending |
| %x07 | pending |
| %x08 | pending |
| %x09 | pending |
| %x10 | pending |
| %x11 | pending |
| %x12 | pending |
| %x13 | pending |
| •••  |         |
|      |         |

execution unit ALU 1 ALU 2

#### instruction queue

| # | instruction                               |
|---|-------------------------------------------|
| 1 | addq %x01, %x05 → %x06                    |
| 2 | addq %x02, %x06 $\rightarrow$ %x07        |
| 3 | addq %x03, %x07 → %x08                    |
| 4 | cmpq %x04, %x08 → %x09.cc                 |
| 5 | jne %x09.cc,                              |
| 6 | addq %x01, %x08 $\rightarrow$ %x10        |
| 7 | addq $%x02$ , $%x10 \rightarrow %x11$     |
| 8 | addq %x03, %x11 $\rightarrow$ %x12        |
| 9 | cmpq $%x04$ , $%x12 \rightarrow %x13$ .cc |
|   |                                           |

execution unit cycle# 1 ALU 1 ALU 2

| reg  | status  |
|------|---------|
| _    |         |
| %x01 | ready   |
| %x02 | ready   |
| %x03 | ready   |
| %x04 | ready   |
| %x05 | ready   |
| %x06 | pending |
| %x07 | pending |
| %x08 | pending |
| %x09 | pending |
| %x10 | pending |
| %x11 | pending |
| %x12 | pending |
| %x13 | pending |
| •••  |         |

### instruction queue

| # | instruction                           |
|---|---------------------------------------|
| 1 | addq %x01, %x05 → %x06                |
| 3 | addq %x02, %x06 → %x07                |
| 3 | addq %x03, %x07 → %x08                |
| 4 | cmpq %x04, %x08 → %x09.cc             |
| 5 | jne %x09.cc,                          |
| 6 | addq %x01, %x08 $\rightarrow$ %x10    |
| 7 | addq $%x02$ , $%x10 \rightarrow %x11$ |
| 8 | addq %x03, %x11 $\rightarrow$ %x12    |
| 9 | cmpq %x04, %x12 $\rightarrow$ %x13.cc |

| reg  | status  |
|------|---------|
| %x01 | ready   |
| %x02 | ready   |
| %x03 | ready   |
| %x04 | ready   |
| %x05 | ready   |
| %x06 | pending |
| %x07 | pending |
| %x08 | pending |
| %x09 | pending |
| %x10 | pending |
| %x11 | pending |
| %x12 | pending |
| %x13 | pending |
| •••  |         |

## instruction queue

| _ | ,                                        |
|---|------------------------------------------|
| # | instruction                              |
| 1 | addq %x01, %x05 → %x06                   |
| 2 | addq %x02, %x06 $\rightarrow$ %x07       |
| 2 | addq %x03, %x07 → %x08                   |
| 4 | cmpq %x04, %x08 → %x09.cc                |
| 5 | jne %x09.cc,                             |
| 6 | addq %x01, %x08 → %x10                   |
| 7 | addq $%x02$ , $%x10 \rightarrow %x11$    |
| 8 | addq %x03, %x11 $\rightarrow$ %x12       |
| 9 | cmpq $%x04$ , $%x12 \rightarrow %x13.cc$ |
|   |                                          |

execution unit cycle# 1 ALU 1 ALU 2

| reg  | status        |
|------|---------------|
| %x01 | ready         |
| %x02 | ready         |
| %x03 | ready         |
| %x04 | ready         |
| %x05 | ready         |
| %x06 | pending ready |
| %x07 | pending       |
| %x08 | pending       |
| %x09 | pending       |
| %x10 | pending       |
| %x11 | pending       |
| %x12 | pending       |
| %x13 | pending       |
| •••  |               |

### instruction queue

| #         | instruction                               |
|-----------|-------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                    |
|           | addq %x02, %x06 $\rightarrow$ %x07        |
| 3         | addq %x03, %x07 → %x08                    |
| 4         | cmpq %x04, %x08 $\rightarrow$ %x09.cc     |
| 5         | jne %x09.cc,                              |
| 6         | addq %x01, %x08 $ ightarrow$ %x10         |
| 7         | addq %x02, %x10 $ ightarrow$ %x11         |
| 8         | addq %x03, %x11 $\rightarrow$ %x12        |
| 9         | cmpq $%x04$ , $%x12 \rightarrow %x13$ .cc |
|           |                                           |

execution unit cycle# 1 2 ALU 1 1 2 ALU 2

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | pending ready            |
| %x08 | pending                  |
| %x09 | pending                  |
| %x10 | pending                  |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

#### instruction queue

| #         | instruction                               |
|-----------|-------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                    |
| 2×<       | addq %x02, %x06 → %x07                    |
| 3         | addq %x03, %x07 → %x08                    |
| 4         | cmpq %x04, %x08 → %x09.cc                 |
| 5         | jne %x09.cc,                              |
| 6         | addq %x01, %x08 → %x10                    |
| 7         | addq %x02, %x10 $ ightarrow$ %x11         |
| 8         | addq %x03, %x11 $\rightarrow$ %x12        |
| 9         | cmpq $%x04$ , $%x12 \rightarrow %x13$ .cc |
|           |                                           |

execution unit cycle# 1 2 3

ALU 1 1 2 3

ALU 2 — — —

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | pending                  |
| %x10 | pending                  |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

### instruction queue

| #         | instruction                               |
|-----------|-------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                    |
| 2×<       | addq %x02, %x06 → %x07                    |
| 3≪        | addq %x03, %x07 → %x08                    |
| 4         | cmpq %x04, %x08 → %x09.cc                 |
| 5         | jne %x09.cc,                              |
| 6         | addq %x01, %x08 → %x10                    |
| 7         | addq %x02, %x10 $\rightarrow$ %x11        |
| 8         | addq %x03, %x11 $\rightarrow$ %x12        |
| 9         | cmpq $%x04$ , $%x12 \rightarrow %x13$ .cc |
|           |                                           |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | pending                  |
| %x10 | pending                  |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

### instruction queue

| #         | instruction                               |
|-----------|-------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                    |
| 2×<       | addq %x02, %x06 → %x07                    |
| 3≪        | addq %x03, %x07 → %x08                    |
| 4         | cmpq %x04, %x08 $\rightarrow$ %x09.cc     |
| 5         | jne %x09.cc,                              |
| 6         | addq %x01, %x08 $\rightarrow$ %x10        |
| 7         | addq %x02, %x10 $ ightarrow$ %x11         |
| 8         | addq %x03, %x11 $\rightarrow$ %x12        |
| 9         | cmpq $%x04$ , $%x12 \rightarrow %x13$ .cc |
|           |                                           |

## scoreboard

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | pending ready            |
| %x10 | pending ready            |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

•••

### instruction queue

| #         | instruction                              |
|-----------|------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                   |
| 2×        | addq %x02, %x06 → %x07                   |
| 3≪        | addq %x03, %x07 → %x08                   |
| 4≪        | $cmpq \%x04, \%x08 \rightarrow \%x09.cc$ |
| 5         | jne %x09.cc,                             |
| 6≪        | addq %x01, %x08 → %x10                   |
| 7         | addq %x02, %x10 $\rightarrow$ %x11       |
| 8         | addq %x03, %x11 $\rightarrow$ %x12       |
| 9         | cmpq %x04, %x12 $\rightarrow$ %x13.cc    |
|           |                                          |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | pending ready            |
|      | <del>pending</del> ready |
|      | pending ready            |
|      | <del>pending</del> ready |
|      | pending ready            |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

| execution unit | cycle# 1 | 2 | 3 | 4 |
|----------------|----------|---|---|---|
| ALU 1          | 1        | 2 | 3 | 4 |
| ALU 2          |          |   |   | 6 |

## instruction queue

| #         | instruction                              |
|-----------|------------------------------------------|
| $\bowtie$ | addq %x01, %x05 → %x06                   |
| 2×<       | addq %x02, %x06 → %x07                   |
| 3≪        | addq %x03, %x07 → %x08                   |
| 4≪        | $cmpq \%x04, \%x08 \rightarrow \%x09.cc$ |
| 5≪        | jne %x09.cc,                             |
| 6≪        | addq %x01, %x08 → %x10                   |
| 7≪        | <u>addq %x02, %x10 → %x11</u>            |
| 8         | addq %x03, %x11 $ ightarrow$ %x12        |
| 9         | cmpq %x04, %x12 $\rightarrow$ %x13.cc    |
|           |                                          |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | <del>pending</del> ready |
| %x10 | pending ready            |
| %x11 | pending                  |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

| execution unit | cycle# 1 | 2 | 3 | 4 | 5 |
|----------------|----------|---|---|---|---|
| ALU 1          | 1        | 2 | 3 | 4 | 5 |
| ALU 2          |          | _ | _ | 6 | 7 |

### instruction queue

| #          | instruction                              |
|------------|------------------------------------------|
| $\bowtie$  | $addq  %x01,  %x05 \rightarrow %x06$     |
| 2×<        | $addq \%x02, \%x06 \rightarrow \%x07$    |
| 3≪         | addq %x03, %x07 → %x08                   |
| 4≪         | $cmpq \%x04, \%x08 \rightarrow \%x09.cc$ |
| 5<         | jne %x09.cc,                             |
| 6≪         | addq %x01, %x08 → %x10                   |
| ~          | addq $%x02$ , $%x10 \rightarrow %x11$    |
| <b>≫</b> < | addq %x03, %x11 → %x12                   |
| 9          | cmpq %x04, %x12 $\rightarrow$ %x13.cc    |
|            |                                          |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | <del>pending</del> ready |
| %x10 | <del>pending</del> ready |
| %x11 | pending ready            |
| %x12 | pending                  |
| %x13 | pending                  |
| •••  |                          |

## instruction queue

| #          | instruction                              |
|------------|------------------------------------------|
| $\bowtie$  | addq %x01, %x05 → %x06                   |
| 2×<        | addq %x02, %x06 → %x07                   |
| 3≪         | addq %x03, %x07 → %x08                   |
| 4≪         | $cmpq \%x04, \%x08 \rightarrow \%x09.cc$ |
| 5≪         | jne %x09.cc,                             |
| 6≪         | addq %x01, %x08 → %x10                   |
| ~          | addq %x02, %x10 → %x11                   |
| <b>≫</b> < | addq %x03, %x11 → %x12                   |
| 9≪         | $cmpq %x04, %x12 \rightarrow %x13.cc$    |
|            |                                          |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | pending ready            |
| %x09 | <del>pending</del> ready |
| %x10 | <del>pending</del> ready |
| %x11 | <del>pending</del> ready |
| %x12 | <del>pending</del> ready |
| %x13 | pending                  |
| •••  |                          |

## instruction queue

| #          | instruction                                      |
|------------|--------------------------------------------------|
| $\bowtie$  | addq %x01, %x05 → %x06                           |
| 2×         | addq %x02, %x06 → %x07                           |
| 3≪         | addq %x03, %x07 → %x08                           |
| 4≪         | $cmpq \%x04, \%x08 \rightarrow \%x09.cc$         |
|            |                                                  |
| 5<         | jne %x09.cc,                                     |
|            | jne %x09.cc,<br>addq %x01, %x08 → %x10           |
| <b>6</b> ≪ | ,                                                |
| 6×<br>7×   | addq %x01, %x08 → %x10                           |
| 6×<br>7×   | addq %x01, %x08 → %x10<br>addq %x02, %x10 → %x11 |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | ready                    |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | <del>pending</del> ready |
| %x08 | <del>pending</del> ready |
| %x09 | <del>pending</del> ready |
| %x10 | <del>pending</del> ready |
| %x11 | <del>pending</del> ready |
| %x12 | <del>pending</del> ready |
| %x13 | pending ready            |
| •••  |                          |

#### instruction queue

| # | instruction                        |
|---|------------------------------------|
| 1 | mrmovq (%x04) → %x06               |
| 2 | mrmovq (%x05) → %x07               |
| 3 | addq %x01, %x02 $\rightarrow$ %x08 |
| 4 | addq %x01, %x06 → %x09             |
| 5 | addq %x01, %x07 → %x10             |

| reg  | status |
|------|--------|
| %x01 | ready  |
| %x02 | ready  |
| %x03 | ready  |
| %x04 | ready  |
| %x05 | ready  |
| %x06 |        |
| %x07 |        |
| %x08 |        |
| %x09 |        |
| %x10 |        |
| •••  |        |

execution unit 
$$cycle \# 1$$
 2 3 4 5 6 7 ALU data cache assume 1 cycle/access

## register renaming: missing pieces

what about "hidden" inputs like %rsp, condition codes?

one solution: translate to intructions with additional register parameters

making %rsp explicit parameter turning hidden condition codes into operands!

bonus: can also translate complex instructions to simpler ones





## an OOO pipeline diagram

```
cycle #
                0 1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                     RIEW
addg %r02, %r05
                          IEW
                     R
addg %r03, %r04
                    DRIE
cmpg %r04, %r08
                            I E W
jne ...
                              IE
                         R
                                   W
addg %r01, %r05
                       DRIE
                                W
addg %r02, %r05
                            RI
                                 Ε
                                   W
addq %r03, %r04
                                IE
                          D
                            R
                                     W
cmpg %r04, %r08
                                   IEW
```

## execution units AKA functional units (1)

where actual work of instruction is done

e.g. the actual ALU, or data cache

sometimes pipelined:

(here: 1 op/cycle; 3 cycle latency)







# execution units AKA functional units (1)

where actual work of instruction is done

e.g. the actual ALU, or data cache

sometimes pipelined:

(here: 1 op/cycle; 3 cycle latency)



exercise: how long to compute  $A \times (B \times (C \times D))$ ?

# execution units AKA functional units (2)

where actual work of instruction is done

e.g. the actual ALU, or data cache

sometimes unpipelined:



#### instruction queue

|    | •                        |
|----|--------------------------|
| #  | instruction              |
| 1  | add %x01, %x02 → %x03    |
| 2  | imul %x04, %x05 → %x06   |
| 3  | imul %x03, %x07 → %x08   |
| 4  | cmp %x03, %x08 → %x09.cc |
| 5  | jle %x09.cc,             |
| 6  | add %x01, %x03 → %x11    |
| 7  | imul %x04, %x06 → %x12   |
| 8  | imul %x03, %x08 → %x13   |
| 9  | cmp %x11, %x13 → %x14.cc |
| 10 | jle %x14.cc,             |
|    |                          |

execution unit

ALU 1 (add, cmp, jxx) ALU 2 (add, cmp, jxx)

ALU 3 (mul) start

ALU 3 (mul) end

| reg  | status  |  |
|------|---------|--|
| %x01 | ready   |  |
| %x02 | ready   |  |
| %x03 | pending |  |
| %x04 | ready   |  |
| %x05 | ready   |  |
| %x06 | pending |  |
| %x07 | ready   |  |
| %x08 | pending |  |
| %x09 | pending |  |
| %x10 | pending |  |
| %x11 | pending |  |
| %x12 | pending |  |
| %x13 | pending |  |
| %x14 | pending |  |
|      |         |  |
|      |         |  |

#### instruction queue

|    | •                        |
|----|--------------------------|
| #  | instruction              |
| 1  | add %x01, %x02 → %x03    |
| 2  | imul %x04, %x05 → %x06   |
| 3  | imul %x03, %x07 → %x08   |
| 4  | cmp %x03, %x08 → %x09.cc |
| 5  | jle %x09.cc,             |
| 6  | add %x01, %x03 → %x11    |
| 7  | imul %x04, %x06 → %x12   |
| 8  | imul %x03, %x08 → %x13   |
| 9  | cmp %x11, %x13 → %x14.cc |
| 10 | jle %x14.cc,             |
|    |                          |

execution unit

ALU 1 (add, cmp, jxx) ALU 2 (add, cmp, jxx)

ALU 3 (mul) start

ALU 3 (mul) end

| reg  | status  |  |
|------|---------|--|
| %x01 | ready   |  |
| %x02 | ready   |  |
| %x03 | pending |  |
| %x04 | ready   |  |
| %x05 | ready   |  |
| %x06 | pending |  |
| %x07 | ready   |  |
| %x08 | pending |  |
| %x09 | pending |  |
| %x10 | pending |  |
| %x11 | pending |  |
| %x12 | pending |  |
| %x13 | pending |  |
| %x14 | pending |  |
|      |         |  |
|      |         |  |

#### instruction queue

|    | • • • • • • • • • • • • • • • • • • • |
|----|---------------------------------------|
| #  | instruction                           |
| 1  | add %x01, %x02 → %x03                 |
| 2  | imul %x04, %x05 → %x06                |
| 3  | imul %x03, %x07 → %x08                |
| 4  | cmp %x03, %x08 → %x09.cc              |
| 5  | jle %x09.cc,                          |
| 6  | add %x01, %x03 → %x11                 |
| 7  | imul %x04, %x06 → %x12                |
| 8  | imul %x03, %x08 → %x13                |
| 9  | cmp %x11, %x13 → %x14.cc              |
| 10 | jle %x14.cc,                          |
|    |                                       |

execution unit cycle# 1
ALU 1 (add, cmp, jxx) 1
ALU 2 (add, cmp, jxx) ALU 3 (mul) start 2
ALU 3 (mul) end

| reg  | status  |
|------|---------|
| %x01 | ready   |
| %x02 | ready   |
| %x03 | pending |
| %x04 | ready   |
| %x05 | ready   |
| %x06 | pending |
| %x07 | ready   |
| %x08 | pending |
| %x09 | pending |
| %x10 | pending |
| %x11 | pending |
| %x12 | pending |
| %x13 | pending |
| %x14 | pending |
| •••  | ***     |

|           | • • • • • • • • • • • • • • • • • • • |
|-----------|---------------------------------------|
| #         | instruction                           |
| $\bowtie$ | add %x01, %x02 → %x03                 |
| 2×<       | imul %x04, %x05 → %x06                |
| 3         | imul %x03, %x07 → %x08                |
| 4         | cmp %x03, %x08 → %x09.cc              |
| 5         | jle %x09.cc,                          |
| 6         | add %x01, %x03 → %x11                 |
| 7         | imul %x04, %x06 → %x12                |
| 8         | imul %x03, %x08 → %x13                |
| 9         | cmp %x11, %x13 → %x14.cc              |
| 10        | jle %x14.cc,                          |
| •         |                                       |

| execution unit        | cycle# 1 | 2 |   |
|-----------------------|----------|---|---|
| ALU 1 (add, cmp, jxx) | 1        | 6 |   |
| ALU 2 (add, cmp, jxx) | _        | _ |   |
| ALÙ 3 (mul) start     | 2        | 3 |   |
| ALU 3 (mul) end       |          | 2 | 3 |

| reg  | status          |
|------|-----------------|
| %x01 | ready           |
| %x02 | ready           |
| %x03 | pending ready   |
| %x04 | ready           |
| %x05 | ready           |
| %x06 | pending (still) |
| %x07 | ready           |
| %x08 | pending         |
| %x09 | pending         |
| %x10 | pending         |
| %x11 | pending         |
| %x12 | pending         |
| %x13 | pending         |
| %x14 | pending         |
| •••  |                 |

|           | • • • • • • • • • • • • • • • • • • •    |
|-----------|------------------------------------------|
| #         | instruction                              |
| $\bowtie$ | add %x01, %x02 → %x03                    |
| 2<        | imul %x04, %x05 → %x06                   |
| 3≪        | imul %x03, %x07 → %x08                   |
| 4         | cmp $%x03$ , $%x08 \rightarrow %x09$ .cc |
| 5         | jle %x09.cc,                             |
| 6≪        | <u>add %x01, %x03 → %x11</u>             |
| 7         | imul %x04, %x06 → %x12                   |
| 8         | imul %x03, %x08 → %x13                   |
| 9         | cmp %x11, %x13 → %x14.cc                 |
| 10        | jle %x14.cc,                             |
|           |                                          |

| execution unit        | cycle# 1   | 2 | 3 |   |
|-----------------------|------------|---|---|---|
|                       | cy c.c// 1 | _ | 3 |   |
| ALU 1 (add, cmp, jxx) | 1          | 6 | _ |   |
| ALU 2 (add, cmp, jxx) | _          | _ | _ |   |
| ALU 3 (mul) start     | 2          | 3 | 7 |   |
| ALU 3 (mul) end       |            | 2 | 3 | 7 |

| status          |
|-----------------|
| ready           |
| ready           |
| pending ready   |
| ready           |
| ready           |
| pending ready   |
| ready           |
| pending (still) |
| pending         |
| pending         |
| pending ready   |
| pending         |
| pending         |
| pending         |
| ""              |
|                 |

|           | • • • • • • • • • • • • • • • • • • • |
|-----------|---------------------------------------|
| #         | instruction                           |
| $\bowtie$ | add %x01, %x02 → %x03                 |
| 2×<       | imul %x04, %x05 → %x06                |
| 3≪        | imul %x03, %x07 → %x08                |
| 4≻<       | <u>cmp %x03, %x08 → %x09.€€</u>       |
| 5         | jle %x09.cc,                          |
| 6≪        | add %x01, %x03 → %x11                 |
| ><        | imul %x04, %x06 → %x12                |
| 8         | imul %x03, %x08 → %x13                |
| 9         | cmp %x11, %x13 → %x14.cc              |
| 10        | jle %x14.cc,                          |
|           |                                       |

| execution unit        | cycle# 1 | 2 | 3 | 4 |
|-----------------------|----------|---|---|---|
| ALU 1 (add, cmp, jxx) | 1        | 6 | _ | 4 |
| ALU 2 (add, cmp, jxx) | _        | _ | _ | _ |
| ALÙ 3 (mul) start     | 2        | 3 | 7 | 8 |
| ALU 3 (mul) end       |          | 2 | 3 | 7 |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | <del>pending</del> ready |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | <del>pending</del> ready |
| %x07 | ready                    |
| %x08 | pending ready            |
| %x09 | pending ready            |
| %x10 | pending                  |
| %x11 | <del>pending</del> ready |
| %x12 | pending (still)          |
| %x13 | pending                  |
| %x14 | pending                  |
| •••  | ""                       |
|      |                          |

|           | · •                               |
|-----------|-----------------------------------|
| #         | instruction                       |
| $\bowtie$ | add %x01, %x02 → %x03             |
| 2×<       | imul %x04, %x05 → %x06            |
| 3≪        | imul %x03, %x07 → %x08            |
| 4≻<       | <u>cmp %x03, %x08 → %x09.€€</u>   |
| 5≪        | jle %x09.cc,                      |
| 6≪        | <u>add %x01, %x03 → %x11</u>      |
| <b>~</b>  | imul %x04, %x96 → %x12            |
| 8         | imul %x03, %x08 $ ightarrow$ %x13 |
| 9         | cmp %x11, %x13 → %x14.cc          |
| 10        | jle %x14.cc,                      |
|           |                                   |

| execution unit        | cycle# 1 | 2 | 3 | 4 | 5 |
|-----------------------|----------|---|---|---|---|
| ALU 1 (add, cmp, jxx) | 1        | 6 | _ | 4 | 5 |
| ALU 2 (add, cmp, jxx) | _        | _ | _ | _ | _ |
| ALU 3 (mul) start     | 2        | 3 | 7 | 8 | _ |
| ALU 3 (mul) end       |          | 2 | 3 | 7 | 8 |

|      | •                        |
|------|--------------------------|
| reg  | status                   |
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | pending ready            |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | pending ready            |
| %x07 | ready                    |
| %x08 | <del>pending</del> ready |
| %x09 | pending ready            |
| %x10 | pending                  |
| %x11 | <del>pending</del> ready |
| %x12 | pending ready            |
| %x13 | pending (still)          |
| %x14 | pending                  |
| •••  | "                        |
|      |                          |

|            | · •                           |
|------------|-------------------------------|
| #          | instruction                   |
| $\bowtie$  | add %x01, %x02 → %x03         |
| 2<         | imul %x04, %x05 → %x06        |
| 3≪         | imul %x03, %x07 → %x08        |
| 4><        | cmp %x03, %x08 → %x09.cc      |
| 5≪         | jle %x09.cc,                  |
| 6≪         | <u>add %x01, %x03 → %x11</u>  |
| ~          | imul %x04, %x06 → %x12        |
| <b>≫</b> < | <u>imul %x03, %x08</u> → %x13 |
| 9          | cmp %x11, %x13 → %x14.cc      |
| 10         | jle %x14.cc,                  |
|            |                               |

| execution unit        | cycle# 1 | 2 | 3 | 4 | 5 |
|-----------------------|----------|---|---|---|---|
| ALU 1 (add, cmp, jxx) | 1        | 6 | _ | 4 | 5 |
| ALU 2 (add, cmp, jxx) | _        | - | - | _ | _ |
| ALU 3 (mul) start     | 2        | 3 | 7 | 8 | _ |
| ALU 3 (mul) end       |          | 2 | 3 | 7 | 8 |

| reg  | status                   |
|------|--------------------------|
| %x01 | ready                    |
| %x02 | ready                    |
| %x03 | <del>pending</del> ready |
| %x04 | ready                    |
| %x05 | ready                    |
| %x06 | pending ready            |
| %x07 | ready                    |
| %x08 | <del>pending</del> ready |
| %x09 | <del>pending</del> ready |
| %x10 | pending                  |
| %x11 | <del>pending</del> ready |
| %x12 | <del>pending</del> ready |
| %x13 | pending ready            |
| %x14 | pending                  |
| •••  | "                        |
|      |                          |

|           | • • • • • • • • • • • • • • • • • • •   |
|-----------|-----------------------------------------|
| #         | instruction                             |
| $\bowtie$ | add %x01, %x02 → %x03                   |
| 2×<       | <pre>imul %x04, %x05 → %x06</pre>       |
| 3≪        | <pre>imul %x03, %x07 → %x08</pre>       |
| 4≻<       | $cmp \%x03, \%x08 \rightarrow \%x09.cc$ |
| 5≪        | jle %x09.cc,                            |
| 6≪        | add %x01, %x03 → %x11                   |
| $\sim$    | <pre>imul %x04, %x06 → %x12</pre>       |
| 8≪        | <pre>imul %x03, %x08 → %x13</pre>       |
| 9≪        | <u>cmp %x11, %x13 → %x14.cc</u>         |
| 10        | jle %x14.cc,                            |

| cycle# 1 | 2                       | 3     | 4     | 5       |
|----------|-------------------------|-------|-------|---------|
| 1        | 6                       | _     | 4     | 5       |
| _        | -                       | -     | _     | _       |
| 2        | 3                       | 7     | 8     | _       |
|          | 2                       | 3     | 7     | 8       |
|          | cycle# 1<br>1<br>-<br>2 | 1 6 - | 1 6 - | 1 6 - 4 |

| reg        | status                   |
|------------|--------------------------|
| %x01       | ready                    |
| %x02       | ready                    |
| %x03       | pending ready            |
| %x04       | ready                    |
| %x05       | ready                    |
| %x06       | pending ready            |
| %x07       | ready                    |
| %x08       | pending ready            |
| %x09       | pending ready            |
| %x10       | pending                  |
| %x11       | <del>pending</del> ready |
| %x12       | <del>pending</del> ready |
| %x13       | <del>pending</del> ready |
| %x14       | pending ready            |
| <b>)</b> · | ***                      |
|            |                          |

| #         | instruction                             |
|-----------|-----------------------------------------|
| $\bowtie$ | add %x01, %x02 → %x03                   |
| 2×<       | <pre>imul %x04, %x05 → %x06</pre>       |
| 3≪        | <pre>imul %x03, %x97 → %x08</pre>       |
| 4         | $cmp \%x03, \%x08 \rightarrow \%x09.cc$ |
| 5×        | jle %x09.cc,                            |
| 6<        | add $%x01$ , $%x03 \rightarrow %x11$    |
| ><        | imul %x04, %x96 → %x12                  |
| 8<        | <pre>imul %x03, %x08 → %x13</pre>       |
| 9×        | <pre>cmp %x11, %x13 → %x14.ec</pre>     |
| 128<      | jle %x14.cc,                            |
|           |                                         |

| execution unit        | cycle# 1 | 2 | 3 | 4 | 5 |
|-----------------------|----------|---|---|---|---|
| ALU 1 (add, cmp, jxx) | 1        | 6 | _ | 4 | 5 |
| ALU 2 (add, cmp, jxx) | _        | - | _ | _ | _ |
| ALU 3 (mul) start     | 2        | 3 | 7 | 8 | _ |
| ALU 3 (mul) end       |          | 2 | 3 | 7 | 8 |

| reg        | status        |
|------------|---------------|
| %x01       | ready         |
| %x02       | ready         |
| %x03       | pending ready |
| %x04       | ready         |
| %x05       | ready         |
| %x06       | pending ready |
| %x07       | ready         |
| %x08       | pending ready |
| %x09       | pending ready |
| %x10       | pending       |
| %x11       | pending ready |
| %x12       | pending ready |
| %x13       | pending ready |
| %x14       | pending ready |
| <u>6</u> . | / ···         |
| 9 1        | Ò             |

## 000 limitations

can't always find instructions to run

plenty of instructions, but all depend on unfinished ones programmer can adjust program to help this

need to track all uncommitted instructions

can only go so far ahead

e.g. Intel Skylake: 224-entry reorder buffer, 168 physical registers

branch misprediction has a big cost (relative to pipelined)

e.g. Intel Skylake: up to approx. 16 cycles (v. 2 for simple pipelined CPU)

## 000 limitations

## can't always find instructions to run

plenty of instructions, but all depend on unfinished ones programmer can adjust program to help this

#### need to track all uncommitted instructions

can only go so far ahead

e.g. Intel Skylake: 224-entry reorder buffer, 168 physical registers

branch misprediction has a big cost (relative to pipelined)

e.g. Intel Skylake: up to approx. 16 cycles (v. 2 for simple pipelined CPU)

## some performance examples

```
example1:
    movq $10000000000, %rax
loop1:
    addq %rbx, %rcx
    decq %rax
    jge loop1
    ret
```

about 30B instructions my desktop: approx 2.65 sec

```
example2:
    movq $10000000000, %rax
loop2:
    addq %rbx, %rcx
    addq %r8, %r9
    decq %rax
    jge loop2
    ret
```

about 40B instructions my desktop: approx 2.65 sec

## some performance examples

```
example1:
    movq $10000000000, %rax
loop1:
    addq %rbx, %rcx
    decq %rax
    jge loop1
    ret
```

about 30B instructions my desktop: approx 2.65 sec

```
example2:
    movq $10000000000, %rax
loop2:
    addq %rbx, %rcx
    addq %r8, %r9
    decq %rax
    jge loop2
    ret
```

about 40B instructions my desktop: approx 2.65 sec

# data flow model and limits (1)



# data flow model and limits (1)



each yellow box = instruction

 $\mathsf{arrows} = \mathsf{dependences}$ 

instructions only executed when dependencies ready

## reassociation

with pipelined, 5-cycle latency multiplier; how long does each take to compute?

$$((a \times b) \times c) \times d$$

$$(a \times b) \times (c \times d)$$

imulq %rbx, %rax
imulq %rcx, %rdx
imulq %rdx, %rax

## reassociation

with pipelined, 5-cycle latency multiplier; how long does each take to compute?



## Intel Skylake OOO design

- 2015 Intel design codename 'Skylake'
- 94-entry instruction queue-equivalent
- 168 physical integer registers
- 168 physical floating point registers
- 4 ALU functional units but some can handle more/different types of operations than others
- 2 load functional units but pipelined: supports multiple pending cache misses in parallel
- 1 store functional unit
- 224-entry reorder buffer determines how far ahead branch mispredictions, etc. can happen

# backup slides

# backup slides

### indirect branch prediction

```
jmp *%rax or jmp *(%rax, %rcx, 8)
```

BTB can provide a prediction

but can do better with more context

example—predict based on other recent computed jumps good for polymophic method calls

table lookup with Hash(last few jmps) instead of Hash(this jmp)

### an OOO pipeline diagram

```
cycle #
                0 1 2 3 4 5 6 7 8 9 10 11
addg %r01, %r05
                     RIEW
addg %r02, %r05
                         IEW
                     R
addg %r03, %r04
                    DRIE
cmpg %r04, %r08
                            IEW
jne ...
                              I E
                         R
                                  W
addg %r01, %r05
                       DRIE
                                W
addg %r02, %r05
                            RI
                                Ε
                                   W
addq %r03, %r04
                                IE
                         D
                           R
                                     W
cmpg %r04, %r08
                                   IEW
```

 $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$  for new instrs

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| %rax  | %x12  |
| %rcx  | %x17  |
| %rbx  | %x13  |
| %rdx  | %x07  |
| •••   | •••   |

#### free list

| %x19 |  |
|------|--|
| %x23 |  |
| •••  |  |
| •••  |  |

 $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$  for new instrs

| arch. | phys. |  |
|-------|-------|--|
| reg   | reg   |  |
| %rax  | %x12  |  |
| %rcx  | %x17  |  |
| %rbx  | %x13  |  |
| %rdx  | %x07  |  |
| •••   | ••    |  |

#### free list

| %x19 |  |
|------|--|
| %x23 |  |
| •••  |  |
| •••  |  |

#### reorder buffer (ROB)

| instr<br>num. | PC     | dest. | reg    | done? | mispred? / except? |
|---------------|--------|-------|--------|-------|--------------------|
| 14            | 0x1233 | %rbx  | / %x23 |       |                    |
| 15            | 0x1239 | %rax  | / %x30 |       |                    |
| 16            | 0x1242 | %rcx  | / %x31 |       |                    |
| 17            | 0x1244 | %rcx  | / %x32 |       |                    |
| 18            | 0x1248 | %rdx  | / %x34 |       |                    |
| 19            | 0x1249 | %rax  | / %x38 |       |                    |
| 20            | 0x1254 | PC    |        |       |                    |
| 21            | 0x1260 | %rcx  | / %x17 |       |                    |
|               |        |       |        |       |                    |
| 31            | 0x129f | %rax  | / %x12 |       |                    |
|               |        |       |        |       |                    |
|               |        |       |        |       |                    |

reorder buffer contains instructions started, but not fully finished new entries created on rename (not enough space? stall rename stage)

 $arch \rightarrow phys reg$  for new instrs

| arch. | phys. |  |
|-------|-------|--|
| reg   | reg   |  |
| %rax  | %x12  |  |
| %rcx  | %x17  |  |
| %rbx  | %x13  |  |
| %rdx  | %x07  |  |
| •••   | •••   |  |

#### free list

| %x19 |  |
|------|--|
| %x23 |  |
| •••  |  |
| •••  |  |

reorder buffer (ROB)

| remove    |          | instr<br>num. | PC     | dest. reg   | done? | mispred? , except? |
|-----------|----------|---------------|--------|-------------|-------|--------------------|
| here      | <b>-</b> | 14            | 0x1233 | %rbx / %x23 |       |                    |
| on commit |          | 15            | 0x1239 | %rax / %x30 |       |                    |
|           |          | 16            | 0x1242 | %rcx / %x31 |       |                    |
|           |          | 17            | 0x1244 | %rcx / %x32 |       |                    |
|           |          | 18            | 0x1248 | %rdx / %x34 |       |                    |
|           |          | 19            | 0x1249 | %rax / %x38 |       |                    |
|           |          | 20            | 0x1254 | PC          |       |                    |
|           |          | 21            | 0x1260 | %rcx / %x17 |       |                    |
|           |          |               | •••    |             |       |                    |
| add here  |          | 31            | 0x129f | %rax / %x12 |       |                    |
|           | <b>-</b> |               |        |             |       |                    |
| on rename |          |               |        |             |       |                    |
| _         |          | ٠.            | cc     |             | •     | •                  |

place newly started instruction at end of buffer remember at least its destination register (both architectural and physical versions)

 $arch \rightarrow phys reg$  for new instrs

| arch. | phys.     |
|-------|-----------|
| reg   | reg       |
| %rax  | %x12      |
| %rcx  | %x17      |
| %rbx  | %x13      |
| %rdx  | %x07 %x19 |
|       |           |

#### free list

| %x19 |  |
|------|--|
| %x23 |  |
| •••  |  |
| •••  |  |

#### reorder buffer (ROB)

| remove                | instr<br>num. | PC     | dest. reg   | done? | mispred?<br>except? |
|-----------------------|---------------|--------|-------------|-------|---------------------|
| here →                | 14            | 0x1233 | %rbx / %x23 |       |                     |
| on commit             | 15            | 0x1239 | %rax / %x30 |       |                     |
|                       | 16            | 0x1242 | %rcx / %x31 |       |                     |
|                       | 17            | 0x1244 | %rcx / %x32 |       |                     |
|                       | 18            | 0x1248 | %rdx / %x34 |       |                     |
|                       | 19            | 0x1249 | %rax / %x38 |       |                     |
|                       | 20            | 0x1254 | PC          |       |                     |
|                       | 21            | 0x1260 | %rcx / %x17 |       |                     |
|                       |               |        | •••         |       |                     |
| add here              | 31            | 0x129f | %rax / %x12 |       |                     |
| · · · · · · · · · · · | 32            | 0x1230 | %rdx / %x19 |       |                     |
| on rename             |               |        |             |       |                     |

next renamed instruction goes in next slot, etc.

 $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$  for new instrs

| arch. | phys.                |  |
|-------|----------------------|--|
| reg   | reg                  |  |
| %rax  | %x12                 |  |
| %rcx  | %x17                 |  |
| %rbx  | %x13                 |  |
| %rdx  | <del>%x07</del> %x19 |  |
|       | •••                  |  |

#### free list

| %x19 |  |
|------|--|
| %x23 |  |
| •••  |  |
| •••  |  |

#### reorder buffer (ROB)

|           |          |               |        |       |        | `     | ,                  |
|-----------|----------|---------------|--------|-------|--------|-------|--------------------|
| remove    |          | instr<br>num. | PC     | dest. | reg    | done? | mispred? / except? |
| here      | <b>→</b> | 14            | 0x1233 | %rbx  | / %x23 |       |                    |
| on commit |          | 15            | 0x1239 | %rax  | / %x30 |       |                    |
|           |          | 16            | 0x1242 | %rcx  | / %x31 |       |                    |
|           |          | 17            | 0x1244 | %rcx  | / %x32 |       |                    |
|           |          | 18            | 0x1248 | %rdx  | / %x34 |       |                    |
|           |          | 19            | 0x1249 | %rax  | / %x38 |       |                    |
|           |          | 20            | 0x1254 | PC    |        |       |                    |
|           |          | 21            | 0x1260 | %rcx  | / %x17 |       |                    |
|           |          |               |        |       |        |       |                    |
|           |          | 31            | 0x129f | %rax  | / %x12 |       |                    |
| add here  |          | 32            | 0x1230 | %rdx  | / %x19 |       |                    |
| auu nere  | -        |               |        |       |        |       |                    |
| on rename |          |               |        |       |        |       | I                  |

 $\operatorname{arch} \to \operatorname{phys.} \operatorname{reg}$  for new instrs

| arch. | phys.                |
|-------|----------------------|
| reg   | reg                  |
| %rax  | %x12                 |
| %rcx  | %x17                 |
| %rbx  | %x13                 |
| %rdx  | <del>%x07</del> %x19 |
| •••   | •••                  |

#### free list

| %x19 |   |
|------|---|
| %x13 |   |
| •••  |   |
| •••  | _ |

#### reorder buffer (ROB)

remove here → on commit

| instr<br>num. | PC     | dest. | reg    | done? | mispred?<br>except? |
|---------------|--------|-------|--------|-------|---------------------|
| 14            | 0x1233 | %rbx  | / %x24 |       |                     |
| 15            | 0x1239 | %rax  | / %x30 |       |                     |
| 16            | 0x1242 | %rcx  | / %x31 |       |                     |
| 17            | 0x1244 | %rcx  | / %x32 |       |                     |
| 18            | 0x1248 | %rdx  | / %x34 |       |                     |
| 19            | 0x1249 | %rax  | / %x38 |       |                     |
| 20            | 0x1254 | PC    |        |       |                     |
| 21            | 0x1260 | %rcx  | / %x17 |       |                     |
|               |        |       |        |       |                     |
| 31            | 0x129f | %rax  | / %x12 |       |                     |
|               |        |       |        |       |                     |
|               |        |       |        |       |                     |

 $arch \rightarrow phys. reg$  for new instrs

| arch. | phys.                |
|-------|----------------------|
| reg   | reg                  |
| %rax  | %x12                 |
| %rcx  | %x17                 |
| %rbx  | %x13                 |
| %rdx  | <del>%x07</del> %x19 |
| •••   | •••                  |

#### free list

| %x19 |  |
|------|--|
| %x13 |  |
| •••  |  |
| •••  |  |

reorder buffer (ROB)

|           |               |        |          | `    |          | ,                   |
|-----------|---------------|--------|----------|------|----------|---------------------|
| remove    | instr<br>num. | PC     | dest. re | eg   | done?    | mispred?<br>except? |
| here →    | 14            | 0x1233 | %rbx /   | %x24 |          |                     |
| on commit | 15            | 0x1239 | %rax /   | %x30 |          |                     |
|           | 16            | 0x1242 | %rcx /   | %x31 | <b>✓</b> |                     |
|           | 17            | 0x1244 | %rcx /   | %x32 |          |                     |
|           | 18            | 0x1248 | %rdx /   | %x34 | ✓        |                     |
|           | 19            | 0x1249 | %rax /   | %x38 | <b>√</b> |                     |
|           | 20            | 0x1254 | PC       |      |          |                     |
|           | 21            | 0x1260 | %rcx /   | %x17 |          |                     |
|           |               | •••    |          |      |          |                     |
|           | 31            | 0x129f | %rax /   | %x12 |          | <b>√</b>            |
|           |               |        |          |      |          |                     |
|           |               |        |          |      |          |                     |

instructions marked done in reorder buffer when computed but not removed ('committed') yet

 $arch \rightarrow phys. reg$ reorder buffer (ROB) for new instrs mispred? / arch. phys. instr done? except? PC dest. reg remove num. reg reg here  $\longrightarrow$  14 0x1233 %rbx / %x24  $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$ %rax %x12 15 0x1239 %rax / %x30 on commit %rcx %x17 for committed 16 0x1242 %rcx / %x31 %rbx %x13 17 0x1244 %rcx / %x32 arch. phys. <del>%x07</del> %x19 %rdx 18 0x1248 %rdx / %x34 reg reg ••• 19 0x1249 %rax / %x38 %x30 %rax 20 0x1254 PC %rcx %x28 free list 21 0x1260 %rcx / %x17 %x23 %rbx %x 19 %rdx %x21 31 0x129f%rax / %x12 %x13 commit stage tracks architectural to physical register map for committed instructions

arch  $\rightarrow$  phys. reg reorder buffer (ROB) for new instrs mispred? / arch. phys. instr done? except? PC dest. reg remove num. reg reg here  $\longrightarrow$  14 0x1233 %rbx / %x24  $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$ %rax %x12 15 0x1239 %rax / %x30 on commit %rcx %x17 for committed 16 0x1242 %rcx / %x31 %rbx %x13 17 0x1244 %rcx / %x32 arch. phys. <del>%x07</del> %x19 %rdx 18 0x1248 %rdx / %x34 reg reg ••• 19 0x1249 %rax / %x38 %x30 %rax 20 0x1254 PC %rcx %x28 free list 21 0x1260 %rcx / %x17 %x23 %x24 %rbx %x 19 %rdx %x21 31 0x129f%rax / %x12 %x13 32 0x1230 %rdx / %x19 when next-to-commit instruction is done %x23 update this register map and free register list and remove instr. from reorder buffer

arch  $\rightarrow$  phys. reg reorder buffer (ROB) for new instrs arch. phys. instr done? except? mispred? / PC dest. reg num. reg reg  $\begin{array}{c} {\sf arch} \to {\sf phys} \ {\sf reg} \ \ {\sf remove} \ {\sf here} \\ {\sf for} \ {\sf committed} \end{array}$ %rax %x12 15 0x1239 %rax / %x30 %rcx %x17 16 0x1242 %rcx / %x31 %rbx %x13 17 0x1244%rcx / %x32 arch. phys. <del>%x07</del> %x19 %rdx 18 0x1248 %rdx / %x34 reg reg ••• 19 0x1249 %rax / %x38 %x30 %rax 20 0x1254 PC %rcx %x28 free list 21 0x1260 %rcx / %x17 <del>%x23</del> %x24 %rbx %x 19 %rdx %x21 0x129f%rax / %x12 31 %x13 32 0x1230\%rdx / \%x19 when next-to-commit instruction is done %x23 update this register map and free register list and remove instr. from reorder buffer

 $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$  for new instrs

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| %rax  | %x12  |
| %rcx  | %x17  |
| %rbx  | %x13  |
| %rdx  | %x19  |
| •••   | •••   |

free list

| <del>%x19</del> |  |
|-----------------|--|
| %x13            |  |
| •••             |  |
| •••             |  |

 $\begin{array}{c} {\sf arch} \to {\sf phys} \ {\sf reg} \\ {\sf for} \ {\sf committed} \end{array}$ 

| arch. | phys.                |
|-------|----------------------|
| reg   | reg                  |
| %rax  | <del>%x30</del> %x38 |
| %rcx  | <del>%x31</del> %x32 |
| %rbx  | <del>%x23</del> %x24 |
| %rdx  | <del>%x21</del> %x34 |
| •••   | •••                  |

reorder buffer (ROB)

| instr<br>num. | PC     | dest. | reg    | done?    | mispred? /<br>except? |
|---------------|--------|-------|--------|----------|-----------------------|
| 14            | 0x1233 | %rbx  | / %x24 | V        |                       |
| 15            | 0x1239 | %rax  | / %x30 | V        |                       |
| 16            | 0x1242 | %rcx  | / %x31 | ·        |                       |
| 17            | 0×1244 | %rex  | / %x32 | ·        |                       |
| 18            | 0×1248 | %rdx  | / %x34 | ·        |                       |
| 19            | 0×1249 | %rax  | / %x38 | ·        |                       |
| 20            | 0x1254 | PC    |        | <b>√</b> | <b>√</b>              |
| 21            | 0x1260 | %rcx  | / %x17 |          |                       |
|               |        |       |        |          |                       |
| 31            | 0x129f | %rax  | / %x12 | <b>√</b> |                       |
| 32            | 0x1230 | %rdx  | / %x19 |          |                       |
|               |        |       |        |          |                       |

 $arch \rightarrow phys reg$  for new instrs

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| %rax  | %x12  |
| %rcx  | %x17  |
| %rbx  | %x13  |
| %rdx  | %x19  |
| •••   | •••   |

 $\operatorname{arch} \to \operatorname{phys} \operatorname{reg}$  for committed

| arch. | phys.                |
|-------|----------------------|
| reg   | reg                  |
| %rax  | <del>%x30</del> %x38 |
| %rcx  | <del>%x31</del> %x32 |
| %rbx  | <del>%x23</del> %x24 |
| %rdx  | <del>%x21</del> %x34 |
|       |                      |

reorder buffer (ROB)

|         |               |        |             |          | ,                     |
|---------|---------------|--------|-------------|----------|-----------------------|
|         | instr<br>num. | PC     | dest. reg   | done?    | mispred? ,<br>except? |
|         | 14            | 0×1233 | %rbx / %x24 | <b>√</b> |                       |
|         | 15            | 0×1239 | %rax / %x30 | √ ·      |                       |
|         | 16            | 0x1242 | %rex / %x31 | V        |                       |
|         | 17            | 0×1244 | %rcx / %x32 | √        |                       |
|         | 18            | 0×1248 | %rdx / %x34 | <b>√</b> |                       |
|         | 19            | 0x1249 | %rax / %x38 | <b>√</b> |                       |
| <b></b> | 20            | 0x1254 | PC          | <b>√</b> | <b>√</b>              |
|         | 21            | 0x1260 | %rcx / %x17 |          |                       |
|         |               | •••    |             |          |                       |
|         | 31            | 0x129f | %rax / %x12 | ✓        |                       |
|         | 32            | 0x1230 | %rdx / %x19 |          |                       |
|         |               |        |             |          |                       |

free list

| <del>%x19</del> |  |
|-----------------|--|
| %x13            |  |
| •••             |  |
| •••             |  |

when committing a mispredicted instruction...

this is where we undo mispredicted instructions





### better? alternatives

can take snapshots of register map on each branch don't need to reconstruct the table (but how to efficiently store them)

can reconstruct register map before we commit the branch instruction

need to let reorder buffer be accessed even more?

can track more/different information in reorder buffer





free regs for new instrs

| X19 | arch. | phys. |
|-----|-------|-------|
| X23 | reg   | reg   |
| ••• | RAX   | X15   |
|     | RCX   | X17   |
|     | RBX   | X13   |
|     | RBX   | X07   |
|     | •••   |       |







free regs for new instrs for complete instrs

| X19 |
|-----|
| X23 |
|     |

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| RAX   | X15   |
| RCX   | X17   |
| RBX   | X13   |
| RBX   | X07   |
|       | •••   |

| arch. | phys.             |
|-------|-------------------|
| reg   | reg               |
| RAX   | X21               |
| RCX   | <del>X2</del> X32 |
| RBX   | X48               |
| RDX   | X37               |
|       |                   |

| instr<br>num. | PC     | dest. reg | done?    | except? |
|---------------|--------|-----------|----------|---------|
| ,             |        | <b></b>   |          |         |
| 17            | 0x1244 | RCX / X32 | V        |         |
| 18            | 0x1248 | RDX / X34 |          |         |
| 19            | 0x1249 | RAX / X38 | <b>√</b> |         |
| 20            | 0x1254 | R8 / X05  |          |         |
| 21            | 0x1260 | R8 / X06  |          |         |
|               |        |           |          |         |



free regs for new instrs for complete instrs

| X19 |
|-----|
| X23 |
|     |

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| RAX   | X15   |
| RCX   | X17   |
| RBX   | X13   |
| RBX   | X07   |
| •••   |       |

| arch. | phys.             |
|-------|-------------------|
| reg   | reg               |
| RAX   | X21               |
| RCX   | <del>X2</del> X32 |
| RBX   | X48               |
| RDX   | X37               |
|       |                   |

|   | instr<br>num. | PC                | dest. reg | done?    | except? |
|---|---------------|-------------------|-----------|----------|---------|
| ¥ |               |                   |           |          |         |
|   | 17            | <del>0x1244</del> | RCX / X32 | <b>√</b> |         |
|   | 18            | 0x1248            | RDX / X34 |          |         |
|   | 19            | 0x1249            | RAX / X38 | <b>√</b> |         |
|   | 20            | 0x1254            | R8 / X05  | √        | ✓       |
|   | 21            | 0x1260            | R8 / X06  |          |         |
|   |               |                   | •••       |          |         |
|   |               |                   |           |          |         |



free regs for new instrs for complete instrs

| X19 |
|-----|
| X23 |
|     |

| arch. | phys. |
|-------|-------|
| reg   | reg   |
| RAX   | X15   |
| RCX   | X17   |
| RBX   | X13   |
| RBX   | X07   |
| •••   | •••   |

| arch. | phys.             |
|-------|-------------------|
| reg   | reg               |
| RAX   | X21 X38           |
| RCX   | <del>X2</del> X32 |
| RBX   | X48               |
| RDX   | X37 X34           |
| •••   |                   |

|   | instr<br>num. | PC                | dest. reg | done?           | except? |
|---|---------------|-------------------|-----------|-----------------|---------|
| ļ |               |                   |           |                 |         |
|   | 17            | <del>0x1244</del> | RCX / X32 | V               |         |
|   | 18            | 0x1248            | RDX-/-X34 | <b>√</b> ·····  |         |
|   | 19            | 0x·1249           | RAX-/X38  | <b>√</b> ······ |         |
|   | 20            | 0x1254            | R8 / X05  | ✓               | √       |
|   | 21            | 0x1260            | R8 / X06  |                 |         |
|   |               | •••               |           |                 |         |









free regs for new instrs

| X19 |
|-----|
| X23 |
|     |

| arch. | phys. |  |
|-------|-------|--|
| reg   | reg   |  |
| RAX   | X15   |  |
| RCX   | X17   |  |
| RBX   | X13   |  |
| RBX   | X07   |  |
| •••   |       |  |

for complete instrs

| arch. | phys.              |  |
|-------|--------------------|--|
| reg   | reg                |  |
| RAX   | <del>X21</del> X38 |  |
| RCX   | <del>X2</del> X32  |  |
| RBX   | X48                |  |
| RDX   | <del>X37</del> X34 |  |
|       |                    |  |

|   | instr<br>num. | PC                | dest. reg | done?    | except? |
|---|---------------|-------------------|-----------|----------|---------|
| ¥ |               |                   |           |          |         |
|   | 17            | <del>0x1244</del> | RCX / X32 | √        |         |
|   | 18            | 0x1248            | RDX / X34 | <b>√</b> |         |
|   | 19            | 0x1249            | RAX / X38 | <b>√</b> |         |
|   | 20            | 0x1254            | R8 / X05  | ✓        | ✓       |
|   | 21            | 0x1260            | R8 / X06  |          |         |
|   |               |                   |           |          |         |

### handling memory accesses?

one idea:

list of done + uncommitted loads+stores

execute load early + double-check on commit have data cache watch for changes to addresses on list if changed, treat like branch misprediction

loads check list of stores so you read back own values actually finish store on commit maybe treat like branch misprediction if conflict?

### the open-source BROOM pipeline



### data flow model and limits



### data flow model and limits



### data flow model and limits



#### data flow model and limits



#### better data-flow



#### better data-flow



#### better data-flow



how to (in hardware) connect A and B?

Α

В

how to (in hardware) connect A and B?

one wire carrying binary signals?

В









how to (in hardware) connect A and B?



here: duplex via multiple wires (simplest scheme) can achieve effect electrically/etc. via one wire example: cable Internet (how is topic for ECE class)

how to connect?















# shared bus, really?

common for parts of internals of computers (topic later)

model for wifi
radio "channel" kinda similar to shared wire

how the early versions of Ethernet worked "vampire taps" physically attached to shared cable

# shared bus, messages for who?



messages needs a 'header' to tell who it's to/from

everyone needs to filter out messathat aren't theirs

Figure 6-1: Data Link Layer Frame Format

# taking turns on shared bus?

#### token ring

one machine has a 'token' = can send send special message to pass to another machine

free-for-all: collision detection + retry detect if you're transmitting when someone else is wait (usually randomized amount of time) and retry

coordinating machine transmits timeslots part of common cellphone design (TDMA: time division multiple access)

make bus support multiple transmitters?

requires understanding how interference works another part of common cell phone design



#### what does the hub do?

#### simple version:

imitate shared bus: copy messages to everyone else something to handle two messages sent at once

#### less simple:

read "header" on message + send to destination only requires some way to figure out destinations queue of messages waiting to be sent



# more complicated designs

hierarchies

networks of networks "internetworks"

so far still have single points of failure



# individual computers are networks

individual computers are (kinda) networks of...

processors memories I/O devices

so what topology (layout) do those networks have?

# the "bus"



# example: 80386 signal pins

| name      | purpose                |          |
|-----------|------------------------|----------|
| CLK2      | clock for bus          | timing   |
| W/R#      | write or read?         |          |
| D/C#      | data or control?       | metadata |
| M/IO#     | memory or I/O?         |          |
| INTR      | interrupt request      |          |
|           | other metadata signals |          |
| BE0#-BE3# | (4) byte enable        | address  |
| A2-A31    | (30) address bits      |          |
| DO-D31    | (32) data signals      | data     |

# example: AMD EPYC (1 socket)



Fig. 21. Single-socket AMD EPYC<sup>TM</sup> system (SP3). Figure from Burd et al, "'' 'Zepllin': An SoC for Multichip Architectures'' (IEEE JSSC Vol 54, No 1)

# example: Intel Skylake-SP



# extra trips to CPU



# extra trips to CPU

















