# Machine Scheduler

Fine grain resource allocation using ResourceSegments

Adam Nemet, Francesco Petrogalli (speaker), Francis Visoiu-Mistrih - Apple

### What is this all about

- MachineScheduler and SchedMachineModel
- No InstrItineraries
- Representation of hardware resources in the SchedMachineModel
- Improved estimates of execution traces
  - Better scheduling
- Ongoing effort

# Background information

Instruction is Ready @<cycle>

All input data needed by an instruction is ready

Instruction is Ready @<cycle>

All input data needed by an instruction is ready

ADD r2, r1, r0 ADD r5, r4, r3 MUL r6, r5, r2

Instruction is Ready @<cycle>

All input data needed by an instruction is ready

Instruction is

Available @<cycle>

All hardware resources that execute the instruction are available

Instruction is Ready @<cycle>

All input data needed by an instruction is ready

Instruction is

Available @<cycle>

All hardware resources that execute the instruction are available



# All instructions used in this presentation are READY

```
ADD r2, r0, r1
ADD r4, r3, r2
```

### Focus on structural hazards



### Sequence of stages

Fetch







### Instructions used in this talk

### **ADD**

Fetch Decode Execute ADD ... Retire

#### **MADD**



# Let's schedule some code!

#### **MADD**



MADD r3, r2, r1, r0

MADD r7, r6, r5, r4

MADD r11, r10, r9, r8



























# Focus on the instruction-specific resources



# Focus on the instruction-specific resources



## Pipelined execution

## Break up the execution in separate stages

#### Hardware feature



## Pipelined resources: faster execution!



## What happens when pipeline execution shares functional units?

## Reminder

#### Focus on the execution units



MADD r2, r1, r0
ADD r5, r4, r3
ADD r8, r7, r6







Postpone execution (stall)







```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
```



```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
```



```
ADD

Adder (1 cycle)

MADD

Multiplier Adder (1 cycle)
```

```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
```





```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
```







#### TableGen description

```
def : WriteRes<WriteADD, [Adder] > {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder] > {
    let ResourceCycles = [ 2, 3];
}
```

The Adder resource is overbooked for 2 extra cycles in the MADD instruction





## LLVM estimation of execution with shared resources.

MADD r2, r1, r0

ADD r5, r4, r3

ADD r8, r7, r6









## Overbooking of resources leads to longer traces

#### What LLVM estimates

#### What hardware does





## Overbooking of resources leads to longer traces

#### What LLVM estimates

#### What hardware does





Fine grain resolution of resource usage



```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
```

```
ADD r2, r1, r0
Adder [0, 1)

MADD r2, r1, r0

Multiplier [0, 2)

Adder [0, 3)
```

```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
```



```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
```



```
def : WriteRes<WriteADD, [Adder]> {
    let ResourceCycles = [ 1];
}
def : WriteRes<WriteADD, [Adder]> {
    let StartAtCycle = [0];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [ 2, 3];
}
def : WriteRes<WriteMADD, [Multiplier, Adder]> {
    let ResourceCycles = [2, 3];
}
let StartAtCycle = [0, 2];
```

## Intermission

Advertise a new feature

#### From fancy tables...



#### ...to text tables!

```
*** Final schedule for %bb.0 ***
* Schedule table (TopDown):
 i: issue
 x: resource booked
Cycle
MADD r2, r1, r0
         Multiplier | x | x
              Adder
ADD r5, r4, r3
              Adder
ADD r8, r7, r6
              Adder
```

# Debug messages generated by the compiler!

```
*** Final schedule for %bb.0 ***
* Schedule table (TopDown):
 i: issue
 x: resource booked
Cycle
MADD r2, r1, r0 | i |
        Multiplier | x | x
             Adder
ADD r5, r4, r3
            Adder
ADD r8, r7, r6
             Adder
```

## llc -misched-dump-schedule-trace

LIT unit tests for resource usage in scheduling models

```
# CHECK-LABEL: *** Final schedule for %bb.0 ***
# CHECK-NEXT: * Schedule table (TopDown):
# CHECK-NEXT: i: issue
# CHECK-NEXT: x: resource booked
# CHECK-NEXT: Cycle
# CHECK-NEXT: MADD r2, r1, r0
# CHECK-NEXT: Multiplier | x
# CHECK-NEXT:
                           Adder
# CHECK-NEXT: ADD r5, r4, r3
# CHECK-NEXT:
                          Adder
# CHECK-NEXT: ADD r8, r7, r6
                           Adder
# CHECK-NEXT:
```

## Implementation

#### What changes in the code

#### TableGen representation and MachineScheduler

- TableGen:
  - list<int> StartAtCycle = []; added to the WriteRes class;
  - Backend changes in llvm/utils/TableGen/SubtargetEmitter.cpp

#### What changes in the code

#### TableGen representation and MachineScheduler

- TableGen:
  - list<int> StartAtCycle = []; added to the WriteRes class;
  - Backend changes in llvm/utils/TableGen/SubtargetEmitter.cpp
- MachineScheduler:
  - Data structure to handle intervals
  - New fine grain bookkeeping algorithm

# Fine grain bookkeeping

Keeping track of resource intervals across the schedule

#### Current algorithm

ADD r2, r1, r0

ADD r5, r4, r3

MADD r9, r8, r7, r6

ADD r12, r11, r10

Last Seen

Multiplier

Adder

#### New algorithm

ADD r2, r1, r0

ADD r5, r4, r3

MADD r9, r8, r7, r6

ADD r12, r11, r10

All Seen

Multiplier

Adder





Last Seen



MADD r9, r8, r7, r6

ADD r12, r11, r10



All Seen

Multiplier Adder

Multiplier Adder













#### Finds the gap in the disjoint interval!



Legend











#### Better estimate of execution.







## Performance improvements

## Example 1: from 25 cycles to 12 cycles Top-down scheduling

```
$x10 = ADD $x9, $x9
                                                                                                                                          $x12 = SUB $x11, $x11
                                                                                                                                         $x16 = SLL $x15, $x15
test001:%bb.0
*** Final schedule for %bb.0 ***
                                                                                                                                         $x14 = MUL $x13, $x13
 * Schedule table (TopDown):
 i: issue
                                                                                                                                          $x18 = SRL $x17, $x17
 x: resource booked
Cycle
ADD
      ResX0
      ResX1
                   X
      ResX2
                                                                                                                            test001:%bb.0
      ResX3
                                                                                                                            *** Final schedule for %bb.0 ***
      ResX4
                                                                                                                            * Schedule table (TopDown):
SUB
                                                                                                                             i: issue
      ResX2
                                                                                                                             x: resource booked
      ResX3
                                          X
                                                                                                                            Cycle
      ResX4
                                                                                                                            ADD
                                                                                                                                              i
      ResX0
                                                                                                                                 ResX0
                                                                                                                                               X
      ResX1
                                                                                                                                 ResX1
                                                                                                                                                   X
SLL
                                                                                                                                 ResX2
      ResX1
                                                                 X
                                                                                                                                 ResX3
      ResX2
                                                                 X
                                                                                                                                 ResX4
                                                                                                                                                                 X
      ResX3
                                                                                                                           SUB
      ResX4
                                                                 X
                                                                                                                                 ResX2
      ResX0
                                                                                                                                 ResX3
MUL
                                                                                                                                 ResX4
      ResX4
                                                                                                                                 ResX0
      ResX0
                                                                                                                                 ResX1
      ResX1
                                                                                                                           SLL
                                                                                                                                                       i
      ResX2
                                                                                                                                 ResX1
      ResX3
                                                                                                                                 ResX2
                                                                                                                                                            X
SRL.
                                                                                                                                 ResX3
                                                                                                                                                                 X
      ResX3
                                                                                                               X
                                                                                                                                 ResX4
      ResX4
                                                                                                               X
                                                                                                                                 ResX0
                                                                                                                                                                          X
      ResX0
                                                                                                                        X
                                                                                                                           SRL.
                                                                                                                                                                     i
      ResX1
                                                                                                               X
                                                                                                                   X
                                                                                                                                 ResX3
                                                                                                                                 ResX4
                                                                                                                                                                          X
                                                                                                                                 ResX0
                                                                                                                                 ResX1
                                                                                                                                 ResX2
                                                                                                                                                                                        X
                                                                                                                                 ResX4
                                                                                                                                 ResX0
                                                                                                                                                                                   X
                                                                                                                                 ResX1
                                                                                                                                                                                        X
                                                                                                                                 ResX2
                                                                                                                                 ResX3
```

liveins: \$x9, \$x11, \$x13, \$x15, \$x17, \$x19

#### Example 2: from 17 cycles to 7 cycles

#### **Bottom-up scheduling**

```
$x10 = ADD $x9, $x9

$x12 = SUB $x11, $x11

$x14 = MUL $x13, $x13

$x16 = SLL $x15, $x15

$x18 = SRL $x17, $x17

$x20 = DIV $x19, $x19

test001:%bb.0

*** Final schedule for %bb.0 ***

* Schedule table (BottomUp):

i: issue

x: resource booked
```

bb.0:

```
*** Final schedule for %bb.0 ***
 * Schedule table (BottomUp):
  i: issue
  x: resource booked
Cycle
ADD
      ResX0
      ResX1
      ResX3
                                               X
      ResX2
      ResX3
      ResX0
      ResX2
                                                                     X
      ResX2
                                                                          X
      ResX0
                                                                                         i
      ResX2
      ResX0
      ResX3
      ResX2
DIV.
      ResX1
      ResX0
```

| <pre>test001:%bb.0 *** Final schedule for %bb.0 *** * Schedule table (BottomUp):    i: issue    x: resource booked</pre> |     |          |     |                  |   |          |       |  |  |
|--------------------------------------------------------------------------------------------------------------------------|-----|----------|-----|------------------|---|----------|-------|--|--|
| Cycle                                                                                                                    | 5   | 4        | 3   | 2                | 1 | l 0      | -1    |  |  |
| ADD                                                                                                                      | i i | -        | J   | <del>-</del><br> | - | <b>U</b> | ^     |  |  |
| ResX0                                                                                                                    | l x | <u> </u> | i   |                  |   | i        | i il  |  |  |
| ResX1                                                                                                                    | ^   | l x      | l x | X                | x | i        | i il  |  |  |
| ResX3                                                                                                                    | İ   |          | ^   | ^                | ^ | ĺχ       | i il  |  |  |
| ResX2                                                                                                                    | i   | i        | i   |                  | i | i        | i x i |  |  |
| SUB                                                                                                                      | i   | ii       | i   |                  |   | i        | i il  |  |  |
| ResX2                                                                                                                    | i   | x        | i   |                  | İ | i        | i il  |  |  |
| ResX0                                                                                                                    | İ   | İ        | X   | İ                | İ | İ        | i il  |  |  |
| MUL                                                                                                                      | İ   | İ        | i   | İ                | İ | İ        | i il  |  |  |
| ResX3                                                                                                                    | ĺ   | ĺ        | X   |                  |   | ĺ        | į į   |  |  |
| ResX0                                                                                                                    |     |          |     | X                |   |          |       |  |  |
| ResX2                                                                                                                    |     |          |     |                  | X |          |       |  |  |
| SLL                                                                                                                      |     |          |     | i                |   |          |       |  |  |
| ResX2                                                                                                                    |     |          |     | X                |   |          |       |  |  |
| ResX0                                                                                                                    |     |          |     |                  | X | X        |       |  |  |
| SRL                                                                                                                      |     | ļ        | ļ   |                  | i |          | ļ [   |  |  |
| ResX3                                                                                                                    |     | ļ        | ļ   |                  | Х | ļ        | ļ ļ   |  |  |
| ResX2                                                                                                                    |     | ļ        | ļ . |                  |   | X        | ļ [   |  |  |
| DIV                                                                                                                      | ļ   | ļ        | ļ   |                  |   | i        | ļ ļ   |  |  |
| ResX1                                                                                                                    | l   | ļ        | ļ   |                  |   | X        |       |  |  |
| ResX0                                                                                                                    |     |          | l   |                  |   |          | X     |  |  |

liveins: \$x9, \$x11, \$x13, \$x15, \$x17, \$x19

#### Average improvements

#### Artificial test cases (LIT)

|            | TOP-DOWN   | (cycles)           | BOTTOM-UP (cycles) |           |  |
|------------|------------|--------------------|--------------------|-----------|--|
| TEST       | Current    | New                | Current            | New       |  |
| test-001   | 4          | 4                  | 4                  | 4         |  |
| test-002   | 9          | 5                  | 9                  | 5         |  |
| test-003   | 7          | 4                  | 7                  | 4         |  |
| test-004   | 7          | 4                  | 7                  | 4         |  |
| test-005   | 9          | 6                  | 9                  | 6         |  |
| test-006   | 16         | 7                  | 17                 | 7         |  |
| test-007.A | 25         | 12                 | 25                 | 12        |  |
| test-007.B | 25         | 9                  | 25                 | 9         |  |
| test-008   | 12         | 8                  | 12                 | 6         |  |
| test-009   | 9          | 5                  | 9                  | 5         |  |
| test-010   | 11         | 8                  | 13                 | 11        |  |
| test-012   | N/A        | N/A                | 12                 | 8         |  |
| TOTAL      | 134        | 72                 | 149                | 81        |  |
|            | New/Curren | t = <b>0.53731</b> | New/Current        | = 0.54362 |  |

### Recap

#### Better scheduling

...and testing!



#### Better scheduling

...and testing!



#### llc -misched-dump-schedule-trace

#### What's next

#### Adoption steps

- All current models are defaulted with StartAtCycle = [0, ..., 0];
- Aim at replacing the current bookkeeping in the machine scheduler with the new one.
- Bit switch in the schedule model class to enable the new codepath.
- Further investigations:
  - Few CodeGen issues (it seems to find gaps that couldn't be found before)
  - Compile time (threshold of 10 intervals per resource).
- Work is ongoing, but WIP patches are up for review / feedback / try out

#### Reviews on Phabricator

#### Feedback is welcome!

- D150310: Adding StartAtCycle to WriteRes (NFC)
- D150311: Schedule traces in debug
- D150312: Modify MachineScheduler to use StartAtCycle

# Thank you!

Questions?