## GloballSel

Past, Present, and Future

Ahmed Bougacha Quentin Colombet LLVM DevMtg October 2017

## GloballSel

Recap

## GloballSel Recap

Revamp of our instruction selection framework

## GloballSel Recap

Revamp of our instruction selection framework

LLVM Dev Meeting talks:

https://www.youtube.com/watch?v=F6GGbYtae3g

https://www.youtube.com/watch?v=6tfb344A7w8

## GloballSel Recap

#### Revamp of our instruction selection framework

#### LLVM Dev Meeting talks:

https://www.youtube.com/watch?v=F6GGbYtae3g

https://www.youtube.com/watch?v=6tfb344A7w8

#### LLVM Dev RFCs:

http://lists.llvm.org/pipermail/llvm-dev/2013-August/064696.html

http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html

# GloballSel Pipeline

Generic Machinelnstr

Machinelnstr

# Past

More Concise Type System

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_ADD %2, %1
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

More Concise Type System

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%4(_,p0) = G_GEP %0, %1
%5(_,s32) = G_LOAD %4(_,p0)
```

#### More Concise Type System

Displacement

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%4(_,p0) = G_GEP %0, %1
%5(_,s32) = G_LOAD %4(_,p0)
```

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT ~3
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_AND %2, %1
%4(_,p0) = G_INTTOPTR %3
```

More Concise Type System

Displacement

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%4(_,p0) = G_GEP %0, %1
%5(_,s32) = G_LOAD %4(_,p0)
```

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT ~3
%2(_,s64) = G_PTRTOINT %0
%3(_,s64) = G_AND %2, %1
%4(_,p0) = G_INTTOPTR %3
```

More Concise Type System

Displacement

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%4(_,p0) = G_GEP %0, %1
%5(_,s32) = G_LOAD %4(_,p0)
```

```
%0(_,p0) = COPY %x0
%4(_,p0) = G_PTR_MASK %0, 2
```

#### More Concise Type System

#### Displacement

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_PTRTOINT %0
%2(_,s64) = G_CONSTANT 4
%3(_,s64) = G_ADD %1, %2
%4(_,p0) = G_INTTOPTR %3
%5(_,s32) = G_LOAD %4(_,p0)
```

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_CONSTANT 4
%4(_,p0) = G_GEP %0, %1
%5(_,s32) = G_LOAD %4(_,p0)
```

```
%0(_,p0) = COPY %x0
%1(_,s64) = G_PTRTOINT %0
%2(_,s64) = G_CONSTANT ~3
%3(_,s64) = G_AND %1, %2
%4(_,p0) = G_INTTOPTR %3
```

```
%0(_,p0) = COPY %x0
%4(_,p0) = G_PTR_MASK %0, 2
```



Easier Bring Up



@fctI @fctI @fctN

































Easier Bring Up





-global-isel-abort=0 or -global-isel-abort=2 (for warnings)

Easier Bring Up





-global-isel-abort=0 or -global-isel-abort=2 (for warnings)

Easier Bring Up



fctI:



```
fctI:
... = op1 ...
... = opN ...
```



```
fctI:
... = op1 ...
... = opN ...
```



```
fctI:
... = op1 ...
... = opN ...
```



```
fctI:
... = leg0p1 ...
... = leg0pN ...
```



```
fctI:
... = leg0p1 ...
... = leg0pN ...
```



```
fctI:
... = leg0p1 ...
... = leg0pN ...
```



```
fctI:
... = leg0p1 ...
... = leg0pN ...
```



```
fctI: x
... = leg0p1 ...
... = leg0pN ...
```



```
fctI: x
... = leg0p1 ...
= leg0pN ...
```



```
fctI: x
... = leg0p1 ...
= leg0pN ...
```



```
fctI: x
... = leg0p1 ...
= leg0pN ...
```



```
fctI: X
... = leg0p1 ...
= leg0pN ...
```

Easier Bring Up



fctI:

Easier Bring Up



fctI:







### What Changed?

### What Changed?

CallLowering API
More instructions supported



%6(\_,s128) = G\_OR %4, %5

```
%6.low(_,s64) = G_OR %4.low, %5.low
%6.high(_,s64) = G_OR %4.high, %5.high
```

```
%7(_,s64) = G_OR %4.low, %5.low
%8(_,s64) = G_OR %4.high, %5.high
%6(_,s128) = G_SEQUENCE %7 %8
```

```
%7(_,s64) = G_OR %4.low, %5.low
%8(_,s64) = G_OR %4.high, %5.high
%6(_,s128) = G_SEQUENCE %7,%8
```

```
%7(_,s64) = G_OR %4.low, %5.low
%8(_,s64) = G_OR .high, .high
%6(_,s128) = G_SEQUENCE %7,%8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_SEQUENCE %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_SEQUENCE %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_SEQUENCE %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_SEQUENCE %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_SEQUENCE %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

Simpler More Regular Semantic

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

Same type

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

```
%0(_,s64), %1(_,s64) = G_EXTRACT %4, 0, 64
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

```
Same type

%0(_,s64), %1(_,s64) = G_UNMERGE_VALUES %4
%2(_,s64), %3(_,s64) = G_EXTRACT %5, 0, 64
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

```
%0(_,s64), %1(_,s64) = G_UNMERGE_VALUES %4
%2(_,s64), %3(_,s64) = G_UNMERGE_VALUES %5
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

Simpler More Regular Semantic

```
%0(_,s64), %1(_,s64) = G_UNMERGE_VALUES %4
%2(_,s64), %3(_,s64) = G_UNMERGE_VALUES %5
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

• G\_SEQUENCEs are gone

```
%0(_,s64), %1(_,s64) = G_UNMERGE_VALUES %4
%2(_,s64), %3(_,s64) = G_UNMERGE_VALUES %5
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

- G\_SEQUENCEs are gone
- G\_EXTRACTs still exist but have only one result

#### Legalization Artifacts

Simpler More Regular Semantic

```
%0(_,s64), %1(_,s64) = G_UNMERGE_VALUES %4
%2(_,s64), %3(_,s64) = G_UNMERGE_VALUES %5
%7(_,s64) = G_OR %0, %2
%8(_,s64) = G_OR %1, %3
%6(_,s128) = G_MERGE_VALUES %7, %8
```

- G\_SEQUENCEs are gone
- G\_EXTRACTs still exist but have only one result
- G\_[UN]MERGE\_VALUES is used instead for these uses cases

How Does It Work?

Legal

How Does It Work?

Legal NarrowScalar

How Does It Work?

Legal NarrowScalar WidenScalar

How Does It Work?

Legal
NarrowScalar
WidenScalar
FewerElements
MoreElements

Lower
Libcall
Custom
Unsupported
NotFound

How Does It Work?

Legal
NarrowScalar
WidenScalar
FewerElements
MoreElements

Lower
Libcall
Custom
Unsupported
NotFound

%0(\_,s16) = G\_UREM %1, %2

```
Legal
NarrowScalar
WidenScalar
FewerElements
MoreElements
```

```
Lower
Libcall
Custom
Unsupported
NotFound
```

```
%0(_,s16) = G_UREM %1, %2
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%5(_,s32) = G_UREM %3, %4
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

Legal
NarrowScalar
WidenScalar
Cu
FewerElements
Un
MoreElements
No

Lower
Libcall
Custom
Unsupported
NotFound

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%5(_,s32) = G_UREM %3, %4
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

```
NarrowScalar
WidenScalar
FewerElements
MoreElements
```

Legal

```
Lower
Libcall
Custom
Unsupported
NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%5(_,s32) = G_UREM %3, %4
%0(_,s16) = G_TRUNC %5
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

setAction({Opcode, [OpIdx,] Type}, Action)

Legal Lower NarrowScalar WidenScalar FewerElements MoreElements

Libcall Custom Unsupported NotFound

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(\_,s32) = G\_UDIV %3, %4
%7(\_,s32) = G_MUL %6, %4
%5(\_,s32) = G_SUB %3, %7
%0(_, s16) = G_TRUNC %5
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

Legal
NarrowScalar
WidenScalar
FewerElements
MoreElements

Lower
Libcall
Custom
Unsupported
NotFound

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

Legal
NarrowScalar
WidenScalar
FewerElements
MoreElements

Lower
Libcall
Custom
Unsupported
NotFound

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%6(_,s32) = G_UDIV %3, %4
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

```
Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound
```

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%r0 = COPY %3
%r1 = COPY %4
%r0 = BLX uidivmod %r0, %r1
%6(_,s32) = COPY %r0
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```

How Does It Work?

%0(\_,s16) = G\_UREM %1, %2

Legal Lower
NarrowScalar Libcall
WidenScalar Custom
FewerElements Unsupported
MoreElements NotFound

```
%3(_,s32) = G_ZEXT %1
%4(_,s32) = G_ZEXT %2
%r0 = COPY %3
%r1 = COPY %4
%r0 = BLX uidivmod %r0, %r1
%6(_,s32) = COPY %r0
%7(_,s32) = G_MUL %6, %4
%5(_,s32) = G_SUB %3, %7
%0(_,s16) = G_TRUNC %5
```





## What Changed?

#### What Changed?



Partial TableGen support Statically allocated structures



































```
bool AArch64DAGToDAGISel::
    SelectArithImmed(
    SDValue N, [...])
```



```
bool AArch64DAGToDAGISel::
    SelectArithImmed(
    SDValue N, [...])
```



```
bool AArch64DAGToDAGISel::
    SelectArithImmed(
    SDValue N, [...])
```

```
InstructionSelector::ComplexRendererFn
AArch64InstructionSelector::
    selectArithImmed(
        MachineOperand &Root) const
```





















Non-optional library

- Non-optional library
- Ports in progress:

- Non-optional library
- Ports in progress:

X86 AMDGPU AArch64 ARM Hexagon PowerPC Mips SystemZ WebAssembly AVR NVPTX Sparc BPF XCore Lanai RISCV MSP430 Nios2 ARC

- Non-optional library
- Ports in progress:

X86 AMDGPU AArch64 ARM Hexagon PowerPC Mips SystemZ WebAssembly AVR NVPTX Sparc BPF XCore Lanai RISCV MSP430 Nios2 ARC

- Non-optional library
- Ports in progress:

X86 AMDGPU AArch64 ARM Hexagon PowerPC Mips SystemZ WebAssembly AVR NVPTX Sparc BPF XCore Lanai RISCV MSP430 Nios2 ARC

Status: AArch64 -00

Status: AArch64 -00

• Test Suite + SPEC

Status: AArch64 -00

• Test Suite + SPEC

Status: AArch64 - 00

- Test Suite + SPEC
- Clang self-host

Status: AArch64 - 00

- Test Suite + SPEC
- Clang self-host

Status: AArch64 - 00

- Test Suite + SPEC
- Clang self-host
- Our internal software

Status: AArch64 - 00



- Clang self-host
- Our internal software

Status: Select TableGen

Status: Select TableGen

AArch64

Status: Select TableGen

AArch64

% Imported patterns

Status: Select TableGen



% Imported patterns

Status: Select TableGen



ARM

% Imported patterns

Status: Select TableGen



% Imported patterns

Status: Select TableGen



X86

% Imported patterns

Status: Select TableGen



Status: Select TableGen



Performance

Code size

#### Code size



#### Code size



#### Constant placement



Constant placement: -regalloc=fast



Constant placement: Localizer



Code size: before



Code size: with localizer



# Performance Compile time

#### Compile time



#### Compile time



# Performance Compile time









Compile time: per pass, normalized



Runtime: SPEC2006

Runtime: SPEC2006



Runtime: SPEC2006



Runtime: Test Suite



Flesh out the pass pipeline

- Flesh out the pass pipeline
- Kill Localizer

- Flesh out the pass pipeline
- Kill Localizer
- Kill CodeGenPrepare

Beyond the SelectionDAG GloballSelEmitter

Beyond the SelectionDAG GloballSelEmitter

Emit legality info

#### Beyond the SelectionDAG GloballSelEmitter

Emit legality info

```
def : Pat<
   (i32 (mul (GPR32:$Rn, GPR32:$Rm))),
   (...)
>;
```

#### Beyond the SelectionDAG GloballSelEmitter

Emit legality info

```
def : Pat<
  (i32 (mul (GPR32:$Rn, GPR32:$Rm))),
   (...)
>;
```



```
setAction({G_MUL, 0, s32}, Legal)
```

## Future Beyond the SelectionDAG GloballSelEmitter

- Emit legality info
- Emit register bank mapping

### Future Beyond the SelectionDAG GloballSelEmitter

- Emit legality info
- Emit register bank mapping
- Support pure GloballSel patterns

# Future Beyond SelectionDAG

## Future Beyond SelectionDAG

• Feature parity (remember vectors?)

### Future Beyond SelectionDAG

- Feature parity (remember vectors?)
- Improved performance

### Future Beyond Selection DAG

- Feature parity (remember vectors?)
- Improved performance
- More testing!

```
def : Pat<
  (f32 (fma (fneg (FPR32:$Rn), FPR32:$Rm, FPR32:$Ra)),
  (FMSUBSrrr FPR32:$Rn, FPR32:$Rm, FPR32:$Ra)
>;
```

```
def Mul2ToShl1 : Combine<
    (mul type0:$src0, 2),
    (shl type0:$src0, 1),
>;
```

```
def Mul2ToShl1 : Combine<
     (mul type0:$src0, 2),
     (shl type0:$src0, 1),
>;
TableGen
Backend
```





```
Inst
                                                                      Combine
                                                                        inc
        def Mul2ToShl1 : Combine<</pre>
                                                  TableGen
             (mul type0:$src0, 2),
             (shl type0:$src0, 1),
                                                   Backend
        >;
                                                                       GISel
                                                                      Combine
                                                                        .inc
def : RunWhen<Mul2ToShl1,</pre>
              [BeforeLegalized,
                                                    Target
               AfterLegalized]>
                                                    Config
                                                                      Target
def MyPattern : (MyOp(...));
                                                      td
                                                                      Combine
def : RunWhen<MyPattern,</pre>
              [AfterRegBankSelect]>
                                                                        inc
```

Tools to Help the Transition

• Better .mir test format

- Better mir test format
- Better MachineInstr API

- Better mir test format
- Better MachineInstr API
- Regression tests generator

- Better mir test format
- Better MachineInstr API
- Regression tests generator
- SDNode to GMIR migrator

### Contributions Welcome!

### Questions?