

- The machine description describes valid instructions for a machine.
- It is implemented using a YACC grammar and associated group of semantic actions.
- Typical organization:
  - instructions
  - effects (register transfers) of instructions
  - operands of instructions



- Used as a recognizer any time a new instruction is created or modified.
  - instruction selection
  - common subexpression elimination

```
R[r[13]+i.]=r[7];
R[r[13]+i.]=R[r[13]+i.]+1;
=>
R[r[13]+i.]=r[7];
R[r[13]+i.]=r[7]+1; // legal?
```

- code motion
- loop strength reduction



- Translate an RTL to an assembly language or machine code instruction.
- Determine an estimated cost of the instruction. This is used by the CSE phase to check if the replacement RTL is cheaper than the original.
- Determine the type of an instruction for instruction scheduling, etc.
- Produce detailed measurements.



- Evaluation order determination is performed to reduce the number of registers required by an expression.
- Treats an expression as a tree. Determines the cost (number of registers) of performing each subtree. Orders the generation of code for the tree so that the most expensive subtrees are done first.



- Data-flow analysis collects global information about how a function manipulates its data and distributes this information to each block in the control-flow graph.
- Data-flow information can be collected by solving systems of dataflow equations that relate information at various points in the function
- Information is usually represented as bits in a vector so operations (e.g., union and intersection) can be efficiently applied.

### General Types of Data-Flow Analysis

- structural analysis
  - Uses detailed information about control structures to produce data-flow equations.
  - Can be efficient when control-flow graphs are guaranteed to be reducible (only one entry for each loop).
- iterative analysis
  - Solved by iteration until information reaches a fixed point.
  - Easy to implement and works on any flow graph.
  - Most commonly used.
- demand-driven analysis
  - Only obtain needed dataflow information upon demand.
  - Limit the analysis to the portion of the program representation needed to answer the specified query.
  - Usually is accomplished in a recursive manner.



- ullet For variable x and point p, we wish to know if the value of x at p could be used along some path in the flow graph. If so, then x is live at p, otherwise x is dead at p.
- Used for establishing links for instruction selection and performing register assignment, register allocation, code motion, basic induction variable elimination, etc.



- Live variable analysis in VPO is associated with registers and scalars that are local variables or arguments.
- def[B] the set of items assigned values in B prior to any use of the item in B
- use[B] the set of items whose values are used in B prior to any assignment
- We will see that defs and uses are used to later calculate the ins and outs

### Example of Defs and Uses

• Say block B consists of the following RTLs.

```
r[8]=r[8]+1;
r[5]=13;
r[2]=r[5]+r[8];
r[9]=r[2]*r[11];
```

• The following information can be calculated for this block.

```
def[B]=r[5],r[2],r[9]
use[B]=r[8],r[11]
```

### Ins and Outs for Live Variable Analysis

- in[B]
  - the set of items which are live immediately before entering B
  - in[B] = use[B] U (out[B] def[B])
- out[B]
  - the set of items which are live immediately after exiting B
  - out[B] = U in[S], for each immediate successor S of B
- Note that in[B] depends on out[B] and out[B] depends on in[S].





### Defining Links within a Basic Block

- VPO determines where each register is set and where the first use of that register is within the block.
- A link is set if it is safe.
- Example of an unsafe link:

```
r[17]=r[0]; r[0]:
...
ST=HI[foo]+L0[foo];
Sr[0]:...
...
r[18]=r[17]; r[17]:
```

### Example of an Unsafe Global Link

• Global links cross basic block boundaries.

```
Before
    IC=r[6]?5;
                               IC=r[6]?5;
    PC=IC==0, L5;
                               PC=IC==0,L5;
    r[7]=0;
                               r[7] = 0;
     PC=L6;
                               PC=L6;
L5:
                           L5:
5
     r[7]=1;
L6:
                           L6:
     R[base]=r[7];
                          6
                               R[base]=1;
```



- Define a link from the use of an item only if just one definition of the item reaches the use.
- VPO uses SSA information to detect if more than one definition reaches a use.
- In the previous example, two definitions of r[7] reach the use at RTL 6.





- A link can only be defined from the first use if it is the only first use of the definition.
- In the previous example there is a first use at RTL 4 and a first use at RTL 6.

# Register Assignment Pseudo registers contain temporary values. Register assignment is the mapping of pseudo registers to hardware registers. Sometimes this phase is called local register allocation and may also be combined with code generation.



- The code expander uses on average about 10 pseudo registers for each source-level statement.
- Pseudo registers are never live across source-level statements.
- After the initial instruction selection, this number is reduced to about 5.
- Usually only a maximum of 2 or 3 pseudo registers are live simultaneously at any point in the function.
- Evaluation order determination can decrease this number.



r[1]:

-- r[1] set, not dead

 $r[34]=R[_n];$ 

r[1]=R[\_a]; r[1]=r[1]+1;

 $R[_m]=r[1];$ 

 $r[34] = R[_n];$ 

=>

```
Register Spills

Register spills are introduced when the number of live pseudo registers exceeds the number of allocable registers available on the target machine.

The hardware register chosen to spill (store is generated) is the one whose next pseudo register reference is furthest away.

At the point the next reference to this spilled pseudo register is encountered, a new hardware register is associated with the pseudo register and a load inserted to get the value from memory.
```

```
Reg Assign
                                     Register Allocation
Example of a Register Spill
      1 r[33]=R[i];
       2 r[35]=r[33];
      10 r[38]=R[j];
                              -- spill needed here
      16 r[38]=r[38]+r[33];
   =>
      1 r[2]=R[i];
       2 r[3]=r[2];
        R[tmp]=r[2];
                              -- spill is inserted
      10 r[2]=R[j];
                              -- now can use r[2]
      15 r[3] = R[tmp];
                              -- r[3] now available
      16 r[2]=r[2]+r[3];
```



### Example of a Dedicated Use of a Hardware Register

```
...=a+foo();

=>

r[32]=R[_a];

ST=foo;

rr[0] -- RESLINE

Ur[0]r[1]...r[15] -- USELINE

r[33]=r[0]; r[0]:

r[33]=r[33]+r[32];
```

• r[32] must be assigned to a nonscratch register.

### Detecting Loops

- Detecting loops is important since code inside loops is typically executed much more often than code outside of loops.
- Optimizations associated with loops in VPO include:
  - loop inversion
  - loop-invariant code motion
  - loop strength reduction
  - induction variable elimination
  - recurrence elimination
  - loop unrolling

- Block d dominates block n if every path from the initial block of the flow graph to n goes through d. A block always dominates itself.
- Dominator information is used to calculate natural loops, which will be described later.
- Dominators are also used in a variety of other optimizations.
  - loop-invariant code motion
  - detection of basic induction variables













- A reducible flow graph is one whose edges can be partitioned into two sets:
  - backward edges as defined earlier
  - forward edges form a DAG (directed acyclic graph)



## Example of a Nonreducible Flow Graph • 2 → 3 and 3 → 2 are not backedges since neither node dominates the other. • However, the graph is still cyclic. • Most compilers only perform optimizations on natural loops.





























### What Variables Can Be Allocated?

- Only scalar variables are candidates for register allocation.
- VPO does not allocate variables that are indirectly referenced.

```
foo(&a);
=>
   r[8]=r[14]+a.;
   ST=HI[foo]+LO[foo];
```

### Benefit Must Outweigh the Cost

- Cost of using a nonscratch registers on most machines is a save and a restore (2 memory references). Benefit would require more than 2 estimated references.
- Parameters have to be loaded from the stack (unless they are passed through a register). Allocating a stack parameter would require another memory reference.

### Register Allocation Which Variables Are Allocated First?

### • VPO allocates live ranges of variables first that have the greatest potential benefit.

- VPO uses a simple estimate of the frequency for each variable reference based on the loop nesting level in which the reference appears.
  - (loop nesting level  $\ll 4$ ) + 1
- The estimate for a live range is the sum of the estimates for each reference of the variable within the live range.



• Given an assignment  $x \leftarrow y$ , replace later uses of x with y as long as intervening instructions have not changed the value of either x or y

```
r[2]=r[1];
                      r[2]=r[1];
                      r[3]=r[3]+r[1];
r[3]=r[3]+r[2];
r[2]=r[4]+r[2]; \Rightarrow r[2]=r[4]+r[1];
r[4]=r[4]+r[2];
                      r[4]=r[4]+r[2];
```











- A peephole optimization rule is typically expressed with the following parts:
  - matching pattern which can match one or more assembly instructions
  - semantic checks
  - replacement pattern which replaces these matched instructions

## Example Peephole Optimization Rule

• Below is a rule to eliminate an unconditional jump by reversing a conditional branch.

```
b%0
             L%1",
             L%2",
       ba
"L%1:"
invert(%0,%3)
=>
       b%3
             L%2",
"L%1:"
```



- Peephole optimization is a convenient method for dealing with a variety of special cases.
- One can quickly specify a number of rules that are appropriate for a particular architecture.
- The validity and completeness of such optimizations would always be a concern.



- Common subexpression elimination replaces code that recalculates an expression that is already currently available in a cheaper form.
- Most implementations work on machine-independent intermediate code (trees, triples, quads, etc) and not on machine instructions.

```
b = a[i];
v = i*4;
```

### Common Subexpression Elimination in VPO

- VPO symbolically simulates register transfers and records the values that they store.
- When it encounters an instruction that recomputes an existing value and it appears to be beneficial to replace, it will replace the calculation with the earlier value.

### Symbolic Expressions

• An s-expr is a symbolic expression. Two s-exprs a and b are said to match at a given point in the function, if and only if it is determined that a and b have the same value at that point.

After symbolic simulation of the second instruction, r[1] matches M[\_a]+M[\_b].

## 

- After the first two instructions, we have the equivalence class (M[\_c], r[1], M[\_a]).
- After the 3rd instruction, we find the equivalence class that contains  $M[\_c]$  and replace it with the cheapest member in the class, r[1]. So the 3rd instruction is updated as shown below and the death of r[1] is moved to the 3rd instruction.

```
r[1]=M[_c];
=> M[_a]=r[1];
r[2]=r[1]; r[1]:
```



- Dead assignment elimination eliminates useless assignments to a register or to memory.
- It is performed during the CSE phase in VPO since the CSE phase keeps track of the number of uses of a variable or a register after it is updated.

```
r[4]=r[5]; r[4]=r[5];
r[3]=r[4]; => r[3]=r[5]; => r[3]=r[5];
r[2]=r[4]; r[4]: r[2]=r[5]; r[4]: r[2]=r[5];
```







