### Code Generation: Sethi Ullman Algorithm

Amey Karkare karkare@cse.iitk.ac.in

March 28, 2019

Generates code for expression trees (not dags).

- Generates code for expression trees (not dags).
- Target machine model is simple. Has

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and

- Generates code for expression trees (not dags).
- Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and
  - binary operations involving either a register and a memory, or two registers.

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and
  - binary operations involving either a register and a memory, or two registers.
- ▶ Does not use algebraic properties of operators. If a \* b has to be evaluated using  $r_1 \leftarrow r_1 * r_2$ , then a and b have to be necessarily loaded in  $r_1$  and  $r_2$  respectively.

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and
  - binary operations involving either a register and a memory, or two registers.
- Does not use algebraic properties of operators. If a \* b has to be evaluated using r₁ ← r₁ \* r₂, then a and b have to be necessarily loaded in r₁ and r₂ respectively.
- Extensions to take into account algebraic properties of operators.

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and
  - binary operations involving either a register and a memory, or two registers.
- ▶ Does not use algebraic properties of operators. If a \* b has to be evaluated using r<sub>1</sub> ← r<sub>1</sub> \* r<sub>2</sub>, then a and b have to be necessarily loaded in r<sub>1</sub> and r<sub>2</sub> respectively.
- Extensions to take into account algebraic properties of operators.
- ▶ Generates optimal code i.e. code with least number of instructions. There may be other notions of optimality.

- Generates code for expression trees (not dags).
- ► Target machine model is simple. Has
  - a load instruction,
  - a store instruction, and
  - binary operations involving either a register and a memory, or two registers.
- ▶ Does not use algebraic properties of operators. If a \* b has to be evaluated using r<sub>1</sub> ← r<sub>1</sub> \* r<sub>2</sub>, then a and b have to be necessarily loaded in r<sub>1</sub> and r<sub>2</sub> respectively.
- Extensions to take into account algebraic properties of operators.
- ▶ Generates optimal code i.e. code with least number of instructions. There may be other notions of optimality.
- Complexity is linear in the size of the expression tree. Reasonably efficient.



▶ Here is the expression a/(b+c)-c\*(d+e) represented as a tree:

▶ Here is the expression a/(b+c)-c\*(d+e) represented as a tree:

▶ Here is the expression a/(b+c)-c\*(d+e) represented as a tree:



We have not identified common sub-expressions; else we would have a directed acyclic graph (DAG):



Let  $\Sigma$  be a countable set of variable names, and  $\Theta$  be a finite set of binary operators. Then,

- Let  $\Sigma$  be a countable set of variable names, and  $\Theta$  be a finite set of binary operators. Then,
  - 1. A single vertex labeled by a name from  $\Sigma$  is an expression tree.

- Let  $\Sigma$  be a countable set of variable names, and  $\Theta$  be a finite set of binary operators. Then,
  - 1. A single vertex labeled by a name from  $\Sigma$  is an expression tree.
  - 2. If  $T_1$  and  $T_2$  are expression trees and  $\theta$  is a operator in  $\Theta$ ,



- Let  $\Sigma$  be a countable set of variable names, and  $\Theta$  be a finite set of binary operators. Then,
  - 1. A single vertex labeled by a name from  $\Sigma$  is an expression tree.
  - 2. If  $T_1$  and  $T_2$  are expression trees and  $\theta$  is a operator in  $\Theta$ ,



▶ In this example

$$\Sigma = \{a, b, c, d, e, \dots\}, \text{ and } \Theta = \{+, -, *, /, \dots\}$$



▶ We assume a machine with finite set of registers  $r_0$ ,  $r_1$ , ...,  $r_k$ , countable set of memory locations, and instructions of the form:

▶ We assume a machine with finite set of registers  $r_0$ ,  $r_1$ , ...,  $r_k$ , countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

We assume a machine with finite set of registers r<sub>0</sub>, r<sub>1</sub>, ..., r<sub>k</sub>, countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

2. 
$$r \leftarrow m$$
 (load instruction)

▶ We assume a machine with finite set of registers  $r_0$ ,  $r_1$ , ...,  $r_k$ , countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

- 2.  $r \leftarrow m$  (load instruction)
- 3.  $r \leftarrow r \ op \ m$  (the result of  $r \ op \ m$  is stored in r)

▶ We assume a machine with finite set of registers  $r_0$ ,  $r_1$ , ...,  $r_k$ , countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

- 2.  $r \leftarrow m$  (load instruction)
- 3.  $r \leftarrow r \ op \ m$  (the result of  $r \ op \ m$  is stored in r)
- 4.  $r_2 \leftarrow r_2$  op  $r_1$  (the result of  $r_2$  op  $r_1$  is stored in  $r_2$ )

▶ We assume a machine with finite set of registers  $r_0$ ,  $r_1$ , ...,  $r_k$ , countable set of memory locations, and instructions of the form:

```
    m ← r (store instruction)
    r ← m (load instruction)
    r ← r op m (the result of r op m is stored in r)
    r<sub>2</sub> ← r<sub>2</sub> op r<sub>1</sub> (the result of r<sub>2</sub> op r<sub>1</sub> is stored in r<sub>2</sub>)
```

Note:

We assume a machine with finite set of registers r<sub>0</sub>, r<sub>1</sub>, ..., r<sub>k</sub>, countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

- 2.  $r \leftarrow m$  (load instruction)
- 3.  $r \leftarrow r \ op \ m$  (the result of  $r \ op \ m$  is stored in r)
- 4.  $r_2 \leftarrow r_2$  op  $r_1$  (the result of  $r_2$  op  $r_1$  is stored in  $r_2$ )
- Note:
  - 1. In instruction 3, the memory location is the right operand.

We assume a machine with finite set of registers r<sub>0</sub>, r<sub>1</sub>, ..., r<sub>k</sub>, countable set of memory locations, and instructions of the form:

```
1. m \leftarrow r (store instruction)
```

- 2.  $r \leftarrow m$  (load instruction)
- 3.  $r \leftarrow r \ op \ m$  (the result of  $r \ op \ m$  is stored in r)
- 4.  $r_2 \leftarrow r_2$  op  $r_1$  (the result of  $r_2$  op  $r_1$  is stored in  $r_2$ )

#### Note:

- 1. In instruction 3, the memory location is the right operand.
- In instruction 4, the destination register is the same as the left operand register.

▶ Determines an evaluation order of the subtrees which requires minimum number of registers.

- Determines an evaluation order of the subtrees which requires minimum number of registers.
- ▶ If the left and right subtrees require  $l_1$ , and  $l_2$  ( $l_1 < l_2$ ) registers respectively, what should be the order of evaluation?



#### ► Choice 1

1. Evaluate left subtree first, leaving result in a register. This requires upto  $l_1$  registers.

- 1. Evaluate left subtree first, leaving result in a register. This requires upto  $l_1$  registers.
- 2. Evaluate the right subtree. During this we might require upto  $l_2+1$  registers ( $l_2$  registers for evaluating the right subtree and one register to hold the value of the left subtree.)

- 1. Evaluate left subtree first, leaving result in a register. This requires upto  $l_1$  registers.
- 2. Evaluate the right subtree. During this we might require upto  $l_2+1$  registers ( $l_2$  registers for evaluating the right subtree and one register to hold the value of the left subtree.)
- ▶ The maximum register requirement in this case is  $max(l_1, l_2 + 1) = l_2 + 1$ .

#### ► Choice 2

1. Evaluate the right subtree first, leaving the result in a register. During this evaluation we shall require upto  $l_2$  registers.

- 1. Evaluate the right subtree first, leaving the result in a register. During this evaluation we shall require upto  $I_2$  registers.
- 2. Evaluate the left subtree. During this, we might require upto  $l_1+1$  registers.

- 1. Evaluate the right subtree first, leaving the result in a register. During this evaluation we shall require upto  $I_2$  registers.
- 2. Evaluate the left subtree. During this, we might require upto  $\it l_1+1$  registers.
- ▶ The maximum register requirement over the whole tree is

$$max(l_1 + 1, l_2) = l_2$$

# Key Idea

#### ► Choice 2

- 1. Evaluate the right subtree first, leaving the result in a register. During this evaluation we shall require upto  $I_2$  registers.
- 2. Evaluate the left subtree. During this, we might require upto  $\it l_1+1$  registers.
- ▶ The maximum register requirement over the whole tree is

$$max(l_1 + 1, l_2) = l_2$$

# Key Idea

- ► Choice 2
  - 1. Evaluate the right subtree first, leaving the result in a register. During this evaluation we shall require upto  $I_2$  registers.
  - 2. Evaluate the left subtree. During this, we might require upto  $\it l_1+1$  registers.
- ▶ The maximum register requirement over the whole tree is

$$max(l_1 + 1, l_2) = l_2$$

Therefore the subtree requiring more registers should be evaluated first.



► Label each node by the number of registers required to evaluate it in a store free manner.

► Label each node by the number of registers required to evaluate it in a store free manner.

► Label each node by the number of registers required to evaluate it in a store free manner.



► Label each node by the number of registers required to evaluate it in a store free manner.



▶ Left and the right leaves are labeled 1 and 0 respectively, because the left leaf must necessarily be in a register, whereas the right leaf can reside in memory.



▶ Visit the tree in post-order. For every node visited do:

- Visit the tree in post-order. For every node visited do:
  - 1. Label each left leaf by 1 and each right leaf by 0.

- Visit the tree in post-order. For every node visited do:
  - 1. Label each left leaf by 1 and each right leaf by 0.
  - 2. If the labels of the children of a node n are  $l_1$  and  $l_2$  respectively, then

$$label(n) = max(l_1, l_2), if l_1 \neq l_2$$
  
=  $l_1 + 1$ , otherwise

 The code generation algorithm is represented as a function gencode(n), which produces code to evaluate the node labeled n.

- The code generation algorithm is represented as a function gencode(n), which produces code to evaluate the node labeled n.
- 2. Register allocation is done from a stack of register names rstack, initially containing  $r_0, r_1, \ldots, r_k$  (with  $r_0$  on top of the stack).

- 1. The code generation algorithm is represented as a function gencode(n), which produces code to evaluate the node labeled n.
- 2. Register allocation is done from a stack of register names rstack, initially containing  $r_0, r_1, \ldots, r_k$  (with  $r_0$  on top of the stack).
- 3. gencode(n) evaluates n in the register on the top of the stack.

- 1. The code generation algorithm is represented as a function gencode(n), which produces code to evaluate the node labeled n.
- 2. Register allocation is done from a stack of register names rstack, initially containing  $r_0, r_1, \ldots, r_k$  (with  $r_0$  on top of the stack).
- 3. gencode(n) evaluates n in the register on the top of the stack.
- 4. Temporary allocation is done from a stack of temporary names tstack, initially containing  $t_0, t_1, \ldots, t_k$  (with  $t_0$  on top of the stack).

- 1. The code generation algorithm is represented as a function gencode(n), which produces code to evaluate the node labeled n.
- 2. Register allocation is done from a stack of register names rstack, initially containing  $r_0, r_1, \ldots, r_k$  (with  $r_0$  on top of the stack).
- 3. gencode(n) evaluates n in the register on the top of the stack.
- 4. Temporary allocation is done from a stack of temporary names tstack, initially containing  $t_0, t_1, \ldots, t_k$  (with  $t_0$  on top of the stack).
- 5. swap(rstack) swaps the top two registers on the stack.

gencode(n) described by case analysis on the type of the node n.

- gencode(n) described by case analysis on the type of the node n.
  - 1. n is a left leaf:



- gencode(n) described by case analysis on the type of the node n.
  - 1. n is a left leaf:



- gencode(n) described by case analysis on the type of the node n.
  - 1. n is a left leaf:



$$gen(top(rstack) \leftarrow name)$$

Comments: n is named by a variable say name. Code is generated to load name into a register.

2. n's right child is a leaf:



2. n's right child is a leaf:



#### 2. n's right child is a leaf:



Comments:  $n_1$  is first evaluated in the register on the top of the stack, followed by the operation op leaving the result in the same register.

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



swap(rstack);

Right child goes into next to top register

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



swap(rstack); Right child goes into next to top register  $gencode(n_2)$ ; Evaluate right child

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



swap(rstack); Right child goes into next to top register  $gencode(n_2)$ ; Evaluate right child

R := pop(rstack);

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



swap(rstack); Right child goes into next to top register

 $gencode(n_2)$ ; Evaluate right child

R := pop(rstack);

 $gencode(n_1)$ ; Evaluate left child

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



```
swap(rstack); Right child goes into next to top register gencode(n_2); Evaluate right child R := pop(rstack); gencode(n_1); Evaluate left child gen(top(rstack) \leftarrow top(rstack) \ op \ R); Issue op
```

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



```
swap(rstack); Right child goes into next to top register gencode(n_2); Evaluate right child R := pop(rstack); gencode(n_1); Evaluate left child gen(top(rstack) \leftarrow top(rstack) \ op \ R); Issue op push(rstack, R);
```

3. The left child of n requires lesser number of registers. This requirement is strictly less than the available number of registers



```
swap(rstack); Right child goes into next to top register gencode(n_2); Evaluate right child R := pop(rstack); gencode(n_1); Evaluate left child gen(top(rstack) \leftarrow top(rstack) \ op \ R); Issue op push(rstack, R); swap(rstack) Restore register stack
```





4. The right child of n requires lesser (or the same) number of registers than the left child, and this requirement is strictly less than the available number of registers



 $gencode(n_1);$ 



```
gencode(n_1);

R := pop(rstack);
```



```
gencode(n_1);

R := pop(rstack);

gencode(n_2);
```



```
gencode(n_1);

R := pop(rstack);

gencode(n_2);

gen(R \leftarrow R \ op \ top(rstack));
```

4. The right child of n requires lesser (or the same) number of registers than the left child, and this requirement is strictly less than the available number of registers



```
gencode(n_1);
R := pop(rstack);
gencode(n_2);
gen(R \leftarrow R \ op \ top(rstack));
push(rstack, R)
```

4. The right child of n requires lesser (or the same) number of registers than the left child, and this requirement is strictly less than the available number of registers



```
gencode(n_1);
R := pop(rstack);
gencode(n_2);
gen(R \leftarrow R \ op \ top(rstack));
push(rstack, R)
```

Comments: Same as case 3, except that the left sub-tree is evaluated first.

















5. Both the children of n require registers greater or equal to the available number of registers.



Comments: In this case the right sub-tree is first evaluated into a temporary. This is followed by the evaluations of the left sub-tree and n into the register on the top of the stack.

# An Example

For the example:



# An Example

For the example:



assuming two available registers  $r_0$  and  $r_1$ , the calls to gencode and the generated code are shown on the next slide.



# An Example



▶ The algorithm is optimal because

- ▶ The algorithm is optimal because
  - 1. The number of load instructions generated is optimal.

- ▶ The algorithm is optimal because
  - 1. The number of load instructions generated is optimal.
  - 2. Each binary operation specified in the expression tree is performed only once.

- ▶ The algorithm is optimal because
  - 1. The number of load instructions generated is optimal.
  - 2. Each binary operation specified in the expression tree is performed only once.
  - 3. The number of stores is optimal.

- ▶ The algorithm is optimal because
  - 1. The number of load instructions generated is optimal.
  - 2. Each binary operation specified in the expression tree is performed only once.
  - 3. The number of stores is optimal.
- We shall now elaborate on each of these.

 It is easy to verify that the number of loads required by any program computing an expression tree is at least equal to the number of left leaves. This algorithm generates no more loads than this.

- It is easy to verify that the number of loads required by any program computing an expression tree is at least equal to the number of left leaves. This algorithm generates no more loads than this.
- 2. Each node of the expression tree is visited exactly once. If this node specifies a binary operation, then the algorithm branches into steps 2,3,4 or 5, and at each of these places code is generated to perform this operation exactly once.

3. The number of stores is optimal: this is harder to show.

- 3. The number of stores is optimal: this is harder to show.
  - ▶ Define a *major node* as a node, each of whose children has a label at least equal to the number of available registers.

- 3. The number of stores is optimal: this is harder to show.
  - ▶ Define a *major node* as a node, each of whose children has a label at least equal to the number of available registers.
  - If we can show that the number of stores required by any program computing an expression tree is at least equal the number of major nodes, then our algorithm produces minimal number of stores (Why?)

► To see this, consider an expression tree and the code generated by any optimal algorithm for this tree.

- ► To see this, consider an expression tree and the code generated by any optimal algorithm for this tree.
- ► Assume that the tree has *M* major nodes.

- ► To see this, consider an expression tree and the code generated by any optimal algorithm for this tree.
- ► Assume that the tree has *M* major nodes.
- Now consider a tree formed by replacing the subtree S evaluated by the first store, with a leaf labeled by a name I.

- ► To see this, consider an expression tree and the code generated by any optimal algorithm for this tree.
- ► Assume that the tree has *M* major nodes.
- Now consider a tree formed by replacing the subtree S evaluated by the first store, with a leaf labeled by a name I.

- ➤ To see this, consider an expression tree and the code generated by any optimal algorithm for this tree.
- Assume that the tree has M major nodes.
- Now consider a tree formed by replacing the subtree S evaluated by the first store, with a leaf labeled by a name I.



Let n be the major node in the original tree, just above S, and  $n_1$  and  $n_2$  be its immediate descendants ( $n_1$  could be l itself).

1. In the modified tree, the (modified) label of  $n_1$  might have decreased but the label of  $n_2$  remains unaffected ( $\geq k$ , the available number of registers).

- 1. In the modified tree, the (modified) label of  $n_1$  might have decreased but the label of  $n_2$  remains unaffected ( $\geq k$ , the available number of registers).
- 2. The label of n is  $\geq k$ .

- 1. In the modified tree, the (modified) label of  $n_1$  might have decreased but the label of  $n_2$  remains unaffected ( $\geq k$ , the available number of registers).
- 2. The label of n is  $\geq k$ .
- The node n may no longer be a major node but all other major nodes in the original tree continue to be major nodes in the modified tree.

- 1. In the modified tree, the (modified) label of  $n_1$  might have decreased but the label of  $n_2$  remains unaffected ( $\geq k$ , the available number of registers).
- 2. The label of n is  $\geq k$ .
- 3. The node *n* may no longer be a major node but all other major nodes in the original tree continue to be major nodes in the modified tree.
- 4. Therefore the number of major nodes in the modified tree is M-1.

- 1. In the modified tree, the (modified) label of  $n_1$  might have decreased but the label of  $n_2$  remains unaffected ( $\geq k$ , the available number of registers).
- 2. The label of n is  $\geq k$ .
- 3. The node *n* may no longer be a major node but all other major nodes in the original tree continue to be major nodes in the modified tree.
- 4. Therefore the number of major nodes in the modified tree is M-1.
- 5. If we assume as induction hypothesis that the number of stores for the modified tree is at least M-1, then the number of stores for the original tree is at least M.

#### SETHI-ULLMAN ALGORITHM: COMPLEXITY

Since the algorithm visits every node of the expression tree twice – once during labeling, and once during code generation, the complexity of the algorithm is O(n).