### Basic Blocks and Common Subexpressions

In [1]:
import nbimporter; nbimporter.options["only_defs"] = False
from P0 import compileString

*Common subexpression elimination* is a code optimization technique that is suitable for register-based architectures. The algorithm for eliminating common subexpressions, while parsing the input from left to right, builds a directed acyclic graph that is represented by a table. The table numbers each subexpression. Each time a new expression is recognized, it is looked up in the table and if it is not present, it is assigned a new number. The key is that subexpressions are referred to by their number. For example, when in 

    a × (a + b) + (a + b) × c

the first `a` is recognized, it gets the number `$1`; when the second `a` is recognized, nothing changes as `a` has been given a number already; when `b` is recognized, it gets the number `$2`. Now `a + b` is recognized, the operands are replaced with their number (by looking them up in the table constructed so far) and `$1 + $2` is given a new number. This process continues until the end of the expression. 

| expression | number |
|:-----------|:-------|
| `a`        | `$1`   |
| `b`        | `$2`   |
| `$1 + $2`  | `$3`   |
| `$1 × $3`  | `$4`   |
| `c`        | `$5`   |
| `$3 × $5`  | `$6`   |
| `$4 + $6`  | `$7`   |

*Three-address code* is an abstract representation of machine code consisting of:

- `$r := c`, loading constant `c` to register `$r`
- `$r := x`, loading the memory at location `x` to register `$r`
- `x := $r`, storing register `$r` to memory at location `x`
- `$r := *$s`, loading the memory at `$s` to register `$r`
- `*$r := $s`, storing register `$s` to memory at `$r`
- `$r := $s ⊕ $t`, binary operation `⊕` with three registers
- `goto L`, unconditional jump to instruction with label `L`
- `if $r goto L`, `ifnot $r goto L`, `if $r ⊜ $s goto L`, conditional jump to `L` with relation `⊜`

Formally, if `M` is the memory,
```
x := *y   stands for   x := M[y]
*x := y   stands for   M[x] := y
```

Three-address code assumes that there are arbitrarilly many registers available. Three-address code is used as an intermediate representation between the front-end and back-end, for example by [LLVM](http://llvm.org/). Typically, it includes instructions for parameter passing and procedure calls as well, which we don't consider here.

Above table for the DAG of expression `a × (a + b) + (a + b) × c` corresponds line-by-line to three-address code:

```
$1 := a
$2 := b
$3 := $1 + $2
$4 := $1 × $3
$5 := c
$6 := $3 × $5
$7 := $4 + $6
```

This in turn translates immediately to RISC-V code or other RISC code:

```
lw $1, a
lw $2, b
add $3, $1, $2
mul $4, $1, $3
lw $5, c
mul $6, $3, $5
add $7, $4, $6
```

The same way as the parse tree is not explicitly constructed, the DAG here does not need to be explicitly constructed. Perhaps surprisingly, common subexpressions can be eliminated in a one-pass compiler like P0. The implementation in P0 is rather simple:
- Set `regs` records which registers are currently in use.
- Dictionary `reguse` records which three-address code is stored in which register.
For example, while parsing program `x := a × (a + b) + (a + b) × c`, the dictionary `reguse` could be:
```Python
<ST.Var object at 0x7f45666b5450>: 's11',
<ST.Var object at 0x7f45666b7ad0>: 's3',
('add', 's11', 's3'): 's2',
('mul', 's11', 's2'): 's6',
<ST.Var object at 0x7f45666b6190>: 's7',
('mul', 's2', 's7'): 's8',
('add', 's6', 's8'): 's5'}
```
In `put` of `CGriscv`, each time when a new register is used, its use is recorded in `reguse`; before generating an instruction, `reguse` is checked to see if the result of that instruction is already in a register. With `cd` being the instruction and `x`, `y` the operands, `put` contains:

```Python
        if (cd, x.reg, y.reg) in reguse: x.reg = reguse[(cd, x.reg, y.reg)]
        else:
            x.reg, r = obtainReg(), x.reg # r is source, x.reg is destination
            putOp(cd, x.reg, r, y.reg)#; releaseReg(y.reg)
            reguse[(cd, r, y.reg)] = x.reg
```

A *basic block* is a sequence of three-address code instructions where only the first instruction can be the target of a goto instruction. A block may contain conditional goto instructions anywhere but can have an unconditional goto or a procedure call only as a the last instruction. (Note that other definitions of basic blocks may differ.) Analyze the code generated for `p1`:

In [2]:
compileString("""
var a, b, c: integer
program p1
  a := 1
  if (b = 2) or (c = 3) then
      a := 4; b := 5
  else a := 6
  b := 7
""", target = 'riscv')

	.data
a_:	.space 4
b_:	.space 4
c_:	.space 4
	.text
	.globl main
main:	
	jal ra, p1
	addi a0, zero, 0
	addi a7, zero, 93
	scall
	.globl p1
p1:	
	addi sp, sp, -16
	sw ra, 12(sp)
	sw s0, 8(sp)
	addi s0, sp, 16
	addi s3, zero, 1
	la s11, a_
	sw s3, 0(s11)
	la s8, b_
	lw s6, 0(s8)
	addi s9, zero, 2
	beq s6, s9, L2
L1:	
	la s4, c_
	lw s9, 0(s4)
	addi s5, zero, 3
	bne s9, s5, L3
L4:	
L2:	
	addi s3, zero, 4
	la s11, a_
	sw s3, 0(s11)
	addi s6, zero, 5
	la s8, b_
	sw s6, 0(s8)
	j L5
L3:	
	addi s3, zero, 6
	la s11, a_
	sw s3, 0(s11)
L5:	
	addi s3, zero, 7
	la s11, b_
	sw s3, 0(s11)
	lw ra, 12(sp)
	lw s0, 8(sp)
	addi sp, sp, 16
	ret


The basic blocks are delineated by labels. There are in total six basic blocks. 

Identify the basic blocks of `p2` and explain them in terms of the source code!

In [3]:
compileString("""
var x, y: integer
program p2
  x := 1
  while y < 2 do
      x := x + 3
""", target = 'riscv')

	.data
x_:	.space 4
y_:	.space 4
	.text
	.globl main
main:	
	jal ra, p2
	addi a0, zero, 0
	addi a7, zero, 93
	scall
	.globl p2
p2:	
	addi sp, sp, -16
	sw ra, 12(sp)
	sw s0, 8(sp)
	addi s0, sp, 16
	addi s3, zero, 1
	la s11, x_
	sw s3, 0(s11)
L1:	
	la s11, y_
	lw s3, 0(s11)
	addi s6, zero, 2
	bge s3, s6, L2
L3:	
	la s11, x_
	lw s3, 0(s11)
	addi s6, s3, 3
	la s8, x_
	sw s6, 0(s8)
	j L1
L2:	
	lw ra, 12(sp)
	lw s0, 8(sp)
	addi sp, sp, 16
	ret


Block 1:
<pre style="font-family:monospace;color:royalblue">
addi sp, sp, -16
sw ra, 12(sp)
sw s0, 8(sp)
addi s0, sp, 16
addi s3, zero, 1
la s11, x_
sw s3, 0(s11)    
</pre>
This block initializes the stack frame and sets up the initial value of variable x to 1. It corresponds to the initialization part of the source code.

Block 2:
<pre style="font-family:monospace;color:royalblue">
L1:
la s11, y_
lw s3, 0(s11)
addi s6, zero, 2
bge s3, s6, L2  
</pre>
This block checks the loop condition `y < 2`. If `y` is greater than or equal to `2`, it branches to `L2`, otherwise, it continues execution in the loop. It corresponds to the loop condition check in the source code.

Block 3:
<pre style="font-family:monospace;color:royalblue">
L3:
la s11, x_
lw s3, 0(s11)
addi s6, s3, 3
la s8, x_
sw s6, 0(s8)
j L1
</pre>
This block represents the body of the loop. It increments the value of `x` by `3` in each iteration and then jumps back to `L1` to re-evaluate the loop condition. It corresponds to the loop body in the source code.

Block 4:
<pre style="font-family:monospace;color:royalblue">
L2:
lw ra, 12(sp)
lw s0, 8(sp)
addi sp, sp, 16
ret
</pre>
This block handles the exit from the procedure. It restores the saved register values from the stack frame and returns control to the caller. It corresponds to the exit part of the source code.

The RISC-V code generator eliminates common subexpressions only within a basic block. Consider `p4`:

In [4]:
compileString("""
var x, y: integer
program p4
  while x + y < 3 do
    x := x + y
""", target = 'riscv')

	.data
x_:	.space 4
y_:	.space 4
	.text
	.globl main
main:	
	jal ra, p4
	addi a0, zero, 0
	addi a7, zero, 93
	scall
	.globl p4
p4:	
	addi sp, sp, -16
	sw ra, 12(sp)
	sw s0, 8(sp)
	addi s0, sp, 16
L1:	
	la s11, x_
	lw s3, 0(s11)
	la s8, y_
	lw s6, 0(s8)
	add s9, s3, s6
	addi s4, zero, 3
	bge s9, s4, L2
L3:	
	la s11, x_
	lw s3, 0(s11)
	la s8, y_
	lw s6, 0(s8)
	add s9, s3, s6
	la s4, x_
	sw s9, 0(s4)
	j L1
L2:	
	lw ra, 12(sp)
	lw s0, 8(sp)
	addi sp, sp, 16
	ret


The expression `x + y` is evaluated only once, as the two occurrences are in one basic block. Now consider `p5`:

In [5]:
compileString("""
var x, y: integer
program p5
  y := x + 3
  if y > 0 then
    y := x + 3
  else
    y := x + 3
""", target = 'riscv')

	.data
x_:	.space 4
y_:	.space 4
	.text
	.globl main
main:	
	jal ra, p5
	addi a0, zero, 0
	addi a7, zero, 93
	scall
	.globl p5
p5:	
	addi sp, sp, -16
	sw ra, 12(sp)
	sw s0, 8(sp)
	addi s0, sp, 16
	la s11, x_
	lw s3, 0(s11)
	addi s6, s3, 3
	la s8, y_
	sw s6, 0(s8)
	la s4, y_
	lw s9, 0(s4)
	bge zero, s9, L1
L2:	
	la s11, x_
	lw s3, 0(s11)
	addi s6, s3, 3
	la s8, y_
	sw s6, 0(s8)
	j L3
L1:	
	la s11, x_
	lw s3, 0(s11)
	addi s6, s3, 3
	la s8, y_
	sw s6, 0(s8)
L3:	
	lw ra, 12(sp)
	lw s0, 8(sp)
	addi sp, sp, 16
	ret


The first two occurrences of `x + y` are in the same basic bloc, so are shared, the last occurrence is in a different basic block, so is evaluated again, even though that is here unnecessary. Eliminating such unnecessary evaluations would require a *global analysis*, which is beyond a one-pass compiler.

If within a basic block a variable is overwritten, the earlier expressions in which that variable occurred are no longer valid. For example, in `p6`, the value of `x + 3` is kept in a register but must not be used in the second occurrence:

In [6]:
compileString("""
var x, y: integer
program p6
  x := x + 3
  x := x + 3
""", target = 'riscv')

	.data
x_:	.space 4
y_:	.space 4
	.text
	.globl main
main:	
	jal ra, p6
	addi a0, zero, 0
	addi a7, zero, 93
	scall
	.globl p6
p6:	
	addi sp, sp, -16
	sw ra, 12(sp)
	sw s0, 8(sp)
	addi s0, sp, 16
	la s11, x_
	lw s3, 0(s11)
	addi s6, s3, 3
	la s8, x_
	sw s6, 0(s8)
	la s4, x_
	lw s9, 0(s4)
	addi s5, s9, 3
	la s10, x_
	sw s5, 0(s10)
	lw ra, 12(sp)
	lw s0, 8(sp)
	addi sp, sp, 16
	ret


The solution in P0 is that at an assignment to, say, `x`, the entry in `reguse` that did hold the register of `x` is deleted. Thus the new value of `x` has to be loaded in a new register again. Note that compilers may instead *rename variables* such that each variable is assigned only once, leading to an intermediate form called *single static assignment* (SSA).