# CS202: Compiler Construction

## In-class Exercises, Week of 01/30/2023

----
# Select-instructions

The select-instructions pass transforms a sequence of statements into X86 assembly instructions.

Consider the grammar of the input language:

```
op   ::= 'add'
Atm  ::= Var(x) | Constant(n)
Stmt ::= Assign(x, Prim(op, [Atm])) | Assign(x, Atm) | Print(Atm)
LVar ::= Program([Stmt])
```

## Question 1

Convert the following `Lvar` code into a psuedo-x86 assembly program.

```
Program([
  Assign("y", Constant(5)),
  Assign("x", Var("y")),
  Print(Var("x"))
])
```

"""
movq $5, #y       ---   y is a variable in our pseudo language
movq #y, #x
movq #x, $rdi     ---   note the 1-1 mappig from previous language: sometimes 2 but not so bad
callq print_int
"""

## Question 2

Describe the structure of select-instructions.

The pass should converteach statement from the program into one or more x86 instructions (with variables).

Here's the Grammar:
```
op   ::= 'add'
Atm  ::= Var(x) | Constant(n)
Stmt ::= Assign(x, Prim(op, [Atm])) | Assign(x, Atm) | Print(Atm)
LVar ::= Program([Stmt])
```
We will structure based on this grammar.

 - si_atm comnverts an lvar atomic into x86 atomic
     - Var(x) becomes x86.Var(x)
     - Constant(n) becomes x86.Immediate(n)
 - si_stmt converts a stmt into 1 or more stmts
     - Assign(x, Prim(op, [Atm]))
         - only op is "add" so this becomes:
          - Assign(x, Prim('add', [atm1, atm2]))
     - Assign(x, Atm)
         - Assign(x, atm1)
             - movq si_atm(atm1), #rdx   <---- Going to have to call this si_atm but not write this exactly
     - Print(Atm)
         - Print(atm1)
             - movq si_atm(atm1) #rdx
             - callq print_int
 - si_stmts compiles a list of statments
 
BOOK: Section 2.5 for reference
 - Don't worry about making the code from book in the more "effecient" fashion

----
# Assign-homes pass

The assign-homes pass places each program variable in a *stack location* in memory, eliminating variables from the program.

See Section 2.2 for details; especially see Figure 2.8 for details on the memory layout of stack frames.



registers %rdi ~15 of these and they are FAST
memory  n(%rdb)   LOTS (maybe 16 gb of it)  SLOW
        memory dereference: whatever memory address is stored in rdb go and use this
        Also called indirect reference
        
 - Use: Put some mem address in %rdp   <--- %rdp is conventionally used for this purpose
        use it to read/write

The Stack:
                  16gb

         |                     |

         |   OLD STACK FRAME BUT NEW ONE STARTS HERE

    ---->|%rdp  (base pointer) |

         |                     |  Between we can store info: x=5, y=6, z=7

 -8(%rdp)| OLD RDP Stored      |  the negative is basically how far away from rdp pointer

-16(%rdp)| 5                   |  x

-24(%rdp)| 6                   |  y

-32(%rdp)| 7                   |  z

         | This might be empty |

   ----->|%rsp  (stack pointer)|

         |                     |

         |                     |

                  0

Assign-Hommes deals with stack but only manually setup stack frame is done in prelude + conclusion pass
Assignhomes deals with eliminating variables and turning them into stack memory locations

(This example is storing 4 variables)

pushq %rbp --saves old rbp and will push it to the stack
movq %rsp, %rbp ---- takes moves old rdp and moves it to
subq __$32___ %rsp --- making room for the new rsp, so we subtract a number of bytes
        * The number of variables times 8

next we need to make room for the rest of the variables

## Question 3

Write X86 assembly that prepares a stack frame for four variables and puts the values 1,2,3,4 in stack locations.

In [6]:
from cs202_support.eval_x86 import X86Emulator

# Try some of these one at a time to figure out what exactly is going on here

asm = """
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
movq $1, -8(%rbp)
movq $2, -16(%rbp)
movq $3, -24(%rbp)
movq $4, -32(%rbp)
"""

X86Emulator(logging=False).eval_instructions(asm)

Unnamed: 0,Location,Old,New
0,mem 992,,1000
1,mem 984,,1
2,mem 976,,2
3,mem 968,,3
4,mem 960,,4


Tearing down the stack frame will happen in reverse instrcutions
"addq $32, %rsp
popq %rbp"
at the end of the above.

## Question 4

Write X86 assembly that prepares a stack frame for three variables and puts the values 1,2,3 in stack locations. Why is this situation different than above?

In [10]:
asm = """
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
movq $1, -8(%rbp)
movq $2, -16(%rbp)
movq $3, -24(%rbp)
"""

X86Emulator(logging=False).eval_instructions(asm)

Unnamed: 0,Location,Old,New
0,mem 992,,1000
1,mem 984,,1
2,mem 976,,2
3,mem 968,,3
4,reg rbp,1000.0,992
5,reg rsp,1000.0,960


- This situation is different from above because we leave one stack location unused. This is to make sure that the value of `rsp` is divisible by 16 (the *16-byte alignment*), as required by X86 assembly.
- Your assign-homes pass should *ensure 16-byte alignment* of the stack frame.

rsp should always be divisible by 16.

## Question 5

Implement a function `align` to ensure 16-byte alignment.

In [12]:
def align(num_bytes: int) -> int:
    if num_bytes % 16 == 0:
        return num_bytes
    else:
        return num_bytes +8
# This would need to be more complicated if we had variables that needed more thn 8 bytes
    
print(align(32))
print(align(8))
print(align(24))

32
16
32


## Question 6

Describe the assign-homes pass.

This pass looks for variables in program and replaces it with a stack location  (a Deref AST node).


Keep a dictionary called 'homes': Dict[str, Deref]
SO when find a variable:
 - If it already exists in 'homes', just use that home
 - If it doesn't exist in homes, create a home for it and add it to the dictionary
     - New home using offset -8*(len(homes) +1)

** homes = {} ** -- use this at front of all the the following local only to pass

ah_arg(a: x86Arg)
    uses homes dictionary and checks the two things above:
        - Three cases we have: Registers, Variables, Immediates
        - Variables are really the only ones that matter (reg and imm just return themselves)
            - Var case: 
                - if exists in homes use that home
                - else create home and use it
        
ah_instr(instr:x86Instr)
    match w/ one case per instruction type --> NamedInstr is the only one that can have variables
                                               NamedInstr(op, args)
                                               call ah_arg on each arg -- ah_arg is already responsible for finding mem locs
                                               (this will be a list comprehension or fpor loop)
                                            
ah_block(instrs: List[x86Instr]) - > List[x86Instr]
    a program is like:
        main: (these labels are held in a dictionary)
            BLOCK
            call ah_instr for each instr



----
# Patch-instructions pass

The patch-instructions pass fixes instructions with two in-memory arguments, by using the `rax` register as a temporary location.

## Question 7

What is wrong with the following instructions?

```
movq -8(%rbp), -16(%rbp)
addq -24(%rbp), -16(%rbp)
```

YOUR SOLUTION HERE

## Question 8

Fix the instructions above.

YOUR SOLUTION HERE

## Question 9

Describe the patch-instructions pass.

YOUR SOLUTION HERE

----
# Prelude & conclusion pass

The prelude & conclusion pass adds code to the beginning and end of the `main` block to prepare and tear down the program's stack frame.

## Question 10

The program `print(5+6)` compiles to the following x86 program:

```
  .globl main
main:
  movq $5, %rax
  addq $6, %rax
  movq %rax, -8(%rbp)
  movq -8(%rbp), %rdi
  callq print_int
```

Add the *prelude* and *conclusion* to the `main` block of this program.

YOUR SOLUTION HERE

## Question 11

Describe the prelude & conclusion pass.

YOUR SOLUTION HERE