# CS202: Compiler Construction

## In-class Exercises, Week of 04/10/2023

----

TODO:
1) Do a git pull (some errors occur)
    - Added a field to store stack space in AST node for X86FunctionDef
2) Re-Install cs202-support (instructions in readme)

# Part 1: Functions and Lfun

Functions: 
 - We will treat them like mini-programs
     - Basically we can do everything to our functions that we did to our programs and in the end smash them all together

## Question 1

Write an `Lfun` program with a function called `add1` that adds 1 to its argument, and a call to the function.

In [3]:
def add1(n: int) -> int:
    return n+1

add1(5)

6

## Question 2

Write a recursive program to calculate the factorial of 5.

In [4]:
def fact(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * fact(n-1)

fact(5)

120

## Question 3

Summarize the changes to the compiler to support functions.

Theme: treat each function as its own mini-program

1. before explicate control: add cases for function definitions, treating function definitions as statments
2. explicate-control: contruct CFG for each function definition, output program is a list of function definitions (each is a mini-program)
3. after explicate-control: call passes we wrote for A6 on each mini-program separately

----

# Part 2: Typechecking for Functions

## Question 4

What are the types of the functions `add1` and `*`?

```
add1: 
in Python: Callable[[int], int]] or int -> int

*:
Callable[[int, int], int] or int * int -> int
```

## Question 5

Why do we need to specify the types of a function's arguments?

In [None]:
# We can't deduce type signature of this
def identity(x)
    return x

 - So we can typecheck function calls?
 - Overloading?
 - Modularity: need to know the types of inputs in ordern to typecheck fucntion definitions in isolation
 - Recursive Functions:
     - need functions output type in order to typecheck it's body (or else you get an infinite loop)

## Question 6

Write a function `length` such that the following expression returns 3:

```
v = (1, (2, (3, 0)))
print(length(v))
```

def length(v: List[int]) -> int: # We won't have this kind of type, we can't write the type down
    if v == 0:
        return 1
    else return 1 + length(v[1])

We can't write this function in our language, because of the type
    

## Question 7

How do we typecheck a function call of the form `f(a1, ..., ak)`?

New case in tc_exp:
    case Call(f, args)

1. Assume we havce already type-checked the definition of f
2. typecheck f; it should have the type Callable([t1, ... , tk], t) --- (assertion)
3. Typecheck a1, ... , ak and ensure they have the same types as the arguments in the function def (t1, .. tk) --- (lots of assertions)
4. Return the type t

## Question 8

How do we typecheck a function definition?

```
def fact(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * fact(n-1)
```

New case in tc_stmt:
    case FunctionDef(name, args_and_types, body_stmts, return_type)
   
Update the type environment (somehow)
1. Update type env to have types for the function arguments
2. Typecheck body_stmts (tc_stmts(body_stmts, env))
3. env[name] = Callable([arg_types, return_type])

## Question 9

How do we typecheck a `Lfun` program?

Two new cases:

1. FunctionDef(name, params, body_stmts, return_type)
2. Return(e)
3. Call(func, args)

    
 - For FunctionDef(name, params, body_stmts, return_type)
 1. env[name] = Callable(param_types, return_types)
 2. Copy env into new_env
 3. Add bindings to new_env for each var in params
 4. Handle return type: new_env['ret_val'] = return_type
 5. Call tc_stmts(body_stmts, new_env)
 6. Add name to function_names
 
 - For Return(e)
 1. assert that tc_exp(e, env) == env['retval']

 - For Call(func, args):
     - Treat it like Prim
     - Except you need to call tc_exp(func, env)
     - Except that the resulting type is a Callable
     - Check that each arg has the expected type by the Callable
     - Return type is Callable's return type

----
# Part 3: Changes to RCO and Expose-Alloc

## Question 10

Describe the changes to RCO.

def f():
    pass

f(3)

tmp_ = f
tmp_(3)

New Cases:

1. FunctionDef in rco_stmt
    - just call rco_stmts on the body
2. Return in rco_stmt
    - call rco_exp on the returned expression
    - should make returned expression atomic
3. Call in rco_exp
    - like Prim
    - also call rco_exp on the func
4. Var in rco_exp, when the variable is a function reference
    - if the var is a function name, generate a tmp for it and return that
        - check by asking if Var in function names
    
How do we know whats a function name?
    - we add them to a set function_names
    - add adding these into the typechecker

## Question 11

Describe the changes to expose-alloc.

1. add a case to ea_stmt for FnctionDef that calls ea_stmts on the body

----

# Part 4: Functions in x86 Assembly

## Question 12

Write an x86 assembly program corresponding to the `add1` program.

1. Functions in x86
    callq, retq + calling convention
    
    callq(l) is a fancy jump except it remembers where it came from
        under the hood:
            pushq %rip
            jmp l
    
    retq - goes back to where the function was called
        under the hood:
            popq %rip
            jmp *%rip <- jumping to spot in the code

JMP:
    - can go to memory addresses, but that's not really a modern way to do it
    - RIP - Process remembers where it is in program here

In [3]:
from cs202_support.eval_x86 import X86Emulator

# Use callq as a jmp that remembers where it came from;
# retq to return to where you came from
# calling convention:
#   put arguments into the registers rdi, rsi, rdx, rcx, r8, r9 - more arguments just turn last one into a tuple
#   put return value into rax

asm = """
add1:
    movq %rdi, %r8
    addq $1, %r8
    movq %r8, %rax
main:
    movq $5, %rdi
    callq add1
    movq %rax, %rdi
    callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()

CALL TO print_int: 6
  Location                        Value
0  reg rbp                         1000
1  reg rsp                         1000
2  reg rdi                            6
3   reg r8                            6
4  reg rax                            6
5     add1  FunPointer(fun_name='add1')
6     main  FunPointer(fun_name='main')
FINAL STATE:
  Location                        Value
0  reg rbp                         1000
1  reg rsp                         1000
2  reg rdi                            6
3   reg r8                            6
4  reg rax                            6
5     add1  FunPointer(fun_name='add1')
6     main  FunPointer(fun_name='main')
OUTPUT: [6]


Unnamed: 0,Location,Value
0,reg rbp,1000
1,reg rsp,1000
2,reg rdi,6
3,reg r8,6
4,reg rax,6
5,add1,FunPointer(fun_name='add1')
6,main,FunPointer(fun_name='main')


## Question 13

Describe the *calling convention* we will use for functions in Rfun.

calling convention:
  - put arguments into the registers rdi, rsi, rdx, rcx, r8, r9
  - put return value into rax
  - book says: for more than 6 parameters put the extra ones in a tuple
  - our compiler is limited to <= 6 params

## Question 14

Describe the management of the *stack* and *root stack* performed on function entry and exit.

On function entry:
 - Allocate new stack frame with slots for stack-allocated variables of the function
 - Allocate a root stack frame with slots for root-stack allocated variables

On function exit:
 - Reclaim the stack space we allocated
 - Reclaim the root stack space we allocated
 
We do it in the exact same way way as for programs except that we don't initialize the heap on function entry

## Question 15

Modify the program from earlier to correctly manage the stack and root stack. Allocate the variable `n` on the stack.

In [4]:
asm = """
addstart1:
    movq %rdi, %r8
    addq $1, %r8
    movq %r8, %rax
    jmp add1conclusion
add1:
    pushq %rbp
    movq %rsp, %rbp
    subq $0, %rsp
    jmp add1start
add1conclusion:
    addq $0, %rsp
    popq %rbp
    retq
main:
    movq $5, %rdi
    callq add1
    movq %rax, %rdi
    callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()



KeyError: 'add1start'

## Question 16

Modify the program again, to save and restore the *callee-saved registers*.

In [3]:
# There are two kinda of registers: callee-saved and caller-saved
# function can do whatever it wants with caller-saved registers
# function MUST maintain value of callee-saved registers

asm = """
addstart1:
    movq %rdi, %r8
    addq $1, %r8
    movq %r8, %rax
    jmp add1conclusion
add1:
    pushq %rbp
    movq %rsp, %rbp
    subq $0, %rsp
    pushq %rbx
    pushq %r12
    pushq %r13
    pushq %r14
    jmp add1start
add1conclusion:
    addq $0, %rsp
    pushq %r14
    pushq %r13
    pushq %r12
    pushq %rbx
    popq %rbp
    retq
main:
    movq $5, %rdi
    callq add1
    movq %rax, %rdi
    callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()

CALL TO initialize: 16384, 16
           Location                                   Value
0           mem 992                                    1000
1           reg rbp                                     992
2           reg rsp                                     992
3           reg rdi                                   16384
4           reg rsi                                      16
5              main             FunPointer(fun_name='main')
6              add1             FunPointer(fun_name='add1')
7        add1_start       FunPointer(fun_name='add1_start')
8   add1_conclusion  FunPointer(fun_name='add1_conclusion')
9             start            FunPointer(fun_name='start')
10       conclusion       FunPointer(fun_name='conclusion')
11  rootstack_begin                                    2000
12    rootstack_end                                   18384
13         free_ptr                                  100000
14  fromspace_begin                                  100000
15    from

Unnamed: 0,Location,Value
0,mem 952,
1,mem 960,
2,mem 968,
3,mem 976,42
4,mem 984,992
5,mem 992,1000
6,reg rbp,1000
7,reg rsp,1000
8,reg rdi,42
9,reg rsi,16


----
# Part 5: Explicate-Control

## Question 16

Describe the changes to explicate-control.

Goal of explicate control
old:
  - covert statements to control-flow graph

new:
  - convert statements to a list of function definitions, each with its own control flow graph
  
Explicate-Control works as before, but has three new cases:
 1. Return to ec_stmt
 2. Add Call to ec_exp: like Prim, but call ec_atm on the function
 3. Add FunctionDef to ec_stmt: call ec_function


## Question 17

Describe the ec_function function.

Change the pass globally:
 1. Add a global var to the pass called `current_function` that tracks the function being compiled. It starts out as `main`.
 2. Add a global var to pass called `functions` this is a list of function definitions. The ec_function function will add to this list
 3. Modify create_block to add the functions name as a prefix to the label it creates

`ec_function` function:
 1. Save `basic_blocks` abd `current_function` so we can restore them at the end
 2. Set `basic_blocks` to {} and `current_function` to the function's name
 3. Call ec_stmts on the body statements, with the continuation `Return(0)`
 4. Set `basic_blocks[name+ 'start']` to the result of step 3
 5. Construct a cfun.FunctionDef with the name, parameter names, and basic_blocks
 6. Append the rfunction def to `functions`
 7. Restore `basic_blocks` and `current_function`


In [None]:
x = 1
def add1(n):
    return n+1
y = x + 4
# statments before function def should belong to outside block