# CS202: Compiler Construction

## In-class Exercises, Week of 01/16/2023

----
# PART I: Abstract Syntax Trees

## Question 1

The following grammar defines the *concrete syntax* for a language of integer arithmetic (numbers and the "plus" operator).

\begin{align*}
expr &::= n \\
&\mid expr + expr
\end{align*}

The following class hierarchy defines an *abstract syntax* for the same language.

In [1]:
from dataclasses import dataclass
from cs202_support.python import AST, print_ast

@dataclass
class Expr(AST):
    pass

@dataclass
class Constant(Expr):
    val: int

@dataclass
class Plus(Expr):
    e1: Expr
    e2: Expr

Write an abstract syntax tree for the expression `1 + 2 + 3`.

In [2]:
ast = Plus(Constant(1), Plus(Constant(2),Constant(3)))
print(print_ast(ast))

Plus(
 Constant(1),
 Plus(
  Constant(2),
  Constant(3)))


## Question 2

The code below defines a parser that transforms concrete syntax for this simple language into abstract syntax trees.

In [3]:
from lark import Lark
_rint_parser = Lark(r"""
    ?exp: NUMBER -> int_e
        | exp "+" exp -> plus_e
        | "(" exp ")"
    %import common.NUMBER
    %import common.CNAME
    %import common.WS
    %ignore WS
    """, start='exp')

def parse(s):
    def t_ast(e):
        if e.data == 'int_e':
            return Constant(int(e.children[0]))
        elif e.data == 'plus_e':
            e1, e2 = e.children
            return Plus(t_ast(e1), t_ast(e2))

    parsed = _rint_parser.parse(s)
    #print(parsed)
    ast = t_ast(parsed)
    return ast

Write code to use the parser above to parse the expression `1 + 2 + 3` into an abstract syntax tree.

In [4]:
ast = parse("1+2+3")
print(ast)
print(print_ast(ast))

Plus(e1=Plus(e1=Constant(val=1), e2=Constant(val=2)), e2=Constant(val=3))
Plus(
 Plus(
  Constant(1),
  Constant(2)),
 Constant(3))


## Question 3

Write an *interpreter* for this language.

**The structure of your function should follow the structure of the AST**

In [5]:
def eval_rint(e: Expr) -> int:
    match e:
        case Plus(e1,e2):
            return eval_rint(e1) + eval_rint(e2)
        case Constant(val):
            return val

In [6]:
# TEST CASE
assert eval_rint(parse('1 + 2 + 3')) == 6
assert eval_rint(parse('42 + 20 + 10 + 5 + 5')) == 82

----
# PART II: x86 Assembly

In [7]:
from cs202_support.eval_x86 import X86Emulator

## Question 4

Write x86 assembly code to add the numbers 1 and 2, putting the result in the register `rax`.

Class Notes:
movq moves number 1 into register rax
addq adds the number 2 to whatever is in rax

%rax is a register, CPU has around 15, which holds 64-bit values

movq moves a value
addq adds 2 values
retq returns 

In [8]:
emu = X86Emulator(logging = False)
emu.eval_instructions("""
                        movq $1, %rax
                        addq $2, %rax
                        retq""")

Unnamed: 0,Location,Old,New
0,reg rax,,3


arguments are always registers and immediates (constants)

movq - command

$1 - Constant 1

%rax is a register

## Question 5 

Write x86 assembly code to add the numbers 1, 2, 3, and 4, putting the result in the register `rdi`.

In [9]:
emu = X86Emulator(logging = False)
emu.eval_instructions("""
                        movq $1, %rdi
                        addq $2, %rdi
                        addq $3, %rdi
                        addq $4, %rdi
                        retq""")

Unnamed: 0,Location,Old,New
0,reg rdi,,10


## Question 6

Write a complete x86 program to:

- Place the number 42 in the register `rdi`
- Call the function `print_int` in the runtime
- Return cleanly to the operating system

Hint: try using the [Assignment 1 online compiler](https://jnear.w3.uvm.edu/cs202/compiler-a1.php).

In [10]:
emu = X86Emulator(logging = False)
emu.eval_program("""
.globl main
main:
  movq $42, %rdi
  callq print_int
  retq
""")

[42]

In [11]:
# Note what happens when you callq twice
emu = X86Emulator(logging = False)
emu.eval_program("""
.globl main
main:
  movq $42, %rdi
  callq print_int
  callq print_int
  retq
""")

[42, 42]

This prints the list containing 42

note that print_int doesn't have arguments!
First argument by convention is contained register %rdi

## Question 7

Write code to generate a *pseudo-x86 abstract syntax tree* for the `main` block in the program above.

Hint: reference the [pseudo-x86 AST class hierarchy](https://github.com/jnear/cs202-assignments/blob/master/cs202_support/x86exp.py). Debug your solution using the online compiler's output for the `select instructions` pass.

In [12]:
import cs202_support.x86 as x86

ast = x86.X86Program(
 {
  'main':
   [
    x86.NamedInstr(
     "movq",
     [
      x86.Immediate(42),
      x86.Reg("rdi")
     ]),
    x86.Callq("print_int"),
    x86.Retq()
   ]
 },
 None)

print(print_ast(ast))

X86Program(
 {
  'main':
   [
    NamedInstr(
     "movq",
     [
      Immediate(42),
      Reg("rdi")
     ]),
    Callq("print_int"),
    Retq()
   ]
 },
 None)


NamedInstr - called "movq"
Immediate - Constants
Reg - registers
Callq

## Question 8

What is the purpose of the `select_instructions` pass of the compiler? How should it be implemented?

It breaks down the abstract syntax of the input program into the concrete calls the
assembly language is going to need to do. Something another pass will be able to easier
extract information from.

The next call can individually parse what the NamedInstr means for x86Assembly language, Callq and Retq into
seperate lines.

In [13]:
from typing import List, Set, Dict, Tuple
import sys

from cs202_support.python import *
import cs202_support.x86 as x86


##################################################
# select-instructions
##################################################

def select_instructions(program: Program) -> x86.X86Program:
    """
    Transforms a Lmin program into a pseudo-x86 assembly program.
    :param program: a Lmin program
    :return: a pseudo-x86 program
    """
    match program:
        case Program([Print(Constant(val))]):
            return x86.X86Program(
                {
                    'main':
                        [
                            x86.NamedInstr(
                                "movq",
                                [
                                    x86.Immediate(val),
                                    x86.Reg("rdi")
                                ]),
                            x86.Callq("print_int"),
                            x86.Retq()
                        ]
                },
                None)
    pass


##################################################
# Compiler definition
##################################################

compiler_passes = {
    'select instructions': select_instructions,
    'print x86': x86.print_x86
}


def run_compiler(s, logging=False):
    current_program = parse(s)

    if logging == True:
        print()
        print('==================================================')
        print(' Input program')
        print('==================================================')
        print()
        print(print_ast(current_program))

    for pass_name, pass_fn in compiler_passes.items():
        current_program = pass_fn(current_program)

        if logging == True:
            print()
            print('==================================================')
            print(f' Output of pass: {pass_name}')
            print('==================================================')
            print()
            print(print_ast(current_program))

    return current_program


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('Usage: python compiler.py <source filename>')
    else:
        file_name = sys.argv[1]
        with open(file_name) as f:
            print(f'Compiling program {file_name}...')

            try:
                input_program = f.read()
                x86_program = run_compiler(input_program, logging=True)

                with open(file_name + '.s', 'w') as output_file:
                    output_file.write(x86_program)

            except Exception as e:
                raise Exception('Error during compilation:', e)


NameError: name 'Program' is not defined