# Lab 3

- It is recommended to **go through this file with a partner**. 
- Ensure to **ask** if anything is not clear - first your partner, then a lab helper.
- You want to first go through the accompanying code of the lectures.

In this lab, you will write a full parser for ``SIMP``. 

**HINT:** As it's easy to get stuck here, the solutions are already available in the repository.

In [2]:
from jupyterquiz import display_quiz

question_path="./"

Here is a grammar for``SIMP``. 
(This one has a slightly different definition for declarations compared to the grammar you saw in the lecture.)

```
program ::=  [declarations] commands 
declarations ::= declaration; | declaration; declarations 
declaration ::= VAR identifier

commands ::= command | command; commands
command ::= identifier := exp | IF condexp THEN command | IF condexp THEN command ELSE command | WHILE condexp DO command | BEGIN program END | INPUT identifier | PRINT exp 

comp := = | != | <= | < | >= | >
condexp := exp comp exp 
exp  ::= identifier | number | exp + exp | exp – exp | exp * exp | exp / exp | - exp 
```

Recall abstract syntax and tokens for ``SIMP``:

In [6]:
(* Abstract syntax, tokens and helper functions *)

exception SyntaxError of string

type op = Plus | Minus | Mult | Div 

type exp = Id of string | Numb of int | Op of exp * op * exp | Neg of exp

type cond = Eq | Neq | Lte | Lt | Gte | Gt 
type condexp = Cop of exp * cond * exp
                                          
type cmd = Asgn of string * exp 
         | Ite of condexp * cmd * cmd | If of condexp * cmd 
         | While of condexp * cmd
         | Begin of program 
         | Input of string
         | Print of exp 
         
and program = Program of string list * cmd list

type token = SEMI | VAR | ASGN | IF | THEN | ELSE
            | WHILE | DO | BEGIN | END | INPUT | PRINT
            | EQ | NEQ | LTE | LT | GTE | GT 
            | ID of string | INT of int
            | PLUS  | MINUS | STAR | SLASH 
            | LBRA | RBRA 
            | EOF

let parse_token (x : token) (xs : token list) = match xs with 
| y :: ys -> if (x == y) then ys else raise (SyntaxError "Token expected.")
| _ -> raise (SyntaxError "Token expected.") 

exception SyntaxError of string


type op = Plus | Minus | Mult | Div


type exp = Id of string | Numb of int | Op of exp * op * exp | Neg of exp


type cond = Eq | Neq | Lte | Lt | Gte | Gt


type condexp = Cop of exp * cond * exp


type cmd =
    Asgn of string * exp
  | Ite of condexp * cmd * cmd
  | If of condexp * cmd
  | While of condexp * cmd
  | Begin of program
  | Input of string
  | Print of exp
and program = Program of string list * cmd list


type token =
    SEMI
  | VAR
  | ASGN
  | IF
  | THEN
  | ELSE
  | WHILE
  | DO
  | BEGIN
  | END
  | INPUT
  | PRINT
  | EQ
  | NEQ
  | LTE
  | LT
  | GTE
  | GT
  | ID of string
  | INT of int
  | PLUS
  | MINUS
  | STAR
  | SLASH
  | LBRA
  | RBRA
  | EOF


val parse_token : token -> token list -> token list = <fun>


## 1. LL(1) 

In [3]:
display_quiz(question_path+"questions31.json")

## 2. Extend Expressions by Unary Negation 

1. Make ``exp`` LL(1) by building in precedence, and associativity, and eliminating left-recursion.

Take care: Compared to the lecture, expressions can contain a factor, i.e. 
```
[MINUS; ID "x"]
```
is a valid token sequence for expressions.

All operators are right-associative. Multiplication/division bind stronger than addition/subtraction bind stronger than negation.
E.g. - 2 * 3 + 5  binds as ((-2) * 3)  + 5.

(* SOLUTION *)
```
exp ::= term [{+|-} exp]
term ::= factor [{*|/} term]
factor ::= [-] base
base ::= identifier | number | (exp)
```

2. Below is the previous definition of an expression parser without negation. Extend it with the new unary negation operation. You'll need a new category (see a).

In [7]:
let rec parse_exp (xs : token list) : exp * token list = let 
  (e1, xs') = parse_term xs in 
  match xs' with 
  | PLUS :: xs'' -> let 
      (e2, xs''') = parse_exp xs'' 
      in (Op (e1, Plus, e2), xs''')
  | MINUS :: xs'' -> let 
      (e2, xs''') = parse_exp xs'' 
      in (Op (e1, Minus, e2), xs''')
  | _ -> (e1, xs') 
           
and parse_term (xs : token list) : exp * token list = let 
  (e1, xs') = parse_base xs in 
  match xs' with 
  | STAR :: xs'' -> let 
    (e2, xs''') = parse_term xs''
      in (Op (e1, Mult, e2), xs''') 
  | SLASH :: xs'' -> let 
    (e2, xs''') = parse_term xs''
      in (Op (e1, Div, e2), xs''')    
  | _ -> (e1, xs')
  
and parse_base (xs : token list) : exp * token list = match xs with 
  | ID x :: xs' -> (Id x, xs')
  | INT x :: xs' -> (Numb x , xs')
  | LBRA :: xs' -> (let 
        (e, xs'') = parse_exp xs' in let
         xs''' = parse_token RBRA xs''
      in (e, xs'''))
  | _ -> raise (SyntaxError "Expected ID, INT or LBRA.")

val parse_exp : token list -> exp * token list = <fun>
val parse_term : token list -> exp * token list = <fun>
val parse_base : token list -> exp * token list = <fun>


In [9]:
(* SOLUTION *)

let rec parse_exp (xs : token list) : exp * token list = let 
  (e1, xs') = parse_term xs in 
  match xs' with 
  | PLUS :: xs'' -> let 
      (e2, xs''') = parse_exp xs'' 
      in (Op (e1, Plus, e2), xs''')
  | MINUS :: xs'' -> let 
      (e2, xs''') = parse_exp xs'' 
      in (Op (e1, Minus, e2), xs''')
  | _ -> (e1, xs') 
           
and parse_term (xs : token list) : exp * token list = let 
  (e1, xs') = parse_factor xs in 
  match xs' with 
  | STAR :: xs'' -> let 
    (e2, xs''') = parse_term xs''
      in (Op (e1, Mult, e2), xs''') 
  | SLASH :: xs'' -> let 
    (e2, xs''') = parse_term xs''
      in (Op (e1, Div, e2), xs''')    
  | _ -> (e1, xs')
  
and parse_factor (xs : token list) : exp * token list = match xs with 
  | MINUS :: xs' -> let (e, xs'') = parse_base xs' in 
                    (Neg e, xs'')
  | _ -> parse_base xs
  
and parse_base (xs : token list) : exp * token list = match xs with 
  | ID x :: xs' -> (Id x, xs')
  | INT x :: xs' -> (Numb x , xs')
  | LBRA :: xs' -> (let 
        (e, xs'') = parse_exp xs' in let
         xs''' = parse_token RBRA xs''
      in (e, xs'''))
  | _ -> raise (SyntaxError "Expected ID, INT or LBRA.") 

val parse_exp : token list -> exp * token list = <fun>
val parse_term : token list -> exp * token list = <fun>
val parse_factor : token list -> exp * token list = <fun>
val parse_base : token list -> exp * token list = <fun>


3. Test the parser with one token list that should be accepted and one token list that should not be accepted.

4. Explain why [MINUS; INT 3] is a valid ``exp``-sentence using your implementation of a parser.

(* SOLUTION *)

```
parse_exp [MINUS; INT 3] 
1. parse_term [MINUS; INT 3], then match on the remaining list according to parse_exp in 2.
1.1 parse_factor [MINUS; INT 3], then match on the remaining list according to parse_term in 1.2
1.1.1 parse_base [INT 3], then return the negation of the yielded expression/remaining list of tokens. 
1.1.1.1 We get Numb 3 and the empty list. Go to 1.1.2.
1.1.2 We return Neg (Numb 3) and the empty list. Go back to 1.2. 
1.2 As the remaining list of tokens is empty, we match with the last case and just return Neg (Numb 3) and the empty list. 
2. As the remaining list of tokens is empty, we match with the last case and just return Neg (Numb 3) and the empty list. 
 
```

## 3. Conditional Expressions 

Extend the parser with conditional expressions by completing the following code. 

Test the parser with one token list that should be accepted and one token list that should not be accepted.

In [12]:
let parse_comp (ts : token list) : cond * token list = 
    raise (SyntaxError "TODO: IMPLEMENT")
    
let parse_cond (ts : token list) : condexp * token list = 
   raise (SyntaxError "TODO: IMPLEMENT")

val parse_comp : token list -> cond * token list = <fun>


val parse_cond : token list -> condexp * token list = <fun>


In [None]:
(* SOLUTION *)

let parse_comp (ts : token list) : cond * token list = match ts with 
    | LT :: ts' -> (Lt, ts')
    | LTE :: ts' -> (Lte, ts')
    | EQ :: ts' -> (Eq, ts')
    | NEQ :: ts' ->  (Neq, ts')
    | GT :: ts' -> (Gt, ts')
    | GTE :: ts' -> (Gte, ts')
    | _ -> raise (SyntaxError "Comparison token expected.")
    
let parse_cond (ts : token list) : condexp * token list = let 
        (e1, ts') = parse_exp ts in let
        (c, ts'') = parse_comp ts' in let f
        (e2, ts''') = parse_exp ts'' in 
        (Cop (e1, c, e2), ts''')  

val parse_comp : token list -> cond * token list = <fun>


error: compile_error

## 4. Abstract Syntax to Concrete Syntax 

Change the grammar such that the remaining parts are LL(1).

(* SOLUTION *)

```
program ::=  [declarations] commands 
declarations ::= declaration; [declarations] 
declaration ::= VAR identifier

commands ::= command[; commands]
command ::= identifier := exp | IF condexp THEN command [ELSE command] | WHILE condexp DO command | BEGIN program END | INPUT identifier | PRINT exp 
 
comp := = | != | <= | < | >= | >
condexp := exp comp exp 

exp ::= term [{+|-} exp]
term ::= factor [{*|/} term]
factor ::= [-] base
base ::= identifier | number | (exp)
```

## 5. Full Parser 

Below you see a partial definition of the full parser for ``SIMP``. 
The parts which you have seen in the lecture have been filled in. 

a) Ensure you understand the return types of all functions.

b) Why do parse_program/parse_while/parse_if/parse_command/parse_commands have to be declared mutually recursive? 

c) Explain why ``[VAR; SEMI]`` is not a valid ``declaration``- sentence given the implementation of the parser. 

d) Complete the definition of a parser. 

e) Test the parser with one token list that should be accepted and one token list that should not be accepted.

In [14]:
let parse_declaration (ts : token list) : string * token list = match ts with 
  | VAR :: ID x :: ts' -> (x, ts')
  | _ ->  raise (SyntaxError "Declaration expected.")

let rec parse_declarations (ts : token list) : string list * token list = let 
  (dcl, ts') = parse_declaration ts in let 
  ts'' = parse_token SEMI ts' in 
  match ts'' with 
  | VAR :: _ -> (let (dcls, ts''') = parse_declarations ts'' in 
                  (dcl :: dcls, ts''')
                  )
  | _ -> ([dcl], ts'')


let parse_assign x ts = 
    raise (SyntaxError "TODO: IMPLEMENT")

let parse_input ts =
    raise (SyntaxError "TODO: IMPLEMENT")

let parse_print ts = 
     raise (SyntaxError "TODO: IMPLEMENT")

let rec parse_program (ts : token list) : program * token list = match ts with 
    | VAR :: _ -> let 
                  (dcls, ts') = parse_declarations ts in let 
                  (cs, ts'') = parse_commands ts' in 
                  (Program (dcls, cs), ts'')
    | _ -> let 
          (cs, ts'') = parse_commands ts in 
          (Program ([], cs), ts'')

and parse_commands (ts : token list) : cmd list * token list = let 
    (c, ts') = parse_command ts in 
    match ts' with 
    | SEMI :: ts'' -> let 
                      (cs, ts''') = parse_commands ts''
                      in (c :: cs, ts''')
    | _ -> ([c], ts')

and parse_command (ts : token list) : cmd * token list = match ts with 
  | ID x :: ts' -> parse_assign x ts'
  | INPUT :: ts' -> parse_input ts'
  | PRINT :: ts' -> parse_print ts'
  | IF :: ts' -> parse_if ts'
  | WHILE :: ts' -> parse_while ts'
  | BEGIN :: ts' -> parse_block ts'
  | _ -> raise (SyntaxError "Command expected.")
    
and parse_if (ts : token list) : cmd * token list = let 
    (b, ts') = parse_cond ts in let 
    ts'' = parse_token THEN ts' in let
    (c1, ts''') = parse_command ts'' in 
        match ts''' with 
        | ELSE :: ts'''' -> let (c2, ts''''') = parse_command ts'''' in 
                           (Ite (b, c1, c2), ts''''')
        | _ -> (If (b, c1), ts''')

and parse_while (ts : token list) : cmd * token list = 
   raise (SyntaxError "TODO: IMPLEMENT")

and parse_block (ts : token list) : cmd * token list =
  raise (SyntaxError "TODO: IMPLEMENT")

val parse_declaration : token list -> string * token list = <fun>


val parse_declarations : token list -> string list * token list = <fun>


val parse_assign : 'a -> 'b -> 'c = <fun>


val parse_input : 'a -> 'b = <fun>


val parse_print : 'a -> 'b = <fun>


val parse_program : token list -> program * token list = <fun>
val parse_commands : token list -> cmd list * token list = <fun>
val parse_command : token list -> cmd * token list = <fun>
val parse_if : token list -> cmd * token list = <fun>
val parse_while : token list -> cmd * token list = <fun>
val parse_block : token list -> cmd * token list = <fun>


In [15]:
(* SOLUTION *)

let parse_declaration (ts : token list) : string * token list = match ts with 
  | VAR :: ID x :: ts' -> (x, ts')
  | _ ->  raise (SyntaxError "Declaration expected.")

let rec parse_declarations (ts : token list) : string list * token list = let 
  (dcl, ts') = parse_declaration ts in let 
  ts'' = parse_token SEMI ts' in 
  match ts'' with 
  | VAR :: _ -> (let (dcls, ts''') = parse_declarations ts'' in 
                  (dcl :: dcls, ts''')
                  )
  | _ -> ([dcl], ts'')

   
let parse_assign x ts = let 
    ts' = parse_token ASGN ts in let 
    (e, ts'') = parse_exp ts' in 
    ((Asgn (x, e)), ts'')

let parse_input ts = match ts with 
    | ID x :: ts' -> (Input x, ts')
    | _ -> raise (SyntaxError "Identifier expected.")

(*let parse_print ts = let 
    (e, ts') = parse_exp ts in 
    (Print e, ts')*)


let rec parse_program (ts : token list) : program * token list = match ts with 
    | VAR :: _ -> let 
                  (dcls, ts') = parse_declarations ts in let 
                  (cs, ts'') = parse_commands ts' in 
                  (Program (dcls, cs), ts'')
    | _ -> let 
          (cs, ts'') = parse_commands ts in 
          (Program ([], cs), ts'')

and parse_commands (ts : token list) : cmd list * token list = let 
    (c, ts') = parse_command ts in 
    match ts' with 
    | SEMI :: ts'' -> let 
                      (cs, ts''') = parse_commands ts''
                      in (c :: cs, ts''')
    | _ -> ([c], ts')

and parse_command (ts : token list) : cmd * token list = match ts with 
  | ID x :: ts' -> parse_assign x ts'
  | INPUT :: ts' -> parse_input ts'
  | PRINT :: ts' -> parse_print ts'
  | IF :: ts' -> parse_if ts'
  | WHILE :: ts' -> parse_while ts'
  | BEGIN :: ts' -> parse_block ts'
  | _ -> raise (SyntaxError "Command expected.")
    
and parse_if (ts : token list) : cmd * token list = let 
    (b, ts') = parse_cond ts in let 
    ts'' = parse_token THEN ts' in let
    (c1, ts''') = parse_command ts'' in 
        match ts''' with 
        | ELSE :: ts'''' -> let (c2, ts''''') = parse_command ts'''' in 
                           (Ite (b, c1, c2), ts''''')
        | _ -> (If (b, c1), ts''')

and parse_while (ts : token list) : cmd * token list = let 
    (b, ts') = parse_cond ts in let 
    ts'' = parse_token DO ts' in let 
    (c, ts''') = parse_command ts'' in 
    (While (b, c), ts''')

and parse_block (ts : token list) : cmd * token list = let 
    (p, ts') = parse_program ts in let 
    ts'' = parse_token END ts' in 
    (Begin p, ts'')

val parse_declaration : token list -> string * token list = <fun>


val parse_declarations : token list -> string list * token list = <fun>


val parse_assign : string -> token list -> cmd * token list = <fun>


val parse_input : token list -> cmd * token list = <fun>


val parse_program : token list -> program * token list = <fun>
val parse_commands : token list -> cmd list * token list = <fun>
val parse_command : token list -> cmd * token list = <fun>
val parse_if : token list -> cmd * token list = <fun>
val parse_while : token list -> cmd * token list = <fun>
val parse_block : token list -> cmd * token list = <fun>


In [20]:
parse_declarations [VAR; ID "A"; SEMI; VAR; ID "B"; SEMI;
   VAR; ID "C"; SEMI; VAR; ID "D"; SEMI;
   VAR; ID "Z"; SEMI; ID "A"; ASGN; INT 128;
   SEMI; ID "B"; ASGN; INT 64; SEMI; ID "C";
   ASGN; INT 32; SEMI; ID "D"; ASGN; INT 16;
   SEMI; ID "Z"; ASGN; LBRA; ID "A"; PLUS;
   ID "B"; RBRA; PLUS; LBRA; ID "C"; PLUS;
   ID "D"; RBRA]

- : string list * token list =
(["A"; "B"; "C"; "D"; "Z"],
 [ID "A"; ASGN; INT 128; SEMI; ID "B"; ASGN; INT 64; SEMI; ID "C"; ASGN;
  INT 32; SEMI; ID "D"; ASGN; INT 16; SEMI; ID "Z"; ASGN; LBRA; ID "A"; PLUS;
  ID "B"; RBRA; PLUS; LBRA; ID "C"; PLUS; ID "D"; RBRA])


## 6. Challenge 

Change the parser of expressions so that all binary operators are **left-associative**. 