# Lab 2

- It is recommended to **go through this file with a partner**. 
- Ensure to **ask** if anything is not clear - first your partner, then a lab helper.
- You want to first go through the accompanying code of the lectures.

In this lab, you will write a full lexer for ``SIMP``. 

Writing a lexer consists of the following parts: 
1. Identifying all necessary tokens. 
2. Set up the tokenisation. 
3. Set up needed regular expressions.

During this lab, you will want to work both with this file and the input file for the lexer generator, ``simp.mll``.
Open this file now.

In [12]:
from jupyterquiz import display_quiz

question_path="./"

## 1. Tokens 

Recall the description of ``SIMP``. 

```
program ::=  [declarations;] commands 
declarations ::= declaration | declaration; declarations 
declaration ::= VAR identifier

commands ::= command | command; commands
command ::= identifier := exp | IF condexp THEN command | IF condexp THEN command ELSE command | WHILE condexp DO command | BEGIN program END | INPUT identifier | PRINT exp 

comp := = | != | <= | < | >= | >
condexp := exp comp exp 
exp  ::= identifier | number | exp + exp | exp – exp | exp * exp | exp / exp | - exp 
```

1.  Write down a suitable data type of tokens: 

In [None]:
type token = (* TODO *)

error: compile_error

In [14]:
display_quiz(question_path+"questions21.json")

<IPython.core.display.Javascript object>

2. Insert the definition of tokens into the user definitions of ``simp.mll``. 

## 2. Define Rules For Tokens 


Let us start setting up the ``token`` function. 

In [15]:
display_quiz(question_path+"questions22.json")

<IPython.core.display.Javascript object>

1. Set up the ``token`` function such that 
    - The end-of-file symbol is recognized. 
    - An exception is thrown in case there is an unexpected character. You will want to define the exception in the user definitions. 

2. Include one rule for each token except identifers and integers. 

Let's test whether your code runs.

**IMPORTANT** Jupyter Notebooks automatically saves some output information. 
Each time you change the ``simp.mll`` file and want to re-run the following commands, 
first choose in the menu Kernel -> Restart & Clear Output to ensure your changed file is used.

In [16]:
#require "jupyter.notebook" ;;

open Jupyter_notebook;;

In [17]:
Process.sh "ocamllex simp.mll";;
Process.sh "ocamlc -c simp.ml";;

62 states, 3074 transitions, table size 12668 bytes


- : Jupyter_notebook.Process.t =
{Jupyter_notebook.Process.exit_status = Unix.WEXITED 0; stdout = None;
 stderr = None}


- : Jupyter_notebook.Process.t =
{Jupyter_notebook.Process.exit_status = Unix.WEXITED 0; stdout = None;
 stderr = None}


In [18]:
#load "simp.cmo"

In [19]:
Simp.token;;

let rec stream_to_list buffer = 
    match Simp.token buffer with 
    | EOF -> []
    | x -> x :: stream_to_list buffer

let res = stream_to_list (Lexing.from_string "WHILE < = BEGIN READ END PRINT <=")

- : Lexing.lexbuf -> Simp.token = <fun>


val stream_to_list : Lexing.lexbuf -> Simp.token list = <fun>


val res : Simp.token list =
  [Simp.WHILE; Simp.LT; Simp.EQ; Simp.BEGIN; Simp.ID "READ"; Simp.END;
   Simp.PRINT; Simp.LTE]


## 3. Set Up Regular Expressions 

We come to the last part, where you will ensure to set up the right regular expressions for white space, integers, and identifers. 
Remember that we want: 
- white space to be either a blank ``" "``, horizontal tabulator space ``\t``, or the new line symbol ``\n``.
- integers to be a positive or negative number.
- identifiers to be a letter followed by an arbitrary number of letters or numbers.

1. Define patterns for white space, numbers, and identifiers. 
2. Add rule for white space, numbers and identifiers.

In [9]:
display_quiz(question_path+"questions23.json")

<IPython.core.display.Javascript object>

3. Once you update the file ``simp.mll``, reset the kernel: first choose in the menu Kernel -> Restart & Clear Output to ensure your changed file is used.

Let's test a final program: 

In [20]:
let p = "VAR s;
VAR n;
s := 0; 
INPUT n;
WHILE n := 0 DO 
    s := s + n; 
    INPUT n
END;
PRINT s
"

let res = stream_to_list (Lexing.from_string p)

val p : string =
  "VAR s;\nVAR n;\ns := 0; \nINPUT n;\nWHILE n := 0 DO \n    s := s + n; \n    INPUT n\nEND;\nPRINT s\n"


val res : Simp.token list =
  [Simp.VAR; Simp.ID "s"; Simp.SEMI; Simp.VAR; Simp.ID "n"; Simp.SEMI;
   Simp.ID "s"; Simp.ASGN; Simp.INT 0; Simp.SEMI; Simp.INPUT; Simp.ID "n";
   Simp.SEMI; Simp.WHILE; Simp.ID "n"; Simp.ASGN; Simp.INT 0; Simp.DO;
   Simp.ID "s"; Simp.ASGN; Simp.ID "s"; Simp.PLUS; Simp.ID "n"; Simp.SEMI;
   Simp.INPUT; Simp.ID "n"; Simp.END; Simp.SEMI; Simp.PRINT; Simp.ID "s"]


## Challenge 

Extend the lexer file for ``Simp`` so that comments recognized. 

1. Implement one-line comments, starting with ``#`` and going for the rest of the line. 
 E.g., 
  ```
  # This is a comment 
   2 + 3
  ```
would yield ``[INT 2; PLUS; INT 3]``.

2. Implement comments going over several lines. Comments start with ``{``, end with ``}`` and ignore all content. 
    I.e., there are no nested comments. 
 
 E.g., 
  ```
  { This is a comment 
      of several lines. }
   2 + 3
   { Comment at the end of the file. }
  ```
would yield ``[INT 2; PLUS; INT 3]``.    
    

3. Can you define nested comments?

In [22]:
let p = "# Comment 1
VAR s;
VAR n;
{ Comment over several 
lines }
s := 0; 
INPUT n;
WHILE n := 0 DO 
    s := s + n; 
    INPUT n
END;
PRINT s
{ Second comment. }
"

let res = stream_to_list (Lexing.from_string p)

val p : string =
  "# Comment 1\nVAR s;\nVAR n;\n{ Comment over several \nlines }\ns := 0; \nINPUT n;\nWHILE n := 0 DO \n    s := s + n; \n    INPUT n\nEND;\nPRINT s\n{ Second comment. }\n"


val res : Simp.token list =
  [Simp.VAR; Simp.ID "s"; Simp.SEMI; Simp.VAR; Simp.ID "n"; Simp.SEMI;
   Simp.ID "s"; Simp.ASGN; Simp.INT 0; Simp.SEMI; Simp.INPUT; Simp.ID "n";
   Simp.SEMI; Simp.WHILE; Simp.ID "n"; Simp.ASGN; Simp.INT 0; Simp.DO;
   Simp.ID "s"; Simp.ASGN; Simp.ID "s"; Simp.PLUS; Simp.ID "n"; Simp.SEMI;
   Simp.INPUT; Simp.ID "n"; Simp.END; Simp.SEMI; Simp.PRINT; Simp.ID "s"]
