# CW 2.4:  Lexer for FUNC

**CW 2.4** consists of writing a lexer for FUNC.

**Submissions** Please recall that for this week (only), you'll have to submit a .zip file containing both this notebook and the file ``CW/func.mll``.

## The Source Language: FUNC

Recall the syntax of FUNC:

```
<program> ::= <methods> 
<methods> ::= <method>;[<methods>] 
<method> ::= method <id>([<args>]) [vars <args>] 
	begin <statements> [return <id>;] endmethod
<args> ::= <id>[,<args>] 
<statements> ::= <statement>;[<statements>] 
<statement> ::= <assign> | <if> | <while> | <rw>
<rw> ::= read <id> | write <exp>
<assign> ::= <id> := <exp>
<if> ::= if  <cond> then <statements> [else <statements>] endif 
<while> ::= while <cond> begin <statements> endwhile
<cond> ::= <bop> ( [<exps>] ) 
<bop> ::= less | lessEq | eq | nEq 
<exps> ::= <exp> [,<exps>] 
<exp> ::= <id>[( [<exps>] )] | <int> 
<int> is a natural number (no leading zeroes) 
<id> is any string starting with a character followed by characters or numbers (that is not already a keyword)
```

- Each program must have a function called ``main`` with no arguments and no return value. 
- All other functions may have an optional return value. If a function does not have a return value, they implicitly return `0`.
- You should support the following built-in functions - assume they have been defined; they accept two integers and return an integer:
     - ``plus``, which adds its arguments;
     - ``times``, which multiplies its arguments;
     - ``minus``, which subtracts its arguments;
     - ``divide``, which divides its arguments.
- All the boolean operators (``less``, ``lessEq``, ``eq``, ``nEq``) are also binary, i.e. take two arguments.
- The ``read`` command assumes that the given variable is an ``int`` variable.

##### Example 

The following example illustrates a valid FUNC program (more examples later in the document)

```
method pow(x, y) vars i, res
begin
    res := x; 
    i := 1; 
    while less(i,y)
    begin
        res := times(res,x);
        i := plus(i,1); 
    endwhile;
    write res;
    return res;
endmethod;

method main() vars a, b, x
begin
    a := 5; b := 2; 
    x := pow(b,a);
    if  eq(x,32) then write 1; else write 0; endif; 
endmethod;
```

## Lexing

**Task** Produce a lexer file into ``CW/func.mll`` together with a suitable representation of tokens.

**IMPORTANT** Jupyter Notebooks automatically saves some output information. 
Each time you change the ``func.mll`` file and want to re-run the following commands, 
first choose in the menu Kernel -> Restart & Clear Output to ensure your changed file is used.

In [22]:
#require "jupyter.notebook" ;;
open Jupyter_notebook ;;

In [24]:
(* Run the lexer generator *)
Process.sh "ocamllex func.mll";;

(* Compile and load the file produced by the lexer *)
Process.sh "ocamlc -c func.ml";;
#load "func.cmo";;

(* Convert the buffer into a list for further processing. *)
let rec stream_to_list buffer = 
    match Func.token buffer with 
    | EOF -> []
    | x -> x :: stream_to_list buffer

70 states, 4692 transitions, table size 19188 bytes


- : Jupyter_notebook.Process.t =
{Jupyter_notebook.Process.exit_status = Unix.WEXITED 0; stdout = None;
 stderr = None}


- : Jupyter_notebook.Process.t =
{Jupyter_notebook.Process.exit_status = Unix.WEXITED 0; stdout = None;
 stderr = None}


The files func.cmo and func.cmo disagree over interface Func


val stream_to_list : Lexing.lexbuf -> Func.token list = <fun>


In [25]:
(*
You can test your lexer here. 
You will want to test your lexer with more code snippets!
*)

let p_basic = 
"
method main() vars inp, res
begin
read inp;
res:=0;
while less(0,inp)
begin
res := plus(res,inp);
inp := minus(inp,1);
endwhile;
write res;
endmethod;
";;

open Func

let res = stream_to_list (Lexing.from_string p_basic) 

val p_basic : string =
  "\nmethod main() vars inp, res\nbegin\nread inp;\nres:=0;\nwhile less(0,inp)\nbegin\nres := plus(res,inp);\ninp := minus(inp,1);\nendwhile;\nwrite res;\nendmethod;\n"


val res : Func.token list =
  [METHOD; ID "main"; LEFTPARANTHESIS; RIGHTPARANTHESIS; ID "vars"; ID "inp";
   COMMA; ID "res"; ID "begin"; ID "read"; ID "inp"; SEMICOLON; ID "res";
   ASSIGN; INT 0; SEMICOLON; ID "while"; ID "less"; LEFTPARANTHESIS; 
   INT 0; COMMA; ID "inp"; RIGHTPARANTHESIS; ID "begin"; ID "res"; ASSIGN;
   ID "plus"; LEFTPARANTHESIS; ID "res"; COMMA; ID "inp"; RIGHTPARANTHESIS;
   SEMICOLON; ID "inp"; ASSIGN; ID "minus"; LEFTPARANTHESIS; ID "inp"; COMMA;
   INT 1; RIGHTPARANTHESIS; SEMICOLON; ID "endwhile"; SEMICOLON; ID "write";
   ID "res"; SEMICOLON; ENDMETHOD; SEMICOLON]
