## The P0 Compiler – for Lab 11 only
#### COMP SCI 4TB3/6TB3, McMaster University
#### Original Author: Emil Sekerinski, revised March 2021

This collection of _Jupyter notebooks_ develops a compiler for P0, a programming langauge inspired by Pascal, as Pascal was designed for ease of compilation. The compiler currently generates WebAssebmbly and MIPS code, but is modularized to facilitate other targets. WebAssembly is representive of stack-based virtual machines while the MIPS architecture is representative of Reduced Instruction Set Computing (RISC) processors.

### The P0 Language
The main syntactic elements of P0 are *statements*, *declarations*, *types*, and *expressions*.

#### Statements
* _Assignment statement_ (`d` designator, `e` expressions):
      d := e
* _Procedure call_ (`p` procedure identifier, `e₁`, `e₂`, … expressions, `d` designator):
      p(e₁, e₂, …)
      d ← p(e₁, e₂, …)
* _Sequential composition_ (`S₁`, `S₂`, … statements):
      S₁; S₂; …
* _If-statements_ (`B` Boolean expression, `S`, `T` statements):
	  if B then S
      if B then S else T
* _While-statements_ (`B` Boolean expression, `S` statement):
      while B do S

#### Declarations
* _Constant declaration_ (`c` constant identifier, `e` constant expression):
      const c =  e
* _Type declaration_ (`t` type identifier, `T` type):
      type t = T
* _Variable declaration_ (`x₁`, `x₂`, … variable identifiers, `T` type):
      var x₁, x₂, …: T
* _Procedure declaration_ (`p` procedure identifier, `v₁`, `v₂`, … variable identifiers, `T₁`, `T₂`, …, `U`  types, `D₁`, `D₂`, … declarations, `S` statement):
      procedure p (v₁: T₁, v₂: T₂, …) → (r: U)
          D₁
          D₂
          …
              S

#### Types
* _Elementary Types:_
      integer, boolean
* _Arrays_ (`m`, `n` integer expressions, `T` type):
      [m .. n] → T

#### Expressions:
* _Constants:_
	  number, identifier
* _Designator_ (`x` identifier, `i` expression):
      x
	  x[i]
* _Operators,_ in order of their binding power (e, e₁, e₂ are expressions):
	  (e), ¬ e
      e₁ × e₂, e₁ div e₂, e₁ mod e₂, e₁ and e₂
      + e, – e, e₁ + e₂, e₁ – e₂, e₁ or e₂
      e₁ = e₂, e₁ ≠ e₂, e₁ < e₂, e₁ ≤ e₂, e₁ > e₂, e₁ ≥ e₂

Types `integer` and `boolean` and procedures `read`, `write`, `writeln` are not symbols of the grammar; they are _standard identifiers_ (*predefined identifiers*).

### P0 Examples


```Pascal
procedure quot(x, y: integer) → (q: integer)
    var r: integer
      q := 0; r := x
      while r ≥ y do { q × y + r = x ∧ r ≥ y }
        r := r - y; q := q + 1

program arithmetic
    var x, y, q, r: integer
      x ← read(); y ← read()
      q ← quot(x, y)
      write(q); writeln()
```

```Pascal
procedure fact(n: integer) → (f: integer)
    if n = 0 then f := 1
    else
        f ← fact(n - 1); f := f × n

program factorial;
    var y, z: integer
        y ← read(); z ← fact(y); write(z)
```

```Pascal
const N = 10
var a: [0 .. N - 1] → integer

procedure has(x: integer) → (r: boolean)
    var i: integer
        i := 0
        while i < N and a[i] ≠ x do i := i + 1
        r := i < N

```

### The P0 Grammar

    designator ::= ident { "[" expression "]" }
    factor ::= designator | integer | "(" expression ")" | "¬" factor
    term ::= factor {("×" | "div" | "mod" | "and") factor}
    simpleExpression ::= ["+" | "-"] term {("+" | "-" | "or") term}
    expression ::= simpleExpression
        {("=" | "≠" | "<" | "≤" | ">" | "≥") simpleExpression}
    statementList ::= statement {";" statement}
    statementBlock ::= statementList {statementList}
    statementSuite ::= statementList | INDENT statementBlock DEDENT
    statement ::=
        designator ":=" expression |
        designator "←" ident "(" [expression {"," expression}] ")" |
        "if" expression "then" statementSuite ["else" statementSuite] |
        "while" expression "do" statementSuite
    type ::=
        ident |
        "[" expression ".." expression "]" "→" type
    typedIds ::= ident ":" type {"," ident ":" type}
    declarations ::= 
        {"const" ident "=" expression}
        {"type" ident "=" type}
        {"var" typedIds}
        {"procedure" ident "(" [typedIds] ")" [ "→" "(" typedIds ")" ] body}
    body ::= INDENT declarations (statementBlock | INDENT statementBlock DEDENT) DEDENT
    program ::= declarations "program" ident body

### The P0 Grammar - OO

    designator ::= ident { "[" expression "]" }
    factor ::= designator | integer | "(" expression ")" | "¬" factor
    term ::= factor {("×" | "div" | "mod" | "and") factor}
    simpleExpression ::= ["+" | "-"] term {("+" | "-" | "or") term}
    expression ::= simpleExpression
        {("=" | "≠" | "<" | "≤" | ">" | "≥") simpleExpression}
    statementList ::= statement {";" statement}
    statementBlock ::= statementList {statementList}
    statementSuite ::= statementList | INDENT statementBlock DEDENT
    statement ::=
        designator ":=" expression |
        designator "←" ident "(" [expression {"," expression}] ")" |
        "if" expression "then" statementSuite ["else" statementSuite] |
        "while" expression "do" statementSuite
    type ::=
        ident |
        "[" expression ".." expression "]" "→" type
    typedIds ::= ident ":" type {"," ident ":" type}
    classAttributes ::= {"var" : typedIds}
    classMethods ::= {"procedure" ident "(" "self" ["," typedIds] ")" [ "→" "(" typedIds ")" ] body}
    classBody ::= INDENT classAttributes classMethods DEDENT
    declarations ::= 
        {"const" ident "=" expression}
        {"type" ident "=" type}
        {"class" ident ["extends" ident] classBody}
        {"var" typedIds}
        {"object" typedIds}
        {"procedure" ident "(" [typedIds] ")" [ "→" "(" typedIds ")" ] body}
    body ::= INDENT declarations (statementBlock | INDENT statementBlock DEDENT) DEDENT
    program ::= declarations "program" ident body

### Modularization
<div><span style="float:right"><img width="60%" src="modularization.svg"/></span></div>

- The parser, `P0`, parses the source text, type-checks it, evaluates constant expressions, and generates target code, in one pass over the source text.
- The scanner, `SC`, reads characters of the source text and provides the next symbol to the parser; it allows errors to be reported at the current position in the source text.
- The symbol table, `ST`, stores all currently valid declarations, as needed for type-checking.
- The code generator, `CG`, provides the parser with procedures for generating code for P0 expressions, statements, and variable declarations, and procedure declarations.

The parser is the main program that calls the scanner, symbol table, and code generator. All call the scanner for error reporting. The code generator augments the entries in the the symbol table, for example with the size and location of variables. There are three code generators: `CCGwat` generates WebAssembly code, `CGmips` generates MIPS code, and `CGast` generates only an abstract syntax tree.

### The Parser
The scanner and symbol table are always imported. Depending on the selected target, a different code generator is imported when compilation starts.

In [None]:
import nbimporter; nbimporter.options["only_defs"] = False
import SC  #  used for SC.init, SC.sym, SC.val, SC.error
from SC import TIMES, DIV, MOD, AND, PLUS, MINUS, OR, EQ, NE, LT, GT, \
    LE, GE, PERIOD, COMMA, COLON, NOT, LPAREN, RPAREN, LBRAK, RBRAK, \
    LARROW, RARROW, OF, THEN, DO, BECOMES, NUMBER, IDENT, SEMICOLON, ELSE, \
    IF, WHILE, ARRAY, RECORD, CONST , TYPE, VAR, PROCEDURE, PROGRAM, \
    INDENT, DEDENT, NL, EOF, getSym, mark, \
    CLASS, EXTENDS, OBJECT, printMark, SELF

import ST  #  used for ST.init
from ST import Var, Ref, Const, Type, Proc, StdProc, Int, Bool, Enum, \
    Record, Array, newDecl, find, openScope, topScope, closeScope, \
    Object, Class

The first sets for recursive descent parsing are:

In [None]:
FIRSTFACTOR = {IDENT, NUMBER, LPAREN, NOT}
FIRSTEXPRESSION = {PLUS, MINUS, IDENT, NUMBER, LPAREN, NOT}
FIRSTSTATEMENT = {IDENT, IF, WHILE}
FIRSTTYPE = {IDENT, RECORD, ARRAY, LPAREN}
FIRSTDECL = {CONST, TYPE, VAR, PROCEDURE, CLASS, OBJECT}

Procedure `designator()` parses

    designator ::= ident { "[" expression "]" }

and generates code for the designator if not error is reported. If the designator is a constant, a `Const` item is returned (and code may not be generated); if the designator is not a constant, the location of the result is returned. 

In [None]:
def designator():
    global Debug
    if Debug: printMark("designator start")
    x = find(SC.val)
    if type(x) == Var: 
        x = CG.genVar(x)
        getSym()
    elif type(x) == Const: x = Const(x.tp, x.val); x = CG.genConst(x); getSym()
    else: mark('designator expected')
    while SC.sym == LBRAK:
        getSym(); y = expression()
        if type(x.tp) == Array:
            if y.tp == Int:
                if type(y) == Const and (y.val < x.tp.lower or y.val >= x.tp.lower + x.tp.length):
                    mark('index out of bounds')
                else: x = CG.genIndex(x, y)
            else: mark('index not integer')
        else: mark('not an array')
        if SC.sym == RBRAK: getSym()
        else: mark("] expected")
    if Debug: printMark("designator end")
    return x

Procedure `factor()` parses

    factor ::= designator | integer | "(" expression ")" | "¬" factor.

and generates code for the factor if no error is reported. If the factor is a constant, a `Const` item is returned (and code may not be generated); if the factor is not a constant, the location of the result is returned. 

In [None]:
def factor():
    global Debug
    if Debug: printMark("factor start")
    if SC.sym == IDENT: x = designator()
    elif SC.sym == NUMBER:
        x = Const(Int, SC.val); x = CG.genConst(x); getSym()
    elif SC.sym == LPAREN:
        getSym(); x = expression()
        if SC.sym == RPAREN: getSym()
        else: mark(") expected")
    elif SC.sym == NOT:
        getSym(); x = factor()
        if x.tp != Bool: mark('not boolean')
        elif type(x) == Const: x.val = 1 - x.val # constant folding
        else: x = CG.genUnaryOp(NOT, x)
    else: mark("expression expected")
    if Debug: printMark("factor end")
    return x

Procedure `term()` parses

    term ::= factor {("×" | "div" | "mod" | "and") factor}.

and generates code for the term if no error is reported. If the term is a constant, a `Const` item is returned (and code may not be generated); if the term is not a constant, the location of the result is returned. 

In [None]:
def term():
    global Debug
    if Debug: printMark("term start")
    x = factor()
    while SC.sym in {TIMES, DIV, MOD, AND}:
        op = SC.sym; getSym();
        if op == AND and type(x) != Const: x = CG.genUnaryOp(AND, x)
        y = factor() # x op y
        if x.tp == Int == y.tp and op in {TIMES, DIV, MOD}:
            if type(x) == Const == type(y): # constant folding
                if op == TIMES: x.val = x.val * y.val
                elif op == DIV: x.val = x.val // y.val
                elif op == MOD: x.val = x.val % y.val
            else: x = CG.genBinaryOp(op, x, y)
        elif x.tp == Bool == y.tp and op == AND:
            if type(x) == Const: # constant folding
                if x.val: x = y # if x is true, take y, else x
            else: x = CG.genBinaryOp(AND, x, y)
        else: mark('bad type')
    if Debug: printMark("term end")
    return x

Procedure `simpleExpression()` parses

    simpleExpression ::= ["+" | "-"] term {("+" | "-" | "or") term}.

and generates code for the simple expression if no error is reported. If the simple expression is a constant, a `Const` item is returned (and code may not be generated); if the simple expression is not constant, the location of the result is returned. 

In [None]:
def simpleExpression():
    global Debug
    if Debug: printMark("simpleExp start")
    if SC.sym == PLUS:
        getSym(); x = term()
    elif SC.sym == MINUS:
        getSym(); x = term()
        if x.tp != Int: mark('bad type')
        elif type(x) == Const: x.val = - x.val # constant folding
        else: x = CG.genUnaryOp(MINUS, x)
    else: x = term()
    while SC.sym in {PLUS, MINUS, OR}:
        op = SC.sym; getSym()
        if op == OR and type(x) != Const: x = CG.genUnaryOp(OR, x)
        y = term() # x op y
        if x.tp == Int == y.tp and op in {PLUS, MINUS}:
            if type(x) == Const == type(y): # constant folding
                if op == PLUS: x.val = x.val + y.val
                elif op == MINUS: x.val = x.val - y.val
            else: x = CG.genBinaryOp(op, x, y)
        elif x.tp == Bool == y.tp and op == OR:
            if type(x) == Const: # constant folding
                if not x.val: x = y # if x is false, take y, else x
            else: x = CG.genBinaryOp(OR, x, y)
        else: mark('bad type')
    if Debug: printMark("simpleExp end")
    return x

Procedure `expression()` parses

    expression ::= simpleExpression
                 {("=" | "≠" | "<" | "≤" | ">" | "≥") simpleExpression}.

and generates code for the expression if no error is reported. If the expression is a constant, a `Const` item is returned (and code may not be generated); if the expression is not constant, the location of the result is returned. 

In [None]:
def expression():
    global Debug
    if Debug: printMark("Exp start")
    x = simpleExpression()
    while SC.sym in {EQ, NE, LT, LE, GT, GE}:
        op = SC.sym; getSym(); y = simpleExpression() # x op y
        if x.tp == y.tp in (Int, Bool):
            if type(x) == Const == type(y): # constant folding
                if op == EQ: x.val = x.val == y.val
                elif op == NE: x.val = x.val != y.val
                elif op == LT: x.val = x.val < y.val
                elif op == LE: x.val = x.val <= y.val
                elif op == GT: x.val = x.val > y.val
                elif op == GE: x.val = x.val >= y.val
                x.tp = Bool
            else: x = CG.genRelation(op, x, y)
        else: mark('bad type')
    if Debug: printMark("Exp end")
    return x

Procedure `statementList()` parses

    statementList ::= statement {";" statement}

and generates code for the statement list if no error is reported.

In [None]:
def statementList():
    global Debug
    if Debug: printMark("stmtList start")
    x = statement()
    while SC.sym == SEMICOLON:
        getSym(); y = statement(); x = CG.genSeq(x, y)
    if Debug: printMark("stmtList end")
    return x

Procedure `statementBlock()` parses

    statementBlock ::= statementList {statementList}

and generates code for the statement block if no error is reported. Each statement list has to start on a new line.

In [None]:
def statementBlock():
    global Debug
    if Debug: printMark("stmtBlock start")
    x = statementList()
    while SC.sym in FIRSTSTATEMENT:
        if not SC.newline: mark('new line expected')
        y = statementList(); x = CG.genSeq(x, y)
    if Debug: printMark("stmtBlock start")
    return x

Procedure `statementSuite()` parses

    statementSuite ::= statementList | INDENT statementBlock DEDENT

and generates code for the statement suite if no error is reported.

In [None]:
def statementSuite():
    global Debug
    if Debug: printMark("stmtSuite start")
    if SC.sym in FIRSTSTATEMENT: x = statementList()
    elif SC.sym == INDENT:
        getSym(); x = statementBlock()
        if SC.sym == DEDENT: getSym();
        else: mark('dedent expected')
    else: mark("(indented) statement expected!")
    if Debug: printMark("stmtSuite start")
    return x

Procedure `statement()` parses

    statement ::=
        designator ":=" expression |
        designator "←" ident "(" [expression {"," expression}] ")" |
        "if" expression "then" statementSuite ["else" statementSuite] |
        "while" expression "do" statementSuite

and generates code for the statement if no error is reported.

In [None]:
def statement():
    global Debug
    if Debug: printMark("stmt start")
    if SC.sym == IDENT: # x := y, y(...), x ← y(...)
        # type(x) == Proc, StdProc: check no result parameters needed; call, y := true, x
        # type(x) ≠ Proc, StdProc: x := designator():
        #   sym == BECOMES: assignment; call := false
        #   sym == LARROW: check result paramter match, type(y) is Proc, StdProc, 
        # record SC.val in case it is a class method (j.setGrade)
        tmp = SC.val
        #ST.printSymTab()
        x = find(SC.val)
        #the boolean classMet represent whether x is a class method
        classMet = False
        if type(x) in {Proc, StdProc}: # call
            if type(x) == Proc:
                #print("Proc found", x)
                # each Proc type has an attribute recording if 
                # it is a class method
                classMet = x.classMet
            if x.res != []: mark('designator for result expected')
            getSym(); call, y, x = True, x, None
        elif type(x) in {Var, Ref}:
            x = designator()
            if SC.sym == BECOMES:
                getSym(); y = expression(); call = False # x := y
                if x.tp == y.tp in {Bool, Int}: x = CG.genAssign(x, y)
                else: mark('incompatible assignment')
            elif SC.sym == LARROW:
                getSym()
                if SC.sym == IDENT: y = find(SC.val); getSym(); call = True
                else: mark('procedure identifier expected')
                if type(y) in {Proc, StdProc}:
                    if len(y.res) != 1 or x.tp != y.res[0].tp: mark('incompatible call')
                else: mark('procedure expected')
            else: mark(':= or ← expected')
        else: mark("variable or procedure expected")
        if call: # call y(ap) or x ← y(ap)
            #print("a calll for ", y, "with par: ")
            #for each in y.par:
            #    print(each, "type:", type(each))
            #print("end of par")
            fp, ap, i = y.par, [], 0   #  list of formals, list of actuals
            #this is a class method, so it starts with self
            if classMet:
                # j.setGrade -> objName = j
                objName = tmp.split('.')[0]
                #print("--",procName, tmp, objName)
                # find the name (e.g. j) in symbol table
                obj = find(objName)
                #pass the base address
                ap.append(CG.genActualPara(obj,fp[0],0))
                i = i + 1
            if SC.sym == LPAREN: getSym()
            else: mark("'(' expected")
            if SC.sym in FIRSTEXPRESSION:
                a = expression()
                if i < len(fp):
                    if fp[i].tp == a.tp: ap.append(CG.genActualPara(a, fp[i], i))
                    else: mark('incompatible parameter')
                else: mark('extra parameter')
                i = i + 1
                while SC.sym == COMMA:
                    getSym()
                    a = expression()
                    if i < len(fp):
                        if fp[i].tp == a.tp: ap.append(CG.genActualPara(a, fp[i], i))
                        else: mark('incompatible parameter')
                    else: mark('extra parameter')
                    i = i + 1
            if SC.sym == RPAREN: getSym()
            else: mark("')' expected")
            if i < len(fp): mark('too few parameters')
            elif type(y) == StdProc:
                if y.name == 'read': x = CG.genRead(x)
                elif y.name == 'write': x = CG.genWrite(a)
                elif y.name == 'writeln': x = CG.genWriteln()
            else: 
                #print("At gencall, x, y, ap",x, y, ap)
                x = CG.genCall(x, y, ap)
    elif SC.sym == IF:
        getSym(); x = expression();
        if x.tp == Bool: x = CG.genThen(x)
        else: mark('boolean expected')
        if SC.sym == THEN: getSym()
        else: mark("'then' expected")
        y = statementSuite()
        if SC.sym == ELSE:
            getSym()
            y = CG.genElse(x, y)
            z = statementSuite()
            x = CG.genIfElse(x, y, z)
        else:
            x = CG.genIfThen(x, y)
    elif SC.sym == WHILE:
        getSym(); t = CG.genWhile(); x = expression()
        if x.tp == Bool: x = CG.genDo(x)
        else: mark('boolean expected')
        if SC.sym == DO: getSym()
        else: mark("'do' expected")
        y = statementSuite()
        x = CG.genWhileDo(t, x, y)
    else: mark('statement expected')
    if Debug: printMark("stmt end")
    return x

Procedure `typ` parses

    type ::=
        ident |
        "[" expression ".." expression "]" "→" type

and returns a type descriptor if not error is reported. The array bound are checked to be constants; the lower bound must be smaller or equal to the upper bound.

In [None]:
def typ():
    global Debug
    if Debug: printMark("typ start")
    if SC.sym == IDENT:
        #x is the type/class the ident corresponds to
        ident = SC.val; x = find(ident)
        if type(x) == Class: getSym()
        elif type(x) == Type: x = Type(x.val); getSym()
        else: 
            print('failed to find (or icorrect type)--', ident, '--current symtab')
            ST.printSymTab()
            mark('type identifier expected')
    elif SC.sym == LBRAK:
        getSym(); x = expression()
        if SC.sym == PERIOD: getSym()
        else: mark("'.' expected")
        if SC.sym == PERIOD: getSym()
        else: mark("'.' expected")
        y = expression()
        if SC.sym == RBRAK: getSym()
        else: mark("']' expected")
        if SC.sym == RARROW: getSym()
        else: mark("'→' expected")
        z = typ().val;
        if type(x) != Const or x.val < 0: mark('bad lower bound')
        elif type(y) != Const or y.val < x.val: mark('bad upper bound')
        else: x = Type(CG.genArray(Array(z, x.val, y.val - x.val + 1)))
    else: mark('type expected')
    if Debug: printMark("typ end")
    return x

Procedure `typeIds()` parses

    typedIds ::= ident ":" type {"," ident ":" type}.

and updates the top scope of symbol table; an error is reported if an identifier is already in the top scope.

In [None]:
# return the list of tuples
# (name, type)
# typedIds can take in a class (fromClass)
# to represent the idents are attributes of the class
# and will be added to the symbol table
def typedIds():
    global Debug
    if Debug: printMark("typedIds start")
    #list of newly created ident in tuples
    newIdents = []
    if SC.sym == IDENT: tid = [SC.val]; getSym()
    else: mark("identifier expected")
    while SC.sym == COMMA:
        getSym()
        if SC.sym == IDENT: tid.append(SC.val); getSym()
        else: mark('identifier expected')
    if SC.sym == COLON: getSym()
    else: mark("':' expected")
    tp = typ().val
    for i in tid: 
        newDecl(i, Var(tp))
        #add the tuple to the result list
        newIdents.append((i, tp))
    while SC.sym == COMMA:
        getSym()
        if SC.sym == IDENT: tid = [SC.val]; getSym()
        else: mark("identifier expected")
        while SC.sym == COMMA:
            getSym()
            if SC.sym == IDENT: tid.append(SC.val); getSym()
            else: mark('identifier expected')
        if SC.sym == COLON: getSym()
        else: mark("':' expected")
        tp = typ().val
        for i in tid: 
            newDecl(i, Var(tp))
            #add the tuple to result list
            newIdents.append((i, tp))
    if Debug: printMark("typedIds end")
    return newIdents

Procedure `declarations(allocVar)` parses

    declarations ::= 
        {"const" ident "=" expression}
        {"type" ident "=" type}
        {"var" typedIds}
        {"procedure" ident "(" [typedIds] ")" [ "→" "(" typedIds ")" ] body}

and updates the top scope of symbol table; an error is reported if an identifier is already in the top scope. An error is also reported if the expression of a constant declarations is not constant. For each procedure, a new scope is opened for its formal parameters and local declarations, the formal parameters and added to the symbol table, and code is generated for the body. The size of the variable declarations is returned, as determined by calling paramater `allocVar`.

In [None]:
# return a list of tuples consisting of (attribute name, type)
# 'self.' is not in the names
def classAttributes(currentClass):
    global Debug
    if Debug: printMark("classatt start")
    newAttDefs = []
    while SC.sym == VAR:
        getSym();
        newAttDefs.extend(typedIds())
    if Debug: printMark("classmet end")
    return newAttDefs

In [None]:
# return two lists of strings
# 1. names of the class methods
# note: these names do not contain class name
# 2. the actual Proc types created for the methods

# takes in three:
# currentClass: the Class object that these methods belong to
# className : name of the above class
# att_offset : list of attributes, in the form of tuples:
#                 (name, type, offset)
#  offset is with respect to base address of the class (see classBody)
def classMethods(currentClass, className, att_offset):
    global Debug
    if Debug: printMark("classmet start")
    #list of names of methods, names are in the form :
    #                class.method_name  Person.printSIN
    newMethods = []
    newProc = []
    while SC.sym == PROCEDURE:
        getSym()
        if SC.sym == IDENT: ident = SC.val; getSym()
        else: mark("method name expected")
        ident = className + '.' + ident
        #add to result list
        newMethods.append(ident)
        newDecl(ident, Proc([], [], True))
        sc = topScope(); openScope() # new scope for parameters and body
        if SC.sym == LPAREN: getSym()
        else: mark("( expected")
        # keyword 'self' must be the first parameter
        if SC.sym == SELF:
            getSym()
            # 'self' is a reference to the currentClass type
            vari = Ref(currentClass)
            newDecl("self", vari)
            # att_offset: list of tuples
            # attrib: tuple (name, type, offset)
            for attrib in att_offset:
                attName = attrib[0]       # name of attribute
                attTp = attrib[1]         # type of attribute
                attOffset = attrib[2]     # offset of attrbute
                newAtt = Var(attTp)       # create a new Var
                newAtt.setClass(currentClass)  # set the Var to be part of currentClass
                newAtt.adr = attOffset    # set address to offset, its reg will be set to the base address
                newDecl("self." + attName, newAtt)  # put in symbol table with 'self.' prepended
        else:
            mark("expect keyword self in class methods")
        
        if SC.sym == COMMA: getSym(); typedIds()
        fp_1 = topScope()
        fp = []
        #exclude 'self.' from fp
        for each in fp_1: 
            if "self." not in each.name:
                fp.append(each)
        
        parN = len(fp) #including self
        if SC.sym == RPAREN: getSym()
        else: mark(") expected")
        d = len(fp)
        if SC.sym == RARROW:
            getSym()
            if SC.sym == LPAREN: getSym()
            else: mark('( expected')
            typedIds()
            if SC.sym == RPAREN: getSym()
            else: mark(') expected')
        sc[-1].par, sc[-1].res = fp[:d], fp[d:] #  procedure parameters updated
        parsize = CG.genProcStart(ident, fp[:d], fp[d:])
        #true represents the body of a class method, see body
        body(ident, parsize, True)
        #append the Proc to newProc list
        newProc.append(sc[-1]); closeScope() #  scope for parameters and body close
    if Debug: printMark("classmet end")
    return newMethods, newProc

In [None]:
# takes in three:
# currentClass: the Class object that these methods belong to
# className : name of the above class
# ext: name of class extended
# (currently only for single inheritance, so just one string)
def classBody(className, ext, currentClass):
    global Debug
    if Debug: printMark("classbody start")
    if ext == "":
        # no attribute/methods inherited
        extClass = Class("", "")
        extClassAtt = []
        extClassMet = []
    else:
        extClass = find(ext)
        #get attributes and methods of the parent class
        extClassAtt = extClass.att.copy()
        extClassMet = extClass.methods.copy() 
    if SC.sym == INDENT: getSym()
    else: mark("indent expected in class")
    #open new scope for the class (attributes, methods)
    sc = topScope(); openScope()
    attr = extClassAtt
    
    #list of names of the attributes of super class
    attrNameL = []
    for each in attr:
        attrNameL.append(each[0])
    
    # get the list of tuples consisting of (attribute name, type)
    currentClassAttr = classAttributes(currentClass)
    
    #check for overloads, on name clash,
    # if B extends A, type in B overloads A
    for name, tp in currentClassAttr:
        # i.e. name in A's attributes
        # which means there is already a tuple
        # (name, A's type) in attr
        if name in attrNameL:
            for eachIndex in range(len(attr)):
                each = attr[eachIndex]
                #found the tuple
                if each[0] == name:
                    #change the type
                    attr[eachIndex] = (name, tp)
        else:
            attr.append((name, tp))
    #otherwise, the layout is always base class attributes
    #first, then derived class e.g.    base address         base address
    # B: var sin, age : integer   B:  | sin : int |+0  A: | sin : int   |+0
    # A: var grade : integer          | age : int |+4     | age : int   |+4
    # A extends B                                         | grade : int |+8
    
    size = 0
    #list of attribute, with offset calculated
    # (name, type, offset)
    att_offset = []
    for each in attr:
        name = each[0]
        tp = each[1]
        current_att = (name, tp, size)
        att_offset.append(current_att)
        size += tp.size
        
    sc[-1].att = att_offset #set att, size, methods for the Class object
    sc[-1].size = size
    newMethods, newProc = classMethods(currentClass, className, att_offset)
    if extClassMet != []:
        #add base class methods
        newMethods.extend(extClassMet)
    sc[-1].methods = newMethods
    closeScope() #close the scope for this class
    # currently, the class methods are only seen in the class
    # copy them to the outer scope
    for each in newProc:
        each.lev = each.lev - 1
        topScope().append(each)
    if SC.sym == DEDENT: getSym()
    else: mark("dedent expected in class")
    if Debug: printMark("classbody end")

In [None]:
def declarations(allocVar):
    global Debug
    if Debug: printMark("declaration start")
    while SC.sym == CONST:
        getSym()
        if SC.sym == IDENT: ident = SC.val; getSym()
        else: mark("constant name expected")
        if SC.sym == EQ: getSym()
        else: mark("= expected")
        x = expression()
        if type(x) == Const: newDecl(ident, x)
        else: mark('expression not constant')
    while SC.sym == TYPE:
        getSym()
        if SC.sym == IDENT: ident = SC.val; getSym()
        else: mark("type name expected")
        if SC.sym == EQ: getSym()
        else: mark("= expected")
        x = typ(); newDecl(ident, x)  #  x is of type ST.Type
    #'Class' ident
    while SC.sym == CLASS:
        getSym()
        if SC.sym == IDENT: ident = SC.val; getSym()
        else: mark("class name expected")
        #below takes care of extends, 
        # ext: name of the class, "" if no extends
        ext = ""
        if SC.sym == EXTENDS:
            getSym()
            if SC.sym == IDENT: ext = SC.val; getSym()
            else: mark("extends class name expected")
        currentClass = Class("", ext)
        newDecl(ident, currentClass) #add empty class with name ident into symbol table
        classBody(ident, ext, currentClass)
    start = len(topScope())
    while SC.sym == VAR:
        getSym(); typedIds()
    while SC.sym == OBJECT:
        #objects are declared as Var, with type (.tp) being the Class() created
        getSym(); typedIds()
    varsize = allocVar(topScope(), start)
    while SC.sym == PROCEDURE:
        getSym()
        if SC.sym == IDENT: ident = SC.val; getSym()
        else: mark("procedure name expected")
        newDecl(ident, Proc([], [])) #  entered without parameters
        sc = topScope(); openScope() # new scope for parameters and body
        if SC.sym == LPAREN: getSym()
        else: mark("( expected")
        if SC.sym == IDENT: typedIds()
        fp = topScope()
        if SC.sym == RPAREN: getSym()
        else: mark(") expected")
        d = len(fp)
        if SC.sym == RARROW:
            getSym()
            if SC.sym == LPAREN: getSym()
            else: mark('( expected')
            typedIds()
            if SC.sym == RPAREN: getSym()
            else: mark(') expected')
        sc[-1].par, sc[-1].res = fp[:d], fp[d:] #  procedure parameters updated
        parsize = CG.genProcStart(ident, fp[:d], fp[d:])
        body(ident, parsize); closeScope() #  scope for parameters and body closed
    if Debug: printMark("declarations end")
    return varsize

Procedure `body` parses

    body ::= INDENT declarationBlock (statementBlock | INDENT statementBlock DEDENT) DEDENT

and returns the generated code if no error is reported.

In [None]:
#classMet : boolean for whether this body is for class method
def body(ident, parsize, classMet = False):
    global Debug
    if Debug: printMark("body start")
    if SC.sym == INDENT: getSym()
    else: mark('indent expected')
    start = len(topScope())
    localsize = declarations(CG.genLocalVars)
    CG.genProcEntry(ident, parsize, localsize)
    # if this is the body for a class method
    # need to give the method the base address of the object
    if classMet:
        #r : register that contains the base address
        r = CG.genClassMetEntry(ident, parsize, localsize)
        # in top scope, need to assign 'self' with r
        # also, assign all Var with 'self.' the register r
        # (they already have the offset in their .adr)
        for each in topScope():
            if each.name == 'self':
                each.reg = r
            elif 'self.' in each.name:
                each.reg = r
    if SC.sym in FIRSTSTATEMENT: 
        x = statementBlock()
    elif SC.sym == INDENT:
        getSym(); x = statementBlock()
        if SC.sym == DEDENT: getSym()
        else: mark('dedent expected')
    else: mark('statement expected')
    CG.genProcExit(x, parsize, localsize)
    if SC.sym == DEDENT: getSym()
    else: mark('dedent expected')
    if Debug: printMark("body end")
    return x

Procedure `program` parses

    program ::= declarations "program" ident body
    
and returns the generated code if no error is reported. The standard identifiers are entered initially in the symbol table.

In [None]:
def program():
    newDecl('boolean', Type(CG.genBool(Bool)))
    newDecl('integer', Type(CG.genInt(Int)))
    newDecl('true', Const(Bool, 1))
    newDecl('false', Const(Bool, 0))
    newDecl('read', StdProc([], [Var(Int)]))
    newDecl('write', StdProc([Var(Int)], []))
    newDecl('writeln', StdProc([], []))
    CG.genProgStart()
    declarations(CG.genGlobalVars)
    if SC.sym == PROGRAM: getSym()
    else: mark("'program' expected")
    ident = SC.val
    if SC.sym == IDENT: getSym()
    else: mark('program name expected')
    openScope(); CG.genProgEntry(ident); x = body(ident, 0)
    closeScope(); x = CG.genProgExit(x)
    return x

Procedure `compileString(src, dstfn, target)` compiles the source as given by string `src`; if `dstfn` is provided, the code is written to a file by that name, otherwise printed on the screen. If `target` is omitted, MIPS code is generated.

In [None]:
def compileString(src, dstfn = None, target = 'riscv', debug = False):
    global CG, Debug
    Debug = debug
    if target == 'wat': import CGwat as CG
    elif target == 'mips': import CGmips as CG
    elif target == 'ast': import CGast as CG
    elif target == 'riscv': import CGriscv as CG
    else: print('unknown target'); return
    try:
        SC.init(src); ST.init(); p = program()
        if dstfn == None: print(p)
        else:
            with open(dstfn, 'w') as f: f.write(p)
    except Exception as msg:
        raise Exception(str(msg))

Procedure `compileFile(srcfn, target)` compiles the file named `scrfn`, which must have the extension `.p`, and generates assembly code in a file with extension `.s`. If `target` is omitted, MIPS code is generated.

In [None]:
def compileFile(srcfn, target = 'wat'):
    if srcfn.endswith('.p'):
        with open(srcfn, 'r') as f: src = f.read()
        dstfn = srcfn[:-2] + '.s'
        compileString(src, dstfn, target)
    else: print("'.p' file extension expected")

Sample usage (in code cell):

    cd /path/to/my/prog
    compileFile('myprog.p')