In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

In [None]:
import {
  createToken,
  Lexer,
  CstParser,
  IToken,
  CstNode,
  TokenType,
} from "chevrotain";
import * as fs from "fs";
import * as readlineSync from "readline-sync";

# An Interpreter for a Simple Programming Language

In this notebook we develop an interpreter for a small programming language.

We allow both *single-line comments* and *multi-line comments*.
- The regular expression `/\*(.|\n)*?\*/` recognizes multi-line comments.
  Multi-line comments start with the string `/*` and end with the string `*/`.
  Note the use of the *non-greedy* quantor `*?`.  If we have code like
  ```
  /* blah */ a := 1; /* blub */
  ```
  the greedy quantor would recognize the whole line as one comment. 
- The regular expression `//.*` recognizes single-line comments.
  A single line comment starts with the string `//` and extends to the end of the line.

In [None]:
const Comment: TokenType = createToken({
  name: "COMMENT",
  pattern: /\/\*[^]*?\*\/|\/\/[^\n]*/,
  group: Lexer.SKIPPED,
});

The token `NUMBER` specifies a natural number.

In [None]:
const NumberTok: TokenType = createToken({
  name: "NUMBER",
  pattern: /0|[1-9][0-9]*/,
});

In [None]:
const Assign: TokenType = createToken({ name: "ASSIGN", pattern: /:=/ });
const Eq: TokenType = createToken({ name: "EQ", pattern: /==/ });
const Ne: TokenType = createToken({ name: "NE", pattern: /!=/ });
const Le: TokenType = createToken({ name: "LE", pattern: /<=/ });
const Ge: TokenType = createToken({ name: "GE", pattern: />=/ });

The keywords 'int', 'if', 'else', 'while', 'return' have to be dealt with separately as they are syntactical identical to identifiers. The dictionary Keywords shown below maps every keyword to its token type.

In [None]:
const IfTok: TokenType = createToken({ name: "IF", pattern: /if/ });
const WhileTok: TokenType = createToken({ name: "WHILE", pattern: /while/ });

When an identifier is read, we first have to check whether the identifier is one of our keywords.  If so, we assign the corresponding token type that is stored in the dictionary `Keywords`.  Otherwise, the token type is set to `ID`.

In [None]:
const IdTok: TokenType = createToken({
  name: "ID",
  pattern: /[a-zA-Z][a-zA-Z0-9_]*/,
});

Operators consisting of a single character do not need an associated token type.
They are declared via the keyword `literals`.

In [None]:
const Plus: TokenType = createToken({ name: "Plus", pattern: /\+/ });
const Minus: TokenType = createToken({ name: "Minus", pattern: /-/ });
const Mul: TokenType = createToken({ name: "Mul", pattern: /\*/ });
const Div: TokenType = createToken({ name: "Div", pattern: /\// });
const Mod: TokenType = createToken({ name: "Mod", pattern: /%/ });
const LParen: TokenType = createToken({ name: "LParen", pattern: /\(/ });
const RParen: TokenType = createToken({ name: "RParen", pattern: /\)/ });
const LBrace: TokenType = createToken({ name: "LBrace", pattern: /\{/ });
const RBrace: TokenType = createToken({ name: "RBrace", pattern: /\}/ });
const Semi: TokenType = createToken({ name: "Semi", pattern: /;/ });
const LT: TokenType = createToken({ name: "LT", pattern: /</ });
const GT: TokenType = createToken({ name: "GT", pattern: />/ });
const Comma: TokenType = createToken({ name: "Comma", pattern: /,/ });

White space, i.e. *space characters*, *tabulators*, and *carriage returns* are ignored. 

In [None]:
const WhiteSpace: TokenType = createToken({
  name: "WhiteSpace",
  pattern: /[ \t\r\n]+/,
  group: Lexer.SKIPPED,
});

Syntactically, newline characters are ignored. However, we still need to keep track of them in order to know the current line number, which is used for error messages.

Given a `token`, the function `find_colum` returns the column where `token` starts.  This is possible, because every token contains a reference to the current lexer as `token.lexer` and this lexer in turn stores the string that is given to it via the reference `lexer.lexdata`.  Furthermore, `token.lexpos` is the number of characters that precede `token`.

The function `t_error` is called for any token `t` that can not be scanned by the lexer.  In this case, `t.value[0]` is the first character that is not recognized by the scanner.  This character is discarded.  After that, scanning proceeeds as if nothing has happened.

In [None]:
const allTokens: TokenType[] = [
  WhiteSpace,
  Comment,
  Assign,
  Eq,
  Ne,
  Le,
  Ge,
  Plus,
  Minus,
  Mul,
  Div,
  Mod,
  LParen,
  RParen,
  LBrace,
  RBrace,
  Semi,
  LT,
  GT,
  Comma,
  IfTok,
  WhileTok,
  IdTok,
  NumberTok,
];

In [None]:
const SLexer: Lexer = new Lexer(allTokens, { positionTracking: "full" });

In [None]:
function findColumn(token: IToken): number {
  return token.startColumn ?? 0;
}

In [None]:
function testScanner(fileName: string): void {
  const program: string = fs.readFileSync(fileName, "utf-8");
  console.log(program);
  const result = SLexer.tokenize(program);

  if (result.errors.length > 0) {
    for (const err of result.errors) {
      console.error(
        `Illegal character at offset ${err.offset}: ${err.message}`
      );
    }
  }

  for (const t of result.tokens) {
    const line: number = t.startLine ?? 0;
    const col: number = t.startColumn ?? 0;
    const val: string | number =
      t.tokenType.name === "NUMBER" ? parseInt(t.image, 10) : `'${t.image}'`;
    console.log(`LexToken(${t.tokenType.name},${val},${line},${col})`);
  }
}

In [None]:
testScanner('sum.sl')

In [None]:
testScanner('factorial.sl')

## Parser specification
We translate the grammar above into Chevrotain rules.  
The start rule is `program` and we first build a CST.

Below is the grammar for our language:
```
program
    : /* epsilon */
    | stmnt program
    
stmnt 
    : IF '(' bool_expr ')' stmnt                 
    | WHILE '(' bool_expr ')' stmnt
    | '{' stmnt_list '}' 
    | ID ':=' expr ';'  
    | expr ';'       

bool_expr 
    : expr '==' expr     
    | expr '!=' expr     
    | expr '<=' expr     
    | expr '>=' expr     
    | expr '<'  expr      
    | expr '>'  expr     
 
expr: expr '+' product                 
    | expr '-' product
    | product
              
product
    : product '*' factor               
    | product '/' factor
    | product '%' factor 
    | factor

factor
    : '(' expr ')' 
    | NUMBER
    | ID
    | ID '(' expr_list ')'

expr_list
    :
    | ne_expr_list

ne_expr_list
    : expr
    | expr ',' ne_expr_list
```

The *start variable* of our grammar is `program`.

In [None]:
class SLParser extends CstParser {
  public program!: (idx?: number) => CstNode;
  public stmnt_list!: (idx?: number) => CstNode;
  public stmnt!: (idx?: number) => CstNode;
  public bool_expr!: (idx?: number) => CstNode;
  public expr!: (idx?: number) => CstNode;
  public product!: (idx?: number) => CstNode;
  public factor!: (idx?: number) => CstNode;
  public expr_list!: (idx?: number) => CstNode;

  constructor() {
    super(allTokens, { maxLookahead: 2 });
    const $ = this;

    // program : stmnt_list
    $.RULE("program", () => {
      $.SUBRULE($.stmnt_list);
    });

    // stmnt_list : (stmnt)*
    $.RULE("stmnt_list", () => {
      $.MANY(() => {
        $.SUBRULE($.stmnt);
      });
    });

    // stmnt :
    //   IF '(' bool_expr ')' stmnt
    // | WHILE '(' bool_expr ')' stmnt
    // | '{' stmnt_list '}'
    // | ID ASSIGN expr ';'
    // | expr ';'
    $.RULE("stmnt", () => {
      $.OR([
        {
          ALT: () => {
            $.CONSUME(IfTok);
            $.CONSUME1(LParen);
            $.SUBRULE($.bool_expr);
            $.CONSUME1(RParen);
            $.SUBRULE1($.stmnt);
          },
        },
        {
          ALT: () => {
            $.CONSUME(WhileTok);
            $.CONSUME2(LParen);
            $.SUBRULE2($.bool_expr);
            $.CONSUME2(RParen);
            $.SUBRULE2($.stmnt);
          },
        },
        {
          ALT: () => {
            $.CONSUME(LBrace);
            $.SUBRULE($.stmnt_list);
            $.CONSUME(RBrace);
          },
        },
        {
          ALT: () => {
            $.CONSUME(IdTok);
            $.CONSUME(Assign);
            $.SUBRULE3($.expr);
            $.CONSUME(Semi);
          },
        },
        {
          ALT: () => {
            $.SUBRULE4($.expr);
            $.CONSUME2(Semi);
          },
        },
      ]);
    });

    // bool_expr : expr (EQ|NE|LE|GE|LT|GT) expr
    $.RULE("bool_expr", () => {
      $.SUBRULE($.expr);
      $.OR([
        { ALT: () => $.CONSUME(Eq) },
        { ALT: () => $.CONSUME(Ne) },
        { ALT: () => $.CONSUME(Le) },
        { ALT: () => $.CONSUME(Ge) },
        { ALT: () => $.CONSUME(LT) },
        { ALT: () => $.CONSUME(GT) },
      ]);
      $.SUBRULE2($.expr);
    });

    // expr : product (('+' | '-') product)*
    $.RULE("expr", () => {
      $.SUBRULE($.product);
      $.MANY(() => {
        $.OR([
          {
            ALT: () => {
              $.CONSUME(Plus);
              $.SUBRULE2($.product);
            },
          },
          {
            ALT: () => {
              $.CONSUME(Minus);
              $.SUBRULE3($.product);
            },
          },
        ]);
      });
    });

    // product : factor (('*' | '/' | '%') factor)*
    $.RULE("product", () => {
      $.SUBRULE($.factor);
      $.MANY(() => {
        $.OR([
          {
            ALT: () => {
              $.CONSUME(Mul);
              $.SUBRULE2($.factor);
            },
          },
          {
            ALT: () => {
              $.CONSUME(Div);
              $.SUBRULE3($.factor);
            },
          },
          {
            ALT: () => {
              $.CONSUME(Mod);
              $.SUBRULE4($.factor);
            },
          },
        ]);
      });
    });

    // factor :
    //   '(' expr ')'
    // | ID '(' expr_list? ')'
    // | NUMBER
    // | ID
    $.RULE("factor", () => {
      $.OR([
        {
          ALT: () => {
            $.CONSUME1(LParen);
            $.SUBRULE($.expr);
            $.CONSUME1(RParen);
          },
        },
        {
          ALT: () => {
            $.CONSUME1(IdTok);
            $.CONSUME2(LParen);
            $.OPTION(() => {
              $.SUBRULE($.expr_list);
            });
            $.CONSUME2(RParen);
          },
        },
        { ALT: () => $.CONSUME(NumberTok) },
        { ALT: () => $.CONSUME2(IdTok) },
      ]);
    });

    // expr_list : (expr (',' expr)*)?
    $.RULE("expr_list", () => {
      $.OPTION(() => {
        $.SUBRULE($.expr);
        $.MANY(() => {
          $.CONSUME(Comma);
          $.SUBRULE2($.expr);
        });
      });
    });

    this.performSelfAnalysis();
  }
}

const parser: SLParser = new SLParser();

## CST → AST: nested tuples

We represent programs as *nested tuples* in the same style as the Python notebook, with `'.'` for statement lists and tags like `':='`, `'if'`, `'while'`, `'call'`.

In [None]:
const BaseCstVisitor = parser.getBaseCstVisitorConstructor();

type NestedTuple = number | string | [string, ...NestedTuple[]];

class ToASTVisitor extends BaseCstVisitor {
  constructor() {
    super();
    (this as unknown as { validateVisitor(): void }).validateVisitor();
  }

  // Hilfsfunktion, damit visit typisiert ist
  private v(node: CstNode): NestedTuple {
    return (this as unknown as { visit(node: CstNode): NestedTuple }).visit(
      node
    );
  }

  // program : stmnt_list
  public program(ctx: { stmnt_list: CstNode[] }): NestedTuple {
    return this.v(ctx.stmnt_list[0]);
  }

  // stmnt_list : (stmnt)*
  public stmnt_list(ctx: { stmnt?: CstNode[] }): NestedTuple {
    const stmts: NestedTuple[] = ctx.stmnt
      ? ctx.stmnt.map((s: CstNode): NestedTuple => this.v(s))
      : [];
    return [".", ...stmts];
  }

  // stmnt :
  //   IF '(' bool_expr ')' stmnt
  // | WHILE '(' bool_expr ')' stmnt
  // | '{' stmnt_list '}'
  // | ID ASSIGN expr ';'
  // | expr ';'
  public stmnt(ctx: {
    IF?: IToken[];
    WHILE?: IToken[];
    LBrace?: IToken[];
    stmnt_list?: CstNode[];
    bool_expr?: CstNode[];
    stmnt?: CstNode[];
    ID?: IToken[];
    IdTok?: IToken[];
    ASSIGN?: IToken[];
    Assign?: IToken[];
    expr?: CstNode[];
  }): NestedTuple {
    if (ctx.IF && ctx.bool_expr && ctx.stmnt) {
      return ["if", this.v(ctx.bool_expr[0]), this.v(ctx.stmnt[0])];
    }

    if (ctx.WHILE && ctx.bool_expr && ctx.stmnt) {
      return ["while", this.v(ctx.bool_expr[0]), this.v(ctx.stmnt[0])];
    }

    if (ctx.LBrace && ctx.stmnt_list) {
      return this.v(ctx.stmnt_list[0]);
    }

    if ((ctx.ASSIGN || ctx.Assign) && ctx.expr) {
      const idTok: IToken =
        (ctx.ID && ctx.ID[0]) || (ctx.IdTok && ctx.IdTok[0]);
      const varName: string = idTok.image;
      return [":=", varName, this.v(ctx.expr[0])];
    }

    if (ctx.expr) {
      return this.v(ctx.expr[0]);
    }

    throw new Error("Unexpected statement");
  }

  // bool_expr : expr (EQ|NE|LE|GE|LT|GT) expr
  public bool_expr(ctx: {
    expr: CstNode[];
    EQ?: IToken[];
    NE?: IToken[];
    LE?: IToken[];
    GE?: IToken[];
    LT?: IToken[];
    GT?: IToken[];
  }): NestedTuple {
    const left: NestedTuple = this.v(ctx.expr[0]);
    const right: NestedTuple = this.v(ctx.expr[1]);

    let op: string = "?";
    if (ctx.EQ) op = "==";
    else if (ctx.NE) op = "!=";
    else if (ctx.LE) op = "<=";
    else if (ctx.GE) op = ">=";
    else if (ctx.LT) op = "<";
    else if (ctx.GT) op = ">";

    return [op, left, right];
  }

  // expr : product (('+' | '-') product)*
  public expr(ctx: {
    product: CstNode[];
    Plus?: IToken[];
    Minus?: IToken[];
  }): NestedTuple {
    let node: NestedTuple = this.v(ctx.product[0]);
    let i = 1;

    for (const _ of ctx.Plus ?? []) {
      node = ["+", node, this.v(ctx.product[i])];
      i += 1;
    }
    for (const _ of ctx.Minus ?? []) {
      node = ["-", node, this.v(ctx.product[i])];
      i += 1;
    }

    return node;
  }

  // product : factor (('*' | '/' | '%') factor)*
  public product(ctx: {
    factor: CstNode[];
    Mul?: IToken[];
    Div?: IToken[];
    Mod?: IToken[];
  }): NestedTuple {
    let node: NestedTuple = this.v(ctx.factor[0]);
    let i = 1;

    for (const _ of ctx.Mul ?? []) {
      node = ["*", node, this.v(ctx.factor[i])];
      i += 1;
    }
    for (const _ of ctx.Div ?? []) {
      node = ["/", node, this.v(ctx.factor[i])];
      i += 1;
    }
    for (const _ of ctx.Mod ?? []) {
      node = ["%", node, this.v(ctx.factor[i])];
      i += 1;
    }

    return node;
  }

  // factor :
  //   '(' expr ')'
  // | ID '(' expr_list? ')'
  // | NUMBER
  // | ID
  public factor(ctx: {
    LParen?: IToken[];
    RParen?: IToken[];
    expr?: CstNode[];
    IdTok?: IToken[];
    ID?: IToken[];
    NUMBER?: IToken[];
    expr_list?: CstNode[];
  }): NestedTuple {
    // ( expr )
    if (ctx.LParen && ctx.expr && ctx.RParen && !ctx.IdTok && !ctx.ID) {
      return this.v(ctx.expr[0]);
    }

    // Funktionsaufruf: ID '(' expr_list? ')'
    if ((ctx.IdTok || ctx.ID) && ctx.LParen && ctx.RParen && ctx.expr_list) {
      const idTok: IToken = (ctx.IdTok?.[0] ?? ctx.ID?.[0]) as IToken;
      const fname: string = idTok.image;
      const listNode: NestedTuple = this.v(ctx.expr_list[0]);
      let args: NestedTuple[] = [];

      if (Array.isArray(listNode) && listNode[0] === ".") {
        args = listNode.slice(1) as NestedTuple[];
      } else if (typeof listNode !== "number" && typeof listNode !== "string") {
        args = [listNode];
      }

      return args.length === 0
        ? ["call", fname]
        : (["call", fname, ...args] as [string, ...NestedTuple[]]);
    }

    // Funktionsaufruf ohne Argumente: ID '(' ')'
    if ((ctx.IdTok || ctx.ID) && ctx.LParen && ctx.RParen && !ctx.expr_list) {
      const idTok: IToken = (ctx.IdTok?.[0] ?? ctx.ID?.[0]) as IToken;
      const fname: string = idTok.image;
      return ["call", fname];
    }

    // NUMBER
    if (ctx.NUMBER?.[0]) {
      return parseInt(ctx.NUMBER[0].image, 10);
    }

    // Variable ID
    if (ctx.IdTok?.[0] || ctx.ID?.[0]) {
      const tok: IToken = (ctx.IdTok?.[0] ?? ctx.ID?.[0]) as IToken;
      return tok.image;
    }

    throw new Error("Unknown factor");
  }

  // expr_list : (expr (',' expr)*)?
  public expr_list(ctx: { expr?: CstNode[] }): NestedTuple {
    if (!ctx.expr || ctx.expr.length === 0) {
      return ["."];
    }
    const first: NestedTuple = this.v(ctx.expr[0]);
    const rest: NestedTuple[] = ctx.expr
      .slice(1)
      .map((n: CstNode): NestedTuple => this.v(n));
    return [".", first, ...rest];
  }
}

const toAST: ToASTVisitor = new ToASTVisitor();

## Interpreter: execute and evaluate

We now define the interpreter that executes the nested‑tuple AST:  
- `execute` runs statements,  
- `evaluate` evaluates arithmetic expressions (including `read()`),  
- `evaluate_bool` evaluates boolean expressions.

In [None]:
type NumberValue = number;
type Env = Record<string, NumberValue>;

// Ausdruck auswerten (arithmetisch + call read)
function evaluate(expr: NestedTuple, values: Env): NumberValue {
  if (typeof expr === "number") return expr;
  if (typeof expr === "string") return values[expr];

  const tag: string = expr[0];

  switch (tag) {
    case "call": {
      const fname: string = expr[1] as string;
      if (fname === "read") {
        const input: string = readlineSync.question(
          "Please enter a natural number: "
        );
        return parseInt(input, 10);
      }
      throw new Error(`Unknown function call ${fname}`);
    }
    case "+": {
      return evaluate(expr[1], values) + evaluate(expr[2], values);
    }
    case "-": {
      return evaluate(expr[1], values) - evaluate(expr[2], values);
    }
    case "*": {
      return evaluate(expr[1], values) * evaluate(expr[2], values);
    }
    case "/": {
      return evaluate(expr[1], values) / evaluate(expr[2], values);
    }
    case "%": {
      return evaluate(expr[1], values) % evaluate(expr[2], values);
    }
    default:
      throw new Error(`${JSON.stringify(expr)} unexpected`);
  }
}

// Boolesche Ausdrücke auswerten
function evaluateBool(expr: NestedTuple, values: Env): boolean {
  const tag: string = (expr as [string, NestedTuple, NestedTuple])[0];
  const lhs: NumberValue = evaluate((expr as [string, NestedTuple, NestedTuple])[1], values);
  const rhs: NumberValue = evaluate((expr as [string, NestedTuple, NestedTuple])[2], values);

  switch (tag) {
    case "==":
      return lhs === rhs;
    case "!=":
      return lhs !== rhs;
    case "<=":
      return lhs <= rhs;
    case ">=":
      return lhs >= rhs;
    case "<":
      return lhs < rhs;
    case ">":
      return lhs > rhs;
    default:
      throw new Error(`${JSON.stringify(expr)} unexpected`);
  }
}

// Statement‑Liste ('.', s1, s2, ...)
function executeList(statementList: NestedTuple, values: Env): void {
  if (!Array.isArray(statementList) || statementList[0] !== ".") return;
  const items: NestedTuple[] = statementList.slice(1);
  for (const st of items) {
    execute(st, values);
  }
}

// Einzelnes Statement ausführen
function execute(stmnt: NestedTuple, values: Env): void {
  if (Array.isArray(stmnt) && stmnt[0] === ".") {
    executeList(stmnt, values);
  } else if (Array.isArray(stmnt) && stmnt[0] === ":=") {
    const variable: string = stmnt[1] as string;
    values[variable] = evaluate(stmnt[2], values);
  } else if (
    Array.isArray(stmnt) &&
    stmnt[0] === "call" &&
    stmnt[1] === "print"
  ) {
    console.log(evaluate(stmnt[2], values));
  } else if (Array.isArray(stmnt) && stmnt[0] === "if") {
    if (evaluateBool(stmnt[1], values)) {
      execute(stmnt[2], values);
    }
  } else if (Array.isArray(stmnt) && stmnt[0] === "while") {
    while (evaluateBool(stmnt[1], values)) {
      execute(stmnt[2], values);
    }
  } else {
    throw new Error(`${JSON.stringify(stmnt)} unexpected`);
  }
}

// ============================================================================
// 5. parseProgram + runFile (wie parse/main im Python‑Notebook)
// ============================================================================

function parseProgram(source: string): NestedTuple {
  const lexRes = SLexer.tokenize(source);
  if (lexRes.errors.length > 0) {
    throw new Error(lexRes.errors[0].message);
  }
  parser.input = lexRes.tokens;
  const cst: CstNode = parser.program();
  if (parser.errors.length > 0) {
    throw new Error(parser.errors[0].message);
  }
  return toAST.visit(cst) as NestedTuple;
}

function runFile(fileName: string): void {
  const program: string = fs.readFileSync(fileName, "utf-8");
  const ast: NestedTuple = parseProgram(program);
  console.log(JSON.stringify(ast));
  const env: Env = {};
  execute(ast, env);
}

## Parsing and running a program

Finally, we provide helpers `parseProgram` and `runFile`, analogous to the Python `parse` and `main` functions.[file:9][file:10]

In [None]:
runFile("sum.sl");

In [None]:
runFile("factorial.sl");