In [None]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css : string = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Evaluating an Exam Using Lezer

This notebook shows how we can use the module [`lezer`](https://lezer.codemirror.net/docs/guide/#writing-a-grammar) to implement a scanner (and parser).

Our goal is to implement a program that can be used to evaluate the results of an exam.

Assume the result of an exam is stored in the string `data` that is defined below:

In [None]:
const data: string = `Class: Algorithms and Complexity
          Group: TINF22AI1
          MaxPoints = 60
   
          Exercise:      1. 2. 3. 4. 5. 6.
          Jim Smith:     9 12 10  6  6  0
          John Slow:     4  4  2  0  -  -
          Susi Sorglos:  9 12 12  9  9  6
          1609922:       7  4 12  5  5  3
       `;

This data show that there has been a exam with the subject <em style="color:blue">Algorithms and Complexity</em>
in the group <em style="color:blue">TIT22AI1</em>.  Furthermore, the equation
```
   MaxPoints = 60
```
shows that in order to achieve the best mark, <em style="color:blue">60</em> points would have been necessary.

There have been 6 different exercises in this exam and, in this small example,  only four students took part, namely *Jim Smith*, *John Slow*, *Susi Sorglos*, and some student that is only represented by their matriculation number.  Each of the rows decribing the results of the students begins with the name (or matriculation number) of the student followed by the number of points that they have achieved in the different exercises. Our goal is to write a program that is able to compute the marks for all students.

## Importing the Lezer Library

We will use the package [Lezer](https://lezer.codemirror.net/).

Lezer uses a declarative grammar. We need:

- `buildParser` from `@lezer/generator` to compile the grammar.
- `Tree` and `TreeCursor` from `@lezer/common` to traverse the syntax tree.

In [None]:
import { buildParser } from '@lezer/generator';
import { Tree, TreeCursor } from '@lezer/common';
import { LRParser } from '@lezer/lr';

## Auxiliary Functions

The function `mark(maxPoints: number, points: number): number` takes two arguments and returns a numeric grade:

**Parameters:**
- `maxPoints: number` - The number of points needed to achieve the best mark of 1.0
- `points: number` - The number of points achieved by the student

**Return value:**
- `number` - The calculated grade (between 1.0 and 5.0)

It is assumed that the relation between the mark and the points is mostly linear. A student who achieves 50% of `maxPoints` will get the mark 4.0, while 100% results in mark 1.0.

The formula to calculate the grade is:
$$ \textrm{grade} = 7 - 6 \cdot \frac{\texttt{points}}{\texttt{maxPoints}} $$

However, the worst mark is 5.0. The `Math.min()` function ensures the grade does not exceed 5.0. The result is rounded to one decimal place using `Math.round()`.

In [None]:
function mark(maxPoints: number, points: number): number {
    if (maxPoints === 0) return 0; // Prevent division by zero
    const grade = 7 - (6 * points) / maxPoints;
    return Math.round(Math.min(5.0, grade) * 10) / 10;
}

## Token Extraction Logic

The `Token` interface formally defines a semantic unit as a tuple $t = (\text{type}, \text{value})$, mapping a syntactic category to a specific substring of the source code.

In [None]:
interface Token {
    type: string;
    value: string;
}

 The `extractTokens` function linearizes the hierarchical Abstract Syntax Tree (AST) by iterating through nodes with a cursor to construct a flat sequence of these tokens. During traversal, structural non-terminals (such as "ExamData" or "Header") are filtered out based on an exclusion set $\mathcal{F}$ to retain only atomic elements. To ensure robustness, the algorithm simultaneously normalizes parser artifacts, mapping the "⚠" symbol to a standard "Error" type via a function $\eta$. Finally, the accumulated tokens are returned as a list $L = [t_1, \dots, t_n]$ for further analysis.

In [None]:
function extractTokens(tree: Tree, source: string): Token[] {
    const cursor: TreeCursor = tree.cursor();
    const tokens: Token[] = [];

    do {
        if (
            [
                "ExamData",
                "line",
                "StudentRecord",
                "EmptyLine",
                "Header",
            ].includes(cursor.name)
        ) {
            continue;
        }

        const token: Token = {
            type: cursor.name === "⚠" ? "Error" : cursor.name,
            value: source.substring(cursor.from, cursor.to),
        };

        tokens.push(token);
    } while (cursor.next());

    return tokens;
}

## Visualizing the Grading Function

To better understand how our `mark()` function converts points to grades, let's visualize it:

In [None]:
import { plotGradeFunction } from "./utils/plotGrade";

plotGradeFunction(mark, 60);

The resulting plot shows how the grade decreases linearly from 5.0 (worst) at 0 points to 1.0 (best) at 60 points, with a grade of 4.0 achieved at exactly 50% of the maximum points (30 points).

## Defining the Grammar

We will define the grammar in segments, explaining the purpose of each rule before adding it.

### 1. Entry Point and Structure

First, we define the structure of our document. The `@top` rule declares that our file (`ExamData`) consists of a sequence of lines (`line*`).

A `line` can be one of several types:

* A `Header` (informational text)
* A `MaxDef` (configuration of max points)
* A `StudentRecord` (the actual grading data)
* An `EmptyLine`

We map these structural rules to the specific tokens we will define later.


In [None]:
const entryPoint: string = `
  @top ExamData { line* }

  line {
    Header |
    MaxDef |
    StudentRecord |
    EmptyLine
  }

  // Structure Mapping
  Header { header }
  MaxDef { maxdef }
  StudentRecord { (Name | Matriculation) Number* Linebreak }
  EmptyLine { Linebreak }
`;

### 2. Token Block Start
We begin the `@tokens` block, where we define the lexical patterns (Regular Expressions) for our data.

In [None]:
const tokenStart: string = `
  @tokens {
`;

### 3. Informational Headers

The `header` token matches lines like `Class: ...` or `Group: ...`.
The pattern `$[A-Za-z]+ ":" ![\n]* "\n"` matches:

1. One or more letters.
2. A colon.
3. Any content that is *not* a newline.
4. The newline character itself.

In [None]:
const headerTokens: string = `
    header { $[A-Za-z]+ ":" ![\\n]* "\\n" }
`;

### 4. Configuration (MaxPoints)
The `maxdef` token extracts the maximum points definition.
The pattern matches the literal "MaxPoints", optional whitespace, an equals sign, and a number (defined as a non-zero digit followed by any digits).

In [None]:
const configTokens: string = `
    maxdef { "MaxPoints" $[ \\t]* "=" $[ \\t]* $[1-9] $[0-9]* }
`;

### 5. Student Identifiers

We need to identify students either by name or matriculation number.

* `Name`: Matches sequences of letters separated by spaces, ending with a colon (e.g., "Jim Smith:").
* `Matriculation`: Matches exactly 7 digits followed by a colon (e.g., "1609922:").

In [None]:
const identityTokens: string = `
    Name { $[A-Za-z]+ (" " $[A-Za-z]+)+ ":" }
    Matriculation { $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] ":" }
`;


### 6. Scores and Values

For the points, we define:

* `Number`: Either "0" or a number starting with 1-9 (preventing leading zeros like "01").
* `Dash`: A single `-`, representing a skipped exercise.
* `Linebreak`: Specifically captures `\n` to signal the end of a student record.


In [None]:
const valueTokens: string = `
    Number { "0" | $[1-9] $[0-9]* }
    Dash { "-" }
`;

### 7. Whitespace and Skipping

Finally, we define whitespace (`space`) as spaces, tabs, or carriage returns.
We close the `@tokens` block and define a `@skip` block. This tells the parser to automatically ignore `space` and `Dash` tokens, so we only process meaningful data.

In [None]:
const skipAndClose: string = `
    Linebreak { "\\n" }
    space { $[ \\t\\r]+ }
  }

  @skip { space | Dash }
`;

### Building the Final Grammar

We concatenate all the parts to form the complete grammar string and build the parser.

In [None]:
const finalGrammar: string =
    entryPoint +
    tokenStart +
    headerTokens +
    configTokens +
    identityTokens +
    valueTokens +
    skipAndClose;


In [None]:
finalGrammar

In [None]:
const parser: LRParser = buildParser(finalGrammar);

## Processing the Exam Data

Now we implement the logic to process the token stream. We iterate over the tokens extracted from the tree and update our state machine accordingly.

* **`maxdef`**: Updates the maximum possible points.
* **`Name` / `Matriculation`**: Resets the point counter and sets the current student name.
* **`Number`**: Adds to the current student's point total.
* **`Linebreak`**: Triggers the calculation and output of the grade.

### Step 1: Numeric Value Extraction

The `extractMaxPoints` function accepts a raw string literal $S$ (e.g., `"max_points: 60"`) and isolates the quantitative value embedded within it. It utilizes the regular expression $R = [1-9][0-9]*$ to scan $S$ for the first sequence of digits representing a positive integer, ignoring structural text. Upon finding a match, the substring is parsed into a decimal integer $N \in \mathbb{Z}$; otherwise, a default value of $0$ is returned to ensure type safety. This process transforms semi-structured configuration tokens into computable numeric limits.

In [None]:
function extractMaxPoints(tokenImage: string): number {
    const match = tokenImage.match(/[1-9][0-9]*/);
    return match ? parseInt(match[0]) : 0;
}

### Step 2: Starting a New Student Record

The `startNewStudent` function acts as a pre-processor for identifier tokens, taking an input string $S$ that technically acts as a syntactic delimiter (e.g., `"Name:"`). Since the token includes a trailing colon used by the parser to recognize the record start, this character must be removed to retrieve the actual data value. The function performs a slicing operation to return the substring $S' = S[0, \dots, |S|-2]$, effectively truncating the last character. This yields the clean student identifier string required to initialize a new data record.

In [None]:
function startNewStudent(tokenImage: string): string {
    return tokenImage.slice(0, -1);
}

### Step 3: Outputting a Student's Grade

The `outputGrade` function executes the terminal operation for a student record by accepting the identifier $Id$, the accumulated total $P_{total}$, and the reference maximum $P_{max}$. It delegates the algorithmic evaluation to the auxiliary `mark` function, which computes the final classification $G = f(P_{total}, P_{max})$. The function then interpolates these values into a structured format string to provide human-readable feedback. Finally, this synthesized result is emitted to the standard output stream, effectively closing the processing cycle for the current entity.

In [None]:
function outputGrade(
    name: string,
    totalPoints: number,
    maxPoints: number,
): void {
    const grade = mark(maxPoints, totalPoints);
    console.log(
        `${name} has ${totalPoints} points and achieved the mark ${grade}.`,
    );
}

### Step 4: Processing State

The `ProcessingState` interface defines the mutable context required to maintain data continuity while traversing the linear token stream. It encapsulates the global constraint $P_{max}$ (`maxPoints`) alongside transient variables specific to the active record. The `currentName` acts as a temporary identifier for the student currently under analysis, while `sumPoints` serves as a running accumulator $\Sigma p_i$ to aggregate individual sub-scores. This data structure allows the parser to persist state across disjoint tokens, ensuring that distributed data points are correctly unified into a coherent semantic object.

In [None]:
interface ProcessingState {
    maxPoints: number;
    currentName: string;
    sumPoints: number;
}

### Step 5: The Main Processing Loop

The `processExamData` function acts as the central interpreter for the system, responsible for orchestrating the transition from raw text to semantic output. The pipeline begins with a **Syntactic Analysis Phase**, where the input string $S$ is parsed into an Abstract Syntax Tree (AST); this is immediately wrapped in error-handling logic to ensure the system terminates gracefully if $S \notin \mathcal{L}_{valid}$ (i.e., if the input is syntactically invalid). Following successful verification, the tree is linearized into a token stream $T = [t_1, t_2, \dots, t_n]$ to facilitate sequential processing.

To handle the dependencies between disjoint tokens, the function initializes a mutable state vector $\sigma$:

$$
\sigma = (P_{max}, \text{ID}_{curr}, \Sigma_{pts})
$$

Here, $P_{max}$ represents the global maximum score, $\text{ID}_{curr}$ tracks the active student identifier, and $\Sigma_{pts}$ serves as the running accumulator for the current record. The function then iterates through $T$, behaving as a **Finite State Machine** where specific token types trigger distinct transitions:

1.  **Global Configuration:** A `MaxDef` token updates the global constraint $P_{max}$.
2.  **Context Initialization:** `Name` or `Matriculation` tokens signal the start of a new entity, resetting the accumulator $\Sigma_{pts} \to 0$.
3.  **Aggregation:** `Number` tokens trigger an additive update to the state: $\Sigma_{pts} \leftarrow \Sigma_{pts} + \text{value}$.
4.  **Termination & Output:** A `Linebreak` token acts as a record delimiter. If a valid context exists ($\text{ID}_{curr} \neq \emptyset$), the system triggers the `outputGrade` function to emit the calculated result and subsequently clears the buffer.

Finally, to prevent data loss, a post-execution check runs after the loop to capture any "orphaned" record that may exist at the very end of the stream without a trailing newline delimiter.

In [None]:
function processExamData(input: string): void {
    let tree: Tree;
    try {
        tree = parser.parse(input);
    } catch (e) {
        console.error("Parsing failed", e);
        return;
    }
    const tokens: Token[] = extractTokens(tree, input);
    const state: ProcessingState = {
        maxPoints: 0,
        currentName: "",
        sumPoints: 0,
    };
    for (const token of tokens) {
        switch (token.type) {
            case "maxdef":
            case "MaxDef":
                state.maxPoints = extractMaxPoints(token.value);
                break;
            case "Name":
            case "Matriculation":
                if (state.currentName !== "") {
                    state.currentName = "";
                    state.sumPoints = 0;
                }
                state.currentName = startNewStudent(token.value);
                state.sumPoints = 0;
                break;
            case "Number":
                state.sumPoints += parseInt(token.value, 10);
                break;
            case "Linebreak":
                if (state.currentName !== "") {
                    outputGrade(
                        state.currentName,
                        state.sumPoints,
                        state.maxPoints,
                    );
                    state.currentName = "";
                }
                break;
        }
    }
    if (state.currentName !== "")
        outputGrade(state.currentName, state.sumPoints, state.maxPoints);
}

Now let's run our scanner on the exam data and see the results:

In [None]:
processExamData(data);

### How It Works: Example Trace

Let's trace through what happens for one student when the loop processes the tokens:


| Matched Token Type | Token Image | Action / Helper Function | State Update | Output |
| :-- | :-- | :-- | :-- | :-- |
| `Name` | `"Jim Smith:"` | `startNewStudent()` | `currentName = "Jim Smith"`, `sumPoints = 0` | |
| `Number` | `"9"` | `state.sumPoints += ...` | `sumPoints = 9` | |
| `Number` | `"12"` | `state.sumPoints += ...` | `sumPoints = 21` | |
| `Number` | `"10"` | `state.sumPoints += ...` | `sumPoints = 31` | |
| `Number` | `"6"` | `state.sumPoints += ...` | `sumPoints = 37` | |
| `Number` | `"6"` | `state.sumPoints += ...` | `sumPoints = 43` | |
| `Number` | `"0"` | `state.sumPoints += ...` | `sumPoints = 43` | |
| `Linebreak` | `"\n"` | `outputGrade()`, `state.currentName = ''` | `currentName = ""` | `"Jim Smith has 43 points..."` |