In [1]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css : string = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Evaluating an Exam Using Lezer

This notebook shows how we can use the module [`lezer`](https://lezer.codemirror.net/docs/guide/#writing-a-grammar) to implement a scanner (and parser).

Our goal is to implement a program that can be used to evaluate the results of an exam.

Assume the result of an exam is stored in the string `data` that is defined below:

In [2]:
const data : string = `Class: Algorithms and Complexity
          Group: TINF22AI1
          MaxPoints = 60
   
          Exercise:      1. 2. 3. 4. 5. 6.
          Jim Smith:     9 12 10  6  6  0
          John Slow:     4  4  2  0  -  -
          Susi Sorglos:  9 12 12  9  9  6
          1609922:       7  4 12  5  5  3
       `;

This data show that there has been a exam with the subject <em style="color:blue">Algorithms and Complexity</em>
in the group <em style="color:blue">TIT22AI1</em>.  Furthermore, the equation
```
   MaxPoints = 60
```
shows that in order to achieve the best mark, <em style="color:blue">60</em> points would have been necessary.

There have been 6 different exercises in this exam and, in this small example,  only four students took part, namely *Jim Smith*, *John Slow*, *Susi Sorglos*, and some student that is only represented by their matriculation number.  Each of the rows decribing the results of the students begins with the name (or matriculation number) of the student followed by the number of points that they have achieved in the different exercises. Our goal is to write a program that is able to compute the marks for all students.

## Importing the Lezer Library

We will use the package [Lezer](https://lezer.codemirror.net/).

Lezer uses a declarative grammar. We need:

- `buildParser` from `@lezer/generator` to compile the grammar.
- `Tree` and `TreeCursor` from `@lezer/common` to traverse the syntax tree.

In [3]:
import { buildParser } from '@lezer/generator';
import { Tree, TreeCursor } from '@lezer/common';
import { LRParser } from '@lezer/lr';

## Auxiliary Functions

The function `mark(maxPoints: number, points: number): number` takes two arguments and returns a numeric grade:

**Parameters:**
- `maxPoints: number` - The number of points needed to achieve the best mark of 1.0
- `points: number` - The number of points achieved by the student

**Return value:**
- `number` - The calculated grade (between 1.0 and 5.0)

It is assumed that the relation between the mark and the points is mostly linear. A student who achieves 50% of `maxPoints` will get the mark 4.0, while 100% results in mark 1.0.

The formula to calculate the grade is:
$$ \textrm{grade} = 7 - 6 \cdot \frac{\texttt{points}}{\texttt{maxPoints}} $$

However, the worst mark is 5.0. The `Math.min()` function ensures the grade does not exceed 5.0. The result is rounded to one decimal place using `Math.round()`.

In [4]:
function mark(maxPoints: number, points: number): number {
    if (maxPoints === 0) return 0; // Prevent division by zero
    const grade = 7 - 6 * points / maxPoints;
    return Math.round(Math.min(5.0, grade) * 10) / 10;
}

Since Lezer produces a hierarchical tree, we use a helper to flatten it into a linear stream of tokens for easier processing.

In [5]:
interface Token {
  type: string;
  value: string;
}

function extractTokens(tree: Tree, source: string): Token[] {
  const cursor: TreeCursor = tree.cursor();
  const tokens: Token[] = [];

  do {
    // We ignore structural nodes that don't contain data, 
    // but keep 'MaxDef' available so we don't miss the score definition.
    if (['ExamData', 'line', 'StudentRecord', 'EmptyLine', 'Header'].includes(cursor.name)) {
      continue;
    }

    const token: Token = {
      type: cursor.name === 'âš ' ? 'Error' : cursor.name,
      value: source.substring(cursor.from, cursor.to)
    };

    tokens.push(token);
  } while (cursor.next());

  return tokens;
}

## Visualizing the Grading Function

To better understand how our `mark()` function converts points to grades, let's visualize it:

In [6]:
import { plotGradeFunction } from "./utils/plotGrade";

plotGradeFunction(mark, 60);

The resulting plot shows how the grade decreases linearly from 5.0 (worst) at 0 points to 1.0 (best) at 60 points, with a grade of 4.0 achieved at exactly 50% of the maximum points (30 points).

## Defining the Grammar

We will define the grammar in segments, explaining the purpose of each rule before adding it.

### 1. Entry Point and Structure

First, we define the structure of our document. The `@top` rule declares that our file (`ExamData`) consists of a sequence of lines (`line*`).

A `line` can be one of several types:

* A `Header` (informational text)
* A `MaxDef` (configuration of max points)
* A `StudentRecord` (the actual grading data)
* An `EmptyLine`

We map these structural rules to the specific tokens we will define later.


In [7]:
const entryPoint: string = `
  @top ExamData { line* }

  line {
    Header |
    MaxDef |
    StudentRecord |
    EmptyLine
  }

  // Structure Mapping
  Header { header }
  MaxDef { maxdef }
  StudentRecord { (Name | Matriculation) Number* Linebreak }
  EmptyLine { Linebreak }
`;

### 2. Token Block Start
We begin the `@tokens` block, where we define the lexical patterns (Regular Expressions) for our data.

In [8]:
const tokenStart: string = `
  @tokens {
`;

### 3. Informational Headers

The `header` token matches lines like `Class: ...` or `Group: ...`.
The pattern `$[A-Za-z]+ ":" ![\n]* "\n"` matches:

1. One or more letters.
2. A colon.
3. Any content that is *not* a newline.
4. The newline character itself.

In [9]:
const headerTokens: string = `
    header { $[A-Za-z]+ ":" ![\\n]* "\\n" }
`;

### 4. Configuration (MaxPoints)
The `maxdef` token extracts the maximum points definition.
The pattern matches the literal "MaxPoints", optional whitespace, an equals sign, and a number (defined as a non-zero digit followed by any digits).

In [10]:
const configTokens: string = `
    maxdef { "MaxPoints" $[ \\t]* "=" $[ \\t]* $[1-9] $[0-9]* }
`;

### 5. Student Identifiers

We need to identify students either by name or matriculation number.

* `Name`: Matches sequences of letters separated by spaces, ending with a colon (e.g., "Jim Smith:").
* `Matriculation`: Matches exactly 7 digits followed by a colon (e.g., "1609922:").

In [11]:
const identityTokens: string = `
    Name { $[A-Za-z]+ (" " $[A-Za-z]+)+ ":" }
    Matriculation { $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] ":" }
`;


### 6. Scores and Values

For the points, we define:

* `Number`: Either "0" or a number starting with 1-9 (preventing leading zeros like "01").
* `Dash`: A single `-`, representing a skipped exercise.
* `Linebreak`: Specifically captures `\n` to signal the end of a student record.


In [12]:
const valueTokens: string = `
    Number { "0" | $[1-9] $[0-9]* }
    Dash { "-" }
`;

### 7. Whitespace and Skipping

Finally, we define whitespace (`space`) as spaces, tabs, or carriage returns.
We close the `@tokens` block and define a `@skip` block. This tells the parser to automatically ignore `space` and `Dash` tokens, so we only process meaningful data.

In [13]:
const skipAndClose: string = `
    Linebreak { "\\n" }
    space { $[ \\t\\r]+ }
  }

  @skip { space | Dash }
`;

### Building the Final Grammar

We concatenate all the parts to form the complete grammar string and build the parser.

In [14]:
const finalGrammar: string = 
  entryPoint + 
  tokenStart + 
  headerTokens + 
  configTokens + 
  identityTokens + 
  valueTokens + 
  skipAndClose;

In [15]:
finalGrammar


  @top ExamData { line* }

  line {
    Header |
    MaxDef |
    StudentRecord |
    EmptyLine
  }

  // Structure Mapping
  Header { header }
  MaxDef { maxdef }
  StudentRecord { (Name | Matriculation) Number* Linebreak }
  EmptyLine { Linebreak }

  @tokens {

    header { $[A-Za-z]+ ":" ![\n]* "\n" }

    maxdef { "MaxPoints" $[ \t]* "=" $[ \t]* $[1-9] $[0-9]* }

    Name { $[A-Za-z]+ (" " $[A-Za-z]+)+ ":" }
    Matriculation { $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] $[0-9] ":" }

    Number { "0" | $[1-9] $[0-9]* }
    Dash { "-" }

    Linebreak { "\n" }
    space { $[ \t\r]+ }
  }

  @skip { space | Dash }



In [16]:
const parser: LRParser = buildParser(finalGrammar);

## Processing the Exam Data

Now we implement the logic to process the token stream. We iterate over the tokens extracted from the tree and update our state machine accordingly.

* **`maxdef`**: Updates the maximum possible points.
* **`Name` / `Matriculation`**: Resets the point counter and sets the current student name.
* **`Number`**: Adds to the current student's point total.
* **`Linebreak`**: Triggers the calculation and output of the grade.

### Step 1: Extracting Maximum Points

When we encounter a `MAXDEF` token (e.g., `"max_points: 60"`), we need to extract the number:

In [17]:
function extractMaxPoints(tokenImage: string): number {
  const match = tokenImage.match(/[1-9][0-9]*/);
  return match ? parseInt(match[0]) : 0;
}

This function uses a regex to find the numeric value and returns it as an integer.

### Step 2: Starting a New Student Record

When we see a `NAME` or `MATRICULATION` token, we begin tracking a new student:

In [18]:
function startNewStudent(tokenImage: string): string {
  // Removes the trailing colon
  return tokenImage.slice(0, -1);
}

We simply remove the trailing colon (`:`) from the token to get the clean name or ID.

### Step 3: Outputting a Student's Grade

When we reach a `LINEBREAK`, we calculate and display the student's grade:

In [19]:
function outputGrade(name: string, totalPoints: number, maxPoints: number): void {
  const grade = mark(maxPoints, totalPoints);
  console.log(`${name} has ${totalPoints} points and achieved the mark ${grade}.`);
}

This function uses our previously defined `mark()` function to calculate the grade and formats the output message.

### Step 4: Processing State

To track our progress through the input, we maintain a state object:

In [20]:
interface ProcessingState {
  maxPoints: number;
  currentName: string;
  sumPoints: number;
}

The state keeps track of:
- **`maxPoints`**: The maximum achievable points (from `MAXDEF`)
- **`currentName`**: The student currently being processed
- **`sumPoints`**: Running total of points for the current student


### Step 5: The Main Processing Loop

Now we can assemble our processing function from these building blocks:

In [21]:
function processExamData(input: string): void {
  let tree: Tree;
  try {
      tree = parser.parse(input);
  } catch (e) {
      console.error("Parsing failed", e);
      return;
  }

  const tokens: Token[] = extractTokens(tree, input);

  const state: ProcessingState = {
    maxPoints: 0,
    currentName: '',
    sumPoints: 0
  };

  for (const token of tokens) {
    // Check for both the rule name (MaxDef) and the token name (maxdef)
    if (token.type === 'maxdef' || token.type === 'MaxDef') {
        state.maxPoints = extractMaxPoints(token.value);
    }
    else if (token.type === 'Name' || token.type === 'Matriculation') {
        if (state.currentName !== '') {
             state.currentName = ''; 
             state.sumPoints = 0;
        }
        state.currentName = startNewStudent(token.value);
        state.sumPoints = 0;
    }
    else if (token.type === 'Number') {
        state.sumPoints += parseInt(token.value, 10);
    }
    else if (token.type === 'Linebreak') {
        if (state.currentName !== '') {
          outputGrade(state.currentName, state.sumPoints, state.maxPoints);
          state.currentName = '';
        }
    }
  }

  // Handle last line if no trailing newline exists
  if (state.currentName !== '') {
    outputGrade(state.currentName, state.sumPoints, state.maxPoints);
  }
}

Now let's run our scanner on the exam data and see the results:

In [22]:
processExamData(data);

Jim Smith has 43 points and achieved the mark 2.7.
John Slow has 10 points and achieved the mark 5.
Susi Sorglos has 57 points and achieved the mark 1.3.
1609922 has 36 points and achieved the mark 3.4.


### How It Works: Example Trace

Let's trace through what happens for one student when the loop processes the tokens:


| Matched Token Type | Token Image | Action / Helper Function | State Update | Output |
| :-- | :-- | :-- | :-- | :-- |
| `Name` | `"Jim Smith:"` | `startNewStudent()` | `currentName = "Jim Smith"`, `sumPoints = 0` | |
| `Number` | `"9"` | `state.sumPoints += ...` | `sumPoints = 9` | |
| `Number` | `"12"` | `state.sumPoints += ...` | `sumPoints = 21` | |
| `Number` | `"10"` | `state.sumPoints += ...` | `sumPoints = 31` | |
| `Number` | `"6"` | `state.sumPoints += ...` | `sumPoints = 37` | |
| `Number` | `"6"` | `state.sumPoints += ...` | `sumPoints = 43` | |
| `Number` | `"0"` | `state.sumPoints += ...` | `sumPoints = 43` | |
| `Linebreak` | `"\n"` | `outputGrade()`, `state.currentName = ''` | `currentName = ""` | `"Jim Smith has 43 points..."` |