In [1]:
import { display } from "tslab";
import { readFileSync } from "fs";

const css = readFileSync("../style.css", "utf8");
display.html(`<style>${css}</style>`);

# Evaluating an Exam Using Chevrotain

This notebook shows how we can use the module [`chevrotain`](https://chevrotain.io/docs/) to implement a scanner.

Our goal is to implement a program that can be used to evaluate the results of an exam.

Assume the result of an exam is stored in the string `data` that is defined below:

In [2]:
const data = `Class: Algorithms and Complexity
          Group: TINF22AI1
          MaxPoints = 60
   
          Exercise:      1. 2. 3. 4. 5. 6.
          Jim Smith:     9 12 10  6  6  0
          John Slow:     4  4  2  0  -  -
          Susi Sorglos:  9 12 12  9  9  6
          1609922:       7  4 12  5  5  3
       `;

This data show that there has been a exam with the subject <em style="color:blue">Algorithms and Complexity</em>
in the group <em style="color:blue">TIT22AI1</em>.  Furthermore, the equation
```
   MaxPoints = 60
```
shows that in order to achieve the best mark, <em style="color:blue">60</em> points would have been necessary.

There have been 6 different exercises in this exam and, in this small example,  only four students took part, namely *Jim Smith*, *John Slow*, *Susi Sorglos*, and some student that is only represented by their matriculation number.  Each of the rows decribing the results of the students begins with the name (or matriculation number) of the student followed by the number of points that they have achieved in the different exercises. Our goal is to write a program that is able to compute the marks for all students.

We will use the package [Chevrotain](https://chevrotain.io/).

In particular, we will use the lexer generator that is provided by `createToken` and the `Lexer` class.

Furthermore, we will use TypeScript's built-in regular expressions to match and extract patterns.

In [3]:
import { createToken, Lexer } from "chevrotain";

## Auxiliary Functions

The function `mark(maxPoints, points)` takes two arguments:

- `points` - The number of points achieved by the student
- `maxPoints` - The number of points needed to achieve the best mark of 1.0

It is assumed that the relation between the mark and the points is mostly linear. A student who achieves 50% of `maxPoints` will get the mark 4.0, while 100% results in mark 1.0.

The formula to calculate the grade is:
$$ \textrm{grade} = 7 - 6 \cdot \frac{\texttt{points}}{\texttt{max_points}} $$
However, the worst mark is 5.0. The `Math.min()` function ensures the grade does not exceed 5.0. The result is rounded to one decimal place.


In [4]:
function mark(maxPoints: number, points: number): number {
    const grade = 7 - 6 * points / maxPoints;
    return Math.round(Math.min(5.0, grade) * 10) / 10;
}

Let's test this function by visualizing it.

Since interactive plotting libraries like Plotly can be unstable in TypeScript Jupyter notebooks, we'll create our visualization using **SVG (Scalable Vector Graphics)**. SVG is a web standard for creating graphics using code, and it works perfectly in Jupyter without any external dependencies.

The code below might look complex at first glance, but it follows a clear structure:

1. **Setup**: Define dimensions and create data points
2. **Scaling**: Convert our data values to pixel coordinates
3. **Grid**: Draw background grid lines for better readability
4. **Axes**: Draw the main X and Y axes
5. **Plot**: Draw the line connecting all points
6. **Markers**: Add blue dots to highlight data points
7. **Labels**: Add titles and axis descriptions

While this approach requires more code than a plotting library, it gives us full control and works reliably in any environment that supports HTML/SVG.

In [5]:
import { display } from "tslab";

function mark(maxPoints: number, points: number): number {
    const grade = 7 - 6 * points / maxPoints;
    return Math.round(Math.min(5.0, grade) * 10) / 10;
}

const maxPoints = 60;
const width = 800;
const height = 500;
const padding = 60;

// Generate points
const chartData = Array.from({ length: maxPoints + 1 }, (_, i) => ({
    x: i,
    y: mark(maxPoints, i)
}));

// Scale functions
const xScale = (x: number) => padding + (x / maxPoints) * (width - 2 * padding);
const yScale = (y: number) => height - padding - ((y - 1) / 4) * (height - 2 * padding);

// Build SVG
let svg = `<svg width="${width}" height="${height}" xmlns="http://www.w3.org/2000/svg" style="background: white;">`;

// Grid lines (horizontal)
for (let grade = 1.0; grade <= 5.0; grade += 0.5) {
    const y = yScale(grade);
    svg += `<line x1="${padding}" y1="${y}" x2="${width-padding}" y2="${y}" stroke="#e0e0e0" stroke-width="1"/>`;
    svg += `<text x="${padding - 10}" y="${y + 5}" text-anchor="end" font-size="12" fill="#666">${grade.toFixed(1)}</text>`;
}

// Grid lines (vertical) - every 10 points
for (let pts = 0; pts <= maxPoints; pts += 10) {
    const x = xScale(pts);
    svg += `<line x1="${x}" y1="${padding}" x2="${x}" y2="${height-padding}" stroke="#e0e0e0" stroke-width="1"/>`;
    svg += `<text x="${x}" y="${height-padding + 20}" text-anchor="middle" font-size="12" fill="#666">${pts}</text>`;
}

// Axes (thicker, on top of grid)
svg += `<line x1="${padding}" y1="${padding}" x2="${padding}" y2="${height-padding}" stroke="black" stroke-width="2"/>`;
svg += `<line x1="${padding}" y1="${height-padding}" x2="${width-padding}" y2="${height-padding}" stroke="black" stroke-width="2"/>`;

// Plot line
const pathData = chartData.map((d, i) => 
    `${i === 0 ? 'M' : 'L'} ${xScale(d.x)} ${yScale(d.y)}`
).join(' ');
svg += `<path d="${pathData}" fill="none" stroke="#1f77b4" stroke-width="2"/>`;

// Points (with markers every 5 points for clarity)
chartData.forEach((d, i) => {
    if (i % 5 === 0) {
        svg += `<circle cx="${xScale(d.x)}" cy="${yScale(d.y)}" r="4" fill="#1f77b4" stroke="white" stroke-width="1"/>`;
    }
});

// Axis labels
svg += `<text x="${width/2}" y="${height-10}" text-anchor="middle" font-size="14" font-weight="bold">Points</text>`;
svg += `<text x="15" y="${height/2}" text-anchor="middle" font-size="14" font-weight="bold" transform="rotate(-90, 15, ${height/2})">Grade</text>`;

// Title
svg += `<text x="${width/2}" y="30" text-anchor="middle" font-size="16" font-weight="bold">Grade as a Function of Points (Max Points = 60)</text>`;

svg += `</svg>`;

display.html(svg);


The resulting plot shows how the grade decreases linearly from 5.0 (worst) at 0 points to 1.0 (best) at 60 points, with a grade of 4.0 achieved at exactly 50% of the maximum points (30 points).

## Token Definitions

In this section, we will define the tokens needed to process our exam data.

Each token is created using Chevrotain's `createToken` function, which takes two main parameters:
- `name` - A string identifying the token type
- `pattern` - A regular expression that defines what strings this token matches

### The `HEADER` Token

The `HEADER` token is designed to match informational lines at the beginning of our exam data.

Looking at our example data:

```
Class: Algorithms and Complexity
Group: TINF22AI1
Exercise: 1. 2. 3. 4. 5. 6.
```

Each HEADER line follows this pattern:
1. It starts with one or more letters (for example, "Class", "Group", or "Exercise")
2. This is followed by a colon `:`
3. After the colon comes any descriptive text (such as the course name, group, or exercise numbers)
4. The line ends with a newline character

The regular expression `/[A-Za-z]+:.*\n/` captures this pattern:
- `[A-Za-z]+` matches one or more letters (upper or lowercase)
- `:` matches the literal colon character
- `.*` matches any characters after the colon (the descriptive text)
- `\n` matches the newline at the end

**Note:** By including the newline in the pattern, we ensure that the entire line is recognized as a single token.

In [6]:
const Header = createToken({ 
  name: "HEADER", 
  pattern: /[A-Za-z]+:.*\n/ 
});

### The `MAXDEF` Token

The `MAXDEF` token matches the line that defines the maximum number of points for the exam.

In our example data, this line looks like:

```
MaxPoints = 60
```

The regular expression `/MaxPoints\s*=\s*[1-9][0-9]*/` captures this pattern:
- `MaxPoints` matches the literal string
- `\s*` matches any amount of whitespace before and after the equals sign
- `=` matches the literal equals sign
- `[1-9][0-9]*` matches a number without leading zeros (e.g., "60", "100")

This token is important because it tells us how many points are needed for the best possible grade.

In [7]:
const MaxDef = createToken({ 
  name: "MAXDEF", 
  pattern: /MaxPoints\s*=\s*[1-9][0-9]*/ 
});

### The `NAME` Token

The `NAME` token matches the name of a student, which is always followed by a colon.

Student names can contain letters, spaces, and hyphens. For example:

```
Jim Smith:
Susi Sorglos:
```

The regular expression `/[A-Za-z]+(?: [A-Za-z]+)+:/` ensures:
- The name starts with one or more letters
- It contains at least one space (to distinguish names from headers)
- It ends with a colon `:`

This token helps us identify which student the following points belong to.

In [8]:
const Name = createToken({ 
  name: "NAME", 
  pattern: /[A-Za-z]+(?: [A-Za-z]+)+:/ 
});

### The `MATRICULATION` Token

The `MATRICULATION` token matches a student identification number.

Some students are identified by a 7-digit matriculation number followed by a colon, for example:

```
1609922:
```

The regular expression `/[0-9]{7}:/` ensures:
- Exactly seven digits (`[0-9]{7}`)
- Followed by a colon (`:`)

This token helps us process students who are listed by their ID instead of their name.

In [9]:
const Matriculation = createToken({ 
  name: "MATRICULATION", 
  pattern: /[0-9]{7}:/ 
});

### The `NUMBER` Token

The `NUMBER` token matches the points a student achieved in an exercise.

A number is either exactly `0` or starts with a digit from 1-9 followed by any number of digits. This prevents leading zeros, so "007" would be tokenized as three separate numbers: `0`, `0`, `7`.

The regular expression `/0|[1-9][0-9]*/` ensures:
- Either a single zero (`0`)
- Or a non-zero digit followed by more digits (`[1-9][0-9]*`)

These tokens are used to sum up the points for each student.

In [10]:
const Number = createToken({ 
  name: "NUMBER", 
  pattern: /0|[1-9][0-9]*/ 
});

### The `DASH` Token

The `DASH` token matches a hyphen/minus character `-`.

In the exam data, dashes indicate that a student did not attempt a specific exercise. For example:

```
John Slow: 4 4 2 0 - -
```


Here, John Slow didn't attempt exercises 5 and 6 (indicated by the dashes).

The regular expression `/-/` simply matches a single dash character.

Since dashes don't contribute to the point total, we add this token to the `SKIPPED` group. This means:
- The lexer recognizes dashes (so they don't cause errors)
- They are not included in the token stream
- They effectively represent 0 points

This is similar to how we handle whitespace - recognized but not processed.

In [11]:
const Dash = createToken({ 
  name: "DASH", 
  pattern: /-/, 
  group: Lexer.SKIPPED 
});

### The `IGNORE` Token

Lines that contain only whitespace (spaces or tabs) should be ignored.

In Chevrotain, we use a token in the `SKIPPED` group to recognize and discard these lines. The regular expression `/[ \t\r]+/` matches any sequence of spaces, tabs, or carriage returns.

This ensures that empty lines in the input do not affect the processing.

In [12]:
const Whitespace = createToken({ 
  name: "WS", 
  pattern: /[ \t\r]+/, 
  group: Lexer.SKIPPED 
});


### The `LINEBREAK` Token

The `LINEBREAK` token matches the newline character `\n`.

This token is important for detecting the end of a student's record. When we reach a LINEBREAK, we know it's time to calculate and output the student's grade.

The regular expression `/\n/` matches a single newline character.


In [13]:
const Linebreak = createToken({ 
  name: "LINEBREAK", 
  pattern: /\n/ 
});


## Creating the Lexer

Now that we have defined all our tokens, we need to collect them in an array and create the lexer.

**Important:** The order of tokens matters! More specific patterns must come before more general ones to avoid ambiguity:
- `MAXDEF` comes before `HEADER` (both contain letters and colons, but MAXDEF is more specific)
- `MATRICULATION` comes before `NUMBER` (matriculation numbers are specific 7-digit sequences)

In [14]:
const allTokens = [
  Whitespace,
  Dash,
  MaxDef,
  Header,
  Matriculation,
  Name,
  Number,
  Linebreak
];

const lexer = new Lexer(allTokens, { positionTracking: "full" });

## Processing the Exam Data

In Chevrotain, token recognition (lexing) and data processing are separate concerns.

We'll create a function that:
1. Tokenizes the input using our lexer
2. Iterates through the tokens
3. Maintains state (current student, their points)
4. Calculates and outputs grades when we reach the end of a student's line

This approach is cleaner and more maintainable than mixing lexing with business logic.

In [15]:
function processExamData(input: string): void {
  // Step 1: Tokenize the input
  const result = lexer.tokenize(input);
  
  // Check for lexing errors
  if (result.errors.length > 0) {
    result.errors.forEach(err => {
      console.log(`Illegal character at line ${err.line}.`);
    });
    return;
  }
  
  // Step 2: Initialize state variables
  let maxPoints = 0;      // Maximum points for best grade
  let currentName = '';   // Current student being processed
  let sumPoints = 0;      // Total points for current student
  
  // Step 3: Process each token
  for (const token of result.tokens) {
    switch (token.tokenType.name) {
      case 'MAXDEF':
        // Extract maximum points using regex
        const match = token.image.match(/[1-9][0-9]*/);
        if (match) {
          maxPoints = parseInt(match[0]);
        }
        currentName = '';
        break;
        
      case 'NAME':
        // Start processing a new student (remove trailing colon)
        currentName = token.image.slice(0, -1);
        sumPoints = 0;
        break;
        
      case 'MATRICULATION':
        // Start processing a student by ID (remove trailing colon)
        currentName = token.image.slice(0, -1);
        sumPoints = 0;
        break;
        
      case 'NUMBER':
        // Add points to current student's total
        sumPoints += parseInt(token.image);
        break;
        
      case 'LINEBREAK':
        // End of student record - calculate and output grade
        if (currentName !== '') {
          const grade = mark(maxPoints, sumPoints);
          console.log(`${currentName} has ${sumPoints} points and achieved the mark ${grade}.`);
          currentName = '';
        }
        break;
        
      case 'HEADER':
        // Headers are recognized but not processed
        break;
    }
  }
}

Now let's run our scanner on the exam data and see the results:

In [16]:
processExamData(data);

Jim Smith has 43 points and achieved the mark 2.7.
John Slow has 10 points and achieved the mark 5.
Susi Sorglos has 57 points and achieved the mark 1.3.
1609922 has 36 points and achieved the mark 3.4.


## How it works

Let's trace through what happens for one student:

1. We encounter a `NAME` token: "Jim Smith:"
   - Store "Jim Smith" as `currentName`
   - Reset `sumPoints` to 0

2. We encounter `NUMBER` tokens: "9", "12", "10", "6", "6", "0"
   - Each number is added to `sumPoints`
   - After all numbers: `sumPoints = 43`

3. We encounter a `LINEBREAK` token
   - Calculate grade: `mark(60, 43) = 2.7`
   - Output: "Jim Smith has 43 points and achieved the mark 2.7."
   - Reset `currentName` to empty string

This process repeats for each student in the data.