MSU Rating Report Scraper

Fetches MSU (Mindanao State University) SASE rating report PDFs and extracts the data inside them into structured JSON. Comes with three ways to run:

a local Express server (for exploring/debugging),
a CLI script (extract.js) for one-shot runs,
a GitHub Actions workflow with a name input, so you can search by examinee name from the Actions tab without running anything locally.

What it does

The MSU SASE result portal serves each examinee's rating report as a PDF at a URL of the form:

https://saseresult-rating.msumain.edu.ph/reportOfRating.php?xnum=<examineeNo>&referkey=<key>

The scraper walks a range of xnum values, downloads each PDF, parses it with pdf-parse, and keeps the records whose name matches your query (case-insensitive substring match against last name, first name, MI, and full name).

Extracted fields

For each report, the parser pulls out:

Examinee No., MSU System ID, Date of Test
Last name, first name, middle initial, full name
School and school address
Section scores: AP (30), LU (80), MA (40), SC (30), GR (180)
Sex, religion, tribe
Preferred courses (1st / 2nd / 3rd)
Testing center, campus preferred
Rating remarks, online reference key

Requirements

Node.js 18+ (uses the built-in fetch)
npm

Install

npm install

Run on GitHub Actions (recommended)

The repo ships a workflow_dispatch workflow at .github/workflows/extract.yml.

Push the repo to GitHub.
Go to the Actions tab → Extract Examinee Records → Run workflow.
Fill in the inputs:
- name — name to search for (case-insensitive substring, matches across last/first/MI/full name)
- start_xnum — start of the xnum range (inclusive). Default 662406.
- end_xnum — end of the xnum range (exclusive). Default 662506.
- referkey — the referkey query parameter. Default 321e56229a1143e.
When the run finishes, download the results-<name> artifact — it contains results.json with every match.

The workflow has a 350-minute timeout. The default 100-record range finishes quickly; a 300k range will not fit, so split it across multiple runs.

Run the CLI locally

NAME=revilla START_XNUM=662406 END_XNUM=662506 npm run extract

Configurable via env vars:

Variable	Default	Notes
`NAME`	(required)	Substring to search for, case-insensitive
`START_XNUM`	`662406`	Inclusive start of the `xnum` range
`END_XNUM`	`662506`	Exclusive end of the `xnum` range
`REFERKEY`	`321e56229a1143e`	`referkey` query parameter
`OUTPUT`	`results.json`	Output file path

Matches are written to OUTPUT as a JSON array, with each record prefixed by its xnum.

On Windows PowerShell, set env vars inline like this:

$env:NAME='revilla'; $env:START_XNUM='662406'; $env:END_XNUM='662506'; npm run extract

Run the Express server

npm start      # node server.js
npm run dev    # nodemon server.js (auto-restart on changes)

The server listens on http://localhost:3000 and exposes:

GET / — Fetches a single hard-coded PDF and streams it back to the browser inline.
GET /extract?name=<name>&start=<xnum>&end=<xnum> — Walks the xnum range, filters by name, and returns the matches as JSON. All three query params are optional; defaults match the CLI.

Example:

http://localhost:3000/extract?name=revilla&start=662406&end=662506

Project layout

server.js — Express server with GET / and GET /extract.
extract.js — CLI runner used by the workflow and npm run extract.
parser.js — Shared PDF-to-JSON parsing logic (extractFromPDF, matchesName).
.github/workflows/extract.yml — GitHub Actions workflow with the name input.

Notes

Requests are sequential with no delay; a wide range will take a while and may stress the upstream server.
The referkey value is treated as a static query parameter — the same key is reused for every request in the range.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
extract.js		extract.js
package-lock.json		package-lock.json
package.json		package.json
parser.js		parser.js
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSU Rating Report Scraper

What it does

Extracted fields

Requirements

Install

Run on GitHub Actions (recommended)

Run the CLI locally

Run the Express server

Project layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MSU Rating Report Scraper

What it does

Extracted fields

Requirements

Install

Run on GitHub Actions (recommended)

Run the CLI locally

Run the Express server

Project layout

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages