Fetches MSU (Mindanao State University) SASE rating report PDFs and extracts the data inside them into structured JSON. Comes with three ways to run:
- a local Express server (for exploring/debugging),
- a CLI script (
extract.js) for one-shot runs, - a GitHub Actions workflow with a name input, so you can search by examinee name from the Actions tab without running anything locally.
The MSU SASE result portal serves each examinee's rating report as a PDF at a URL of the form:
https://saseresult-rating.msumain.edu.ph/reportOfRating.php?xnum=<examineeNo>&referkey=<key>
The scraper walks a range of xnum values, downloads each PDF, parses it with pdf-parse, and keeps the records whose name matches your query (case-insensitive substring match against last name, first name, MI, and full name).
For each report, the parser pulls out:
- Examinee No., MSU System ID, Date of Test
- Last name, first name, middle initial, full name
- School and school address
- Section scores: AP (30), LU (80), MA (40), SC (30), GR (180)
- Sex, religion, tribe
- Preferred courses (1st / 2nd / 3rd)
- Testing center, campus preferred
- Rating remarks, online reference key
- Node.js 18+ (uses the built-in
fetch) - npm
npm installThe repo ships a workflow_dispatch workflow at .github/workflows/extract.yml.
- Push the repo to GitHub.
- Go to the Actions tab → Extract Examinee Records → Run workflow.
- Fill in the inputs:
- name — name to search for (case-insensitive substring, matches across last/first/MI/full name)
- start_xnum — start of the
xnumrange (inclusive). Default662406. - end_xnum — end of the
xnumrange (exclusive). Default662506. - referkey — the
referkeyquery parameter. Default321e56229a1143e.
- When the run finishes, download the
results-<name>artifact — it containsresults.jsonwith every match.
The workflow has a 350-minute timeout. The default 100-record range finishes quickly; a 300k range will not fit, so split it across multiple runs.
NAME=revilla START_XNUM=662406 END_XNUM=662506 npm run extractConfigurable via env vars:
| Variable | Default | Notes |
|---|---|---|
NAME |
(required) | Substring to search for, case-insensitive |
START_XNUM |
662406 |
Inclusive start of the xnum range |
END_XNUM |
662506 |
Exclusive end of the xnum range |
REFERKEY |
321e56229a1143e |
referkey query parameter |
OUTPUT |
results.json |
Output file path |
Matches are written to OUTPUT as a JSON array, with each record prefixed by its xnum.
On Windows PowerShell, set env vars inline like this:
$env:NAME='revilla'; $env:START_XNUM='662406'; $env:END_XNUM='662506'; npm run extractnpm start # node server.js
npm run dev # nodemon server.js (auto-restart on changes)The server listens on http://localhost:3000 and exposes:
GET /— Fetches a single hard-coded PDF and streams it back to the browser inline.GET /extract?name=<name>&start=<xnum>&end=<xnum>— Walks thexnumrange, filters byname, and returns the matches as JSON. All three query params are optional; defaults match the CLI.
Example:
http://localhost:3000/extract?name=revilla&start=662406&end=662506
- server.js — Express server with
GET /andGET /extract. - extract.js — CLI runner used by the workflow and
npm run extract. - parser.js — Shared PDF-to-JSON parsing logic (
extractFromPDF,matchesName). - .github/workflows/extract.yml — GitHub Actions workflow with the name input.
- Requests are sequential with no delay; a wide range will take a while and may stress the upstream server.
- The
referkeyvalue is treated as a static query parameter — the same key is reused for every request in the range.