Skip to content

arukenimon/msu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSU Rating Report Scraper

Fetches MSU (Mindanao State University) SASE rating report PDFs and extracts the data inside them into structured JSON. Comes with three ways to run:

  • a local Express server (for exploring/debugging),
  • a CLI script (extract.js) for one-shot runs,
  • a GitHub Actions workflow with a name input, so you can search by examinee name from the Actions tab without running anything locally.

What it does

The MSU SASE result portal serves each examinee's rating report as a PDF at a URL of the form:

https://saseresult-rating.msumain.edu.ph/reportOfRating.php?xnum=<examineeNo>&referkey=<key>

The scraper walks a range of xnum values, downloads each PDF, parses it with pdf-parse, and keeps the records whose name matches your query (case-insensitive substring match against last name, first name, MI, and full name).

Extracted fields

For each report, the parser pulls out:

  • Examinee No., MSU System ID, Date of Test
  • Last name, first name, middle initial, full name
  • School and school address
  • Section scores: AP (30), LU (80), MA (40), SC (30), GR (180)
  • Sex, religion, tribe
  • Preferred courses (1st / 2nd / 3rd)
  • Testing center, campus preferred
  • Rating remarks, online reference key

Requirements

  • Node.js 18+ (uses the built-in fetch)
  • npm

Install

npm install

Run on GitHub Actions (recommended)

The repo ships a workflow_dispatch workflow at .github/workflows/extract.yml.

  1. Push the repo to GitHub.
  2. Go to the Actions tab → Extract Examinee RecordsRun workflow.
  3. Fill in the inputs:
    • name — name to search for (case-insensitive substring, matches across last/first/MI/full name)
    • start_xnum — start of the xnum range (inclusive). Default 662406.
    • end_xnum — end of the xnum range (exclusive). Default 662506.
    • referkey — the referkey query parameter. Default 321e56229a1143e.
  4. When the run finishes, download the results-<name> artifact — it contains results.json with every match.

The workflow has a 350-minute timeout. The default 100-record range finishes quickly; a 300k range will not fit, so split it across multiple runs.

Run the CLI locally

NAME=revilla START_XNUM=662406 END_XNUM=662506 npm run extract

Configurable via env vars:

Variable Default Notes
NAME (required) Substring to search for, case-insensitive
START_XNUM 662406 Inclusive start of the xnum range
END_XNUM 662506 Exclusive end of the xnum range
REFERKEY 321e56229a1143e referkey query parameter
OUTPUT results.json Output file path

Matches are written to OUTPUT as a JSON array, with each record prefixed by its xnum.

On Windows PowerShell, set env vars inline like this:

$env:NAME='revilla'; $env:START_XNUM='662406'; $env:END_XNUM='662506'; npm run extract

Run the Express server

npm start      # node server.js
npm run dev    # nodemon server.js (auto-restart on changes)

The server listens on http://localhost:3000 and exposes:

  • GET / — Fetches a single hard-coded PDF and streams it back to the browser inline.
  • GET /extract?name=<name>&start=<xnum>&end=<xnum> — Walks the xnum range, filters by name, and returns the matches as JSON. All three query params are optional; defaults match the CLI.

Example:

http://localhost:3000/extract?name=revilla&start=662406&end=662506

Project layout

Notes

  • Requests are sequential with no delay; a wide range will take a while and may stress the upstream server.
  • The referkey value is treated as a static query parameter — the same key is reused for every request in the range.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors