Scrape Carleton program pages into a structured requirements manifest, so we can later build program templates and decide which non-COMP courses to include.
🧠 Context
This is the program-side counterpart to the course scraper (ticket 08). Its job is to extract, per program, the structured degree requirements: which courses are required, which "choose N credits from this set" groups exist, and how many elective credits of each category are needed. It produces a requirements manifest. Arranging requirements into terms for recommended plans is manual content work and is out of scope here.
This builds on the scraper infrastructure from ticket 09 (the scripts/ folder, cheerio, Node's built-in fetch, and the fixture-based test pattern). Reuse the existing setup rather than adding new dependencies (unless necessary).
The page to scrape: the Computer Science programs page — https://calendar.carleton.ca/undergrad/undergradprograms/computerscience/ — which lists the CS programs and streams (e.g. "Computer Science B.C.S. Honours", "Computer Science Software Engineering Stream B.C.S. Honours", "Computer Science B.C.S. Major", and several other streams). See the calendar page for the full list.
For the JSON keys, the simplest consistent approach is to lowercase and kebab-case each program name (e.g. "Computer Science Software Engineering Stream B.C.S. Honours" → computer-science-software-engineering-stream-bcs-honours). Any consistent scheme is fine — the exact key format doesn't matter as long as it's stable and unique per program.
Write it to generalize: other program pages on the calendar share this same structure. We're only scraping CS for now, but structure the code so it can be pointed at another program listing page with minimal change, rather than hard-coding CS-specific assumptions.
Output shape — write scripts/output/programs-requirements.json:
{
"bcs-general": {
"url": "https://calendar.carleton.ca/...",
"requiredCourses": ["COMP 1405", "COMP 1406", "..."],
"chooseGroups": [
{ "credits": 2.0, "courses": ["COMP 3803", "COMP 4001", "COMP 4801", "COMP 4804"] }
],
"electives": [
{ "category": "Breadth Elective", "credits": 5.0 },
{ "category": "Free Elective", "credits": 4.0 }
]
}
}
This shape is a draft — a starting point, not a strict contract. If slightly different fields or structure make more sense once you see the actual page markup, that's fine; just keep it consistent across programs.
🛠️ Implementation Plan
- Create
scripts/scrape-programs.ts, run via pnpm run scrape:programs (add the script to package.json).
- Use
cheerio for parsing and Node's built-in fetch or cheerio's equivalent for HTTP (course scraper should be a good example) — do not add dependencies like axios. If a dependency install is blocked by pnpm-workspace.yaml policy, flag it to Jacc. The existing setup from the course scraper should be reusable here.
- Inspect the Carleton program pages in your browser to understand how required courses, "choose from" groups, and elective credit lines are marked up. Save a real program page (e.g. BCS General) as a fixture under
scripts/fixtures/.
- Parse, per program:
requiredCourses (flat list of codes), chooseGroups ({ credits, courses[] }), and electives ({ category, credits }).
- Write the manifest to
scripts/output/programs-requirements.json.
- Write tests in
scripts/scrape-programs.test.ts that run against the saved fixture (no live network). Assert a known program is extracted with the right required courses and at least one choose-group (e.g. game dev stream has a choose group).
- Run
pnpm typecheck and pnpm lint.
✅ Acceptance Criteria
Scrape Carleton program pages into a structured requirements manifest, so we can later build program templates and decide which non-COMP courses to include.
🧠 Context
This is the program-side counterpart to the course scraper (ticket 08). Its job is to extract, per program, the structured degree requirements: which courses are required, which "choose N credits from this set" groups exist, and how many elective credits of each category are needed. It produces a requirements manifest. Arranging requirements into terms for recommended plans is manual content work and is out of scope here.
This builds on the scraper infrastructure from ticket 09 (the
scripts/folder,cheerio, Node's built-infetch, and the fixture-based test pattern). Reuse the existing setup rather than adding new dependencies (unless necessary).The page to scrape: the Computer Science programs page — https://calendar.carleton.ca/undergrad/undergradprograms/computerscience/ — which lists the CS programs and streams (e.g. "Computer Science B.C.S. Honours", "Computer Science Software Engineering Stream B.C.S. Honours", "Computer Science B.C.S. Major", and several other streams). See the calendar page for the full list.
For the JSON keys, the simplest consistent approach is to lowercase and kebab-case each program name (e.g. "Computer Science Software Engineering Stream B.C.S. Honours" →
computer-science-software-engineering-stream-bcs-honours). Any consistent scheme is fine — the exact key format doesn't matter as long as it's stable and unique per program.Write it to generalize: other program pages on the calendar share this same structure. We're only scraping CS for now, but structure the code so it can be pointed at another program listing page with minimal change, rather than hard-coding CS-specific assumptions.
Output shape — write
scripts/output/programs-requirements.json:{ "bcs-general": { "url": "https://calendar.carleton.ca/...", "requiredCourses": ["COMP 1405", "COMP 1406", "..."], "chooseGroups": [ { "credits": 2.0, "courses": ["COMP 3803", "COMP 4001", "COMP 4801", "COMP 4804"] } ], "electives": [ { "category": "Breadth Elective", "credits": 5.0 }, { "category": "Free Elective", "credits": 4.0 } ] } }This shape is a draft — a starting point, not a strict contract. If slightly different fields or structure make more sense once you see the actual page markup, that's fine; just keep it consistent across programs.
🛠️ Implementation Plan
scripts/scrape-programs.ts, run viapnpm run scrape:programs(add the script topackage.json).cheeriofor parsing and Node's built-infetchor cheerio's equivalent for HTTP (course scraper should be a good example) — do not add dependencies like axios. If a dependency install is blocked bypnpm-workspace.yamlpolicy, flag it to Jacc. The existing setup from the course scraper should be reusable here.scripts/fixtures/.requiredCourses(flat list of codes),chooseGroups({ credits, courses[] }), andelectives({ category, credits }).scripts/output/programs-requirements.json.scripts/scrape-programs.test.tsthat run against the saved fixture (no live network). Assert a known program is extracted with the right required courses and at least one choose-group (e.g. game dev stream has a choose group).pnpm typecheckandpnpm lint.✅ Acceptance Criteria
pnpm run scrape:programsruns and writesscripts/output/programs-requirements.jsonurl,requiredCourses,chooseGroups, andelectivesin the shape above, or updated shape has been documentedrequiredCoursesare course code strings;chooseGroupscarrycredits+ acourseslist;electivescarrycategory+creditspnpm typecheckpasses