Skip to content

feat(scraper): add courses scraping#85

Merged
chenxin-yan merged 14 commits intomainfrom
feat/course-scraping
Nov 16, 2025
Merged

feat(scraper): add courses scraping#85
chenxin-yan merged 14 commits intomainfrom
feat/course-scraping

Conversation

@xyspg
Copy link
Copy Markdown
Collaborator

@xyspg xyspg commented Nov 8, 2025

📌 What's Changed

Scraper for courses. Closes #7.

✅ Actions

  • [ ]

📝 Notes for Reviewer

I tried Firecrawl, for simple tasks like discoverCourses it seems to work well, but there is no need because a simple fetch can do the work. For scraping each courses, it might be very expensive to call it 10K+ times.
There is a batch request limit for cloudflare queue, so I add relevant pagination logic which may also be helpful for other scrapers.

I added Non-School Based Programs - UG to school names as beta class search suggests (albert class search is down)

CleanShot 2025-11-08 at 03 22 30@2x

should we add a notes in prerequisites to display like "with a Minimum Grade of C", "and not open to students who have already completed CSCI-UA 467. "
or we are doing more sophisticated logic like handling them automatically based on student's past grades and courses

@chenxin-yan
Copy link
Copy Markdown
Member

I think its fine for now to not include "with a Minimum Grade of C" as its basically the requirement for any class to count toward prereq. So when handling the logic any course with grade less than C will not count

@chenxin-yan chenxin-yan force-pushed the main branch 3 times, most recently from 7103a52 to 70e93ad Compare November 9, 2025 03:31
@chenxin-yan chenxin-yan changed the title feat(scraper): course scraping feat(scraper): add courses scraping Nov 10, 2025
@chenxin-yan chenxin-yan marked this pull request as ready for review November 10, 2025 23:05
Copy link
Copy Markdown
Member

@chenxin-yan chenxin-yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works really well. Good job!

Just a few things we could finalized:


Do you think we should make level field in courses to capture whether the course is a graduate or undergraduate course? is it more helpful? Thus, for getCourses, we can make level optional and used to filter for undergraduate and graduate courses.

I am planing on adding a level field to student table that will be inferred form the school they filled out during onboarding, so that we can expose only undergrad courses to undergrad students.


Do you think we should add a short break each course scraping, so that we are not accidentally overload NYU's server?


After these are addressed we can merge this.

@xyspg
Copy link
Copy Markdown
Collaborator Author

xyspg commented Nov 10, 2025

It works really well. Good job!

Just a few things we could finalized:

Do you think we should make level field in courses to capture whether the course is a graduate or undergraduate course? is it more helpful? Thus, for getCourses, we can make level optional and used to filter for undergraduate and graduate courses.

I am planing on adding a level field to student table that will be inferred form the school they filled out during onboarding, so that we can expose only undergrad courses to undergrad students.

After that is addressed we can merge this.

I think it is there is no need to display only freshman courses or sophomore courses. but it do matters to distinguish between undergraduate and graduate courses. If we stick to the official naming:

1XXX - Freshman Level
2XXX - Sophomore Level
3XXX - Junior Level
4XXX - Senior Level
5XXX to 9XXX - Graduate level

then we can make the api to accept a range of level, e.g. default to be 100 <= x < 500 for undergraduate. and for edge cases, like undergraduate students taking a graduate class (idk, just guessing), and vice versa, maybe add a toggle in the settings to display all courses.

#93 is blocked by this. the frontend is now calling getCourses with level parameter

@chenxin-yan
Copy link
Copy Markdown
Member

chenxin-yan commented Nov 10, 2025

It works really well. Good job!
Just a few things we could finalized:
Do you think we should make level field in courses to capture whether the course is a graduate or undergraduate course? is it more helpful? Thus, for getCourses, we can make level optional and used to filter for undergraduate and graduate courses.
I am planing on adding a level field to student table that will be inferred form the school they filled out during onboarding, so that we can expose only undergrad courses to undergrad students.
After that is addressed we can merge this.

I think it is there is no need to display only freshman courses or sophomore courses. but it do matters to distinguish between undergraduate and graduate courses. If we stick to the official naming:

1XXX - Freshman Level
2XXX - Sophomore Level
3XXX - Junior Level
4XXX - Senior Level
5XXX to 9XXX - Graduate level

then we can make the api to accept a range of level, e.g. default to be 100 <= x < 500 for undergraduate. and for edge cases, like undergraduate students taking a graduate class (idk, just guessing), and vice versa, maybe add a toggle in the settings to display all courses.

#93 is blocked by this. the frontend is now calling getCourses with level parameter

Yes, I think we could keep it simple by just having a union that takes in string literal "undergraduate" and "graduate" just like the rest of the tables. I haven't look into it but on the bullitin page does it mark whether the courses on the page is for undergraduate and graudate? can we just use that? so we don't have to worry about having our own logic for determining the level so that it is more consistent.

Regarding students table, I changed it in #53 so that for school field, it will contains Id<"schools">, and for getCurrentStudent, school record will be jointed to the student record so that we can get currnetly authenticated students level through student.school.level which would be either graduate or undergraduate

@xyspg
Copy link
Copy Markdown
Collaborator Author

xyspg commented Nov 10, 2025

Yes, I think we could keep it simple by just having a union that takes in string literal "undergraduate" and "graduate" just like the rest of the tables. I haven't look into it but on the bullitin page does it mark whether the courses on the page is for undergraduate and graudate? can we just use that? so we don't have to worry about having our own logic for determining the level so that it is more consistent.

Regarding students table, I changed it in #53 so that for school field, it will contains Id<"schools">, and for getCurrentStudent, school record will be jointed to the student record so that we can get currnetly authenticated students level through student.school.level which would be either graduate or undergraduate

It doesn't. but we can tell from course code. UA = Undergraduate Arts, GA = Graduate Arts, etc

@chenxin-yan
Copy link
Copy Markdown
Member

Arts,

Great. Then we should make the change.

@chenxin-yan
Copy link
Copy Markdown
Member

I just changed the schema so that level in courses would be indicating whether this is a "undergraduate" or "graduate" course

@chenxin-yan chenxin-yan force-pushed the main branch 3 times, most recently from 551a188 to 78a92a9 Compare November 12, 2025 09:40
@chenxin-yan chenxin-yan merged commit c7af17b into main Nov 16, 2025
2 checks passed
@chenxin-yan chenxin-yan deleted the feat/course-scraping branch November 16, 2025 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

impelment scraper for courses on bulletin

2 participants