feat(scraper): add courses scraping#85
Conversation
|
I think its fine for now to not include "with a Minimum Grade of C" as its basically the requirement for any class to count toward prereq. So when handling the logic any course with grade less than C will not count |
7103a52 to
70e93ad
Compare
There was a problem hiding this comment.
It works really well. Good job!
Just a few things we could finalized:
Do you think we should make level field in courses to capture whether the course is a graduate or undergraduate course? is it more helpful? Thus, for getCourses, we can make level optional and used to filter for undergraduate and graduate courses.
I am planing on adding a level field to student table that will be inferred form the school they filled out during onboarding, so that we can expose only undergrad courses to undergrad students.
Do you think we should add a short break each course scraping, so that we are not accidentally overload NYU's server?
After these are addressed we can merge this.
I think it is there is no need to display only freshman courses or sophomore courses. but it do matters to distinguish between undergraduate and graduate courses. If we stick to the official naming:
then we can make the api to accept a range of #93 is blocked by this. the frontend is now calling |
Yes, I think we could keep it simple by just having a union that takes in string literal "undergraduate" and "graduate" just like the rest of the tables. I haven't look into it but on the bullitin page does it mark whether the courses on the page is for undergraduate and graudate? can we just use that? so we don't have to worry about having our own logic for determining the level so that it is more consistent. Regarding |
It doesn't. but we can tell from course code. UA = Undergraduate Arts, GA = Graduate Arts, etc |
Great. Then we should make the change. |
|
I just changed the schema so that |
551a188 to
78a92a9
Compare
📌 What's Changed
Scraper for courses. Closes #7.
✅ Actions
📝 Notes for Reviewer
I tried Firecrawl, for simple tasks like
discoverCoursesit seems to work well, but there is no need because a simplefetchcan do the work. For scraping each courses, it might be very expensive to call it 10K+ times.There is a batch request limit for cloudflare queue, so I add relevant pagination logic which may also be helpful for other scrapers.
I added
Non-School Based Programs - UGto school names as beta class search suggests (albert class search is down)should we add a notes in prerequisites to display like "with a Minimum Grade of C", "and not open to students who have already completed CSCI-UA 467. "
or we are doing more sophisticated logic like handling them automatically based on student's past grades and courses