Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Implemented course scraping #94

Merged

Conversation

ThatJuanGuy
Copy link
Collaborator

…_instructor functions

There were some data fields I couldn't find in the JSON. I just left these as empty strings or False for now. Will add comments over the code to show where these are.

@ThatJuanGuy ThatJuanGuy added the backend Anything related to the backend API/Django label Nov 19, 2019
@ThatJuanGuy ThatJuanGuy self-assigned this Nov 19, 2019
section_term_code = course['term']

section_min_credits = course['creditHourLow']
section_max_credits = course['creditHourHigh']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different between the course and section models. Course just has one field for credit hours but section has 2. Should this be consistent across both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should, seeing as on our website we'll probably want to display the credit hours for a course overall (maybe in the course search, definitely when viewing schedule overviews). As for how we would implement this, whenever adding a section, we could update the corresponding course's minimum credits.

@gannonprudhomme
Copy link
Member

Honestly I had no idea you were doing the entire thing, but great work! In that case, this should be merging into backend/master rather than backend/scraper/course-scheduling, since this completes the entirety of #44

def parse_instructor(instructor):
" parse_course() must be done first, and pass the data for the instructor here "
#creates and saves section object
s = Section(id=section_id, subject=section_subject, course_num=course_number, section_num=section_number, term_code=section_term_code, min_credits=section_min_credits, max_credits=section_max_credits, max_enrollment=section_max_enrollment, current_enrollment=section_current_enrollment, instructor=section_instructor)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to run pylint on this file (message me on Slack if you need help setting it up), as there are a few lines that it wants fixed (for example, this one gives the message Line too long (216/90))

Copy link
Collaborator

@rachelconn rachelconn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this looks good, but I can't get it to run on my computer: even with the changes I proposed it gives me an error django.db.utils.DataError: value too long for type character varying(10). Have you tried running this code locally, and making sure it fills the tables correctly?

@gannonprudhomme gannonprudhomme changed the title Wrote most of the parse_course,parse_section,parse_meeting, and parse… Implemented course scraping Nov 20, 2019
@gannonprudhomme gannonprudhomme changed the base branch from backend/scraper/course-scraping to backend/master November 22, 2019 16:46
@gannonprudhomme
Copy link
Member

Also, since we have no immediate plans to use prerequisites, corequisites, cd, or icd, I say we can remove them from the Course model. If we need them in the future we can always add them.

@gannonprudhomme gannonprudhomme force-pushed the backend/scraper/course-scraping-parse-course branch from ab0ba35 to 0b48dca Compare November 25, 2019 06:48
@gannonprudhomme

This comment has been minimized.

@gannonprudhomme

This comment has been minimized.

i = Instructor(id=instructor_id, email_address=instructor_email, name=instructor_name)
i.save()
#calls parse_section
parse_section(course,i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the faculty block is like its own part of the section input, I think parse_instructor should return the Instructor model, and instead of calling parse_section in it, you can call it within parse_course.

ThatJuanGuy and others added 20 commits February 4, 2020 18:38
-ran pylint. fixed all problem messages except for some "missing docstring"
-commented out print statement at the end as well as related lines. Didn't outright delete them in case they are helpful for future debugging.
-removed meeting count and replaced with enumerate
-generate_meeting_time now checks for empty string and returns None if that is the case. moved function to scrape_courses.py
-removed name field for instructor because name was used as the primary id
-the loop that finds meeting class days was replaced with a function called generate_meeting_days. the method uses list comprehension.
-removed course_desc and course_core_curriculum fields from course model
-primary id for course maxlength is now 15 instead of 14
-meeting type is now 4 characters long
-generate_meeting_id now uses section id instead of crn
General:
Debugged scraper so it runs for all departments.

Detailed:
scrape_courses.py
-generate_meeting_time is now convert_meeting_time and checks for "" and None
-generate_meeting_days is now parse_meeting_days
-added the elapsed time back in

section.py
-course number changed to CharField in section.py. Now it matches with course.py and doesn't crash the program
-section number changed to CharField
-instructor can now be null
-building max_length switched to 5 from 4
-null instructor is now handled better in sections
-made Adel's requested change and depts now queries database instead of making banner request
Also removed some unneeded comments and elaborated in a few places
Shortened variables like "section_subject" => "subject"

Also changed / elaborated on some comments

Also added return / parameter types for model arguments (i.e. Instructor, Section)
This makes it so that rather than choosing the first instructor in the list of faculty, it saves the primary one.
Changed parse_meeting_days to return list comprehension immediately

Changed instructor name dict retrieval to .get()
Removes parentheses from multi-line strings

Changed dept_name default to correctly catch errors
Renamed some functions, better use of assertions

Whitespace/comment fixes
Fixed query for csce section test
@gannonprudhomme gannonprudhomme force-pushed the backend/scraper/course-scraping-parse-course branch from 9ce20f4 to 66cd8fb Compare February 5, 2020 00:44
@gannonprudhomme
Copy link
Member

First, I rebased this onto backend/master so we would have the reset migrations fix for Actions, then I cherry-picked all of the necessary commits so there weren't any merges / duplicate commits. Should be good to go now

Copy link
Collaborator

@rachelconn rachelconn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now :feelsgood:

@gannonprudhomme gannonprudhomme dismissed stale reviews from firejake308 and themself February 5, 2020 00:54

addressed changes

@gannonprudhomme gannonprudhomme merged commit 1e69238 into backend/master Feb 5, 2020
@gannonprudhomme
Copy link
Member

FINALLY MUAHAHAHAH

@gannonprudhomme gannonprudhomme deleted the backend/scraper/course-scraping-parse-course branch February 5, 2020 00:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backend Anything related to the backend API/Django
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants