Implemented course scraping #94
Implemented course scraping #94
Conversation
section_term_code = course['term'] | ||
|
||
section_min_credits = course['creditHourLow'] | ||
section_max_credits = course['creditHourHigh'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different between the course and section models. Course just has one field for credit hours but section has 2. Should this be consistent across both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should, seeing as on our website we'll probably want to display the credit hours for a course overall (maybe in the course search, definitely when viewing schedule overviews). As for how we would implement this, whenever adding a section, we could update the corresponding course's minimum credits.
Honestly I had no idea you were doing the entire thing, but great work! In that case, this should be merging into |
def parse_instructor(instructor): | ||
" parse_course() must be done first, and pass the data for the instructor here " | ||
#creates and saves section object | ||
s = Section(id=section_id, subject=section_subject, course_num=course_number, section_num=section_number, term_code=section_term_code, min_credits=section_min_credits, max_credits=section_max_credits, max_enrollment=section_max_enrollment, current_enrollment=section_current_enrollment, instructor=section_instructor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to run pylint on this file (message me on Slack if you need help setting it up), as there are a few lines that it wants fixed (for example, this one gives the message Line too long (216/90)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of this looks good, but I can't get it to run on my computer: even with the changes I proposed it gives me an error django.db.utils.DataError: value too long for type character varying(10)
. Have you tried running this code locally, and making sure it fills the tables correctly?
Also, since we have no immediate plans to use |
ab0ba35
to
0b48dca
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
i = Instructor(id=instructor_id, email_address=instructor_email, name=instructor_name) | ||
i.save() | ||
#calls parse_section | ||
parse_section(course,i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the faculty block is like its own part of the section input, I think parse_instructor
should return the Instructor model, and instead of calling parse_section in it, you can call it within parse_course
.
-ran pylint. fixed all problem messages except for some "missing docstring" -commented out print statement at the end as well as related lines. Didn't outright delete them in case they are helpful for future debugging. -removed meeting count and replaced with enumerate -generate_meeting_time now checks for empty string and returns None if that is the case. moved function to scrape_courses.py -removed name field for instructor because name was used as the primary id -the loop that finds meeting class days was replaced with a function called generate_meeting_days. the method uses list comprehension. -removed course_desc and course_core_curriculum fields from course model -primary id for course maxlength is now 15 instead of 14 -meeting type is now 4 characters long -generate_meeting_id now uses section id instead of crn
General: Debugged scraper so it runs for all departments. Detailed: scrape_courses.py -generate_meeting_time is now convert_meeting_time and checks for "" and None -generate_meeting_days is now parse_meeting_days -added the elapsed time back in section.py -course number changed to CharField in section.py. Now it matches with course.py and doesn't crash the program -section number changed to CharField -instructor can now be null -building max_length switched to 5 from 4
-null instructor is now handled better in sections -made Adel's requested change and depts now queries database instead of making banner request
Also removed some unneeded comments and elaborated in a few places
Shortened variables like "section_subject" => "subject" Also changed / elaborated on some comments Also added return / parameter types for model arguments (i.e. Instructor, Section)
This makes it so that rather than choosing the first instructor in the list of faculty, it saves the primary one.
Changed parse_meeting_days to return list comprehension immediately Changed instructor name dict retrieval to .get()
Removes parentheses from multi-line strings Changed dept_name default to correctly catch errors
Fixed assertions
Renamed some functions, better use of assertions Whitespace/comment fixes
Fixed query for csce section test
9ce20f4
to
66cd8fb
Compare
First, I rebased this onto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now
addressed changes
FINALLY MUAHAHAHAH |
…_instructor functions
There were some data fields I couldn't find in the JSON. I just left these as empty strings or False for now. Will add comments over the code to show where these are.