Implemented course scraping #94

ThatJuanGuy · 2019-11-19T19:04:12Z

…_instructor functions

There were some data fields I couldn't find in the JSON. I just left these as empty strings or False for now. Will add comments over the code to show where these are.

autoscheduler/scraper/management/commands/scrape_courses.py

ThatJuanGuy · 2019-11-19T19:38:27Z

autoscheduler/scraper/management/commands/scrape_courses.py

+    section_term_code = course['term']
+
+    section_min_credits = course['creditHourLow']
+    section_max_credits = course['creditHourHigh']


This is different between the course and section models. Course just has one field for credit hours but section has 2. Should this be consistent across both?

It probably should, seeing as on our website we'll probably want to display the credit hours for a course overall (maybe in the course search, definitely when viewing schedule overviews). As for how we would implement this, whenever adding a section, we could update the corresponding course's minimum credits.

autoscheduler/scraper/management/commands/scrape_courses.py

.vs/VSWorkspaceState.json

autoscheduler/scraper/management/commands/scrape_courses.py

gannonprudhomme · 2019-11-20T03:49:54Z

Honestly I had no idea you were doing the entire thing, but great work! In that case, this should be merging into backend/master rather than backend/scraper/course-scheduling, since this completes the entirety of #44

rachelconn · 2019-11-20T03:56:10Z

autoscheduler/scraper/management/commands/scrape_courses.py

-def parse_instructor(instructor):
-    " parse_course() must be done first, and pass the data for the instructor here "
+    #creates and saves section object
+    s = Section(id=section_id, subject=section_subject, course_num=course_number, section_num=section_number, term_code=section_term_code, min_credits=section_min_credits, max_credits=section_max_credits, max_enrollment=section_max_enrollment, current_enrollment=section_current_enrollment, instructor=section_instructor)


You'll want to run pylint on this file (message me on Slack if you need help setting it up), as there are a few lines that it wants fixed (for example, this one gives the message Line too long (216/90))

rachelconn

A lot of this looks good, but I can't get it to run on my computer: even with the changes I proposed it gives me an error django.db.utils.DataError: value too long for type character varying(10). Have you tried running this code locally, and making sure it fills the tables correctly?

autoscheduler/scraper/management/commands/scrape_courses.py

gannonprudhomme · 2019-11-22T17:09:50Z

Also, since we have no immediate plans to use prerequisites, corequisites, cd, or icd, I say we can remove them from the Course model. If we need them in the future we can always add them.

autoscheduler/scraper/management/commands/scrape_courses.py

gannonprudhomme · 2019-11-25T23:51:01Z

autoscheduler/scraper/management/commands/scrape_courses.py

+        i = Instructor(id=instructor_id, email_address=instructor_email, name=instructor_name)
+        i.save()
+        #calls parse_section
+        parse_section(course,i)


Since the faculty block is like its own part of the section input, I think parse_instructor should return the Instructor model, and instead of calling parse_section in it, you can call it within parse_course.

-ran pylint. fixed all problem messages except for some "missing docstring" -commented out print statement at the end as well as related lines. Didn't outright delete them in case they are helpful for future debugging. -removed meeting count and replaced with enumerate -generate_meeting_time now checks for empty string and returns None if that is the case. moved function to scrape_courses.py -removed name field for instructor because name was used as the primary id -the loop that finds meeting class days was replaced with a function called generate_meeting_days. the method uses list comprehension. -removed course_desc and course_core_curriculum fields from course model -primary id for course maxlength is now 15 instead of 14 -meeting type is now 4 characters long -generate_meeting_id now uses section id instead of crn

General: Debugged scraper so it runs for all departments. Detailed: scrape_courses.py -generate_meeting_time is now convert_meeting_time and checks for "" and None -generate_meeting_days is now parse_meeting_days -added the elapsed time back in section.py -course number changed to CharField in section.py. Now it matches with course.py and doesn't crash the program -section number changed to CharField -instructor can now be null -building max_length switched to 5 from 4

-null instructor is now handled better in sections -made Adel's requested change and depts now queries database instead of making banner request

Also removed some unneeded comments and elaborated in a few places

Shortened variables like "section_subject" => "subject" Also changed / elaborated on some comments Also added return / parameter types for model arguments (i.e. Instructor, Section)

This makes it so that rather than choosing the first instructor in the list of faculty, it saves the primary one.

Changed parse_meeting_days to return list comprehension immediately Changed instructor name dict retrieval to .get()

Removes parentheses from multi-line strings Changed dept_name default to correctly catch errors

Fixed assertions

Renamed some functions, better use of assertions Whitespace/comment fixes

Fixed query for csce section test

gannonprudhomme · 2020-02-05T00:45:45Z

First, I rebased this onto backend/master so we would have the reset migrations fix for Actions, then I cherry-picked all of the necessary commits so there weren't any merges / duplicate commits. Should be good to go now

rachelconn

Looks good now

addressed changes

gannonprudhomme · 2020-02-05T00:55:43Z

FINALLY MUAHAHAHAH

ThatJuanGuy added the backend Anything related to the backend API/Django label Nov 19, 2019

ThatJuanGuy requested review from gannonprudhomme and rachelconn November 19, 2019 19:04

ThatJuanGuy self-assigned this Nov 19, 2019

ThatJuanGuy commented Nov 19, 2019

View reviewed changes

autoscheduler/scraper/management/commands/scrape_courses.py Outdated Show resolved Hide resolved