Added scrape_grades command #159
Added scrape_grades command #159
Commits on Mar 22, 2020
-
Added documents/ to .gitignore
This is needed for grade dist. pdfs to not be tracked by git
Configuration menu - View commit details
-
Copy full SHA for 17440e7 - Browse repository at this point
Copy the full SHA 17440e7View commit details -
Most of this is from Good Bull Schedules, but will most likely change as we go along Also added __init__.py for it
Configuration menu - View commit details
-
Copy full SHA for fced98b - Browse repository at this point
Copy the full SHA fced98bView commit details -
Configuration menu - View commit details
-
Copy full SHA for a4ff5df - Browse repository at this point
Copy the full SHA a4ff5dfView commit details -
Updated load_json with load_pdf function
Also added a _generate_path function for use in it + load_json_file
Configuration menu - View commit details
-
Copy full SHA for 029e8db - Browse repository at this point
Copy the full SHA 029e8dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for a1a1ed4 - Browse repository at this point
Copy the full SHA a1a1ed4View commit details -
Extracted out functions from pdf_parser
Moved into pdf_helper, and simplifies the parse_page function accordingly
Configuration menu - View commit details
-
Copy full SHA for 2c01670 - Browse repository at this point
Copy the full SHA 2c01670View commit details -
Configuration menu - View commit details
-
Copy full SHA for 41078cc - Browse repository at this point
Copy the full SHA 41078ccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6fe96b2 - Browse repository at this point
Copy the full SHA 6fe96b2View commit details -
Changed pdf_parser functions to use GradeData
This basically just cleans up the return types so they're easier to understand
Configuration menu - View commit details
-
Copy full SHA for 6d791cf - Browse repository at this point
Copy the full SHA 6d791cfView commit details -
Extracted out extract_letter_grades from parse_page
Basically just for readability purposes, functions the same Also removed unused function generate_year_semesters()
Configuration menu - View commit details
-
Copy full SHA for eb20c98 - Browse repository at this point
Copy the full SHA eb20c98View commit details -
Configuration menu - View commit details
-
Copy full SHA for db67218 - Browse repository at this point
Copy the full SHA db67218View commit details -
Added parse_page test to pdf_parser_tests
Also some misc linting fixes
Configuration menu - View commit details
-
Copy full SHA for 8d2a00c - Browse repository at this point
Copy the full SHA 8d2a00cView commit details -
Added PyPDF2 to requirements for pdf_parser
Also added it to the lint-requirements for GitHub Actions
Configuration menu - View commit details
-
Copy full SHA for 33bb200 - Browse repository at this point
Copy the full SHA 33bb200View commit details -
Changed generate_path to be public function in load_json.py
ALso removed redundant json.close() in load_json_file and instead returned it directly
Configuration menu - View commit details
-
Copy full SHA for dadf932 - Browse repository at this point
Copy the full SHA dadf932View commit details -
Configuration menu - View commit details
-
Copy full SHA for a5f2012 - Browse repository at this point
Copy the full SHA a5f2012View commit details -
Added returned of pdf_data in pdf_parse.parse_pdf
Also changed pdf_reader.getNumPages() to .numPages Also fixed linting error
Configuration menu - View commit details
-
Copy full SHA for 573bd95 - Browse repository at this point
Copy the full SHA 573bd95View commit details -
Changed get_pdf_skip_count to assign returned variables inline Removed extra grades iteration by adding up num_students in existing for-loop Changed list addition operator to .extend for readability
Configuration menu - View commit details
-
Copy full SHA for 79080ef - Browse repository at this point
Copy the full SHA 79080efView commit details -
Added Grades model + GradeManager
GradeManager is used for calculating an instructor's past grade distributions
Configuration menu - View commit details
-
Copy full SHA for c777a45 - Browse repository at this point
Copy the full SHA c777a45View commit details -
Configuration menu - View commit details
-
Copy full SHA for 107ad0d - Browse repository at this point
Copy the full SHA 107ad0dView commit details -
Minor fixes in models_tests & grades model
Changed instructor_performance return to specify that Dict value can be a float or int Rest of commit is minor comment fixes
Configuration menu - View commit details
-
Copy full SHA for 6a130a9 - Browse repository at this point
Copy the full SHA 6a130a9View commit details -
Added beautifulsoup and lxml to requirements.txt
Also added beautiful soup to lint-requirements
Configuration menu - View commit details
-
Copy full SHA for 692c49c - Browse repository at this point
Copy the full SHA 692c49cView commit details -
Configuration menu - View commit details
-
Copy full SHA for bbf4fb1 - Browse repository at this point
Copy the full SHA bbf4fb1View commit details -
These are incomplete, and more need to be added as commented
Configuration menu - View commit details
-
Copy full SHA for 0a49dc5 - Browse repository at this point
Copy the full SHA 0a49dc5View commit details -
Updated pdf_parser to work with old pdf style
Since only the header row of the PDF indicates that it's an old pdf style, we only knew that it was an old pdf style for the first row and not the actual grades themselves, which prevented us from actually correctly parsing the section's grades, since the old style has a different format. To remedy this, anytime old_pdf_style is True in pdf_helper.get_pdf_skip_count, we store it (in pdf_parser.parse_page) and use it for the rest of the page. Also adds the according tests for it
Configuration menu - View commit details
-
Copy full SHA for bfcf7b4 - Browse repository at this point
Copy the full SHA bfcf7b4View commit details -
Added suggestions to scrape_grades
Changed PDF_DOWNLOAD_DIR to use dirname instead of relative path Changed scrape_pdf's counts dictionary to use defaultdict Other misc semantic syntax changes
Configuration menu - View commit details
-
Copy full SHA for bc21c29 - Browse repository at this point
Copy the full SHA bc21c29View commit details -
Configuration menu - View commit details
-
Copy full SHA for e3fba69 - Browse repository at this point
Copy the full SHA e3fba69View commit details -
Updated documents/grade_dists error catching
Moved to _create_documents_folder since thats where the actual error will occur
Configuration menu - View commit details
-
Copy full SHA for dc9fbf5 - Browse repository at this point
Copy the full SHA dc9fbf5View commit details -
Added optional arguments for scrape_grades
Example usage: python manage.py scrape_grades -c EN --year 2015
Configuration menu - View commit details
-
Copy full SHA for 12ae93c - Browse repository at this point
Copy the full SHA 12ae93cView commit details -
Misc semantic changes in scrape_grades
Also adds SSL verification back to scrape_grades.fetch_page_data
Configuration menu - View commit details
-
Copy full SHA for 463c1e6 - Browse repository at this point
Copy the full SHA 463c1e6View commit details -
Minor syntax changes in scrape_grades per PR comments
- Removed unnecessary import to pass linting - Changed task collecting to use list comprehension - Changed colleges & years assignment to use ternary operators
Configuration menu - View commit details
-
Copy full SHA for d9947e9 - Browse repository at this point
Copy the full SHA d9947e9View commit details