Scrape the UiO servers for visible and hidden files. Effortlessly finds exams, exam solutions, lectures exercises etc. Great for exam studying or finding solutions to exercises
-
Requires Python3 as well as the module bs4:
pip install bs4 -
Clone this repo:
git clone https://www.github.com/BrorH/UiOScrape.git
Main script. Pass any UiO subject in order to start the scraping. Example:
python3.8 scraper.py FYS-MEK1110. The subject code is case-insensitive.
You will be asked to enter your UiO username and password (why?). You will have to enter these credentials every time you run the scraper.
The scraped files will be downloaded into a folder downloads within the UiOScrape directory which will be created the first time you run the scraper. In the example above, all files will be downloaded into the directory ./downloads/FYS-MEK1110.
Run scraper.py --help for help.
- Smart downloading which assures that duplicate files won't be downloaded (even though they may have different filenames).
- If two or more files have different content but same name, an appropriate filename suffix is constructed, e.g
file.pdfandfile(1).pdf - Supports ALL subjects on UiO
- You can add patterns to ignore within the filenames
- Add support for other files than just pdfs
- Add more command line arguments, like max downloads, max size downloads, etc
- Add autocomplete for all subjects via command line
- Add smart filter which detects only specific file types (i.e exams or obligs)
- bugsbugsbugs
- Add Windows / MacOS support
- Revamp entire scraping system to be reliant on requests rather than mounting
- Implement hashing/UUID system to assert no file is downloaded twice.
- Group results in a prettier manner
- Eat 🍕
-
Extract files though mounting of UiO webDav server -
Safer unmounting system -
Create installation script
As a UiO student you are granted access to the course pages of most semester pages of most subjects. Simply telling the UiO servers that you are indeed a student grants you the ability to view and download resources that do not appear publicly on the semester web pages. The script scrapes websites like this, which requires authentication. In order to adhere to UiO's IT Rules you will have to provide credentials every time.
The great thing about open source is that you can confirm for yourself that the scraper does nothing dubious with your credentials.
Your password is handled by the getpass module and is only used once to create an HTTP authentication manager. If you have any concerns, please contact me.
No. However, netiquette applies; don't behave inappropriately and don't be an idiot and everything will be fine. These files are available to all UiO students, this program just quickly and neatly collects them. The scraper is built with UiO's IT Rules in mind, meaning as long as you don't deliberately try to annoy the servers, you will be ok.