UiOScrape

Scrape the UiO servers for visible and hidden files. Effortlessly finds exams, exam solutions, lectures exercises etc. Great for exam studying or finding solutions to exercises

Installing

Requires Python3 as well as the module bs4:

pip install bs4
Clone this repo:

git clone https://www.github.com/BrorH/UiOScrape.git

Usage

scraper.py

Main script. Pass any UiO subject in order to start the scraping. Example: python3.8 scraper.py FYS-MEK1110. The subject code is case-insensitive.
You will be asked to enter your UiO username and password (why?). You will have to enter these credentials every time you run the scraper. The scraped files will be downloaded into a folder downloads within the UiOScrape directory which will be created the first time you run the scraper. In the example above, all files will be downloaded into the directory ./downloads/FYS-MEK1110.

Run scraper.py --help for help.

Features

Smart downloading which assures that duplicate files won't be downloaded (even though they may have different filenames).
If two or more files have different content but same name, an appropriate filename suffix is constructed, e.g file.pdf and file(1).pdf
Supports ALL subjects on UiO
You can add patterns to ignore within the filenames

To-do list

Add support for other files than just pdfs
Add more command line arguments, like max downloads, max size downloads, etc
Add autocomplete for all subjects via command line
Add smart filter which detects only specific file types (i.e exams or obligs)
bugsbugsbugs

To-done list

Add Windows / MacOS support
Revamp entire scraping system to be reliant on requests rather than mounting
Implement hashing/UUID system to assert no file is downloaded twice.
Group results in a prettier manner
Eat 🍕
~~Extract files though mounting of UiO webDav server~~
~~Safer unmounting system~~
~~Create installation script~~

FAQ

Why does the program ask me to log in with my UiO account?

As a UiO student you are granted access to the course pages of most semester pages of most subjects. Simply telling the UiO servers that you are indeed a student grants you the ability to view and download resources that do not appear publicly on the semester web pages. The script scrapes websites like this, which requires authentication. In order to adhere to UiO's IT Rules you will have to provide credentials every time.

The great thing about open source is that you can confirm for yourself that the scraper does nothing dubious with your credentials. Your password is handled by the getpass module and is only used once to create an HTTP authentication manager. If you have any concerns, please contact me.

Can I get in trouble for using this?

No. However, netiquette applies; don't behave inappropriately and don't be an idiot and everything will be fine. These files are available to all UiO students, this program just quickly and neatly collects them. The scraper is built with UiO's IT Rules in mind, meaning as long as you don't deliberately try to annoy the servers, you will be ok.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
src		src
.gitignore		.gitignore
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UiOScrape

Contents

Installing

Usage

scraper.py

Features

To-do list

To-done list

FAQ

Why does the program ask me to log in with my UiO account?

Can I get in trouble for using this?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UiOScrape

Contents

Installing

Usage

scraper.py

Features

To-do list

To-done list

FAQ

Why does the program ask me to log in with my UiO account?

Can I get in trouble for using this?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages