Skip to content
Scrapes all Packages from the Op1.fun Site
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
LICENSE
README.md

README.md

OP1.FUN SCRAPER

Scrapes all Packages from the Op1.fun Site. At the time is wrote this script, there was no API so i used the google chrome headless mode to download the files.

The script scans each page on the packages page for the package links. In the next step the collected links will be downloaded.

The google chrome headless modes makes it very easy with the login system and the downloading.

SETUP

SET USERNAME/PASSWORD

Create a account at the op1.fun site and change the settings in the script:

  • line 39 change YOUR_EMAIL with your email
  • line 42 change YOUR_PASSWORD with your password

RUN

The script and the ChromeDriver in the same directory

  • run the chrome driver in a other Terminal ./chromedriver
  • run python ./src/op1fun_package_scraper.py to start downloading

NOTES

Op1.Fun hosts its files on AWS, the script waits after each download a bit to not trigger a the spam/ddos system from AWS. After 12 downloads there is an other delay too. That helps to prevent access errors. For a complete download of all packages (140 pages 03.06.2019) you will need two daysm with my delay settings. May you can decrese the wait time in a public network.

MODIFY

The following regex statements are used to find the links in the page:

  • \?page\=(\d)*\">Last" to find the last page index (line 55)

  • <a class=\"pack-name parent-link\" href=\"(.)*\">(.)*<\/a to get on each packs page, the link to the packs (line 93)

  • elem = driver.find_element_by_class_name('download').click() find the download button on a pack site

TESTED MacOSX, Op1.fun 03.06.2019

You can’t perform that action at this time.