Scrapes all Packages from the Op1.fun Site. At the time is wrote this script, there was no API so i used the google chrome headless mode to download the files.
The script scans each page on the packages page for the package links. In the next step the collected links will be downloaded.
The google chrome headless modes makes it very easy with the login system and the downloading.
- install python
- install google chrome
- install the google chrome ChromeDriver
- install selenium
pip install selenium
Create a account at the op1.fun site and change the settings in the script:
line 39change YOUR_EMAIL with your email
line 42change YOUR_PASSWORD with your password
The script and the ChromeDriver in the same directory
- run the chrome driver in a other Terminal
python ./src/op1fun_package_scraper.pyto start downloading
Op1.Fun hosts its files on AWS, the script waits after each download a bit to not trigger a the spam/ddos system from AWS. After 12 downloads there is an other delay too. That helps to prevent access errors. For a complete download of all packages (140 pages 03.06.2019) you will need two daysm with my delay settings. May you can decrese the wait time in a public network.
The following regex statements are used to find the links in the page:
\?page\=(\d)*\">Last"to find the last page index (line 55)
<a class=\"pack-name parent-link\" href=\"(.)*\">(.)*<\/ato get on each packs page, the link to the packs (line 93)
elem = driver.find_element_by_class_name('download').click()find the download button on a pack site