Skip to content

Itxaka/scrapereplacementdocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrape replacementdocs.com

Simple spider for scrapy in order to download the pdfs from replacementdocs.com

replacementdocs.com is a great resource for downloading manuals for consoles, unfortunately the page has been broken for a while(april 2018) with no fix in the pipeline

As Im very worried that the site may go down at any given moment due to their issues, I rather have a local copy of all the manuals in case it goes down (plus Im a bit of a data hoarder)

TODO

  • Do not redownload the same pdf
  • Save the pdfs to a different folder based on their source system

Usage

  • pipenv install
  • pipenv run scrapy crawl rpd

Caveats

Take into account that replacementdocs provides a free service so dont go download the whole thing for no good reason as its over 9000 manuals and around 25Gb total. If you need the whole pack I can provide a torrent for it as to alleviate the bandwith that would be incurred by multiple users downloading all the manuals.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages