Script that converts JPEGs from oldgames.sk magazines section into multi-paged PDFs. No copyright infringement intended.
The script is written in Python 2.7 and is dependent on some built-in packages (urllib2) and these packages:
- Pillow (2.7.9)
- PyPDF2 (1.24)
- BeautifulSoup (2.7.0)
These can all be installed easily via pip like this:
pip install Pillow PyPDF2 beautifulsoup4
Everything works well in virtualenv
and it is actually a recommended way to run this script.
Everything should be working fine if you Xcode tools installed. If you are getting CodecError
you might need to install libjpeg library: either via MacPorts or Homebrew:
brew install libjpeg
Ubuntu can give you trouble because of missing installation, if that's the case then do sudo apt-get install python-dev
to have everything you need.
Make sure you have libjpeg library installed to prevent CodecError
. Uninstall Pillow/PIL if it is already on your system pip uninstall PIL
.
yum : yum install libjpeg-devel
apt-get: apt-get install libjpeg-dev
Make sure you install Pillow again after the library is installed.
Will work fine with provided modules (see above Requirements).
Run the script with one or more arguments. The first argument is the title magazine, the other ones are issue numbers (see below on numbering).
The following command will download issues 2, 5 and 6 of Score magazine:
python oldGamesScraper.py score 2 5 6
If you omit the issue numbers, script will proceed to download entire catalog of issues. For example to download all issues of Excalibur, do the following:
python oldGamesScraper.py excalibur
If you need to see all available magazines, launch the script with --list
argument.
python oldGamesScraper.py --list
One note on the numbering of the magazines. The numbers actually refer to indexes. For example issue Excalibur 20+ has index number 24. Excalibur Zero (0) has index number of 1.
You can count the order of the magazine you wish to download on the webpage (start from number one) to get the exact issue. For most of the magazines the index numbers should equal to actual issue numbers. The filename of PDFs should correspond to the right issue number (as stated on website). Some magazines are renamed due to illegal characters like /
and \
.
You can add additional magazines into the MAGAZINES
dictionary in the commented section.
- FIX TESTS !
- add argument for JPEG-only download
- add ePub export