This is a tool for downloading free O'Reilly ebooks of different categories from the remains of what used to be the O'Reilly online shop (before it morphed into a book and video streaming service). While you can, of course, download these free ebooks directly online, it's not exactly easy to download a whole bunch of them more quickly.
freebora
does not use scrapy
on purpose (partly to avoid a
configuration fest), but works in two phases:
- crawl the online site to compile a list of URLs for PDF files to download, and
- download all files from the list created in 1.
Step 1 is done sequentially (for now), while for step 2 you can choose
between a sequential and a parallel version using requests
and
aiohttp
, respectively.
See the files in the docs/sessions
folder for some use-cases of varying
sizes.
You can install freebora
with a simple pip install freebora
from
the Python Package Index, or after cloning or downloading this code from
GitHub and running python3 setup.py install
in its root directory.
At the moment it is intended to work only on Python 3 only.
You can run the (pretty small) test suite like this:
# using py.test (needs a pip install pytest):
py.test -v tests
# using a minified version of py.test, included in runtests.py:
python3 setup.py test
Individual tests can be run like this:
py.test -s tests/test_oreilly_shop.py
- add an async version of the function to collect URLs (step 1 above)
- improve command-line interface
- add feature to download not only PDFs, but other formats, too
- add feature to interactively select individual ebooks to download