Blog ripper suite - part 2

Have you ever wanted to rip your favorite blog? Like, download all posts, convert them to pdfs, print them out or whatever.

No?

Well, you have a chance to do that now!

This script intelligently transforms all html files to beautiful pdfs. Additionaly, it can remove special chars from the final filenames! Yeah!

How it works

The process looks like this:

the script searches for all html files in current directory
it identifies the website core content and extract it (extractor.py)
inserts your own headers and footers (extractor.py)
converts to beautiful pdf files using Prince XML
finally, it copies all the pdfs to one directory

Requirements

Downloaded html files using ScrapBook Firefox Add-On, for example
Prince XML
Python ~2.7
Beautiful Soup library
Linux

Getting Started

To convert:

$ ripper-start.sh <HTMLs directory> <PDFs directory>

To clean filenames:

$ ripper-rename.sh

Authors

Karol Bonenberg

License

This project is licensed under the GNU GPL Version 3 - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
LICENSE.md		LICENSE.md
README.md		README.md
blog-ripper.png		blog-ripper.png
extractor.py		extractor.py
ripper-rename.sh		ripper-rename.sh
ripper-start.sh		ripper-start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE.md

LICENSE.md

README.md

README.md

blog-ripper.png

blog-ripper.png

extractor.py

extractor.py

ripper-rename.sh

ripper-rename.sh

ripper-start.sh

ripper-start.sh

Repository files navigation

Blog ripper suite - part 2

How it works

Requirements

Getting Started

Authors

License

About

Releases

Packages

Languages

License

bonenberg/Blog-Ripper-Part2

Folders and files

Latest commit

History

Repository files navigation

Blog ripper suite - part 2

How it works

Requirements

Getting Started

Authors

License

About

Resources

License

Stars

Watchers

Forks

Languages