Skip to content

bonenberg/Blog-Ripper-Part2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blog ripper suite - part 2

Have you ever wanted to rip your favorite blog? Like, download all posts, convert them to pdfs, print them out or whatever.

No?

Well, you have a chance to do that now!

This script intelligently transforms all html files to beautiful pdfs. Additionaly, it can remove special chars from the final filenames! Yeah!

How it works

The process looks like this:

  1. the script searches for all html files in current directory
  2. it identifies the website core content and extract it (extractor.py)
  3. inserts your own headers and footers (extractor.py)
  4. converts to beautiful pdf files using Prince XML
  5. finally, it copies all the pdfs to one directory

Requirements

Getting Started

To convert:

$ ripper-start.sh <HTMLs directory> <PDFs directory>

To clean filenames:

$ ripper-rename.sh 

Authors

  • Karol Bonenberg

License

This project is licensed under the GNU GPL Version 3 - see the LICENSE.md file for details

About

Creates clean PDFs from websites. A part of my blog ripper suite - an autonomous html to PDF converter.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published