Skip to content

motoom/gutenberg-ebook-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a set of python scripts which downloads all 
Dutch ebooks from Project Gutenberg, renames them to
human-readabele filenames, formats them so they display well 
on my ebook reader, and tosses them into subdirectories for 
easier navigation.

Written by Michiel Overtoom, motoom@xs4all.nl

How to use:

- Run bulkdownload.py to download the raw texts from a mirror of Project Gutenberg's eBook archive.
- Run gutenberg.py to reformat and rename the raw texts.
- Run toss.py to distribute them over subdirectories.

After that, upload them to your eBook reader, and enjoy!

In March 2016 I reworked this program since it's no longer allowed to scrape
from Gutenberg's main web site. This newer version:

- downloads from a mirror instead of scraping from Gutenberg's main web site
- language can be specified
- better input encoding detection
- outputs UTF8 encoded text files

About

Download, convert and organize Gutenberg books for eBook Readers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages