A template project for building an eBook, using Python, Pandoc and Markdown.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

ebook-template

Overview

This repository is a template for a project that'll build an eBook (in ePub, PDF, Microsoft Word and HTML form) from Markdown input files.

tl;dr: You write your book as a series of Markdown files, adhering to some file naming conventions, and you run the ./build command (see Building your book) to build your book.

There are sample files in this repository, so you can build a (completely pointless and utterly useless) eBook right away.

This tooling has been tested with Pandoc versions 2.0.4 and 2.0.5.

If you're impatient, jump to Getting Started.

What's where

  • Your book's Markdown sources, cover image, and some metadata go in the book subdirectory. This is where you'll be doing your editing.

  • The files subdirectory contains files used by the build. For instance, the HTML and ePub style sheets are there, as are LaTeX templates (used for PDF output) and a Microsoft Word style reference document. You shouldn't need to touch anything in files.

  • The scripts subdirectory currently just contains a Pandoc filter used to provide enhanced markup. You shouldn't need to touch anything in scripts.

  • The lib directory contains some additional Python code used by the build. Ignore it.

  • Your book output files (book.docx, book.epub, book.pdf and book.html) are generated in the topmost directory.

  • The build will also generate a subdirectory called tmp to hold some temporary files. Git is configured to ignore that directory.

Supported output formats

This tooling will generate your book in the following formats:

ePub

book.epub

ePub is the format used by Apple's iBooks and various free readers, including Calibre.

PDF

book.pdf is a single PDF document, generated by LaTeX or Weasy Print.

Issues:

  • LaTeX PDF generation uses the LaTeX "article" document class, rather than the seemingly more suitable "book" class, because the "book" class, combined with Pandoc's LaTeX generation, is just a little too funky.
  • Weasy Print-generated PDF has no table of contents.

HTML

book.html is a single-page HTML, styled in a pleasant format.

Microsoft Word

book.docx is a Microsoft Word version of your book.

Issues:

  • There's no table of contents.
  • The cover image is not included in the Word document.

Unsupported formats

Kindle (MOBI)

Pandoc can't generate books in Kindle format. However, there are several options for generating Kindle content:

  • Haul the Microsoft Word version into Kindle Create

  • Use the free and open source Calibre suite to convert the ePub format to Kindle format.

Getting started

Start by downloading and unpacking the latest release of this repository. (By downloading a release, instead of cloning the repository, you can more easily create your own Git repository from the results.)

Then, install the required software and update the configuration files.

Can I use Docker? Why, yes!

If you don't want to install the dependencies on your machine, you can create a Docker image to isolate them. Courtesy of @szaffarano, there's a ./build-docker script in the top-level directory.

Instead of running ./build to build your book, simply run ./build-docker, instead. The first time you run it, the script will build a Docker image with all the dependencies. (This process can take some time.)

Upgrading

If you're already using this tooling for one of your books, and you want to upgrade to a newer version, the process (currently) is straightforward:

  • Download and unpack the new version, as described above. Don't unpack it over your project!
  • Run the new version's upgrade.py file from your project's top level directory.
  • Run ./upgrade.py from within your project, passing it the path to the unpacked new release.

For example:

cd /tmp
tar xf /path/to/downloaded/ebook-template-X.Y.Z.tgz
cd /path/to/your/ebook
/tmp/ebook-template-X.Y.Z/upgrade.py /tmp/ebook-template-X.Y.Z

Note that this copies files, removing ones that aren't necessary any more. If there are metadata changes, however, upgrade.py won't apply them. Be sure to read the change log for the new release.

Required software

  1. Install pandoc.
  2. Install a Python distribution, version 3.6 or better.
  3. I recommend creating and activating a Python virtual environment, to keep the installed version of Python 3 more or less pristine.
  4. Once you have your Python 3 environment set up (and activated, if you're using a virtual environment), install the required Python packages with pip install -r requirements.txt
  5. You can generate PDF via either LaTeX or Weasy Print. There are advantages and disadvantages to each; see Generating PDF, below.

Installing TexLive

  • On Mac OS, use MacTex, and ensure that /Library/TeX/texbin is in your path.
  • On Ubuntu/Debian, install texlive, texlive-latex-recommended and texlive-latex-extras.
  • On Windows, this might work: https://www.tug.org/texlive/windows.html.

WARNING: I avoid Windows as much as possible. I do not (and, likely, never will) test this stuff on Windows. If you insist on using that platform, you're more or less on your own.

Initial configuration

Create your cover image

In your book directory, create a cover image, as a PNG. If you haven't settled on a cover image yet, you can use the dummy image that's already there. Currently, the cover image is not optional.

Fill in the metadata

Edit book/metadata.yaml, and fill in the relevant pieces. Both Pandoc and the build tooling use this metadata.

Note: This file contains Pandoc YAML Metadata, with some additional fields used by this build tooling.

The following elements are required.

  • title (Required): The book title.

  • subtitle (Optional): Subtitle, if any.

  • author (Required): A YAML list of authors. If there is only one author, use a single-element YAML list. For example:

author:
- Joe Horrid
author:
- Joe Horrid
- Frances Horrid
  • copyright (Required): A block with two required fields, owner and year. See the existing sample metadata.yaml for an example.

  • publisher (Required): The publisher of the book.

  • language (Required): The language in which the book is written. The value can be a 2-letter ISO 639-1 code, such as "en" or "fr". It can also be a 2-part string consisting of the ISO 639-1 language code and the 2-letter ISO 3166 country code, such as "en-US", "en-UK", "fr-CA", "fr-FR", etc.

  • genre (Required): The book's genre. See https://wiki.mobileread.com/wiki/Genre for a list of genres.

Edit the copyright information

Edit the book/copyright.md file. You can leave % tokens in there; they'll be substituted as described, below, in Additional markup. The meaning of the {<} is also explained in that section.

Markup notes

Your book will use Markdown, as interpreted by Pandoc. The following Pandoc extensions are enabled. See the Pandoc User's Guide for full details.

Additional markup

The build tool uses a Pandoc filter (in scripts/pandoc-filter.py) to enrich the Markdown slightly:

  1. Level 1 headings denote new chapters and force a new page.
  2. If you want to force a new page without starting a new chapter, just include an empty level-1 header (#). See book/copyright-template.md for an example.
  3. A paragraph containing just the line +++ is replaced by a centered line containing "• • •". This is a useful separator.
  4. A paragraph that starts with {<} followed by at least one space is left-justified. See book/copyright-template.md for an example.
  5. A paragraph that starts with {>} followed by at least one space is right-justified.
  6. A paragraph that starts with {|} followed by at least one space is centered.

Note, too, that Pandoc automatically converts your quotation marks into smart quotes, triple dots (...) into an ellipsis, and two dashes (--) into an em-dash.

(The filter is written in Python, using the Panflute package.)

Support for PlantUML

If you set use_plantuml to true in your metadata, you can use PlantUML diagrams in your book, using special fenced code blocks. For instance:

~~~plantuml
@startuml
client->server: SYN
server->client: SYN+ACK
client->server: ACK
@enduml
~~~

You can use either tildes or backticks, and you can also use Pandoc-style fenced code blocks to supply attributes. If you specify an "alt" attribute or a "title" attribute, it will be used as the title for the image (for readers that display the title). If you specify both, "title" is preferred. For example:

~~~ {.plantuml title="4-way handshake"}
@startuml
client->server: FIN
server->client: ACK
server->client: FIN
client->server: ACK
@enduml
~~~

Chapter 1 of the sample book contains these two examples.

Book source file names

The tooling expects your book's Markdown sources to be in the book subdirectory and to adhere to the following conventions:

  • All files must have the extension .md.

  • If you create a file called dedication.md, it'll be placed right after the copyright page in the generated output. See dedication.md for an example. If you don't want a dedication, simply delete the provided dedication.md.

  • If your book has a foreward, just create file foreward.md, and it'll be inserted right after the dedication. Otherwise, just delete the supplied sample foreward.md.

  • If your book has a preface, just create file preface.md, and it'll be inserted right after the foreward. Otherwise, just delete the supplied sample preface.md.

  • If the book has a prologue, put it in file prologue.md. It'll appear before the first chapter. If you don't want a prologue, simply delete the provided prologue.md.

  • Keep each chapter in a separate file. (This is easier for editing, source control, etc.) Name the files chapter-NN.md. For instance, chapter-01.md, chapter-02.md, etc. The chapter files are sorted lexically, so the leading zeros are necessary if you have more than 9 chapters. If you have more than 100 chapters (seriously?), just add another leading zero (e.g., chapter-001.md). If you must put the entire content in one file, the file's name must start with chapter- and end in .md.

  • If the book has an epilogue, put it in file epilogue.md. It'll follow the last chapter. If you don't want an epilogue, simply delete the provided epilogue.md.

  • If you create a file called acknowledgments.md, it'll be placed after the epilogue. If you don't want an acknowledgements chapter, simply delete the provided acknowledgments.md.

  • If you need one or more appendices, just create files that start with appendix- and end with .md. Note that the files are sorted lexically. There are sample appendix files in book; delete them if you don't want any appendices.

  • If you plan to provide a glossary, create glossary.md. If you don't need a glossary, delete the provided sample file.

  • If you want to include an author biography, just create author.md.

  • If you need a references (bibliography) section, create references.yaml, as described below. If you don't need a bibliography section, just delete the provided sample references.yaml.

NOTE: There's currently no support for generating an index.

Summary of chapter ordering

  • title page
  • dedication (if present)
  • foreward (if present)
  • preface (if present)
  • prologue (if present)
  • all chapters
  • epilogue (if present)
  • acknowledgments (if present)
  • appendices (if present)
  • glossary (if present)
  • author (if present)
  • references (if present)

Generating PDF

You can generate PDF via either LaTeX or Weasy Print.

PDF engine Advantages Disadvantages
LaTeX rich typesetting, table of contents LaTeX fonts aren't supported by all printers
Weasy Print good printer font support no table of contents

Images

Image references to files are relative to the top directory, not to the book directory. It's best to stick with PNG images.

Table of contents

  • PDF: If you're using LaTeX, Pandoc automatically generates the table of contents in the PDF. If you're using Weasy Print, there's no table of contents.

  • ePub: Pandoc generates the table of contents as part of the ePub package.

  • HTML: The build tool includes JavaScript that generates a table of contents in the browser.

  • Word: Pandoc doesn't generate a table of contents for Microsoft Word, because it's trivial to create your own. In newer versions of Microsoft Word (e.g., the version you get with Office 365):

    • Insert a page break to create a new, blank page.
    • Select "References" from the menu bar.
    • Select "Table of Contents", and select your desired style.

Bibliographic references

If you're writing a book that needs a bibliography and uses citations in the text, there's a bit of extra work.

First, install pandoc-citeproc.

  • On Mac OS, use brew install pandoc-citeproc.
  • On Ubuntu/Debian, it should have been installed when you installed pandoc.
  • On Windows, it should have been installed when you installed pandoc.

Next, you'll need to create the bibliography YAML file, book/references.yaml, suitably organized for pandoc to consume. The sample book/references.yaml contains a single entry. You can hand-code this file, or you can use pandoc-citeproc to generate it from an existing bibliographic file (e.g., a BibTeX file).

See the citations section in the Pandoc User's Guide and the pandoc-citeproc man page for more details.

NOTE: The presence of a book/references.yaml file triggers the build tooling to include a References chapter, to which pandoc will add any cited works. Your bibliography (book/references.yaml) can contain as many references as you want; only the ones you actually cite in your text will show up in the References section. If your text contains no citations, the References section will be empty. The build tooling does not check first to see whether you actually have any citations in your text.

An example of a citation is:

[See @WatsonCrick1953]

Again, see the citations section of the Pandoc User's Guide for full details.

Styling your book

The ePub styling uses files/epub.css, and the HTML is styled with files/html.css.

You can change the styling by providing your own version of those files in the book directory. That is:

  • If book/html.css exists, it will be used instead of files/html.css.
  • If book/epub.css exists, it will be used instead of files/epub.css.

Building your book

Once you've prepared everything, as described above, you can rebuild the book by running the command:

./build

./build is a Python script using the Python doit build tool. You should not need to edit it; editing metadata.yaml is sufficient to specify the information about your book.

Other useful build targets

  • ./build version: Show what version of this tooling you have.
  • ./build docx: Build just the Microsoft Word version of the book.
  • ./build pdf: Build just the PDF version of the book.
  • ./build epub: Build just the ePub version of the book.
  • ./build html: Build just the HTML version of the book.

You can combine targets:

./build docx pdf

Cleaning up generated files

To clean up the built targets:

./build clean

To clean everything out (except doit-db.json, which won't go away):

./build clobber

Auto-building

Because ./build is a doit script, it supports auto-building. If you run it as follows:

./build auto

it will build your book (if it's not up-to-date), then wait; any time one or more of the source Markdown files changes, it will automatically rebuild your book. To stop it, just hit Ctrl-C.

NOTE: Auto-building will not detect the addition of new files. For instance, if you're running in auto-build mode, and you add a new chapter-03.md file, the build script will not detect it. You'll have to kill the auto-build and restart it.

Gotchas

But that doesn't always work as expected. For instance, from traditional make(1) usage, you might expect build clean pdf to run the "clean" target, then run the "pdf" target. Instead, it just runs the "clean" operation for the PDF. (That's a doit quirk.)

Copyright and License

This software is copyright © 2017 Brian M. Clapper and is released under the GPL, version 3, similar to the license the underlying Pandoc software uses. See the LICENSE for further details.