Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build
content
epubcheckout
pubresources
analyse.sh
build.sh
readme.md

readme.md

EPUB KB policy testing

Contents of this repository

A set of EPUB test files, specifically created for the purpose of testing an automated procedure to validate EPUBs against KB institutional policies. These policies require the following:

  • Files must be valid EPUB (version 2 or 3)
  • File may not contain DRM or encryption (edge case: font mangling, should be permitted)
  • All resources in the container fall within the Core Media Types
  • No Digital Talking Book (DTB) content documents

As a result, most of the files in this repo deliberately violate one or more of the above requirements.

Some of the files were newly created (with a little help from Sigil), whereas others were taken or adapted from other openly-licensed data sets.

Directories

  • content - uncompressed contents of each test file (each subdirectory represents one epub)
  • build - actual epub builds
  • epubcheckout - epubcheck output
  • pubresources - various resources (files) that were used for creating the epubs.

Build script

The script build.sh iterates over all subdirectories in the content folder and compresses the contents of each to a functional epub file in the build directory.

For an explanation of how the build process works, see here.

Analyse script

The script analyse.sh validates all epubs in the build directory with Epubcheck (it uses both the stable 3.0 version and the alpha 4.0.0 one). You have to install these yourself on your system. Then update the file paths to epubcheck3Jar and epubcheck4Jar at the top of the script.

Description of test files

File name Epub version Description Epubcheck output
epub20_minimal.epub 2 Basic file with one text resource and one image 3,4
epub20_minimal_encryption.epub 2 Includes encryption.xml resource in META-INF, indicating that main text resource is encrypted (text resource is not actually encrypted, BTW) 3,4
epub30_font_obfuscation.epub 3 Includes fonts that are obfuscated (which results in hasEncryption in epubcheck). Taken from EPUB 3 Sample Documents (wasteland with OTF fonts, obfuscated). 3,4
epub20_foreign_resource_no_fallback.epub 2 Includes JP2 image, which is a format that is not on the list of Core Media Types; no fallback defined 3,4
epub20_foreign_resource_with_fallback.epub 2 Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, identifier in content document 3,4
epub20_foreign_resource_with_fallback_noID.epub 2 Includes JP2 image, which is a format that is not on the list of Core Media Types; fallback defined in manifest, no identifier in content document 3,4
epub20_dtbook.epub 2 Includes Digital Talking Book content. Taken from threepress, published under BSD 3 license. 3,4

How to add a new test file

  1. Add uncompressed directory structure to content folder
  2. Run script to update the builds
  3. Add descriptive entry to table above

License

All files here are released under the Creative Commons 3.0 BY-SA license, unless stated otherwise.

You can’t perform that action at this time.