Skip to content

baldurbjarnason/ink-engine

Repository files navigation

ink-engine

The core engine behind Ink's file processing. Currently supports docx and epub (2.0 and 3.0).

License

Apache 2.0

Install

We haven't yet published this package on npm but you can install it directly from the GitHub repository.

npm:

npm install RebusFoundation/ink-engine

Example

const engine = require("ink-engine");
const path = require("path");

const epubPath = "path/to/epubfile.epub";

// This callback is called for every file in the publication, including those generated by the engine like the publication JSON file itself
async function extract(vfile, resource, metadata) {
  // do something with the vfile, e.g. upload to Google Storage
  // Then return the full url for the uploaded resource.
  return "uploaded/" + resource.url;
}

async function process(file) {
  const result = await engine(file, extract);
  // `result` is the publication metadata object. It conforms to the W3C wpub standard _for the most part.
  console.dir(result);
}

process(epubPath);

API

engine(filepath, extractCallback[, options])

Returns a Promise for a publication object that conforms (for the most part) to the W3C Web Publication note.

extractCallback(vfile, resource, metadata)

options

  • options.sanitize (default: true): whether the extracted files should be sanitized before handed over to the extractCallback. This removes all JS files and sanitises CSS, HTML, SVG and XHTML files.
  • options.cssPrefix (default: #ink-engine): the selector prefix that should be used to sandbox the selectors in extracted CSS.

Markup is sanitized using dompurify. CSS is sanitized using an internal PostCSS module that filters out all unknown properties, prefixes all of the selectors with your chosen selector, and removes position: fixed. It also transforms body and html element selectors to ink-body and ink-html for additional rendering control.

Generated hast JSON files

The engine also generates hast JSON files for all HTML and XHTML files. hast is an abstract syntax tree format used by the unified collection of processing tools. These JSON files let clients skip the HTML parsing stage when rendering the publications as a part of a website. They also come embedded with a version of the publication object, a ToC in object form (if available) under the data.book and data.toc properties respectively. And the LinkedResource object for the current file is available under data.resource.

Prerender

The prerender directory contains code that uses the rehype-annotate module to process, prepare, and match annotations to the prerendered file.

TODO

  • Broader format support in general
  • warc support
  • Figure out if and how paged media formats can be supported (PDF, CBZ, CBR)
  • Figure out if and how time-based media formats can be supported (video, audio, podcasts, audiobooks)
  • Support the W3C's proposed Lightweight Packaging Format
  • Make the publication object more compliant with the W3C Web Publications Manifest note.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages