Skip to content
Deriving HTML from PDF
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Deriving HTML from PDF

Implementation of an algorithm that converts well-tagged pdf files into HTML

Since 2016 we have been actively participating in PDF Association Technical Working Group with the aim to address needs of industry for changing the way PDF files are consumed on mobile devices. The main concern was whether or not the traditional fixed-layout pdf contains enough information to be safely and unambiguously interpreted as html - therefore responsive and reusable in different environments. The output of the work is the Derivation algorithm - document that describes how the process of conversion could be done. As a part of the work we came up with referential set of pdf documents and implementation. These should provide enough insights into the whole concept.

We presented about the idea at the PDF Days Europe 2017 in Berlin

This repository contains a commandline tool that converts well-tagged pdfs into html files and set of examples (manually crafted pdf files) that show how specific structure elements, attributes, associated files are used during the derivation and how the author can turn static pdfs into a dynamic html


requirements: Visual C++ Redistributable for Visual Studio 2015


Feel free to submit comments, questions, suggestions and discuss with us .. <how??>


You can’t perform that action at this time.