Deriving HTML from PDF
Implementation of an algorithm that converts well-tagged pdf files into HTML
Since 2016 we have been actively participating in PDF Association Technical Working Group with the aim to address needs of industry for changing the way PDF files are consumed on mobile devices. The main concern was whether or not the traditional fixed-layout pdf contains enough information to be safely and unambiguously interpreted as html - therefore responsive and reusable in different environments. The output of the work is the Derivation algorithm - document that describes how the process of conversion could be done. As a part of the work we came up with referential set of pdf documents and implementation. These should provide enough insights into the whole concept.
We presented about the idea at the PDF Days Europe 2017 in Berlin
This repository contains a commandline tool that converts well-tagged pdfs into html files and set of examples (manually crafted pdf files) that show how specific structure elements, attributes, associated files are used during the derivation and how the author can turn static pdfs into a dynamic html
requirements: Visual C++ Redistributable for Visual Studio 2015
Feel free to submit comments, questions, suggestions and discuss with us .. <how??>