Skip to content
Branch: master
Find file History
romantoda new files added ilustrating
- html namespace usage
- actual/alt text
- handling the special cases
Latest commit 2b8bbc7 Jun 17, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
Animals_in_research_attributes.pdf added basic pdf examples May 15, 2019
BMIBAI_form.pdf added basic pdf examples May 15, 2019
BMIBAI_form_css.pdf new samples Jun 4, 2019
In-fa-due-diligence-services.pdf new files added ilustrating Jun 17, 2019
Invoice_XLS.pdf added basic pdf examples May 15, 2019
Letter_DOC.pdf added basic pdf examples May 15, 2019
PAC_Report.pdf added basic pdf examples May 15, 2019
PAC_Report_style_classmap.pdf new files added ilustrating Jun 17, 2019
README.md new samples Jun 4, 2019
SpecialCases.pdf new files added ilustrating Jun 17, 2019
SpecialCases_NonStruct.pdf new files added ilustrating Jun 17, 2019
SpecialCases_caption.pdf new files added ilustrating Jun 17, 2019
SpecialCases_figure.pdf new files added ilustrating Jun 17, 2019
SpecialCases_heading.pdf new files added ilustrating Jun 17, 2019
SpecialCases_l.pdf new files added ilustrating Jun 17, 2019
SpecialCases_lbl.pdf new files added ilustrating Jun 17, 2019
SpecialCases_link.pdf new files added ilustrating Jun 17, 2019
SpecialCases_th.pdf new files added ilustrating Jun 17, 2019
af_css_js_sort_table.pdf sample pdf files demonstrating associated files May 21, 2019
af_css_table.pdf sample pdf files demonstrating associated files May 21, 2019
af_url_css_table.pdf new samples Jun 4, 2019
html_NS_Car - Wikipedia.pdf new files added ilustrating Jun 17, 2019
html_NS_html_tags_external.pdf new files added ilustrating Jun 17, 2019
html_NS_html_tags_inline.pdf new files added ilustrating Jun 17, 2019
html_NS_html_tags_internal.pdf new files added ilustrating Jun 17, 2019
table_complex_with_headerIDs.pdf added basic pdf examples May 15, 2019
table_with_attributes.pdf added basic pdf examples May 15, 2019
wikipedia_html_NS_no_styling.pdf added basic pdf examples May 15, 2019

README.md

PDF files

Even though the derivation algorithm is designed to work properly for all tagged pdf files, it is expected that the best result in terms of visual accuracy is achieved with pdf files that are tagged for this purpose. In the pdf folder you can find set of examples (manually crafted pdf files) that show how specific structure elements, attributes, associated files are used during the derivation and how the author can turn static pdfs into a dynamic html. We are also providing samples of tagged pdf files that don't perform well during derivation. The reason is highlighted in each sample, but usually we see sloppy tagging, wrong semantics, not enough information like missing attributes, not following best practices.

PAC_Report.pdf

Standard PDF/UA-1 file with no attributes nor classmap. The way lists are used in pdf without additional styling makes html output look little different

PAC_Report_style_classmap.pdf

Styling of derived html is achieved by introducing CSS attributes in classmap. Structure elements are associated with those classes of attributes through "C" entry. Derivation respects this and generated css style from classmap and then refers to it through class attribute.

BMIBAI_form.pdf

Acroform fields with various javascript actions. Changing units recalculates values. Calculates BMI and BAI and allows submitting data into shared google spreadsheet. The same functionality works in pdf and derived html. No specific styling is applied on structure elements to keep output as simple as possible

BMIBAI_form_css.pdf

Same as BMIBAI_form.pdf file. Styling is optimized for mobile/small devices and is achieved by attaching an associated file of type CSS on the Document structure element. Classes are referenced through "C" entry in structure element dictionary.

wikipedia_html_NS_no_styling.pdf

File created by converting html file to pdf with capturing original html structure. Each structure element type is in HTML namespace which is PDF 2.0 feature. The derivation is pretty straightforward. This sample doesn't contain any information about styling from original html. That makes the derived html look differently from layout perspective.

Animals_in_research_attributes.pdf

Standard Layout attributes used to preserve layout information like color or styling in the table.

table_with_attributes.pdf

Standard Layout attributes used to preserve layout information like color or styling in the table.

table_complex_with_headerIDs.pdf

Each table cell refers to it's headers through ID which is preserved in the html. Pdf and html would be consumed by screen reader in the same way

af_css_table.pdf

the same file as table_complex_with_headerIDs.pdf. Extensive styling is achieved with associated CSS file attached to the Document structure element. The css file is used by result html.

af_css_js_sort_table.pdf

Example of interactive table with associated files. The css file is used to style "sortable" class which is assigned to table strucure element via "C" entry. The associated javascript file allows sorting of table by clicking on a table header

af_url_css_table.pdf

Document structure element references external css style defined by url:https://www.w3schools.com/w3css/4/w3.css Table structrure element is then associated with class through "C" entry. Referencing external objects might be considered as unsafe and could introduce some unpredictability in the output.

Letter_DOC.pdf

Standard Microsoft Word template saved as PDF in MS Word 2016. Microsoft Word doesn't convert any styling information into attributes. The layout isn't preserved either.

Invoice_XLS.pdf

Standard Microsoft Excel invoice template saved as PDF in MS Excel 2016. Microsoft Excel doesn't convert any styling information into attributes.

Samples we are working on

  • more complex styling (css in form of associated file or on attributes)
  • structure destination used for better navigation inside of the document
  • used AF to provide additional information (mathml, css styling etc.)

Feedback

Feel free to submit bugs, problematic files, comments, questions, suggestions by creating new issue in Issues section or e-mail us

You can’t perform that action at this time.