My branch differences:

pdf2htmlEX is no longer under active development. New maintainers are wanted.

# pdf2htmlEX

My branch differences:

This is my branch of pdf2htmlEX which I maintain for my own purposes. I have made a number of changes and improvements over the original code:

Lots of bugs fixes, mostly of edge cases
Integration of latest Cairo code
Out of source building
Rewritten handling of obscured/partially obscured text - now much more accurate
Some support for transparent text
Improvement of DPI settings - clamping of DPI to ensure output graphic isn't too big

--correct-text-visibility tracks the visibility of 4 sample points for each character (currently the 4 corners of the character's bounding box, inset slightly) to determine visibility. It now has two modes. 1 = Fully occluded text handled (i.e. doesn't get put into the HTML layer). 2 = Partially occluded text handled.

The default is now "1", so fully occluded text should no longer show through. If "2" is selected then if the character is partially occluded it will be drawn in the background layer. In this case, the rendered DPI of the page will be automatically increased to --covered-text-dpi (default: 300) to reduce the impact of rasterized text.

For maximum accuracy I strongly recommend using the output options: --font-size-multiplier 1 --zoom 25. This will circumvent rounding errors inside web browsers. You will then have to scale down the resulting HTML page using an appropriate "scale" transform.

If you are concerned about file size of the resulting HTML, then I recommend patching fontforge to prevent it writing the current time into the dumped fonts, and then post-process the pdf2htmlEX data to remove duplicate files - there will usually be many duplicate background images and fonts.

Original README.md follows...

一图胜千言
A beautiful demo is worth a thousand words

Bible de Genève, 1564 (fonts and typography): HTML / PDF
Cheat Sheet (math formulas): HTML / PDF
Scientific Paper (text and figures): HTML / PDF
Full Circle Magazine (read while downloading): HTML / PDF
Git Manual (CJK support): HTML / PDF

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. Academic papers with lots of formulas and figures? Magazines with complicated layouts? No problem!

pdf2htmlEX is also an online publishing tool which is flexible for many different use cases.

Learn more about who and why should use pdf2htmlEX.

Features

Native HTML text with precise font and location.
Flexible output: all-in-one HTML or on demand page loading (needs JavaScript).
Moderate file size, sometimes even smaller than PDF.
Supporting links, outlines (bookmarks), printing, SVG background, Type 3 fonts and more...

Compare to others

Portals

LICENSE

pdf2htmlEX, as a whole package, is licensed under GPLv3+. Some resource files are released with relaxed licenses, read LICENSE for more details.

Acknowledgements

pdf2htmlEX is made possible thanks to the following projects:

pdf2htmlEX is inspired by the following projects:

pdftohtml from poppler
MuPDF
PDF.js
Crocodoc
Google Doc

Special Thanks

Hongliang Tian
Wanmin Liu

Name		Name	Last commit message	Last commit date
Latest commit History 1,755 Commits
3rdparty		3rdparty
logo		logo
share		share
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
ChangeLog		ChangeLog
INSTALL		INSTALL
LICENSE		LICENSE
LICENSE_GPLv3		LICENSE_GPLv3
README.md		README.md
TODO		TODO
build_for_ppa.py		build_for_ppa.py
dobuild		dobuild
doclean		doclean
pdf2htmlEX.1.in		pdf2htmlEX.1.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

My branch differences:

Original README.md follows...

Features

Portals

LICENSE

Acknowledgements

Special Thanks

About

Licenses found

Releases

Packages

Languages

License

Licenses found

Rockstar04/pdf2htmlEX

Folders and files

Latest commit

History

Repository files navigation

My branch differences:

Original README.md follows...

Features

Portals

LICENSE

Acknowledgements

Special Thanks

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages