eterps/pdf-struct
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PDF::Extractor is a library that provides high level access to the text objects of a PDF document. It is actually a simple wrapper around the pdftohtml command with its '-xml' option, so you need the pdftohtml command in your path to be able to use this library. The pdftohtml command is written by Mikhail Kruk (originally written by Gueorgui Ovtcharov and Rainer Dorsch): http://pdftohtml.sourceforge.net If you need to have direct (low level) access to a PDF, use pdf-reader instead: http://github.com/yob/pdf-reader/tree/master Usage: document = PDF::Extractor.open('test.pdf') document.elements.each do |element| puts "#{element.left}, #{element.top}\t'#{element.content}'" end
About
PDF::Extractor is a library that provides high level access to the text objects of a PDF document
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published