Skip to content

CorentinB/pdftotext-go

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdftotext-go

OpenSSF Scorecard

Extract texts with their corresponding page numbers from PDF files. Wraps the command line tool pdftotext (poppler-utils).

Usage

  1. poppler-utils (version >=22.05.0) must be installed and available in the path.
  2. go get "github.com/heussd/pdftotext-go"
  3. See tests for code examples.

Why poppler version >=22.05.0

Version 22.05.0 of poppler introduced a new parameter -tsv, which extracts PDF content with meta data as TSV. This functionality is essential for the operation of this library.

About

Extract texts + their page numbers from PDF

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 98.6%
  • Dockerfile 1.4%