Skip to content

PaperCutSoftware/pdfsearch

main
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Pure Go Full Text Search of PDF Files

This library implements full text search for PDFs.

The are some command lines programs that demonstrate the library's functionality.

Binary versions (executables) of these three programs are available in releases. There are 64-bit binaries for Windows, Mac and Linux. The binaries do not require a UniDoc license.

Installation

git clone https://github.com/PaperCutSoftware/pdfsearch

Replace uniDocLicenseKey and companyName in unidoc_glue.go with valid UniDoc license fields.

cd pdfsearch/examples
go build pdf_search_demo.go
go build index.go
go build search.go

examples/pdf_search_demo.go

Usage: ./pdf_search_demo -f <PDF path> <search term>

Example: ./pdf_search_demo -f PDF32000_2008.pdf cubic Bézier curve

The example will search PDF32000_2008.pdf for cubic Bézier curve.

pdf_search_demo.go shows how to use the APIs in index_search.go to

  • create indexes over PDFs,
  • search those indexes using full-text search, and
  • mark up PDFs with the locations of the search matches on pages.

examples/index.go

Usage: ./index <file pattern>

Example: ./index ~/climate/**/*.pdf

The example creates an on-disk index over the PDFs in ~/climate/ and its subdirectories.

examples/search.go

Usage: ./search <search term>

Example: ./search integrated assessment model

The example searches the on-disk index created by examples/index.go for integrated assessment model.

Libraries

index_search.go uses UniDoc for PDF parsing and bleve for search.

Talks about this library

GopherCon AU 2019

About

A full text search library for PDFs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages