pdf-searcher

Traverses a folder and parses PDFs encountered. Use in the CLI. e.g.

pdfsearch 'Registration certificate' --maxpages 4 -a 07/24 -b 08/01 -i
Searching C:\Users\bsamm\Google Drive\Scanned for files matching /Registration\s+certificate/gim with less than 4 pages, created (strictly) between "7/24/2018, 12:00:00 AM" and "8/1/2018, 12:00:00 AM"

> C:\Users\bsamm\Google Drive\Scanned\2018_07_25_07_47_00.pdf Matching Content REGISTRATION  CERTIFICATE

install with npm install -g pdf-search

This is a pretty rough implementation, thrown together on a Sunday afternoon after getting tired of digging through a folder of scanned PDFs when my scanner OCRs the documents. Why not use some custom third party search software? I want to get around to integrating with a node-opencv lib because now-adays opencv has tesseract built in. I should be able to scan images & pdf images and pull text to match against the regexp as well. That'd be a neat script right?

Licensed under MIT by Benjamin Sammons. Have fun.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
built		built
src		src
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf-searcher

About

Releases

Packages

Languages

Sammons/pdf-search

Folders and files

Latest commit

History

Repository files navigation

pdf-searcher

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages