PDF indexer

Simple indexer + text searcher for a collection of PDF documents.

Usage

First, extract the words from the PDF documents:

python extract_text.py --pdf_dir PDF_DIR --text_path TEXT_PATH [--pool_size POOL_SIZE]

Then, construct the index:

python construct_index.py --text_path TEXT_PATH --index_path INDEX_PATH

Finally, run the searcher:

python query.py --index_path INDEX_PATH [--max_results MAX_RESULTS]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt