Simple indexer + text searcher for a collection of PDF documents.
First, extract the words from the PDF documents:
python extract_text.py --pdf_dir PDF_DIR --text_path TEXT_PATH [--pool_size POOL_SIZE]
Then, construct the index:
python construct_index.py --text_path TEXT_PATH --index_path INDEX_PATH
Finally, run the searcher:
python query.py --index_path INDEX_PATH [--max_results MAX_RESULTS]