The PDF Search Engine is a python application that is mean't for directories with large amounts of PDF documents that are too numerous to perfrom manual search.
It is mean't to be an answer to semantic-search AI question answering systems that hallucinate or become ineffective as scale of documents increases.
- Clone the repository to your local machine.
git clone https://github.com/StrangeNPC/PDFSearchEngine.git- Install required packages using the following command.
pip install -r requirements.txt- Run Django Server
streamlit run StreamlitPDF.py
- Build the index:
Click on the "Build Index" button in the sidebar. This will create an index of the PDF files in the "SourceDocuments" directory.
- Select the documents to search:
Choose the PDF files you want to include in the search by selecting them from the multiselect checkbox in the sidebar. By default, all the documents are selected.
- Enter a search query:
Type your search query in the input box below the document selection. Press the "Enter" key or click the "Search" button to perform the search.
- View search results:
The search results will be displayed below the search input box. Each result includes the file name, page number, and the relevant paragraph containing the search query terms.
- Download search result pages:
Click on the "Download Page <page_number>" button to download the specific page from the search results as a separate PDF file.
Contributions are welcome. Please create an issue or submit a pull request if you want to contribute to this project.
