Skip to content

StrangeNPC/PDFSearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Search Engine

The PDF Search Engine is a python application that is mean't for directories with large amounts of PDF documents that are too numerous to perfrom manual search.

It is mean't to be an answer to semantic-search AI question answering systems that hallucinate or become ineffective as scale of documents increases.

download page

Prerequisites

  1. Clone the repository to your local machine.
git clone https://github.com/StrangeNPC/PDFSearchEngine.git
  1. Install required packages using the following command.
pip install -r requirements.txt
  1. Run Django Server
streamlit run StreamlitPDF.py

Usage/Examples

  1. Build the index:

Click on the "Build Index" button in the sidebar. This will create an index of the PDF files in the "SourceDocuments" directory.

  1. Select the documents to search:

Choose the PDF files you want to include in the search by selecting them from the multiselect checkbox in the sidebar. By default, all the documents are selected.

  1. Enter a search query:

Type your search query in the input box below the document selection. Press the "Enter" key or click the "Search" button to perform the search.

  1. View search results:

The search results will be displayed below the search input box. Each result includes the file name, page number, and the relevant paragraph containing the search query terms.

  1. Download search result pages:

Click on the "Download Page <page_number>" button to download the specific page from the search results as a separate PDF file.

Contributing

Contributions are welcome. Please create an issue or submit a pull request if you want to contribute to this project.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages