GitHub

PDFProcessor PDFProcessor is a Python-based application designed to facilitate the extraction, processing, and interaction with PDF documents. Leveraging advanced natural language processing techniques, this tool enables users to query PDF content through a conversational interface.

Features PDF Upload: Seamlessly upload PDF documents for processing.

Text Extraction: Extracts readable text from PDF files.

Text Chunking: Splits extracted text into manageable chunks for efficient processing.

Semantic Search: Utilizes embeddings to perform semantic searches within the document.

Conversational Interface: Engage in a chat-like interface to ask questions related to the PDF content.

Real-time Responses: Receive answers in real-time as the system processes your queries.

Installation Prerequisites Ensure you have the following installed:

Python 3.10 or higher

Ollama (for LLM interactions)

Clone the Repository bash Copy Edit git clone https://github.com/cebause01/PDFProcessor.git cd PDFProcessor Install Dependencies bash Copy Edit pip install -r requirements.txt Run the Application bash Copy Edit streamlit run main.py Usage Upload a PDF: Click on the "Upload a PDF" button in the sidebar to upload your document.

Ask Questions: Once the PDF is processed, type your questions in the input field and press Enter.

View Responses: The system will display answers based on the content of the PDF.

Technologies Used Python: Programming language used for development.

Streamlit: Framework for building the web application.

PyPDF2: Library for PDF text extraction.

NumPy: Library for numerical operations.

Scikit-learn: Library for machine learning and cosine similarity calculations.

Ollama API: Interface for large language model interactions.

Contributing Contributions are welcome! Please fork the repository, make your changes, and submit a pull request. Ensure your code adheres to the existing style and includes appropriate tests.

License This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Release-24.08.0-0.zip		Release-24.08.0-0.zip
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

cebause01/PDFProcessor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages