This Streamlit application creates a pedagogical chatbot augmented by document retrieval (RAG - Retrieval Augmented Generation). The chatbot uses texts extracted from PDF documents and images uploaded by the user to respond to context-based questions. It is a valuable tool for education and quick information retrieval from various documents.
- Upload and process multiple PDF files and images.
- Extract text from PDFs and images.
- Generate contextual responses based on the extracted content.
To install and run this application, follow these steps:
-
Prerequisites:
- Python 3.10 or newer.
- Tesseract-OCR for text extraction from images.
-
Installing Tesseract-OCR: On Ubuntu:
sudo apt update sudo apt install tesseract-ocr
On Windows, download and install Tesseract from this link, and make sure to add the path to the Tesseract executable to your
PATH
environment variable. -
Clone the repository or download the files:
git clone [URL_OF_REPO] cd rag3
-
Install Python dependencies:
pip install -r requirements.txt
-
Configure environment variables: Copy the
.env.sample
file to.env
and adjust the necessary values:cp .env.sample .env
-
Launch the application:
streamlit run app.py
- Launch the application and navigate through the Streamlit interface.
- Use the file uploader to upload PDF documents and images.
- Pose your questions to the chatbot via the user interface to receive answers based on the content of the uploaded documents.
app.py
: The main application file, modify it to adjust the user interface or high-level logic.models.py
: Contains logic for file processing and response generation; modify this file to adjust processing or interaction algorithms.
For assistance or to report issues, open an issue in the application's GitHub repository or contact the developer by email.