A powerful RAG (Retrieval-Augmented Generation) chat application that allows you to chat with your documents using local LLMs through Ollama.
- 💬 Interactive Chat Interface - Built with Streamlit for a smooth user experience
- 📄 Document Upload - Support for PDF and TXT files
- 🤖 Local LLM Support - Uses Ollama for complete privacy and offline operation
- 🔍 Vector Search - ChromaDB for efficient document retrieval
- ⚡ Streaming Responses - Real-time response generation
- 🎨 Model Selection - Switch between different Ollama models on the fly
- 📚 Persistent Storage - Documents are stored in a local vector database
- Python 3.10 or higher
- Ollama installed and running locally
- At least one Ollama model downloaded (e.g.,
llama3.2:latest)
- Download and install Ollama from https://ollama.ai/
- Pull a model:
ollama pull llama3.2:latest - Verify Ollama is running:
ollama list
cd LocalRAGOption A: Using uv (Recommended - Faster)
uv pip install --system -r requirements.txtOption B: Using pip
pip install -r requirements.txtThe application can be configured by editing config.py or setting environment variables:
| Setting | Default | Description |
|---|---|---|
llm_model |
llama3.2:latest |
Ollama model to use |
text_embedding_model |
nomic-embed-text |
Embedding model for vector search |
chunk_size |
1512 |
Text chunk size for document splitting |
chunk_overlap |
256 |
Overlap between chunks |
max_context_docs |
3 |
Number of documents to retrieve |
temp_folder |
./_temp |
Temporary file storage |
chroma_path |
./chroma |
Vector database storage path |
log_level |
INFO |
Logging level |
streamlit run app.pyThe application will open in your default browser at http://localhost:8501
-
Upload Documents (Optional)
- Click the sidebar to expand it
- Go to "Document Management"
- Upload PDF or TXT files
- Wait for processing confirmation
-
Select Model (Optional)
- In the sidebar under "Model Configuration"
- Choose from available Ollama models
-
Chat
- Type your question in the chat input
- Press Enter to send
- View streaming responses in real-time
-
Clear History
- Click "🧹 Clear Chat History" in the sidebar
LocalRAG/
├── app.py # Main Streamlit application
├── query.py # Query handling and RAG logic
├── embed.py # Document embedding and processing
├── get_vector_db.py # Vector database management
├── config.py # Configuration settings
├── logger_config.py # Logging configuration
├── requirements.txt # Python dependencies
├── README.md # This file
├── chroma/ # Vector database storage (created on first run)
└── _temp/ # Temporary file storage (created on first run)
- Document Processing: Uploaded documents are split into chunks and embedded using Ollama's embedding model
- Vector Storage: Embeddings are stored in ChromaDB for efficient retrieval
- Query Processing: User queries are embedded and used to retrieve relevant document chunks
- Response Generation: Retrieved context is sent to the LLM along with the query to generate accurate responses
Core dependencies:
streamlit- Web interfacelangchain- LLM frameworklangchain-ollama- Ollama integrationlangchain-chroma- ChromaDB integrationlangchain-community- Document loaderschromadb- Vector databasepypdf- PDF processing
See requirements.txt for the complete list.
Problem: "Could not connect to Ollama API" Solution:
- Ensure Ollama is running:
ollama serve - Check if Ollama is accessible:
curl http://localhost:11434/api/tags
Problem: ModuleNotFoundError
Solution: Reinstall dependencies
pip install --upgrade -r requirements.txtProblem: RAG not working even after uploading files Solution:
- Check
chroma/directory exists - Verify embedding model is downloaded:
ollama pull nomic-embed-text - Check logs for errors
Problem: Responses take too long Solution:
- Use a smaller/faster model
- Reduce
max_context_docsin config - Ensure Ollama has adequate resources
python -c "from config import settings; from query import get_query_handler; print('✅ All modules working')"Logs are output to the console. Adjust log_level in config.py for more/less detail:
DEBUG- Detailed informationINFO- General information (default)WARNING- Warning messages onlyERROR- Error messages only
- Model Selection: Smaller models (e.g.,
llama3.2:3b) are faster than larger ones - Document Size: Break large documents into smaller files for faster processing
- Chunk Size: Adjust
chunk_sizebased on your documents (larger for books, smaller for articles) - GPU Acceleration: Ollama automatically uses GPU if available
This project is open source and available for personal and commercial use.
Contributions are welcome! Please feel free to submit issues or pull requests.
- Ollama - Local LLM inference
- LangChain - LLM framework
- ChromaDB - Vector database
- Streamlit - Web interface
For issues, questions, or suggestions, please open an issue on the repository.
Happy chatting with your documents! 🚀