Skip to content

Leptons1618/LocalRAG

Repository files navigation

LocalRAG - Local Retrieval-Augmented Generation Chat

A powerful RAG (Retrieval-Augmented Generation) chat application that allows you to chat with your documents using local LLMs through Ollama.

Features

  • 💬 Interactive Chat Interface - Built with Streamlit for a smooth user experience
  • 📄 Document Upload - Support for PDF and TXT files
  • 🤖 Local LLM Support - Uses Ollama for complete privacy and offline operation
  • 🔍 Vector Search - ChromaDB for efficient document retrieval
  • Streaming Responses - Real-time response generation
  • 🎨 Model Selection - Switch between different Ollama models on the fly
  • 📚 Persistent Storage - Documents are stored in a local vector database

Prerequisites

  • Python 3.10 or higher
  • Ollama installed and running locally
  • At least one Ollama model downloaded (e.g., llama3.2:latest)

Installing Ollama

  1. Download and install Ollama from https://ollama.ai/
  2. Pull a model: ollama pull llama3.2:latest
  3. Verify Ollama is running: ollama list

Installation

1. Clone or Download the Repository

cd LocalRAG

2. Install Dependencies

Option A: Using uv (Recommended - Faster)

uv pip install --system -r requirements.txt

Option B: Using pip

pip install -r requirements.txt

Configuration

The application can be configured by editing config.py or setting environment variables:

Setting Default Description
llm_model llama3.2:latest Ollama model to use
text_embedding_model nomic-embed-text Embedding model for vector search
chunk_size 1512 Text chunk size for document splitting
chunk_overlap 256 Overlap between chunks
max_context_docs 3 Number of documents to retrieve
temp_folder ./_temp Temporary file storage
chroma_path ./chroma Vector database storage path
log_level INFO Logging level

Usage

Starting the Application

streamlit run app.py

The application will open in your default browser at http://localhost:8501

Using the Application

  1. Upload Documents (Optional)

    • Click the sidebar to expand it
    • Go to "Document Management"
    • Upload PDF or TXT files
    • Wait for processing confirmation
  2. Select Model (Optional)

    • In the sidebar under "Model Configuration"
    • Choose from available Ollama models
  3. Chat

    • Type your question in the chat input
    • Press Enter to send
    • View streaming responses in real-time
  4. Clear History

    • Click "🧹 Clear Chat History" in the sidebar

Project Structure

LocalRAG/
├── app.py                  # Main Streamlit application
├── query.py               # Query handling and RAG logic
├── embed.py               # Document embedding and processing
├── get_vector_db.py       # Vector database management
├── config.py              # Configuration settings
├── logger_config.py       # Logging configuration
├── requirements.txt       # Python dependencies
├── README.md             # This file
├── chroma/               # Vector database storage (created on first run)
└── _temp/                # Temporary file storage (created on first run)

How It Works

  1. Document Processing: Uploaded documents are split into chunks and embedded using Ollama's embedding model
  2. Vector Storage: Embeddings are stored in ChromaDB for efficient retrieval
  3. Query Processing: User queries are embedded and used to retrieve relevant document chunks
  4. Response Generation: Retrieved context is sent to the LLM along with the query to generate accurate responses

Dependencies

Core dependencies:

  • streamlit - Web interface
  • langchain - LLM framework
  • langchain-ollama - Ollama integration
  • langchain-chroma - ChromaDB integration
  • langchain-community - Document loaders
  • chromadb - Vector database
  • pypdf - PDF processing

See requirements.txt for the complete list.

Troubleshooting

Ollama Connection Error

Problem: "Could not connect to Ollama API" Solution:

  • Ensure Ollama is running: ollama serve
  • Check if Ollama is accessible: curl http://localhost:11434/api/tags

Import Errors

Problem: ModuleNotFoundError Solution: Reinstall dependencies

pip install --upgrade -r requirements.txt

No Documents Found

Problem: RAG not working even after uploading files Solution:

  • Check chroma/ directory exists
  • Verify embedding model is downloaded: ollama pull nomic-embed-text
  • Check logs for errors

Slow Response Times

Problem: Responses take too long Solution:

  • Use a smaller/faster model
  • Reduce max_context_docs in config
  • Ensure Ollama has adequate resources

Development

Running Tests

python -c "from config import settings; from query import get_query_handler; print('✅ All modules working')"

Logging

Logs are output to the console. Adjust log_level in config.py for more/less detail:

  • DEBUG - Detailed information
  • INFO - General information (default)
  • WARNING - Warning messages only
  • ERROR - Error messages only

Performance Tips

  1. Model Selection: Smaller models (e.g., llama3.2:3b) are faster than larger ones
  2. Document Size: Break large documents into smaller files for faster processing
  3. Chunk Size: Adjust chunk_size based on your documents (larger for books, smaller for articles)
  4. GPU Acceleration: Ollama automatically uses GPU if available

License

This project is open source and available for personal and commercial use.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Support

For issues, questions, or suggestions, please open an issue on the repository.


Happy chatting with your documents! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors