SmartRAG - Intelligent Multimodal RAG System

A production-ready RAG system enabling intelligent conversations with documents, images, and audio files. Built with local-first AI models for complete privacy and offline operation.

Quick Start

# Standard deployment
docker-compose up -d

# Access at http://localhost:8501

Core Features

Multimodal Processing

Documents: PDF, DOCX, TXT, MD with intelligent chunking
Images: OCR + visual understanding via BLIP
Audio: Automatic transcription with Whisper

Local AI Stack

Ollama (Llama 3.1 8B) for generation
Nomic Embed Text (768-dim) for embeddings
ChromaDB for vector storage
Complete offline operation

Production Ready

Docker deployment with multi-stage builds
Non-root user execution
Health checks and auto-healing
Resource management and monitoring
Security hardening included

Technology Stack

Component	Technology
LLM	Llama 3.1 8B via Ollama
Embeddings	Nomic Embed Text (768-dim)
Vector DB	ChromaDB / FAISS
Vision	BLIP + CLIP + Tesseract OCR
Audio	OpenAI Whisper (base)
UI	Streamlit
Storage	SQLite3

Architecture

Installation

Docker (Recommended)

git clone https://github.com/itanishqshelar/SmartRAG.git
cd SmartRAG/docker

# Development
docker-compose up -d

# Production with full stack (PostgreSQL, Redis, Nginx)
docker-compose -f docker-compose.prod.yml up -d

Local Setup

# Install dependencies
pip install -r requirements.txt

# Install Ollama and models
ollama pull llama3.1:8b
ollama pull nomic-embed-text

# Install system dependencies
# macOS: brew install tesseract ffmpeg
# Ubuntu: apt-get install tesseract-ocr ffmpeg
# Windows: Download from GitHub releases

# Run application
streamlit run chatbot_app.py

Configuration

SmartRAG uses a single config.yaml with Pydantic validation:

models:
  llm_model: "llama3.1:8b"
  embedding_model: "nomic-embed-text"
  vision_model: "Salesforce/blip-image-captioning-base"
  whisper_model: "base"

vector_store:
  type: "chromadb"
  embedding_dimension: 768

processing:
  chunk_size: 1000
  chunk_overlap: 200
  ocr_enabled: true

generation:
  temperature: 0.7
  max_tokens: 2000
  context_window: 4096

Override via environment variables:

export SMARTRAG_LLM_MODEL=llama2:7b
export SMARTRAG_TEMPERATURE=0.5

Usage

Web Interface

Upload files via drag-and-drop
Ask questions about your content
View source documents inline
Manage chat history and files

Python API

from multimodal_rag.system import MultimodalRAGSystem

system = MultimodalRAGSystem()

# Ingest content
system.ingest_file("document.pdf")
system.ingest_file("screenshot.png")
system.ingest_file("recording.mp3")

# Query with context
response = system.query("Summarize the key points")
print(response.answer)

Batch Processing

# Process directories
results = system.ingest_directory("./docs/", recursive=True)
print(f"Processed {len(results)} files")

Project Structure

smartrag/
├── chatbot_app.py              # Streamlit application
├── config.yaml                 # Configuration
├── requirements.txt            # Dependencies
├── multimodal_rag/
│   ├── system.py              # RAG orchestrator
│   ├── processors/            # File type handlers
│   │   ├── document_processor.py
│   │   ├── image_processor.py
│   │   └── audio_processor.py
│   └── vector_stores/         # DB implementations
│       ├── chroma_store.py
│       └── faiss_store.py
├── docker/                    # Production deployment
│   ├── Dockerfile
│   ├── docker-compose.yml
│   └── docker-compose.prod.yml
└── tests/                     # Test suite

Deployment Options

Standard - All-in-one container with Ollama

docker-compose up -d

Lightweight - External Ollama on host

docker-compose -f docker-compose.lite.yml up -d

Production - Full stack with PostgreSQL, Redis, Nginx

docker-compose -f docker-compose.prod.yml up -d

Development

# Run tests
pytest tests/

# Code formatting
black multimodal_rag/ tests/

# Linting
flake8 multimodal_rag/ tests/

Performance

Image size: 4.2GB
Memory: 4-8GB recommended
CPU: 2-4 cores recommended
Startup time: ~90s (includes model downloads)
Query latency: <3s typical

Security

Local inference - no external API calls
Non-root container execution
File size limits enforced (50MB default)
No privilege escalation
Security headers in production setup

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with ChromaDB, Ollama, Hugging Face Transformers, OpenAI Whisper, and Tesseract OCR.

SmartRAG - Local-first multimodal AI for document intelligence.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docker		docker
docs		docs
multimodal_rag		multimodal_rag
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean_db.py		clean_db.py
config.yaml		config.yaml
config_examples.py		config_examples.py
config_schema.py		config_schema.py
requirements.txt		requirements.txt
setup.py		setup.py
start.py		start.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartRAG - Intelligent Multimodal RAG System

Quick Start

Core Features

Technology Stack

Architecture

Installation

Docker (Recommended)

Local Setup

Configuration

Usage

Project Structure

Deployment Options

Development

Performance

Security

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartRAG - Intelligent Multimodal RAG System

Quick Start

Core Features

Technology Stack

Architecture

Installation

Docker (Recommended)

Local Setup

Configuration

Usage

Project Structure

Deployment Options

Development

Performance

Security

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages