OCP RAG Stack

Fully Self-Hosted RAG System for OpenShift Container Platform CPU-only • Zero External Dependencies

A complete Retrieval-Augmented Generation (RAG) stack designed to run entirely on OpenShift Container Platform without GPU requirements or external API dependencies. Built as a proof-of-value pilot for platform engineers to query internal documentation using natural language.

🚀 100% CPU-Only: Runs on standard OpenShift nodes without GPU requirements
🔒 Fully Self-Hosted: No external API calls, complete data sovereignty
📚 Smart Chunking: Markdown-aware chunking preserves document structure
⚡ Real-Time Streaming: Server-Sent Events for responsive chat experience

🏗️ Architecture

User uploads documents via Frontend
Ingestion API chunks and embeds documents
Embeddings stored in ChromaDB
User asks questions via Chat UI
Chat API retrieves relevant chunks
Ollama generates contextual responses
Streaming response displayed in real-time

🚀 Quick Start

Prerequisites

OpenShift 4.14+ cluster with cluster-admin access
oc CLI configured and authenticated
16GB+ available memory across cluster
50GB+ available storage

One-Command Deployment

# Clone repository
git clone https://github.com/devenes/ocp-rag-stack.git
cd ocp-rag-stack

# Deploy entire stack
make deploy

# Wait for all pods to be ready (5-10 minutes for model downloads)
oc get pods -n ocp-rag-stack -w

# Seed with example runbooks
make seed

# Get frontend URL
make demo

Manual Deployment

# Create namespace and RBAC
oc apply -f deploy/namespace.yaml
oc apply -f deploy/rbac/

# Deploy infrastructure (Ollama + ChromaDB)
oc apply -f deploy/ollama/
oc apply -f deploy/chromadb/

# Wait for Ollama models to download
oc wait --for=condition=complete job/ollama-init-models -n ocp-rag-stack --timeout=600s

# Build and deploy application services
make build-images
oc apply -f deploy/ingestion-api/
oc apply -f deploy/chat-api/

# Deploy frontend
oc apply -f deploy/frontend/

# Verify deployment
oc get pods -n ocp-rag-stack

📖 Usage

Access the UI

# Get frontend URL
FRONTEND_URL=$(oc get route frontend -n ocp-rag-stack -o jsonpath='{.spec.host}')
echo "Frontend: https://$FRONTEND_URL"

# Open in browser
open "https://$FRONTEND_URL"

Upload Documents

Click the upload area or drag-and-drop files
Supported formats: .txt, .md
Documents are automatically chunked and indexed
View indexed documents in the sidebar

Ask Questions

Type your question in the chat input
Press Enter or click Send
Watch the AI response stream in real-time
View source citations below each response

Example Questions

"How do I restart the payment service?"
"What should I do if a node shows NotReady status?"
"How do I check etcd cluster health?"
"What are the steps for incident response?"

🛠️ Development

Build Locally

# Build Go binaries
make build

# Run tests
make test

# Build container images
make build-images

# Push to OpenShift internal registry
make push-images

Run Locally (Development)

# Start ChromaDB (requires Docker)
docker run -d -p 8000:8000 chromadb/chroma:0.5.23

# Start Ollama (requires Ollama installed)
ollama serve &
ollama pull nomic-embed-text
ollama pull qwen2.5:1.5b

# Run ingestion API
VECTOR_STORE=chromahttp \
CHROMA_URL=http://localhost:8000 \
OLLAMA_URL=http://localhost:11434 \
go run cmd/ingestion/main.go

# Run chat API (in another terminal)
VECTOR_STORE=chromahttp \
CHROMA_URL=http://localhost:8000 \
OLLAMA_URL=http://localhost:11434 \
go run cmd/chat/main.go

# Open frontend/index.html in browser

🔧 Configuration

Environment Variables

Ingestion API:

PORT=8081                           # API port (default: 8081)
VECTOR_STORE=chromahttp             # Vector store backend (chromem|chromahttp)
CHROMA_URL=http://chromadb:8000     # ChromaDB URL
OLLAMA_URL=http://ollama:11434      # Ollama URL
EMBED_MODEL=nomic-embed-text        # Embedding model
CHUNKING_STRATEGY=markdown          # Chunking strategy (fixed|sentence|markdown)
CHUNK_SIZE=512                      # Chunk size in tokens
CHUNK_OVERLAP=50                    # Chunk overlap in tokens
LOG_LEVEL=info                      # Log level (debug|info|warn|error)

Chat API:

PORT=8082                           # API port (default: 8082)
VECTOR_STORE=chromahttp             # Vector store backend
CHROMA_URL=http://chromadb:8000     # ChromaDB URL
OLLAMA_URL=http://ollama:11434      # Ollama URL
EMBED_MODEL=nomic-embed-text        # Embedding model
CHAT_MODEL=qwen2.5:1.5b             # Chat model
TOP_K=5                             # Number of chunks to retrieve
LOG_LEVEL=info                      # Log level

Resource Requirements Example

Component	CPU Request	CPU Limit	Memory Request	Memory Limit	Storage
Ollama	2 cores	4 cores	4Gi	6Gi	10Gi
ChromaDB	500m	1 core	1Gi	2Gi	10Gi
Ingestion API	200m	500m	256Mi	512Mi	-
Chat API	200m	500m	256Mi	512Mi	-
Frontend	50m	200m	64Mi	128Mi	-

Total Cluster Requirements:

CPU: ~3.5 cores (requests), ~6.5 cores (limits)
Memory: ~6Gi (requests), ~10Gi (limits)
Storage: ~20Gi persistent volumes

📊 Small Models

Embedding Model: nomic-embed-text

Size: 274MB
Dimensions: 768
Context: 8192 tokens
Performance: ~100 embeddings/sec on CPU
Use Case: Document and query embeddings

Chat Model: qwen2.5:1.5b

Size: 1.1GB
Parameters: 1.5 billion
Context: 32K tokens
Performance: 15-25 tokens/sec on CPU
Use Case: Conversational responses

Both models are optimized for CPU inference and automatically downloaded during deployment.

🔍 Monitoring

Check Component Health

# All pods
oc get pods -n ocp-rag-stack

# Specific component logs
make logs-ollama
make logs-chromadb
make logs-ingestion
make logs-chat
make logs-frontend

# Port forward for debugging
make port-forward-ollama      # localhost:11434
make port-forward-chromadb    # localhost:8000
make port-forward-ingestion   # localhost:8081
make port-forward-chat        # localhost:8082

Health Endpoints

# Ingestion API
curl http://ingestion-api:8081/health

# Chat API
curl http://chat-api:8082/health

# Ollama
curl http://ollama:11434/api/tags

# ChromaDB
curl http://chromadb:8000/api/v1/heartbeat

🐛 Troubleshooting

Pods Not Starting

# Check pod status
oc describe pod <pod-name> -n ocp-rag-stack

# Check events
oc get events -n ocp-rag-stack --sort-by='.lastTimestamp'

# Check logs
oc logs <pod-name> -n ocp-rag-stack

Model Download Timeout

# Check init job status
oc get job ollama-init-models -n ocp-rag-stack

# Check job logs
oc logs job/ollama-init-models -n ocp-rag-stack

# Manually trigger model download
oc exec -n ocp-rag-stack deployment/ollama -- ollama pull nomic-embed-text
oc exec -n ocp-rag-stack deployment/ollama -- ollama pull qwen2.5:1.5b

ChromaDB Connection Issues

# Check ChromaDB pod
oc get pod -n ocp-rag-stack -l app=chromadb

# Test connectivity from ingestion pod
oc exec -n ocp-rag-stack deployment/ingestion-api -- curl -v http://chromadb:8000/api/v1/heartbeat

# Check network policies
oc get networkpolicy -n ocp-rag-stack

Frontend Not Loading

# Check route
oc get route frontend -n ocp-rag-stack

# Check ConfigMap
oc get configmap frontend-html -n ocp-rag-stack

# Restart frontend
oc rollout restart deployment/frontend -n ocp-rag-stack

🧪 Testing

Run Unit Tests

# All tests
make test

# Specific package
go test ./internal/chunking/... -v

# With coverage
go test ./... -coverprofile=coverage.out
go tool cover -html=coverage.out

Integration Testing

# Deploy to test namespace
oc new-project ocp-rag-stack-test
make deploy NAMESPACE=ocp-rag-stack-test

# Run integration tests
make test-integration

# Cleanup
oc delete project ocp-rag-stack-test

📚 API Reference

Ingestion API

POST /api/v1/ingest/text

curl -X POST http://ingestion-api:8081/api/v1/ingest/text \
  -F "file=@runbook.md"

Response:
{
  "message": "Document ingested successfully",
  "chunks_created": 42,
  "document_id": "runbook.md"
}

GET /health

curl http://ingestion-api:8081/health

Response:
{
  "status": "healthy",
  "vector_store": "connected",
  "ollama": "connected"
}

Chat API

POST /api/v1/chat/stream (SSE)

curl -X POST http://chat-api:8082/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I restart a pod?", "stream": true}'

Response (SSE):
data: {"content": "To restart"}
data: {"content": " a pod"}
data: {"content": ", use the"}
data: {"sources": [...]}
data: [DONE]

POST /api/v1/chat (Synchronous)

curl -X POST http://chat-api:8082/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I restart a pod?", "stream": false}'

Response:
{
  "response": "To restart a pod, use the oc delete pod command...",
  "sources": [
    {
      "content": "...",
      "metadata": {"source": "runbook.md"},
      "similarity": 0.89
    }
  ]
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Ollama - Local LLM runtime
ChromaDB - Vector database
chromem-go - Pure Go vector store
go-chi - Lightweight HTTP router
OpenShift - Enterprise Kubernetes platform

Built with ❤️ for Platform Engineers

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cmd		cmd
deploy		deploy
doc		doc
frontend		frontend
internal		internal
seed		seed
.gitattributes		.gitattributes
.gitignore		.gitignore
Containerfile.chat		Containerfile.chat
Containerfile.ingestion		Containerfile.ingestion
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

OCP RAG Stack

🏗️ Architecture

🚀 Quick Start

Prerequisites

One-Command Deployment

Manual Deployment

📖 Usage

Access the UI

Upload Documents

Ask Questions

Example Questions

🛠️ Development

Build Locally

Run Locally (Development)

🔧 Configuration

Environment Variables

Resource Requirements Example

📊 Small Models

Embedding Model: nomic-embed-text

Chat Model: qwen2.5:1.5b

🔍 Monitoring

Check Component Health

Health Endpoints

🐛 Troubleshooting

Pods Not Starting

Model Download Timeout

ChromaDB Connection Issues

Frontend Not Loading

🧪 Testing

Run Unit Tests

Integration Testing

📚 API Reference

Ingestion API

Chat API

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages