Skip to content

devenes/ocp-rag-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OCP RAG Stack

Fully Self-Hosted RAG System for OpenShift Container Platform CPU-only β€’ Zero External Dependencies

OpenShift Go Ollama ChromaDB License: Apache

A complete Retrieval-Augmented Generation (RAG) stack designed to run entirely on OpenShift Container Platform without GPU requirements or external API dependencies. Built as a proof-of-value pilot for platform engineers to query internal documentation using natural language.

Architecture Diagram

  • πŸš€ 100% CPU-Only: Runs on standard OpenShift nodes without GPU requirements
  • πŸ”’ Fully Self-Hosted: No external API calls, complete data sovereignty
  • πŸ“š Smart Chunking: Markdown-aware chunking preserves document structure
  • ⚑ Real-Time Streaming: Server-Sent Events for responsive chat experience

πŸ—οΈ Architecture

Architecture Diagram

  1. User uploads documents via Frontend
  2. Ingestion API chunks and embeds documents
  3. Embeddings stored in ChromaDB
  4. User asks questions via Chat UI
  5. Chat API retrieves relevant chunks
  6. Ollama generates contextual responses
  7. Streaming response displayed in real-time

πŸš€ Quick Start

Resource Requirements

Prerequisites

  • OpenShift 4.14+ cluster with cluster-admin access
  • oc CLI configured and authenticated
  • 16GB+ available memory across cluster
  • 50GB+ available storage

One-Command Deployment

# Clone repository
git clone https://github.com/devenes/ocp-rag-stack.git
cd ocp-rag-stack

# Deploy entire stack
make deploy

# Wait for all pods to be ready (5-10 minutes for model downloads)
oc get pods -n ocp-rag-stack -w

# Seed with example runbooks
make seed

# Get frontend URL
make demo

Manual Deployment

# Create namespace and RBAC
oc apply -f deploy/namespace.yaml
oc apply -f deploy/rbac/

# Deploy infrastructure (Ollama + ChromaDB)
oc apply -f deploy/ollama/
oc apply -f deploy/chromadb/

# Wait for Ollama models to download
oc wait --for=condition=complete job/ollama-init-models -n ocp-rag-stack --timeout=600s

# Build and deploy application services
make build-images
oc apply -f deploy/ingestion-api/
oc apply -f deploy/chat-api/

# Deploy frontend
oc apply -f deploy/frontend/

# Verify deployment
oc get pods -n ocp-rag-stack

πŸ“– Usage

Resource Requirements

Access the UI

# Get frontend URL
FRONTEND_URL=$(oc get route frontend -n ocp-rag-stack -o jsonpath='{.spec.host}')
echo "Frontend: https://$FRONTEND_URL"

# Open in browser
open "https://$FRONTEND_URL"

Upload Documents

  1. Click the upload area or drag-and-drop files
  2. Supported formats: .txt, .md
  3. Documents are automatically chunked and indexed
  4. View indexed documents in the sidebar

Ask Questions

  1. Type your question in the chat input
  2. Press Enter or click Send
  3. Watch the AI response stream in real-time
  4. View source citations below each response

Example Questions

"How do I restart the payment service?"
"What should I do if a node shows NotReady status?"
"How do I check etcd cluster health?"
"What are the steps for incident response?"

πŸ› οΈ Development

Build Locally

# Build Go binaries
make build

# Run tests
make test

# Build container images
make build-images

# Push to OpenShift internal registry
make push-images

Run Locally (Development)

# Start ChromaDB (requires Docker)
docker run -d -p 8000:8000 chromadb/chroma:0.5.23

# Start Ollama (requires Ollama installed)
ollama serve &
ollama pull nomic-embed-text
ollama pull qwen2.5:1.5b

# Run ingestion API
VECTOR_STORE=chromahttp \
CHROMA_URL=http://localhost:8000 \
OLLAMA_URL=http://localhost:11434 \
go run cmd/ingestion/main.go

# Run chat API (in another terminal)
VECTOR_STORE=chromahttp \
CHROMA_URL=http://localhost:8000 \
OLLAMA_URL=http://localhost:11434 \
go run cmd/chat/main.go

# Open frontend/index.html in browser

πŸ”§ Configuration

Environment Variables

Ingestion API:

PORT=8081                           # API port (default: 8081)
VECTOR_STORE=chromahttp             # Vector store backend (chromem|chromahttp)
CHROMA_URL=http://chromadb:8000     # ChromaDB URL
OLLAMA_URL=http://ollama:11434      # Ollama URL
EMBED_MODEL=nomic-embed-text        # Embedding model
CHUNKING_STRATEGY=markdown          # Chunking strategy (fixed|sentence|markdown)
CHUNK_SIZE=512                      # Chunk size in tokens
CHUNK_OVERLAP=50                    # Chunk overlap in tokens
LOG_LEVEL=info                      # Log level (debug|info|warn|error)

Chat API:

PORT=8082                           # API port (default: 8082)
VECTOR_STORE=chromahttp             # Vector store backend
CHROMA_URL=http://chromadb:8000     # ChromaDB URL
OLLAMA_URL=http://ollama:11434      # Ollama URL
EMBED_MODEL=nomic-embed-text        # Embedding model
CHAT_MODEL=qwen2.5:1.5b             # Chat model
TOP_K=5                             # Number of chunks to retrieve
LOG_LEVEL=info                      # Log level

Resource Requirements Example

Component CPU Request CPU Limit Memory Request Memory Limit Storage
Ollama 2 cores 4 cores 4Gi 6Gi 10Gi
ChromaDB 500m 1 core 1Gi 2Gi 10Gi
Ingestion API 200m 500m 256Mi 512Mi -
Chat API 200m 500m 256Mi 512Mi -
Frontend 50m 200m 64Mi 128Mi -

Total Cluster Requirements:

  • CPU: ~3.5 cores (requests), ~6.5 cores (limits)
  • Memory: ~6Gi (requests), ~10Gi (limits)
  • Storage: ~20Gi persistent volumes

πŸ“Š Small Models

Embedding Model: nomic-embed-text

  • Size: 274MB
  • Dimensions: 768
  • Context: 8192 tokens
  • Performance: ~100 embeddings/sec on CPU
  • Use Case: Document and query embeddings

Chat Model: qwen2.5:1.5b

  • Size: 1.1GB
  • Parameters: 1.5 billion
  • Context: 32K tokens
  • Performance: 15-25 tokens/sec on CPU
  • Use Case: Conversational responses

Both models are optimized for CPU inference and automatically downloaded during deployment.

πŸ” Monitoring

Check Component Health

# All pods
oc get pods -n ocp-rag-stack

# Specific component logs
make logs-ollama
make logs-chromadb
make logs-ingestion
make logs-chat
make logs-frontend

# Port forward for debugging
make port-forward-ollama      # localhost:11434
make port-forward-chromadb    # localhost:8000
make port-forward-ingestion   # localhost:8081
make port-forward-chat        # localhost:8082

Health Endpoints

# Ingestion API
curl http://ingestion-api:8081/health

# Chat API
curl http://chat-api:8082/health

# Ollama
curl http://ollama:11434/api/tags

# ChromaDB
curl http://chromadb:8000/api/v1/heartbeat

πŸ› Troubleshooting

Pods Not Starting

# Check pod status
oc describe pod <pod-name> -n ocp-rag-stack

# Check events
oc get events -n ocp-rag-stack --sort-by='.lastTimestamp'

# Check logs
oc logs <pod-name> -n ocp-rag-stack

Model Download Timeout

# Check init job status
oc get job ollama-init-models -n ocp-rag-stack

# Check job logs
oc logs job/ollama-init-models -n ocp-rag-stack

# Manually trigger model download
oc exec -n ocp-rag-stack deployment/ollama -- ollama pull nomic-embed-text
oc exec -n ocp-rag-stack deployment/ollama -- ollama pull qwen2.5:1.5b

ChromaDB Connection Issues

# Check ChromaDB pod
oc get pod -n ocp-rag-stack -l app=chromadb

# Test connectivity from ingestion pod
oc exec -n ocp-rag-stack deployment/ingestion-api -- curl -v http://chromadb:8000/api/v1/heartbeat

# Check network policies
oc get networkpolicy -n ocp-rag-stack

Frontend Not Loading

# Check route
oc get route frontend -n ocp-rag-stack

# Check ConfigMap
oc get configmap frontend-html -n ocp-rag-stack

# Restart frontend
oc rollout restart deployment/frontend -n ocp-rag-stack

πŸ§ͺ Testing

Run Unit Tests

# All tests
make test

# Specific package
go test ./internal/chunking/... -v

# With coverage
go test ./... -coverprofile=coverage.out
go tool cover -html=coverage.out

Integration Testing

# Deploy to test namespace
oc new-project ocp-rag-stack-test
make deploy NAMESPACE=ocp-rag-stack-test

# Run integration tests
make test-integration

# Cleanup
oc delete project ocp-rag-stack-test

πŸ“š API Reference

Ingestion API

POST /api/v1/ingest/text

curl -X POST http://ingestion-api:8081/api/v1/ingest/text \
  -F "file=@runbook.md"

Response:
{
  "message": "Document ingested successfully",
  "chunks_created": 42,
  "document_id": "runbook.md"
}

GET /health

curl http://ingestion-api:8081/health

Response:
{
  "status": "healthy",
  "vector_store": "connected",
  "ollama": "connected"
}

Chat API

POST /api/v1/chat/stream (SSE)

curl -X POST http://chat-api:8082/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I restart a pod?", "stream": true}'

Response (SSE):
data: {"content": "To restart"}
data: {"content": " a pod"}
data: {"content": ", use the"}
data: {"sources": [...]}
data: [DONE]

POST /api/v1/chat (Synchronous)

curl -X POST http://chat-api:8082/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I restart a pod?", "stream": false}'

Response:
{
  "response": "To restart a pod, use the oc delete pod command...",
  "sources": [
    {
      "content": "...",
      "metadata": {"source": "runbook.md"},
      "similarity": 0.89
    }
  ]
}

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments


Built with ❀️ for Platform Engineers

About

A complete Retrieval-Augmented Generation (RAG) stack designed to run entirely on OpenShift Container Platform without GPU requirements or external API dependencies. Built as a proof-of-value pilot for platform engineers to query internal documentation using natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors