Knowledge Base RAG (Retrieval-Augmented Generation) backend service built with Go, PostgreSQL pgvector, and Ollama. This service enables intelligent question-answering over company FAQ documents using vector embeddings and LLM generation.
- 📝 Document Ingestion: Upload and automatically embed documents into vector database
- 🔍 Semantic Search: Fast vector similarity search using HNSW index
- 🤖 RAG Query: Intelligent question-answering with context retrieval
- 🚀 High Performance: Clean architecture with Go for low latency
- 🐳 Docker Ready: Full docker-compose setup with all dependencies
- Backend: Go 1.24+
- Database: PostgreSQL 16 with pgvector extension
- Vector Index: HNSW (Hierarchical Navigable Small World)
- Embedding Model: nomic-embed-text (768 dimensions)
- LLM Model: llama3.2
- AI Runtime: Ollama
- HTTP Router: gorilla/mux
Client Request
↓
POST /api/query
↓
1. Generate embedding from query (Ollama: nomic-embed-text)
2. Vector similarity search (Postgres pgvector HNSW, cosine similarity)
3. Build context from top-K similar documents
4. Generate answer with LLM (Ollama: llama3.2)
↓
Return JSON response with answer + source documents
- Docker and Docker Compose
- Go 1.24+ (for local development)
- curl and jq (for testing)
# Start all services (Postgres, Ollama, API)
docker-compose up -d
# Check logs
docker-compose logs -f# Pull embedding model (nomic-embed-text)
docker exec -it arsys-ollama ollama pull nomic-embed-text
# Pull LLM model (llama3.2)
docker exec -it arsys-ollama ollama pull llama3.2
# Verify models are installed
docker exec -it arsys-ollama ollama list# Check health endpoint
curl http://localhost:8080/api/health | jq
# Expected output:
# {
# "status": "healthy",
# "services": {
# "database": "healthy",
# "ollama": "healthy"
# },
# "timestamp": "2026-02-17T..."
# }# Seed Indonesian FAQ data
bash scripts/seed_data.sh# Ask a question
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "Jam kerja perusahaan apa?",
"top_k": 5
}' | jq
# Run full test suite
bash scripts/test_api.shHealth check for all services.
Response:
{
"status": "healthy",
"services": {
"database": "healthy",
"ollama": "healthy"
},
"timestamp": "2026-02-17T10:30:00Z"
}Ingest a single document into the vector database.
Request:
{
"content": "Jam kerja perusahaan adalah Senin-Jumat 09:00-17:00 WIB",
"source": "hr_policy",
"metadata": {
"category": "working_hours",
"language": "id"
}
}Response:
{
"document_id": "123e4567-e89b-12d3-a456-426614174000",
"message": "Document ingested successfully"
}Ingest multiple documents at once.
Request:
{
"documents": [
{
"content": "Document 1 content...",
"source": "hr_policy",
"metadata": {"category": "leave"}
},
{
"content": "Document 2 content...",
"source": "it_support",
"metadata": {"category": "vpn"}
}
]
}Response:
{
"document_ids": ["uuid1", "uuid2"],
"count": 2,
"message": "Batch ingestion completed"
}Main RAG endpoint - query the knowledge base.
Request:
{
"query": "Bagaimana cara mengajukan cuti?",
"top_k": 5
}Response:
{
"answer": "Untuk mengajukan cuti, silakan login ke portal HR...",
"source_documents": [
{
"id": "uuid",
"content": "Untuk mengajukan cuti, silakan...",
"source": "hr_policy",
"similarity": 0.89,
"metadata": {"category": "leave"},
"created_at": "2026-02-17T..."
}
],
"processing_time_ms": 1234
}Configuration is managed through environment variables. Copy .env.example to .env and adjust as needed.
# Database
DB_HOST=localhost
DB_PORT=55432
DB_USER=arsys
DB_PASSWORD=arsys123
DB_NAME=arsys
# Ollama
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=llama3.2
# RAG Settings
RAG_TOP_K=5 # Number of similar documents to retrieve
RAG_SIMILARITY_THRESHOLD=0.7 # Minimum similarity score (0-1)
RAG_MAX_CONTEXT_LENGTH=2000 # Max characters for LLM context
# Server
SERVER_PORT=8080
LOG_LEVEL=info# Install dependencies
go mod download
# Run locally (without Docker)
go run ./cmd/api
# Build binary
make build
# Run binary
./bin/apiAI-Cad/
├── cmd/api/ # Application entry point
├── internal/
│ ├── api/ # HTTP layer
│ │ ├── handlers/ # Request handlers
│ │ ├── middleware/ # HTTP middleware
│ │ └── router.go # Route configuration
│ ├── client/ollama/ # Ollama API client
│ ├── config/ # Configuration management
│ ├── models/ # Data models
│ ├── repository/ # Database layer
│ └── service/ # Business logic (RAG flow)
├── pkg/logger/ # Logging utility
├── migrations/ # Database migrations
├── scripts/ # Helper scripts
├── docker-compose.yaml # Docker services
├── Dockerfile # API container build
└── Makefile # Build commands
make help # Show all available commands
make build # Build Go binary
make run # Run locally
make docker-up # Start all Docker services
make docker-down # Stop all Docker services
make docker-logs # Follow Docker logs
make models-pull # Pull Ollama models
make seed # Seed sample data
make test-api # Run API tests
make clean # Clean build artifactsCREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(768), -- nomic-embed-text dimension
metadata JSONB DEFAULT '{}'::jsonb,
source VARCHAR(255),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- HNSW index for fast vector similarity search
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Index Parameters:
m = 16: Connections per layer (balance between recall and memory)ef_construction = 64: Build quality (higher = better but slower)vector_cosine_ops: Cosine similarity (standard for text embeddings)
When you ingest a document:
- Content is sent to Ollama for embedding generation (nomic-embed-text)
- Document + embedding + metadata stored in PostgreSQL
- HNSW index automatically updated for fast search
When you query:
- Embedding: Query text → Ollama → 768-dimensional vector
- Search: Vector similarity search using HNSW index (cosine similarity)
- Retrieve: Top-K most similar documents above threshold
- Context: Build context string from retrieved documents
- Generate: Send context + query to LLM (llama3.2)
- Return: Answer + source documents + processing time
Using cosine similarity to find relevant documents:
- Score range: 0 to 1 (1 = identical, 0 = completely different)
- Default threshold: 0.7 (filters out low-relevance results)
- HNSW provides approximate nearest neighbor search (fast with good recall)
# Check service status
docker-compose ps
# View logs
docker-compose logs postgres
docker-compose logs ollama
docker-compose logs api
# Restart services
docker-compose restart# Pull models manually
docker exec -it arsys-ollama ollama pull nomic-embed-text
docker exec -it arsys-ollama ollama pull llama3.2
# Check if models are loaded
docker exec -it arsys-ollama ollama list# Check PostgreSQL is ready
docker exec -it arsys-postgres pg_isready -U arsys
# Verify pgvector extension
docker exec -it arsys-postgres psql -U arsys -d arsys -c "\dx"
# Check documents table
docker exec -it arsys-postgres psql -U arsys -d arsys -c "\d documents"# Check API logs
docker-compose logs -f api
# Verify health endpoint
curl http://localhost:8080/api/health
# Check if port is in use
netstat -an | grep 8080- First query is slow: Ollama loads models on first use (normal)
- All queries slow: Check Ollama container resources, consider GPU
- High similarity threshold: Lower
RAG_SIMILARITY_THRESHOLDto get more results - Too many documents: HNSW index handles millions efficiently, but check RAM
-
Adjust HNSW parameters for your data size:
- Small dataset (<10K docs):
m=8, ef_construction=32 - Medium (10K-100K):
m=16, ef_construction=64(default) - Large (>100K):
m=32, ef_construction=128
- Small dataset (<10K docs):
-
Tune RAG settings:
RAG_TOP_K: Higher = more context but slowerRAG_SIMILARITY_THRESHOLD: Lower = more resultsRAG_MAX_CONTEXT_LENGTH: Balance between context and speed
-
Ollama optimization:
- Use GPU for faster inference
- Keep models loaded (first query is slower)
- Adjust timeout for large prompts
MIT License - feel free to use this project for learning or production!
Contributions are welcome! Please feel free to submit a Pull Request.
For issues or questions, please open an issue on GitHub.
Built with ❤️ using Go, PostgreSQL pgvector, and Ollama