AI-Cad RAG Backend Service

Knowledge Base RAG (Retrieval-Augmented Generation) backend service built with Go, PostgreSQL pgvector, and Ollama. This service enables intelligent question-answering over company FAQ documents using vector embeddings and LLM generation.

Features

📝 Document Ingestion: Upload and automatically embed documents into vector database
🔍 Semantic Search: Fast vector similarity search using HNSW index
🤖 RAG Query: Intelligent question-answering with context retrieval
🚀 High Performance: Clean architecture with Go for low latency
🐳 Docker Ready: Full docker-compose setup with all dependencies

Tech Stack

Backend: Go 1.24+
Database: PostgreSQL 16 with pgvector extension
Vector Index: HNSW (Hierarchical Navigable Small World)
Embedding Model: nomic-embed-text (768 dimensions)
LLM Model: llama3.2
AI Runtime: Ollama
HTTP Router: gorilla/mux

Architecture

Client Request
    ↓
POST /api/query
    ↓
1. Generate embedding from query (Ollama: nomic-embed-text)
2. Vector similarity search (Postgres pgvector HNSW, cosine similarity)
3. Build context from top-K similar documents
4. Generate answer with LLM (Ollama: llama3.2)
    ↓
Return JSON response with answer + source documents

Quick Start

Prerequisites

Docker and Docker Compose
Go 1.24+ (for local development)
curl and jq (for testing)

1. Start Services

# Start all services (Postgres, Ollama, API)
docker-compose up -d

# Check logs
docker-compose logs -f

2. Pull Ollama Models

# Pull embedding model (nomic-embed-text)
docker exec -it arsys-ollama ollama pull nomic-embed-text

# Pull LLM model (llama3.2)
docker exec -it arsys-ollama ollama pull llama3.2

# Verify models are installed
docker exec -it arsys-ollama ollama list

3. Verify Services

# Check health endpoint
curl http://localhost:8080/api/health | jq

# Expected output:
# {
#   "status": "healthy",
#   "services": {
#     "database": "healthy",
#     "ollama": "healthy"
#   },
#   "timestamp": "2026-02-17T..."
# }

4. Seed Sample Data

# Seed Indonesian FAQ data
bash scripts/seed_data.sh

5. Test RAG Query

# Ask a question
curl -X POST http://localhost:8080/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Jam kerja perusahaan apa?",
    "top_k": 5
  }' | jq

# Run full test suite
bash scripts/test_api.sh

API Endpoints

GET /api/health

Health check for all services.

Response:

{
  "status": "healthy",
  "services": {
    "database": "healthy",
    "ollama": "healthy"
  },
  "timestamp": "2026-02-17T10:30:00Z"
}

POST /api/ingest

Ingest a single document into the vector database.

Request:

{
  "content": "Jam kerja perusahaan adalah Senin-Jumat 09:00-17:00 WIB",
  "source": "hr_policy",
  "metadata": {
    "category": "working_hours",
    "language": "id"
  }
}

Response:

{
  "document_id": "123e4567-e89b-12d3-a456-426614174000",
  "message": "Document ingested successfully"
}

POST /api/ingest/batch

Ingest multiple documents at once.

Request:

{
  "documents": [
    {
      "content": "Document 1 content...",
      "source": "hr_policy",
      "metadata": {"category": "leave"}
    },
    {
      "content": "Document 2 content...",
      "source": "it_support",
      "metadata": {"category": "vpn"}
    }
  ]
}

Response:

{
  "document_ids": ["uuid1", "uuid2"],
  "count": 2,
  "message": "Batch ingestion completed"
}

POST /api/query

Main RAG endpoint - query the knowledge base.

Request:

{
  "query": "Bagaimana cara mengajukan cuti?",
  "top_k": 5
}

Response:

{
  "answer": "Untuk mengajukan cuti, silakan login ke portal HR...",
  "source_documents": [
    {
      "id": "uuid",
      "content": "Untuk mengajukan cuti, silakan...",
      "source": "hr_policy",
      "similarity": 0.89,
      "metadata": {"category": "leave"},
      "created_at": "2026-02-17T..."
    }
  ],
  "processing_time_ms": 1234
}

Configuration

Configuration is managed through environment variables. Copy .env.example to .env and adjust as needed.

Key Configuration Options

# Database
DB_HOST=localhost
DB_PORT=55432
DB_USER=arsys
DB_PASSWORD=arsys123
DB_NAME=arsys

# Ollama
OLLAMA_BASE_URL=http://localhost:11434
EMBEDDING_MODEL=nomic-embed-text
LLM_MODEL=llama3.2

# RAG Settings
RAG_TOP_K=5                      # Number of similar documents to retrieve
RAG_SIMILARITY_THRESHOLD=0.7     # Minimum similarity score (0-1)
RAG_MAX_CONTEXT_LENGTH=2000      # Max characters for LLM context

# Server
SERVER_PORT=8080
LOG_LEVEL=info

Development

Local Development

# Install dependencies
go mod download

# Run locally (without Docker)
go run ./cmd/api

# Build binary
make build

# Run binary
./bin/api

Project Structure

AI-Cad/
├── cmd/api/                    # Application entry point
├── internal/
│   ├── api/                    # HTTP layer
│   │   ├── handlers/          # Request handlers
│   │   ├── middleware/        # HTTP middleware
│   │   └── router.go          # Route configuration
│   ├── client/ollama/         # Ollama API client
│   ├── config/                # Configuration management
│   ├── models/                # Data models
│   ├── repository/            # Database layer
│   └── service/               # Business logic (RAG flow)
├── pkg/logger/                # Logging utility
├── migrations/                # Database migrations
├── scripts/                   # Helper scripts
├── docker-compose.yaml        # Docker services
├── Dockerfile                 # API container build
└── Makefile                   # Build commands

Available Make Commands

make help           # Show all available commands
make build          # Build Go binary
make run            # Run locally
make docker-up      # Start all Docker services
make docker-down    # Stop all Docker services
make docker-logs    # Follow Docker logs
make models-pull    # Pull Ollama models
make seed           # Seed sample data
make test-api       # Run API tests
make clean          # Clean build artifacts

Database Schema

Documents Table

CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    embedding vector(768),              -- nomic-embed-text dimension
    metadata JSONB DEFAULT '{}'::jsonb,
    source VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- HNSW index for fast vector similarity search
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Index Parameters:

m = 16: Connections per layer (balance between recall and memory)
ef_construction = 64: Build quality (higher = better but slower)
vector_cosine_ops: Cosine similarity (standard for text embeddings)

How It Works

1. Document Ingestion

When you ingest a document:

Content is sent to Ollama for embedding generation (nomic-embed-text)
Document + embedding + metadata stored in PostgreSQL
HNSW index automatically updated for fast search

2. RAG Query Flow

When you query:

Embedding: Query text → Ollama → 768-dimensional vector
Search: Vector similarity search using HNSW index (cosine similarity)
Retrieve: Top-K most similar documents above threshold
Context: Build context string from retrieved documents
Generate: Send context + query to LLM (llama3.2)
Return: Answer + source documents + processing time

3. Vector Similarity

Using cosine similarity to find relevant documents:

Score range: 0 to 1 (1 = identical, 0 = completely different)
Default threshold: 0.7 (filters out low-relevance results)
HNSW provides approximate nearest neighbor search (fast with good recall)

Troubleshooting

Services Won't Start

# Check service status
docker-compose ps

# View logs
docker-compose logs postgres
docker-compose logs ollama
docker-compose logs api

# Restart services
docker-compose restart

Ollama Models Not Found

# Pull models manually
docker exec -it arsys-ollama ollama pull nomic-embed-text
docker exec -it arsys-ollama ollama pull llama3.2

# Check if models are loaded
docker exec -it arsys-ollama ollama list

Database Connection Issues

# Check PostgreSQL is ready
docker exec -it arsys-postgres pg_isready -U arsys

# Verify pgvector extension
docker exec -it arsys-postgres psql -U arsys -d arsys -c "\dx"

# Check documents table
docker exec -it arsys-postgres psql -U arsys -d arsys -c "\d documents"

API Not Responding

# Check API logs
docker-compose logs -f api

# Verify health endpoint
curl http://localhost:8080/api/health

# Check if port is in use
netstat -an | grep 8080

Slow Query Response

First query is slow: Ollama loads models on first use (normal)
All queries slow: Check Ollama container resources, consider GPU
High similarity threshold: Lower RAG_SIMILARITY_THRESHOLD to get more results
Too many documents: HNSW index handles millions efficiently, but check RAM

Performance Tips

Adjust HNSW parameters for your data size:
- Small dataset (<10K docs): m=8, ef_construction=32
- Medium (10K-100K): m=16, ef_construction=64 (default)
- Large (>100K): m=32, ef_construction=128
Tune RAG settings:
- RAG_TOP_K: Higher = more context but slower
- RAG_SIMILARITY_THRESHOLD: Lower = more results
- RAG_MAX_CONTEXT_LENGTH: Balance between context and speed
Ollama optimization:
- Use GPU for faster inference
- Keep models loaded (first query is slower)
- Adjust timeout for large prompts

License

MIT License - feel free to use this project for learning or production!

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues or questions, please open an issue on GitHub.

Built with ❤️ using Go, PostgreSQL pgvector, and Ollama

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
cmd/api		cmd/api
internal		internal
migrations		migrations
pkg/logger		pkg/logger
sample-app		sample-app
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
go.mod		go.mod
go.sum		go.sum
loki-config.yaml		loki-config.yaml
promtail-config.yaml		promtail-config.yaml

Folders and files

Latest commit

History

Repository files navigation

AI-Cad RAG Backend Service

Features

Tech Stack

Architecture

Quick Start

Prerequisites

1. Start Services

2. Pull Ollama Models

3. Verify Services

4. Seed Sample Data

5. Test RAG Query

API Endpoints

GET /api/health

POST /api/ingest

POST /api/ingest/batch

POST /api/query

Configuration

Key Configuration Options

Development

Local Development

Project Structure

Available Make Commands

Database Schema

Documents Table

How It Works

1. Document Ingestion

2. RAG Query Flow

3. Vector Similarity

Troubleshooting

Services Won't Start

Ollama Models Not Found

Database Connection Issues

API Not Responding

Slow Query Response

Performance Tips

License

Contributing

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages