Skip to content

Raphasha27/Go-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Go-RAG-System

Production-Grade Retrieval Augmented Generation Architecture in Golang

Go PostgreSQL OpenAI License Status


🧬 Project Overview

The Go-RAG-System is a complete, production-ready implementation of a Retrieval-Augmented Generation (RAG) pipeline, written entirely in Go. It combines the raw retrieval power of a PostgreSQL + pgvector vector database with the language generation capabilities of OpenAI GPT-4o and an advanced 17-pattern Agentic Architecture.

Unlike toy implementations, this system is designed to handle real-world complexity:

  • It doesn't just generate text β€” it reasons, plans, executes tools, and self-corrects.
  • It doesn't just store data β€” it indexes, retrieves, and augments with semantic similarity search using 1536-dimensional embeddings.
  • It doesn't just run one AI call β€” it orchestrates multi-agent pipelines with full observability, security authorization, and human approval gates.

πŸ—οΈ System Architecture

The RAG workflow runs through 6 distinct phases before a response is returned:

 User Query
     β”‚
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 1: Embedding                 β”‚
β”‚  text-embedding-3-small (1536d)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 2: Vector DB Retrieval       β”‚
β”‚  PostgreSQL + pgvector              β”‚
β”‚  IVFFlat Cosine Similarity Search   β”‚
β”‚  β†’ Returns Top K Document Chunks    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 3: Prompt Augmentation       β”‚
β”‚  Retrieved chunks + Original query  β”‚
β”‚  β†’ Enriched context prompt          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 4: LLM Generation            β”‚
β”‚  GPT-4o processes enriched prompt   β”‚
β”‚  β†’ Generates precise answer         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 5: Tool Calling (if needed)  β”‚
β”‚  LLM triggers Web Search / DB Query β”‚
β”‚  β†’ External data fetched & merged   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 6: Agentic Post-Processing   β”‚
β”‚  Observability β†’ Evaluation         β”‚
β”‚  β†’ Reflection β†’ Final Response      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ—„οΈ Database Schema (PostgreSQL + pgvector)

Before running the application, ensure your PostgreSQL instance has pgvector installed and execute the following initialization SQL:

-- Step 1: Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Step 2: Create the documents table to store text chunks and their embeddings
CREATE TABLE documents (
    id          SERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536),       -- 1536 dimensions matches text-embedding-3-small
    metadata    JSONB,              -- Supports hybrid filtering (e.g., source, date)
    created_at  TIMESTAMP DEFAULT NOW()
);

-- Step 3: Create an IVFFlat index for fast approximate nearest-neighbor search
-- 'lists' controls the number of cluster centroids (tune based on dataset size)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Optional: HNSW index for higher recall at slightly more memory cost
-- CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Index Selection Guide

Index Type Recall Speed Memory Best For
IVFFlat ~95% Very Fast Low Datasets < 1M rows
HNSW ~99% Fast Higher High-precision requirements
No Index (exact) 100% Slow Minimal Dev / < 100k rows

Why PostgreSQL + pgvector?

  • ACID Compliance: Your document store and relational data live in one reliable, transactional database β€” no eventual consistency issues.
  • Hybrid Search: Combine vector similarity with SQL WHERE clauses. Example: "Find semantically similar documents, but only from the last 30 days, tagged as compliance".
  • Familiar Tooling: Use standard PostgreSQL tooling (pgAdmin, psql, pg_dump) for backups, monitoring, and schema migrations.
  • No Additional Infrastructure: Eliminates the need for a separate dedicated vector database (Pinecone, Weaviate, etc.) reducing cost and ops overhead.

πŸ“ Repository Structure

Go-RAG-System/
β”‚
β”œβ”€β”€ main.go                 # Application entry point β€” wires all components
β”œβ”€β”€ security.go             # Agent Security & Authorization (RBAC checks)
β”œβ”€β”€ observability.go        # TraceID generation, structured logging, metrics
β”œβ”€β”€ evaluation.go           # Response quality & safety scoring
β”œβ”€β”€ agent_logic.go          # Reflection & Self-Correction loop
β”œβ”€β”€ react_planner.go        # ReAct (Reason + Act) Planning Framework
β”œβ”€β”€ human_in_the_loop.go    # HITL Approval Gates
β”œβ”€β”€ memory.go               # Short-term & Long-term Agent Memory
β”œβ”€β”€ single_agent.go         # Perceive β†’ Reason β†’ Act β†’ Learn loop
β”œβ”€β”€ db_query_agent.go       # NL2SQL β€” Natural Language to SQL
β”œβ”€β”€ web_search_agent.go     # Autonomous Web Research (SerpAPI)
β”œβ”€β”€ ml_models.go            # Dynamic AI Model Selection Engine
β”œβ”€β”€ multi_agent.go          # Multi-Agent Coordinator (Planner + Executor)
β”œβ”€β”€ tool_caller.go          # Tool & External API Calling Interface
β”œβ”€β”€ rest_api.go             # REST API CRUD + Bulk Operations mapping
β”œβ”€β”€ goal_based_agent.go     # BFS Goal-Based Pathfinding Agent
β”œβ”€β”€ concurrency.go          # Goroutine-based Concurrent Task Execution
β”œβ”€β”€ custom_errors.go        # AgentHallucinationError & structured exceptions
β”‚
β”œβ”€β”€ go.mod                  # Go module definition
β”œβ”€β”€ go.sum                  # Dependency lock file
β”œβ”€β”€ .env.example            # Environment variable template
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── daily-contribution-sync.yml  # Automated daily maintenance
└── README.md

πŸ€– The 17-Pattern Agentic Architecture

This repository implements every major pattern from modern agentic AI research and engineering:

Core Intelligence Patterns

1. Agent Observability (observability.go) Every agent execution generates a pseudo-UUID TraceID. Structured log events are emitted at every stage β€” tool call attempts, LLM responses, evaluation scores β€” providing a complete, reproducible audit trail.

2. Agent Evaluation (evaluation.go) After each response, scores are computed across four dimensions:

  • Goal Achievement: Did the agent actually answer the question?
  • Task Success Rate: How many sub-steps completed without error?
  • Efficiency: Response latency relative to complexity.
  • Safety Score: Did the response stay within ethical bounds?

3. Reflection & Self-Correction (agent_logic.go) When evaluation fails, instead of raising an error, the agent enters a self-correction loop. It analyzes why it failed (e.g., "Irrelevant search results", "Hallucination detected"), updates its strategy ("Add metadata filter", "Use a different tool"), and retries.

// The agent catches its own failure and corrects:
// Attempt 1: Failed β€” "Irrelevant results"
// Correction: "Narrow query to last 7 days using metadata filter"
// Attempt 2: Success

Planning & Reasoning Patterns

4. ReAct Framework (react_planner.go) The agent actively reasons in a loop: Thought β†’ Action β†’ Observation β†’ Thought. Each iteration brings it closer to the goal by incorporating real-world feedback from tool executions.

5. Agent Planning / Task Decomposition (react_planner.go) High-level user goals (e.g., "Prepare a Q1 compliance report") are autonomously broken into executable sub-tasks before any action is taken, preventing cascading failures from ambiguous prompts.

6. Goal-Based Agent (goal_based_agent.go) Uses BFS (Breadth-First Search) pathfinding to evaluate multiple action paths and select the most optimal sequence to reach a desired end state β€” critical for multi-step workflows.

Safety & Control Patterns

7. Agent Security & Authorization (security.go) Enforces Principle of Least Privilege at the agent layer. Every tool call is checked against an RBAC table before execution.

// Role: "agent" β†’ Action: "execute" β†’ APPROVED βœ…
// Role: "agent" β†’ Action: "delete"  β†’ DENIED ❌ (fallback triggered)

8. Human-in-the-Loop (human_in_the_loop.go) For destructive or irreversible actions, the agent halts and routes to a human approval gate. In production, this triggers a webhook notification to an admin interface and awaits a response.

Data & Tool Patterns

9. Agent Memory (memory.go) Dual-layer memory:

  • Short-Term: An in-memory key-value store cleared at session end.
  • Long-Term: PostgreSQL-backed persistent memory for cross-session recall.

10. Tool Calling (tool_caller.go) Exposes a structured JSON interface enabling the LLM to invoke real-world tools β€” Calculators, Weather APIs, Custom REST endpoints β€” extending its capabilities far beyond its static training data.

11. Database Query Agent / NL2SQL (db_query_agent.go) Translates natural language questions into optimized SQL queries, executes them against the PostgreSQL store, and synthesizes a human-readable response from the result set.

12. Web Search Agent (web_search_agent.go) Connects to SerpAPI to retrieve real-time web results. Parses the JSON payload, extracts the top results, and synthesizes a grounded, citation-backed response.

Scalability Patterns

13. Single Agent Workflow (single_agent.go) A formalized Perceive β†’ Reason β†’ Act β†’ Learn loop β€” the fundamental building block for all agentic systems.

14. Multi-Agent System (multi_agent.go) A Planner Agent decomposes a complex goal and dispatches sub-tasks to specialized Executor Agents running concurrently via goroutines.

15. Concurrent Task Execution (concurrency.go) Leverages native Go goroutines and sync.WaitGroup to execute multiple agent tasks in parallel β€” no blocking, no deadlocks.

16. Dynamic AI Model Selection (ml_models.go) Selects the appropriate algorithmic approach (NLP, Neural Network, Linear Regression) at runtime based on the semantic classification of the task.

17. Custom Exceptions (custom_errors.go) Implements AgentHallucinationError and structured exception types that carry rich context (trace ID, tool name, expected vs. actual values) enabling precise error recovery.


πŸš€ Getting Started

Prerequisites

Dependency Version Notes
Go 1.22+ Install from go.dev
PostgreSQL 15+ With pgvector extension installed
OpenAI API Key β€” Billing account required

Installation

1. Clone the repository:

git clone https://github.com/Raphasha27/Go-RAG-System.git
cd Go-RAG-System

2. Install Go dependencies:

go mod tidy

3. Configure environment variables:

cp .env.example .env

Edit .env:

DATABASE_URL="postgres://user:password@localhost:5432/ragdb?sslmode=disable"
OPENAI_API_KEY="sk-your-openai-api-key"

4. Initialize the database:

psql -U your_user -d ragdb -f schema.sql

5. Run the application:

go run main.go

Expected output:

[TraceID: a1b2c3d4] RAG System Starting...
[TraceID: a1b2c3d4] Connecting to PostgreSQL...
[TraceID: a1b2c3d4] Embedding query: "What is our refund policy?" (1536 dims)
[TraceID: a1b2c3d4] Retrieved 3 context chunks from pgvector
[TraceID: a1b2c3d4] πŸ”§ Tool Call Triggered: web_search
[TraceID: a1b2c3d4] Agent Evaluation: Quality=95 Safety=100 βœ… PASSED

πŸ›‘οΈ Security Best Practices

The system implements the following security controls at the agent layer:

Control Implementation File
Privilege Limitation RBAC β€” agents cannot execute delete or write security.go
Audit Logging Every tool call logged with TraceID observability.go
Hallucination Detection AgentHallucinationError thrown + caught custom_errors.go
Human Override Gate High-risk actions paused for approval human_in_the_loop.go
Input Validation All queries validated before vectorization main.go

πŸ§ͺ Testing

# Run all tests
go test ./...

# Run with verbose output
go test ./... -v

# Run with race condition detector (recommended before production)
go test ./... -race

πŸ”§ Troubleshooting

Issue: pgvector extension not found

Fix: Ensure the extension is installed in your PostgreSQL instance:

CREATE EXTENSION IF NOT EXISTS vector;
-- If this fails, install via: sudo apt install postgresql-15-pgvector

Issue: Goroutine leak detected under -race

Cause: A concurrent task may be holding a resource without releasing it. Fix: Review concurrency.go. Ensure all goroutines are wrapped with a defer wg.Done() call.

Issue: OpenAI 429 Rate Limit Error

Fix: Add exponential backoff retry logic around the client.chat.completions.create() call. The current implementation will return nil on failure β€” a future PR will add automatic retry.

Issue: IVFFlat index returning poor results

Cause: The lists parameter may be too small for your dataset size. Fix: PostgreSQL recommends lists = sqrt(rows). For 1M rows, use lists = 1000.


πŸ—ΊοΈ Roadmap

  • Add a REST HTTP server (using net/http or gin) to expose as a microservice
  • Implement HNSW index as a configuration option alongside IVFFlat
  • Add streaming response support (Server-Sent Events)
  • Integrate Weaviate as an alternative vector backend
  • Build a CLI tool for document ingestion and index management
  • Add Prometheus metrics endpoint for production monitoring

πŸ’Ό Real-World Use Cases

  • Chat with your documents / PDFs: Index company policy documents, technical manuals, or legal contracts and query them in plain English.
  • Enterprise Knowledge Base: Build a Confluence / SharePoint replacement that actually understands what you're asking.
  • Customer Support Bots: Ground AI responses in your actual product documentation to eliminate hallucinations.
  • Research Assistants: Index academic papers and ask cross-paper synthesis questions.
  • Automated Compliance Reporting: Query regulatory documents and generate gap analysis reports.

🧠 Why RAG Over Fine-Tuning?

Aspect Fine-Tuning RAG
Cost Thousands of dollars per run One-time embedding cost
Freshness Stale after training cutoff Real-time via DB updates
Explainability Black box Traceable to source chunks
Domain Adaptation Requires large labeled dataset Works with any documents
Deployment New model version per update Index update only

🀝 Contributing

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feat/your-feature
  3. Commit your changes with a conventional commit message: git commit -m 'feat: add HNSW index support'
  4. Push to the branch: git push origin feat/your-feature
  5. Open a Pull Request describing the change and its motivation.

πŸ“œ License

MIT License. See LICENSE for details.


Part of the Future AGI ecosystem β€” built by Koketso Raphasha (Raphasha27). Kirov Dynamics Technology | Building the Infrastructure of Autonomous Systems.

About

Production-grade RAG architecture in Go featuring pgvector, OpenAI, and a 17-pattern agentic stack.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages