Skip to content

NorthCommits/Go-Vector-Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VecDB - A Mini Vector Database

A lightweight vector database built in Go, inspired by Pinecone. Features HNSW-based approximate nearest neighbor search, metadata filtering, disk persistence, and OpenAI embedding integration.

Features

  • HNSW Index - O(log n) approximate nearest neighbor search
  • Multiple Distance Metrics - Cosine similarity (default), Euclidean, Dot product
  • Metadata Filtering - MongoDB-style query operators
  • Disk Persistence - Binary format with automatic index serialization
  • OpenAI Integration - Text-to-embedding with text-embedding-3-small
  • REST API - Full CRUD and query endpoints
  • CLI - Command-line interface for all operations

Quick Start

1. Setup

# Clone and build
git clone <repo-url>
cd Go-Vector-Database
go build -o vecdb ./cmd/vecdb

# Configure OpenAI API key (required for text embeddings)
echo "OPENAI_API_KEY=your-api-key-here" > .env

2. Start the Server

./vecdb serve
# Server starts on http://localhost:8080

3. Insert Vectors

# Insert with text (auto-embeds via OpenAI)
curl -X POST http://localhost:8080/vectors/text \
  -H "Content-Type: application/json" \
  -d '{
    "id": "doc1",
    "text": "Machine learning is a subset of artificial intelligence",
    "metadata": {"category": "tech", "author": "john"}
  }'

# Insert with raw vector values
curl -X POST http://localhost:8080/vectors \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": [
      {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"type": "embedding"}}
    ]
  }'

4. Query Similar Vectors

# Query with text
curl -X POST http://localhost:8080/query/text \
  -H "Content-Type: application/json" \
  -d '{
    "text": "AI and deep learning",
    "top_k": 5,
    "include_metadata": true
  }'

# Query with filters
curl -X POST http://localhost:8080/query/text \
  -H "Content-Type: application/json" \
  -d '{
    "text": "programming languages",
    "top_k": 10,
    "filter": {"category": "tech"},
    "include_metadata": true
  }'

5. Save to Disk

curl -X POST http://localhost:8080/save

API Reference

Endpoint Method Description
/health GET Health check and stats
/vectors POST Upsert vectors (batch)
/vectors/text POST Upsert single vector with text
/vectors/{id} GET Get vector by ID
/vectors/{id} DELETE Delete vector
/query POST Query with vector values
/query/text POST Query with text
/stats GET Database statistics
/save POST Persist to disk
/load POST Load from disk
/clear POST Clear all vectors

CLI Commands

# Start HTTP server
./vecdb serve [-port 8080] [-host 0.0.0.0]

# Insert vector with text
./vecdb insert -id doc1 -text "Hello world" -metadata '{"key": "value"}'

# Insert vector with values
./vecdb insert -id vec1 -values "0.1,0.2,0.3,..."

# Query with text
./vecdb query -text "search query" -k 5

# Query with values
./vecdb query -values "0.1,0.2,0.3,..." -k 5

# Show statistics
./vecdb stats

# Show version
./vecdb version

Metadata Filtering

Supports MongoDB-style operators:

// Comparison
{"field": {"$eq": "value"}}
{"field": {"$ne": "value"}}
{"field": {"$gt": 10}}
{"field": {"$gte": 10}}
{"field": {"$lt": 20}}
{"field": {"$lte": 20}}

// Array
{"field": {"$in": ["a", "b", "c"]}}
{"field": {"$nin": ["x", "y"]}}

// Logical
{"$and": [{"field1": "a"}, {"field2": "b"}]}
{"$or": [{"field1": "a"}, {"field2": "b"}]}

// String
{"field": {"$contains": "substring"}}
{"field": {"$startswith": "prefix"}}
{"field": {"$endswith": "suffix"}}

// Existence
{"field": {"$exists": true}}

Configuration

Environment variables (or .env file):

Variable Default Description
OPENAI_API_KEY - OpenAI API key for embeddings
OPENAI_MODEL text-embedding-3-small Embedding model
SERVER_PORT 8080 HTTP server port
SERVER_HOST 0.0.0.0 HTTP server host
DATA_DIR ./data Data directory
DATABASE_FILE vecdb.dat Database filename
HNSW_M 16 HNSW connections per node
HNSW_EF_CONSTRUCTION 200 HNSW construction parameter
HNSW_EF_SEARCH 50 HNSW search parameter

Project Structure

.
├── cmd/vecdb/main.go        # CLI entry point
├── pkg/
│   ├── api/                 # REST API server
│   ├── config/              # Configuration
│   ├── database/            # Database orchestrator
│   ├── embedding/           # OpenAI client
│   ├── index/               # HNSW algorithm
│   └── storage/             # Vector storage & persistence
├── test/                    # Tests
├── go.mod
└── .env

How It Works

  1. Embedding: Text is converted to 1536-dimensional vectors via OpenAI's embedding API
  2. Indexing: Vectors are inserted into an HNSW (Hierarchical Navigable Small World) graph
  3. Search: Queries traverse the graph to find approximate nearest neighbors in O(log n)
  4. Filtering: Metadata filters are applied post-search to refine results
  5. Persistence: Binary format stores vectors + serialized HNSW graph

Running Tests

go test ./... -v

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages