VecDB - A Mini Vector Database

A lightweight vector database built in Go, inspired by Pinecone. Features HNSW-based approximate nearest neighbor search, metadata filtering, disk persistence, and OpenAI embedding integration.

Features

HNSW Index - O(log n) approximate nearest neighbor search
Multiple Distance Metrics - Cosine similarity (default), Euclidean, Dot product
Metadata Filtering - MongoDB-style query operators
Disk Persistence - Binary format with automatic index serialization
OpenAI Integration - Text-to-embedding with text-embedding-3-small
REST API - Full CRUD and query endpoints
CLI - Command-line interface for all operations

Quick Start

1. Setup

# Clone and build
git clone <repo-url>
cd Go-Vector-Database
go build -o vecdb ./cmd/vecdb

# Configure OpenAI API key (required for text embeddings)
echo "OPENAI_API_KEY=your-api-key-here" > .env

2. Start the Server

./vecdb serve
# Server starts on http://localhost:8080

3. Insert Vectors

# Insert with text (auto-embeds via OpenAI)
curl -X POST http://localhost:8080/vectors/text \
  -H "Content-Type: application/json" \
  -d '{
    "id": "doc1",
    "text": "Machine learning is a subset of artificial intelligence",
    "metadata": {"category": "tech", "author": "john"}
  }'

# Insert with raw vector values
curl -X POST http://localhost:8080/vectors \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": [
      {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"type": "embedding"}}
    ]
  }'

4. Query Similar Vectors

# Query with text
curl -X POST http://localhost:8080/query/text \
  -H "Content-Type: application/json" \
  -d '{
    "text": "AI and deep learning",
    "top_k": 5,
    "include_metadata": true
  }'

# Query with filters
curl -X POST http://localhost:8080/query/text \
  -H "Content-Type: application/json" \
  -d '{
    "text": "programming languages",
    "top_k": 10,
    "filter": {"category": "tech"},
    "include_metadata": true
  }'

5. Save to Disk

curl -X POST http://localhost:8080/save

API Reference

Endpoint	Method	Description
`/health`	GET	Health check and stats
`/vectors`	POST	Upsert vectors (batch)
`/vectors/text`	POST	Upsert single vector with text
`/vectors/{id}`	GET	Get vector by ID
`/vectors/{id}`	DELETE	Delete vector
`/query`	POST	Query with vector values
`/query/text`	POST	Query with text
`/stats`	GET	Database statistics
`/save`	POST	Persist to disk
`/load`	POST	Load from disk
`/clear`	POST	Clear all vectors

CLI Commands

# Start HTTP server
./vecdb serve [-port 8080] [-host 0.0.0.0]

# Insert vector with text
./vecdb insert -id doc1 -text "Hello world" -metadata '{"key": "value"}'

# Insert vector with values
./vecdb insert -id vec1 -values "0.1,0.2,0.3,..."

# Query with text
./vecdb query -text "search query" -k 5

# Query with values
./vecdb query -values "0.1,0.2,0.3,..." -k 5

# Show statistics
./vecdb stats

# Show version
./vecdb version

Metadata Filtering

Supports MongoDB-style operators:

// Comparison
{"field": {"$eq": "value"}}
{"field": {"$ne": "value"}}
{"field": {"$gt": 10}}
{"field": {"$gte": 10}}
{"field": {"$lt": 20}}
{"field": {"$lte": 20}}

// Array
{"field": {"$in": ["a", "b", "c"]}}
{"field": {"$nin": ["x", "y"]}}

// Logical
{"$and": [{"field1": "a"}, {"field2": "b"}]}
{"$or": [{"field1": "a"}, {"field2": "b"}]}

// String
{"field": {"$contains": "substring"}}
{"field": {"$startswith": "prefix"}}
{"field": {"$endswith": "suffix"}}

// Existence
{"field": {"$exists": true}}

Configuration

Environment variables (or .env file):

Variable	Default	Description
`OPENAI_API_KEY`	-	OpenAI API key for embeddings
`OPENAI_MODEL`	text-embedding-3-small	Embedding model
`SERVER_PORT`	8080	HTTP server port
`SERVER_HOST`	0.0.0.0	HTTP server host
`DATA_DIR`	./data	Data directory
`DATABASE_FILE`	vecdb.dat	Database filename
`HNSW_M`	16	HNSW connections per node
`HNSW_EF_CONSTRUCTION`	200	HNSW construction parameter
`HNSW_EF_SEARCH`	50	HNSW search parameter

Project Structure

.
├── cmd/vecdb/main.go        # CLI entry point
├── pkg/
│   ├── api/                 # REST API server
│   ├── config/              # Configuration
│   ├── database/            # Database orchestrator
│   ├── embedding/           # OpenAI client
│   ├── index/               # HNSW algorithm
│   └── storage/             # Vector storage & persistence
├── test/                    # Tests
├── go.mod
└── .env

How It Works

Embedding: Text is converted to 1536-dimensional vectors via OpenAI's embedding API
Indexing: Vectors are inserted into an HNSW (Hierarchical Navigable Small World) graph
Search: Queries traverse the graph to find approximate nearest neighbors in O(log n)
Filtering: Metadata filters are applied post-search to refine results
Persistence: Binary format stores vectors + serialized HNSW graph

Running Tests

go test ./... -v

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pkg		pkg
test		test
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VecDB - A Mini Vector Database

Features

Quick Start

1. Setup

2. Start the Server

3. Insert Vectors

4. Query Similar Vectors

5. Save to Disk

API Reference

CLI Commands

Metadata Filtering

Configuration

Project Structure

How It Works

Running Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VecDB - A Mini Vector Database

Features

Quick Start

1. Setup

2. Start the Server

3. Insert Vectors

4. Query Similar Vectors

5. Save to Disk

API Reference

CLI Commands

Metadata Filtering

Configuration

Project Structure

How It Works

Running Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages