A fully working Vector Database built from scratch in C++ with a web UI.
Implements HNSW, KD-Tree, and Brute Force search algorithms side-by-side, plus a RAG pipeline powered by a local LLM via Ollama.
Built as an educational project to show how production vector databases like Pinecone, Weaviate, and Chroma actually work under the hood.
| Feature | Description |
|---|---|
| 3 Search Algorithms | HNSW (production-grade), KD-Tree, Brute Force — run all three and compare speed |
| 3 Distance Metrics | Cosine similarity, Euclidean distance, Manhattan distance |
| 16D Demo Vectors | 20 pre-loaded semantic vectors across 4 categories (CS, Math, Food, Sports) |
| 2D PCA Scatter Plot | Live visualization of semantic space — watch clusters form |
| Real Document Embedding | Paste any text → Ollama embeds it with nomic-embed-text (768D) |
| RAG Pipeline | Ask questions about your documents → HNSW retrieves context → local LLM answers |
| Full REST API | CRUD endpoints: insert, delete, search, benchmark, hnsw-info |
Your Text
│
▼
Ollama (nomic-embed-text) ← converts text to a 768-dimensional vector
│
▼
HNSW Index (C++) ← indexes the vector in a multilayer graph
│
▼
Semantic Search ← finds nearest neighbors in vector space
│
▼
Ollama (llama3.2) ← reads retrieved chunks, generates an answer
│
▼
Answer
HNSW (Hierarchical Navigable Small World) is the same algorithm used by Pinecone, Weaviate, Chroma, and Milvus. It builds a multilayer graph where each layer is progressively sparser — searches start at the top layer and zoom in, achieving O(log N) complexity instead of O(N) for brute force.
You need 3 things installed on your Windows laptop:
- MSYS2 (gives you g++ compiler)
- Git
- Ollama (runs the local AI models)
- Go to https://www.msys2.org and download the installer
- Run the installer, keep default path (
C:\msys64) - After install, open MSYS2 UCRT64 from Start Menu (the orange icon)
- Run these commands inside the MSYS2 terminal:
pacman -Syu(Close and reopen the terminal if it asks you to)
pacman -S mingw-w64-ucrt-x86_64-gcc- Add g++ to your Windows PATH:
- Press
Win + R, typesysdm.cpl, press Enter - Click Advanced → Environment Variables
- Under System variables, find Path, click Edit
- Click New and add:
C:\msys64\ucrt64\bin - Click OK on all windows
- Open a new PowerShell and verify:
You should see something likeg++ --versiong++ (GCC) 15.x.x - Press
- Go to https://git-scm.com/download/win and download Git for Windows
- Run the installer with default settings
- Verify in PowerShell:
git --version
- Go to https://ollama.com and click Download for Windows
- Run the installer
- Ollama starts automatically in the system tray
- Open PowerShell and pull the two required models:
ollama pull nomic-embed-text(~274 MB — this is the embedding model)
ollama pull llama3.2(~2 GB — this is the language model)
- Verify Ollama is running:
ollama listYou should see both models listed.
Minimum specs for Ollama: 8GB RAM recommended. The models will use ~3GB total.
Open PowerShell and run:
git clone https://github.com/YOUR_USERNAME/VectorDB.git
cd VectorDB(Replace YOUR_USERNAME with the actual GitHub username)
Inside the VectorDB folder, run:
g++ -std=c++17 -O2 main.cpp -o db -lws2_32This produces db.exe. It takes about 10–20 seconds.
Troubleshooting:
g++: command not found→ MSYS2 not in PATH, redo Step 1 point 5undefined reference to WSA...→ missing-lws2_32flag, add it- Takes too long? Remove
-O2for faster (but slower executable) compile
Terminal 1 — Start Ollama (if not already running):
ollama serve(If Ollama is already in the system tray, skip this)
Terminal 2 — Start the VectorDB server:
./dbYou should see:
=== VectorDB Engine ===
http://localhost:8080
20 demo vectors | 16 dims | HNSW+KD-Tree+BruteForce
Ollama: ONLINE
embed model: nomic-embed-text gen model: llama3.2
Open your browser and go to:
http://localhost:8080
- Type any concept in the search box:
binary tree,sushi,basketball,calculus - Choose your algorithm: HNSW, KD-Tree, or Brute Force
- Choose distance metric: Cosine, Euclidean, or Manhattan
- Click ⚡ SEARCH — results appear with distances, the matching point glows on the scatter plot
- Click ▶ COMPARE ALL ALGOS to run all 3 algorithms and compare their speed
The scatter plot shows all 20 vectors projected to 2D using PCA. Notice how the 4 semantic categories (CS, Math, Food, Sports) form distinct clusters — this is what "semantic similarity" looks like visually.
This uses Ollama to generate real 768-dimensional embeddings from any text.
- Type a title (e.g.,
Operating Systems Notes) - Paste any text — lecture notes, textbook paragraphs, Wikipedia articles
- Click ⚡ EMBED & INSERT
- Long documents are automatically split into overlapping 250-word chunks
- Each chunk gets its own embedding and is stored in a separate HNSW index
- Make sure you have inserted some documents in Tab 2 first
- Type a question about your documents
- Click 🤖 ASK AI
What happens behind the scenes:
1. Your question → embedded with nomic-embed-text (768D vector)
2. HNSW search → finds 3 most semantically similar chunks
3. Retrieved chunks → sent as context to llama3.2
4. llama3.2 → generates an answer based only on your documents
The answer streams in with a typewriter effect. Click the context chips to see exactly which chunks the AI used.
The server exposes a full REST API at http://localhost:8080.
| Method | Endpoint | Description |
|---|---|---|
GET |
/search?v=f1,f2,...&k=5&metric=cosine&algo=hnsw |
K-NN search |
POST |
/insert |
Insert a demo vector |
DELETE |
/delete/:id |
Delete by ID |
GET |
/items |
List all demo vectors |
GET |
/benchmark?v=...&k=5&metric=cosine |
Compare all 3 algorithms |
GET |
/hnsw-info |
HNSW graph structure and layer stats |
GET |
/stats |
Database statistics |
| Method | Endpoint | Body | Description |
|---|---|---|---|
POST |
/doc/insert |
{"title":"...","text":"..."} |
Embed and store document |
GET |
/doc/list |
— | List all stored documents |
DELETE |
/doc/delete/:id |
— | Delete document chunk |
POST |
/doc/ask |
{"question":"...","k":3} |
RAG: retrieve + generate |
GET |
/status |
— | Ollama status and model info |
curl "http://localhost:8080/search?v=0.9,0.8,0.7,0.6,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1&k=3&metric=cosine&algo=hnsw"curl -X POST http://localhost:8080/doc/ask `
-H "Content-Type: application/json" `
-d '{"question":"What is dynamic programming?","k":3}'VectorDB/
├── main.cpp ← C++ backend (HNSW, KD-Tree, BruteForce, REST API, RAG)
├── httplib.h ← Single-header HTTP server library (cpp-httplib)
├── index.html ← Frontend (PCA scatter plot, chat UI, benchmark)
└── README.md ← This file
BruteForce O(N·d) Exact, baseline
KDTree O(log N) Exact, axis-aligned partitioning
HNSW O(log N) Approximate, multilayer small-world graph
VectorDB Unified interface over all 3 (16D demo vectors)
DocumentDB HNSW-only index for real Ollama embeddings (768D)
OllamaClient HTTP client → /api/embeddings + /api/generate
Nodes are inserted into a multilayer graph. Each node randomly gets assigned a maximum layer. Layer 0 has all nodes with many connections; higher layers have fewer nodes (exponentially fewer) with longer-range connections.
Insert: Start at the top layer, greedily find the nearest node, drop a layer, repeat. At each layer from your assigned max down to 0, run a beam search (ef_construction=200) and connect to the M nearest neighbors bidirectionally.
Search: Same greedy descent from top layer. At layer 0, expand to ef nearest candidates using a priority queue.
Why it's fast: The upper layers act like a highway — you quickly get to the right neighborhood, then zoom in at layer 0.
Binary space partitioning. Each node splits space along one dimension (cycling through all dimensions). Search prunes entire subtrees when the closest possible point in that subtree can't beat the current best — the "ball within hyperslab" check.
Weakness: Degrades with high dimensions (curse of dimensionality). Works well for ≤20D, becomes close to brute force at 768D.
KD-Tree pruning relies on axis-aligned distance bounds. In high dimensions, almost all the space is near the boundary of the hypersphere — no subtrees get pruned. HNSW's graph-based approach doesn't have this problem.
| Problem | Fix |
|---|---|
Ollama: OFFLINE in header |
Run ollama serve in a terminal |
| Embedding takes forever | Ollama is downloading the model on first use, wait 2 min |
g++: command not found |
Add C:\msys64\ucrt64\bin to Windows PATH |
| Port 8080 already in use | Kill the process: netstat -ano | findstr 8080 then taskkill /PID <pid> /F |
| LLM answer is slow | Normal — llama3.2 takes 10–30s on a laptop CPU. Use llama3.2:1b for faster answers |
If llama3.2 is too slow on your laptop, switch to the 1B model:
ollama pull llama3.2:1bThen edit main.cpp line where genModel is set:
std::string genModel = "llama3.2:1b"; // change thisRecompile and restart.
MIT — use this however you want.
