Skip to content

carsonpo/haystackdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HaystackDB

Minimal but performant Vector DB

Features

  • Binary embeddings by default (soon int8 reranking)
  • JSON filtering for queries
  • Scalable, distributed architecture for use with multi replica deployments
  • Durable (WAL), persistent data, mem mapped for fast access in the client

Benchmarks

On a MacBook with an M2, 1024 dimension, binary quantized.

FAISS is using a flat index, so brute force, but it's in memory. Haystack is storing the data on disk, and also brute forces.

TLDR is Haystack is ~10x faster despite being stored on disk.

100,000 Vectors
Haystack — 3.44ms
FAISS    — 29.67ms

500,000 Vectors
Haystack — 11.98ms
FAISS    - 146.50ms

1,000,000 Vectors
Haystack — 22.65ms
FAISS    — 293.91ms

Roadmap

  • Quickstart Guide
  • Quality benchmarks (this is in progress)
  • Int8 reranking
  • Better queries with more than simple equality (this is done now)
  • Full text search
  • Better insertion performance with batch B+Tree insertion (could probably be further improved, but good for now)
  • Point in time backups/rollback
    • currently this is destructive (ie you cannot return forward after you go backwards), so a nondestructive version is next on the todo list.
  • Cursor based pagination
  • Schema migrations
  • Vector Kmeans clustering with centroid similarity for improved search perf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published