Skip to content

Vectorstore Databases overview

Praveen Kumar Anwla edited this page Feb 21, 2024 · 1 revision

Q1: Please share list of different vector Databases.

Ans: 1. Weaviate:

  1. Extreme scalability(hundreds of billions of vectors). Hence, suitable for large orgs.

2. Qdrant:

  1. Uses Rust programming langugae under the hood.
  2. Hybrid search not yet available- will come soon.
  3. Lesser resource utilization

3. Milvus/Zilliz:

  1. Supports large streaming data. Hence, Qdrant/ Weaviate would be cheaper and faster for static or small data.

Hosting options: Serverless/embedded, self-hosted(client-server), and a cloud-native distributed SaaS Solution.

4. LanceDB:

  1. Uses Lance, a new, faster columnar format than parquet, designed for very efficient scans)

5. Vespa:

  1. Its application is mostly built in Java, while the backend and the indexing layer are built in C++. Hence, harder to maintain over time - less developer-friendly. Most DBs nowadays are written only in one language such as Rust/GoLang, etc.

You can compare different vector DB for your use-case on the following parameters-

  1. Hosting options
  2. Indexing methods.
  3. Programming lang under hood.
  4. Scalability
  5. Resource utilization.

Q2: Please share a comparison of different vector Databases based on the above suggested parameters.

Ans: Let's compare the vector databases Chroma, Milvus, Weaviate, and FAISS based on the specified parameters for data science projects:

  1. Chroma:

    • Hosting Options: Chroma can be self-hosted or used as a managed service.
    • Indexing Methods: It employs Approximate Nearest Neighbor (ANN) search techniques for efficient similarity searches.
    • Programming Language Under the Hood: Chroma is implemented in Python.
    • Scalability: It offers good scalability, especially for large-scale vector datasets.
    • Resource Utilization: Chroma is resource-efficient and optimized for handling high-dimensional data.
  2. Milvus:

    • Hosting Options: Milvus provides both self-hosted and managed options.
    • Indexing Methods: Milvus uses specialized methods like hashing and graph-based searches for ANN.
    • Programming Language Under the Hood: Milvus is primarily written in C++, with Python and Java SDKs available.
    • Scalability: Milvus is highly scalable and suitable for large-scale data.
    • Resource Utilization: It efficiently manages resources, making it suitable for resource-intensive tasks.
  3. Weaviate:

    • Hosting Options: Weaviate can be self-hosted or used as a managed service.
    • Indexing Methods: It employs graph-based indexing for efficient similarity searches.
    • Programming Language Under the Hood: Weaviate is built using Golang.
    • Scalability: It is designed for scalability, especially in distributed environments.
    • Resource Utilization: Weaviate optimizes resource utilization for complex data processing.
  4. FAISS:

    • Hosting Options: FAISS can be self-hosted.
    • Indexing Methods: FAISS specializes in exact and approximate similarity search methods, including k-means, PCA, and IVF.
    • Programming Language Under the Hood: FAISS is implemented in C++, with Python bindings.
    • Scalability: FAISS is well-suited for large-scale data and high-dimensional embeddings.
    • Resource Utilization: It efficiently manages memory and computational resources.

Choosing the best vector database depends on your specific project requirements. Consider factors such as hosting preferences, existing vector embeddings, latency needs, and the complexity of your data. Each of these databases has its strengths, so evaluate them based on your use case and resources.

Additional resources to follow:

  1. Top vector Databases
  2. Top 15 Vector Databases for Data Science in 2024 - Analytics Vidhya
  3. An Honest Comparison of Open Source Vector Databases
  4. Top 15 Best Vector Databases for 2024 | Detailed List
  5. Best Vector Databases - 2024 Reviews & Comparison - SourceForge