Skip to content

Pasindusameera/Vector_Databases

Repository files navigation

Vector Databases

Overview

A vector database is a type of database designed to store and query vector embeddings efficiently. Vector embeddings are numerical representations of data, such as text, images, or other types of information, transformed into a high-dimensional space by an embedding model. These databases are particularly useful for tasks involving similarity search, clustering, and other operations on high-dimensional data.

Key Concepts

  • Embedding Model: A machine learning model that transforms raw data (e.g., text, images) into vector embeddings. These vectors capture the semantic meaning of the data in a numerical form that can be processed by the vector database.

  • Vector Embeddings: High-dimensional vectors that represent the semantic meaning of the input data. These embeddings enable efficient similarity searches, as similar items will have similar vectors.

  • Indexing: The process of organizing vector embeddings in a way that allows for efficient querying. Various indexing techniques (e.g., KD-trees, HNSW) can be used to speed up search operations.

Diagram

Vector Database

The diagram illustrates the workflow of how data is processed and stored in a vector database:

  • Raw Text Inputs: Various texts (Text 1 to Text 5) are fed into the system.

  • Embedding Model: Each text is processed by an embedding model, which converts the text into a vector embedding.

  • Vector Embeddings: The embeddings for each text are displayed as lists of numerical values (e.g., [4.6, 6.1, 9.1, 7.2, ...]).

  • Indexing: Each vector embedding is indexed, making it easier to search and retrieve similar vectors.

  • Vector Database: The indexed vectors are stored in a vector database. This database allows for efficient querying and retrieval of vectors based on similarity.

About

Pinecone,chromadb vectordatabase

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors