# Vector Stores in LangChain

* A **vector store** is a system designed to store and retrieve data represented as **numerical vectors**.

### Key Features

1. **Storage** – Ensures that vectors and their associated metadata are retained, whether in-memory for quick lookups or on-disk for durability and large-scale use.
2. **Similarity Search** – Helps retrieve the vectors most similar to a query vector.
3. **Indexing** – Provide a data structure or method that enables fast similarity searches on high-dimensional vectors (e.g., approximate nearest neighbor lookups).
4. **CRUD Operations** – Manage the lifecycle of data—adding new vectors, reading them, updating existing entries, removing outdated vectors.

### Use-cases

1. Semantic Search
2. RAG
3. Recommender Systems
4. Image/Multimedia Search
### What are Vector Stores?

A **vector store** is a database or storage system built to store and retrieve high-dimensional vectors (embeddings) efficiently. In LangChain and other LLM frameworks, vector stores are used to:

- **Store Embeddings:** Save the vector representations (embeddings) of documents, text chunks, or other data.
- **Metadata Association:** Store metadata (source, author, etc.) alongside each embedding for richer search/filtering.
- **Similarity Search:** At query time, convert the user’s query into an embedding and quickly find stored vectors that are most similar (“closest”) to it.

### Why Vector Stores?

Vector stores are essential in modern AI and LLM (Large Language Model) applications because they enable fast, scalable, and accurate semantic search and retrieval of information. Here’s why they matter:

- **Semantic Search:** Unlike traditional keyword search, vector stores retrieve data based on meaning and context using high-dimensional embeddings. This allows you to find the most relevant documents or text passages for a query, even if no keywords overlap.
- **Retrieval-Augmented Generation (RAG):** Vector stores are core to RAG pipelines, allowing LLMs to access external knowledge, documents, or datasets at runtime.
- **Performance & Scalability:** They are optimized for efficient similarity search on large datasets, making them suitable for enterprise and production workloads.
- **Improved LLM Output:** By retrieving relevant context for a user’s question, vector stores help LLMs generate more accurate and contextually relevant responses.
- **Versatile Applications:** Used in chatbots, document Q&A, semantic search engines, knowledge management, and more.

### How Vector Stores Work in LangChain

1. **Embedding Creation:** Text or document chunks are converted into vector embeddings using an embedding model (e.g., OpenAI, HuggingFace, Cohere).
2. **Storing Embeddings:** Embeddings and their associated metadata (e.g., original text, source) are stored in a vector store.
3. **Similarity Search:** At query time, the query is embedded and compared (via cosine similarity, dot product, etc.) against stored vectors to retrieve the most relevant documents.

### Popular Vector Stores Supported by LangChain

- **FAISS:** Fast, open-source library for efficient similarity search.
- **Chroma:** Lightweight, simple, and fully open-source vector DB.
- **Pinecone:** Fully managed, scalable vector database in the cloud.
- **Weaviate:** Open-source vector database with hybrid search and knowledge graph features.
- **Qdrant:** Open-source, production-ready vector search engine.
- **Milvus:** High-performance, enterprise-grade vector database.
- **Elasticsearch/OpenSearch:** Supports vector search as a plugin/extension.
- **Redis:** Can be used as a vector store with appropriate configuration.

#### Example Workflow

#### 1. Create Embeddings and Store in a Vector DB

```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Example: Create embeddings for a list of texts
texts = ["Hello World", "How are you?", "LangChain is awesome!"]
embeddings = OpenAIEmbeddings()

# Store in FAISS vector store
vectorstore = FAISS.from_texts(texts, embeddings)
```

#### 2. Perform Similarity Search

```python
# Query for similar documents
results = vectorstore.similarity_search("Hello there!", k=2)
for doc in results:
    print(doc.page_content)
```

#### 3. Persist and Reload Vector Stores

Some vector stores (e.g., FAISS, Chroma) allow saving to disk and reloading for future queries.

```python
vectorstore.save_local("faiss_index")
# Later...
vectorstore = FAISS.load_local("faiss_index", embeddings)
```

---

### When to Use Vector Stores

- Whenever you want to retrieve relevant chunks of text based on semantic similarity.
- For building RAG pipelines, document search, chatbots, FAQ bots, or knowledge bases.
- When working with large collections of text or documents.

### References

- [FAISS GitHub](https://github.com/facebookresearch/faiss)
- [Chroma Documentation](https://docs.trychroma.com/)
- [Pinecone Documentation](https://docs.pinecone.io/)

**Summary:**  
Vector stores in LangChain are the backbone for efficient, scalable, and accurate semantic search and retrieval workflows, powering advanced LLM applications by connecting queries to the most relevant information.

# Vector Store Vs Vector Database

### Vector Store

- Typically refers to a lightweight library or service that focuses on storing vectors (embeddings) and performing similarity search.
- May not include many traditional database features like transactions, rich query languages, or role-based access control.
- Ideal for prototyping, smaller-scale applications.
- Examples: FAISS (where you store vectors and can query them by similarity, but you handle persistence and scaling separately).

### Vector Database

- A full-fledged database system designed to store and query vectors.
- Offers additional **"database-like"** features:
  - Distributed architecture for horizontal scaling.
  - Durability and persistence (replication, backup/restore).
  - Metadata handling (schemas, filters).
  - Potential for ACID or near-ACID guarantees.
  - Authentication/authorization and more advanced security.
- Geared for production environments with significant scaling, large datasets.
- Examples: Milvus, Qdrant, Weaviate.

A vector database is effectively a vector store with extra database features (e.g., clustering, scaling, security, metadata filtering, and durability).

# Vector Stores in LangChain


- **Supported Stores:** LangChain integrates with multiple vector stores (FAISS, Pinecone, Chroma, Qdrant, Weaviate, etc.), giving you flexibility in scale, features, and deployment.
- **Common Interface:** A uniform Vector Store API lets you swap out one backend (e.g., FAISS) for another (e.g., Pinecone) with minimal code changes.
- **Metadata Handling:** Most vector stores in LangChain allow you to attach metadata (e.g., timestamps, authors) to each document, enabling filter-based retrieval.

#### Example methods
- `from_documents(...)` or `from_texts(...)`
- `add_documents(...)` or `add_texts(...)`
- `similarity_search(query, k=...)`

#### Metadata-Based Filtering

## Chroma Vector Store

Chroma is a lightweight, open-source vector database that is especially friendly for local development and small- to medium-scale production needs.

#### Chroma Tenancy and DB Hierarchy

- At the top is the **Tenant**.
- Each tenant has multiple **Databases**.
- Each database contains multiple **Collections**.
- Each collection contains multiple **Docs**.