# Reindexer

[Reindexer](https://github.com/Restream/reindexer) is an embeddable, in-memory, document-oriented database with a high-level Query builder interface.

This notebook shows how to use the Reindexer vector store with LangChain.

## Installation and Setup

```python
%pip install -qU pyreindexer langchain-reindexer langchain_huggingface sentence_transformers
```

## Basic Usage

In [1]:
from langchain_reindexer import ReindexerVectorStore
from langchain_core.embeddings import FakeEmbeddings
from langchain_core.documents import Document

Initialize embeddings (FakeEmbeddings doesn't require API keys)

For production, use OpenAIEmbeddings, HuggingFaceEmbeddings, or other real embeddings

In [2]:
embeddings = FakeEmbeddings(size=1024)

Create vector store

In [3]:
vector_store = ReindexerVectorStore(
    embedding=embeddings,
    rx_connector_config={"dsn": "builtin://"},  # memory load
    rx_namespace="langchain_docs",
)

### OpenAI Embeddings (requires API key)

```python
from langchain_openai import OpenAIEmbeddings

# Requires OPENAI_API_KEY environment variable
openai_embeddings = OpenAIEmbeddings()

openai_vector_store = ReindexerVectorStore(
    embedding=openai_embeddings,
    rx_connector_config={"dsn": "builtin:///tmp/openai_db"},
)
```

## Connection Options

Reindexer supports different connection types:

- **In-memory builtin**: In-memory database (default)
  ```python
  rx_connector_config={"dsn": "builtin:///tmp/my_db"}
  ```

- **By-path builtin**: by-path database
  ```python
  rx_connector_config={"dsn": "builtin:///tmp/pyrx"}
  ```

- **By-auto-path builtin**: By-auto-path database  at the level of the running file
  ```python
  rx_connector_config={"dsn": "builtin:///"}
  ```

For more information, see [Reindexer Documentation](https://github.com/Restream/reindexer).

## Adding Documents

In [4]:
documents = [
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"topic": "framework", "year": 2023},
    ),
    Document(
        page_content="Reindexer is an embeddable, in-memory, document-oriented database.",
        metadata={"topic": "database", "year": 2024},
    ),
    Document(
        page_content="Vector stores enable semantic search over documents using embeddings.",
        metadata={"topic": "vector_search", "year": 2024},
    ),
]

In [5]:
# Add documents to the vector store
ids = vector_store.add_documents(documents)
print(f"Added {len(ids)} documents with IDs: {ids}")

Added 3 documents with IDs: ['edc7651b-3260-4608-83ad-b3cbbc85a2c5', 'fe46e5c8-b240-4184-b829-aa0431a53a4e', '343cbb19-28d6-4492-99c0-599a1bc565f8']


## Similarity Search

In [6]:
# Search for similar documents
query = "What is LangChain?"
results = vector_store.similarity_search(query, k=2)

for i, doc in enumerate(results, 1):
    print(f"\nResult {i}:")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")


Result 1:
Content: LangChain is a framework for developing applications powered by language models.
Metadata: {'topic': 'framework', 'year': 2023}

Result 2:
Content: Reindexer is an embeddable, in-memory, document-oriented database.
Metadata: {'topic': 'database', 'year': 2024}


## Similarity Search with Scores

In [7]:
# Search with similarity scores
results_with_scores = vector_store.similarity_search_with_score(query, k=2)

for doc, score in results_with_scores:
    print(f"\nScore: {score:.4f}")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")


Score: 0.0157
Content: Vector stores enable semantic search over documents using embeddings.
Metadata: {'topic': 'vector_search', 'year': 2024}

Score: 0.0086
Content: Reindexer is an embeddable, in-memory, document-oriented database.
Metadata: {'topic': 'database', 'year': 2024}


## Search with Metadata Filtering

In [8]:
# Search with metadata filter
filtered_results = vector_store.similarity_search(
    query, k=2, filter={"topic": "database"}
)

for doc in filtered_results:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")

Content: Reindexer is an embeddable, in-memory, document-oriented database.
Metadata: {'topic': 'database', 'year': 2024}



## Maximal Marginal Relevance (MMR) Search

In [9]:
# MMR search for diverse results
mmr_results = vector_store.max_marginal_relevance_search(
    query="framework", k=2, fetch_k=3, lambda_mult=0.5
)

for doc in mmr_results:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")

Content: LangChain is a framework for developing applications powered by language models.
Metadata: {'topic': 'framework', 'year': 2023}

Content: Vector stores enable semantic search over documents using embeddings.
Metadata: {'topic': 'vector_search', 'year': 2024}



## Using as a Retriever

In [10]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 2})

# Use the retriever
docs = retriever.invoke("What is a vector store?")
for doc in docs:
    print(f"Content: {doc.page_content}\n")

Content: Reindexer is an embeddable, in-memory, document-oriented database.

Content: LangChain is a framework for developing applications powered by language models.



## Advanced Configuration

In [11]:
# Create vector store with custom HNSW parameters
custom_vector_store = ReindexerVectorStore(
    embedding=FakeEmbeddings(size=1024),  # Use same embedding size
    m=32,  # More connections for better recall
    ef_construction=400,  # Larger candidate list during construction
    multithreading=1,  # Enable multithreaded index construction (0 disables)
    rx_connector_config={"dsn": "builtin:///tmp/custom_db"},
    rx_namespace="custom_namespace",
)

## Save and Load Configuration

- `builtin://` saves data only in memory; use `save_local()` to export to `reindexer_memory_dump.json` file and `load_local()` to restore.

- `builtin:///` automatically creates a directory in the script run directory (`reindexer_storage/<namespace>`) and allows Reindexer to manage files itself.

- You can specify a custom path after `builtin:///absolute/path` to load data prepared in another environment.

In [12]:
# Save vector store configuration
vector_store.save_local("./reindexer_config")

In [13]:
# Load vector store configuration
loaded_store = ReindexerVectorStore.load_local(
    "./reindexer_config", embedding=FakeEmbeddings(size=1024)
)

In [14]:
# Search for similar documents
query = "What is LangChain?"
results = loaded_store.similarity_search(query, k=2)

for i, doc in enumerate(results, 1):
    print(f"\nResult {i}:")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")


Result 1:
Content: Reindexer is an embeddable, in-memory, document-oriented database.
Metadata: {'topic': 'database', 'year': 2024}

Result 2:
Content: Vector stores enable semantic search over documents using embeddings.
Metadata: {'topic': 'vector_search', 'year': 2024}


## Using with Real Embeddings

For production use, you can use real embedding models. Here are some examples:

### HuggingFace Embeddings

```python
# Install HuggingFace embeddings: %pip install langchain-huggingface sentence-transformers

from langchain_huggingface import HuggingFaceEmbeddings

# Use HuggingFace embeddings (works locally, no API key required)
hf_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

hf_vector_store = ReindexerVectorStore(
    embedding=hf_embeddings,
    rx_connector_config={"dsn": "builtin:///tmp/hf_db"},
)

# Add documents
hf_vector_store.add_documents(documents)

# Search
results = hf_vector_store.similarity_search("What is LangChain?", k=1)
print(results[0].page_content)
```