# Vector Store Notebook

This notebook handles vector store operations for the RAG system.

## Purpose
This notebook demonstrates how to manage the vector database, which is crucial for efficient retrieval. It covers:

1.  **Initialization**: Setting up the vector store using FAISS or ChromaDB.
2.  **Indexing**: Adding embeddings along with their metadata to the index.
3.  **Search**: Performing semantic similarity searches to find relevant documents for a given query.
4.  **Persistence**: Saving and loading the vector index to/from disk.

## Usage
Import vector store functions from `src.rag.vector_store` and use this notebook to manage your vector database.


In [None]:
import sys
import os
from pathlib import Path
import numpy as np

# Add project root to path
project_root = Path("..").resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from src.cirq_rag_code_assistant.config import get_config, setup_logging
from src.rag.vector_store import VectorStore

# Setup logging
setup_logging()

### Initialize Vector Store
We will initialize the VectorStore with a specific embedding dimension. We'll use FAISS for this example.

In [None]:
# Configuration
EMBEDDING_DIM = 384  # Example dimension (e.g., for all-MiniLM-L6-v2)
VECTOR_DB_TYPE = "faiss"
INDEX_PATH = project_root / "outputs" / "vector_store_test"

# Initialize
vector_store = VectorStore(
    embedding_dim=EMBEDDING_DIM,
    vector_db_type=VECTOR_DB_TYPE,
    index_path=INDEX_PATH,
    use_gpu=False  # Set to True if you have a GPU and faiss-gpu installed
)

print(f"Vector Store initialized: {vector_store.vector_db_type}")

### Add Embeddings
Let's add some dummy embeddings and metadata to the store.

In [None]:
# Generate dummy data
num_items = 10
embeddings = np.random.rand(num_items, EMBEDDING_DIM).astype('float32')

# Normalize embeddings (simulating what an embedding model would do)
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = embeddings / norms

ids = [f"doc_{i}" for i in range(num_items)]
metadatas = [
    {"type": "code", "language": "python", "topic": f"topic_{i%3}"}
    for i in range(num_items)
]

# Add to store
vector_store.add(embeddings, ids, metadatas)

print(f"Added {num_items} items. Total size: {vector_store.size()}")

### Search
Now we can perform a similarity search.

In [None]:
# Generate a query vector
query_embedding = np.random.rand(1, EMBEDDING_DIM).astype('float32')
query_norm = np.linalg.norm(query_embedding)
query_embedding = query_embedding / query_norm

# Search
k = 3
results = vector_store.search(query_embedding, k=k)

print(f"Top {k} results:")
for res in results[0]:
    print(f"ID: {res['id']}, Score: {res['score']:.4f}, Metadata: {res['metadata']}")

### Filtering
We can also filter results by metadata.

In [None]:
# Filter by topic
filter_dict = {"topic": "topic_0"}
results_filtered = vector_store.search(query_embedding, k=k, filter_dict=filter_dict)

print(f"Top {k} results with filter {filter_dict}:")
for res in results_filtered[0]:
    print(f"ID: {res['id']}, Score: {res['score']:.4f}, Metadata: {res['metadata']}")

### Save and Load
Finally, let's save the index to disk and load it back.

In [None]:
# Save
vector_store.save()
print(f"Vector store saved to {INDEX_PATH}")

# Load new instance
new_vector_store = VectorStore(
    embedding_dim=EMBEDDING_DIM,
    vector_db_type=VECTOR_DB_TYPE,
    index_path=INDEX_PATH
)
new_vector_store.load()

print(f"Loaded vector store. Size: {new_vector_store.size()}")