In [None]:
# Vector Store Utility Functions
from typing import Any, List, Dict, Optional
from langchain.schema import Document
import numpy as np

class VectorStoreUtils:
    @staticmethod
    def batch_process_documents(
        documents: List[Document],
        batch_size: int = 100,
        callback=None
    ) -> List[List[Document]]:
        """Split documents into batches for efficient processing."""
        batches = [documents[i:i + batch_size] for i in range(0, len(documents), batch_size)]
        if callback:
            for i, batch in enumerate(batches):
                callback(i, len(batches))
        return batches
    
    @staticmethod
    def validate_embeddings(embeddings: List[List[float]], dim: int = 3072) -> bool:
        """Validate embedding dimensions and values."""
        if not embeddings:
            return False
        return all(len(emb) == dim for emb in embeddings)
    
    @staticmethod
    def calculate_similarity(vec1: List[float], vec2: List[float]) -> float:
        """Calculate cosine similarity between vectors."""
        vec1 = np.array(vec1)
        vec2 = np.array(vec2)
        return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
    
    @staticmethod
    def deduplicate_documents(
        documents: List[Document],
        threshold: float = 0.95,
        embedding_fn: Any = None
    ) -> List[Document]:
        """Remove near-duplicate documents based on embedding similarity."""
        if not embedding_fn:
            return documents
            
        embeddings = [embedding_fn(doc.page_content) for doc in documents]
        unique_docs = []
        used_indices = set()
        
        for i, emb1 in enumerate(embeddings):
            if i in used_indices:
                continue
                
            unique_docs.append(documents[i])
            used_indices.add(i)
            
            for j, emb2 in enumerate(embeddings[i+1:], i+1):
                if VectorStoreUtils.calculate_similarity(emb1, emb2) > threshold:
                    used_indices.add(j)
                    
        return unique_docs

# Example usage:
print("Vector Store Utilities loaded ✅")

# Comprehensive Vector Store Comparison Guide

## What are Vector Stores?
Vector stores are specialized databases designed to store and efficiently search through vector embeddings (numerical representations of text, images, or other data). They are crucial for:
- Semantic search applications
- Recommendation systems
- Large-scale document retrieval
- RAG (Retrieval Augmented Generation) systems

## Why Compare Different Vector Stores?
Different vector stores have distinct characteristics that make them suitable for different use cases:

1. **Pinecone**
   - Cloud-based, fully managed
   - Excellent scalability
   - Production-ready
   - Pay-as-you-go pricing

2. **FAISS**
   - Local, in-memory storage
   - High performance
   - Open source
   - Great for smaller datasets

3. **Milvus/Zilliz**
   - Hybrid (self-hosted or cloud)
   - High scalability
   - Complex query support
   - Good for large-scale deployments

4. **Weaviate**
   - Graph-based vector search
   - Schema-based organization
   - Multi-modal support
   - Rich filtering capabilities

## Learning Objectives
Through this notebook, you'll learn:
1. How to implement each vector store
2. Pros and cons of each option
3. Performance comparisons
4. Best use cases for each store

# Vector Stores Comparison with LangChain

This notebook demonstrates the implementation and usage of different vector stores:
1. Pinecone (Cloud-based)
2. FAISS (Local, in-memory)
3. Milvus/Zilliz (Self-hosted/Cloud)
4. Weaviate (Self-hosted/Cloud)

We'll use the same dataset across all vector stores to compare their functionality.

## Setting up Environment

First, let's install all required packages and set up our environment:

In [1]:
# Install required packages
# !pip install -q langchain-community langchain pinecone-client faiss-cpu pymilvus weaviate-client python-dotenv

In [7]:
# Import common dependencies
import os
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.schema import Document
from typing import List
import warnings
warnings.filterwarnings('ignore')

# Load environment variables
load_dotenv()

# Verify Google API key
google_api_key = os.getenv('GOOGLE_API_KEY')
if not google_api_key:
    raise ValueError("❌ GOOGLE_API_KEY not found in .env file")
print("✅ Google API key found")

# Initialize Gemini embeddings model
embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-001",  # or "gemini-embedding-001"
    task_type="retrieval_document",  # Specify task type for better embeddings
    google_api_key=google_api_key
)
print("✅ Embeddings model initialized")

✅ Google API key found
✅ Embeddings model initialized


E0000 00:00:1760016611.373995   78734 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.
E0000 00:00:1760016611.374578   78734 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


# Understanding Embeddings

## What are Embeddings?
Embeddings are dense vector representations of text or other data that capture semantic meaning. In this example, we use Google's Gemini model for creating embeddings.

## Key Components:
1. **Embedding Model**: 
   - Uses `GoogleGenerativeAIEmbeddings`
   - Model: "gemini-embedding-001" (optimized for text)
   - Task type: "retrieval_document" (specific to document search)

## Why These Settings?
- The Gemini model provides high-quality embeddings
- Retrieval-specific embeddings are optimized for search
- Environment variables keep API keys secure

## Process Flow:
1. Load API keys securely
2. Initialize embedding model
3. Convert text to vectors for storage
4. Use these vectors for similarity search

In [8]:
# Create sample documents
documents = [
    Document(
        page_content="Python is a high-level programming language known for its simplicity and readability.",
        metadata={"type": "programming", "language": "Python"}
    ),
    Document(
        page_content="JavaScript is a scripting language primarily used for web development.",
        metadata={"type": "programming", "language": "JavaScript"}
    ),
    Document(
        page_content="Machine Learning is a subset of AI that focuses on data and algorithms.",
        metadata={"type": "technology", "field": "AI"}
    ),
    Document(
        page_content="Deep Learning is part of machine learning based on artificial neural networks.",
        metadata={"type": "technology", "field": "AI"}
    ),
    Document(
        page_content="Docker is a platform for developing, shipping, and running applications in containers.",
        metadata={"type": "technology", "field": "DevOps"}
    )
]

# Sample Data Creation

## Document Structure
Each document in our test set contains:
1. **Content**: Main text information
2. **Metadata**: Additional context and categorization
3. **Unique Identifiers**: For tracking and retrieval

## Best Practices:
1. **Diverse Data**: Include various topics and lengths
2. **Realistic Content**: Use real-world-like examples
3. **Rich Metadata**: Add useful filtering attributes
4. **Consistent Format**: Maintain uniform structure

## Testing Considerations:
- Different document lengths
- Various content types
- Multiple metadata fields
- Edge cases and special characters

## 1. FAISS Vector Store

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search. It's local and in-memory, making it perfect for smaller datasets and quick experimentation.

In [None]:
# Initialize FAISS vector store with persistence
from langchain.vectorstores import FAISS
import os

# Create a directory for storing the index
persist_directory = "faiss_index"
if not os.path.exists(persist_directory):
    os.makedirs(persist_directory)

try:
    # Try to load existing index
    if os.path.exists(f"{persist_directory}/index.faiss"):
        faiss_store = FAISS.load_local(persist_directory, embeddings)
        print("Loaded existing FAISS index ✅")
    else:
        # Create new index from documents
        faiss_store = FAISS.from_documents(
            documents=documents,
            embedding=embeddings
        )
        # Save the index
        faiss_store.save_local(persist_directory)
        print("Created and saved new FAISS index ✅")
except Exception as e:
    print(f"Error with FAISS store: {str(e)}")

FAISS vector store created successfully! ✅


In [11]:
# Search with FAISS
query = "What is artificial intelligence?"
results = faiss_store.similarity_search_with_score(query, k=2)

print("Query:", query)
print("\nResults:")
for doc, score in results:
    print(f"\nScore: {score}")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")

Query: What is artificial intelligence?

Results:

Score: 0.2305298000574112
Content: Machine Learning is a subset of AI that focuses on data and algorithms.
Metadata: {'type': 'technology', 'field': 'AI'}

Score: 0.2490173727273941
Content: Deep Learning is part of machine learning based on artificial neural networks.
Metadata: {'type': 'technology', 'field': 'AI'}


# Pinecone section removed
# To re-enable Pinecone later:
# 1. Create an index named 'langchain-demo' in Pinecone console with dimension matching your embeddings (3072)
# 2. Add PINECONE_API_KEY to your .env
# 3. Re-insert the Pinecone cell or use LangChain's Pinecone.from_documents/from_existing_index
# (Pinecone removed to avoid package/version conflicts during this demo)
print("⚠️ Pinecone section is disabled in this notebook. Skipping Pinecone tests.")

### Finding Your Pinecone Environment

To find your environment name:
1. Log in to Pinecone Console (https://app.pinecone.io/)
2. Go to "API Keys" in the left sidebar
3. Look for "Environment" or "Default Environment"
   - It will be something like `gcp-starter` or `us-east1-gcp-free`
4. Copy this value and use it as your `PINECONE_ENV` in the .env file

Example .env file:
```
PINECONE_API_KEY=a1b2c3d4-5e6f-7g8h-9i10-j11k12l13m14
PINECONE_ENV=gcp-starter
```

In [10]:
# Test Pinecone Connection
import os
from pinecone import Pinecone

try:
    api_key = os.getenv('PINECONE_API_KEY')
    
    if not api_key:
        print("⚠️ PINECONE_API_KEY not found in .env file")
        print("\nMake sure your .env file contains:")
        print("PINECONE_API_KEY=your_api_key_here")
    else:
        # Initialize Pinecone with your specific configuration
        pc = Pinecone(api_key=api_key)
        
        # List indexes
        active_indexes = pc.list_indexes()
        print("✅ Successfully connected to Pinecone!")
        print(f"\nActive indexes: {active_indexes}")
        
        # Get index details
        if "langchain-demo" in [index.name for index in active_indexes]:
            index = pc.describe_index("langchain-demo")
            print("\nIndex Statistics:")
            print(f"Dimension: {index.dimension}")
            print(f"Metric: {index.metric}")
            print(f"Status: {index.status}")
            
except Exception as e:
    print(f"❌ Error connecting to Pinecone: {str(e)}")

✅ Successfully connected to Pinecone!

Active indexes: [{
    "name": "langchain-demo",
    "metric": "cosine",
    "host": "langchain-demo-7lvw08o.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 3072,
    "deletion_protection": "disabled",
    "tags": null
}]

Index Statistics:
Dimension: 3072
Metric: cosine
Status: {'ready': True, 'state': 'Ready'}

Index Statistics:
Dimension: 3072
Metric: cosine
Status: {'ready': True, 'state': 'Ready'}


In [None]:
# Initialize Pinecone with comprehensive error handling and setup
import pinecone
from langchain_community.vectorstores import Pinecone as LangchainPinecone
import time
from typing import Optional, Dict, Any

def setup_pinecone(
    api_key: str,
    environment: str,
    index_name: str,
    dimension: int = 3072,
    metric: str = "cosine",
    pod_type: str = "p1",
) -> Optional[Dict[str, Any]]:
    """
    Set up Pinecone with proper error handling and validation.
    """
    try:
        # Initialize Pinecone
        pinecone.init(api_key=api_key, environment=environment)
        print("✅ Pinecone initialized")
        
        # Check if index exists
        if index_name not in pinecone.list_indexes():
            print(f"Creating index '{index_name}'...")
            pinecone.create_index(
                name=index_name,
                dimension=dimension,
                metric=metric,
                pod_type=pod_type
            )
            # Wait for index to be ready
            while not index_name in pinecone.list_indexes():
                time.sleep(1)
        
        # Get index stats
        index = pinecone.Index(index_name)
        stats = index.describe_index_stats()
        
        return {
            "index": index,
            "stats": stats,
            "status": "ready"
        }
        
    except Exception as e:
        print(f"❌ Error setting up Pinecone: {str(e)}")
        return None

def create_pinecone_store(
    documents: list,
    embeddings: Any,
    index_info: Dict[str, Any],
    batch_size: int = 100
) -> Optional[LangchainPinecone]:
    """
    Create Pinecone vector store with batching and progress tracking.
    """
    try:
        # Create vector store
        vector_store = LangchainPinecone.from_documents(
            documents=documents,
            embedding=embeddings,
            index_name=index_info["index"]._index_name,
            namespace="default",
            batch_size=batch_size
        )
        return vector_store
    except Exception as e:
        print(f"❌ Error creating vector store: {str(e)}")
        return None

# Main execution with environment validation
try:
    # Get and validate environment variables
    api_key = os.getenv('PINECONE_API_KEY')
    env = os.getenv('PINECONE_ENV')
    
    if not api_key or not env:
        raise ValueError(
            "Missing environment variables. Please set:\n"
            "- PINECONE_API_KEY\n"
            "- PINECONE_ENV"
        )
    
    # Setup Pinecone
    index_info = setup_pinecone(
        api_key=api_key,
        environment=env,
        index_name="langchain-demo"
    )
    
    if not index_info:
        raise ValueError("Failed to setup Pinecone")
        
    print("\nPinecone Index Statistics:")
    print(f"Total vectors: {index_info['stats'].total_vector_count}")
    print(f"Namespaces: {index_info['stats'].namespaces}")
    
    # Create vector store
    vector_store = create_pinecone_store(
        documents=documents,
        embeddings=embeddings,
        index_info=index_info
    )
    
    if vector_store:
        print("\n✅ Pinecone vector store created successfully!")
        
        # Test search functionality
        query = "What is machine learning?"
        print(f"\nTest search query: {query}")
        
        results = vector_store.similarity_search(
            query=query,
            k=2
        )
        
        print("\nSearch Results:")
        for i, doc in enumerate(results, 1):
            print(f"\n{i}. Content: {doc.page_content[:100]}...")
            print(f"   Metadata: {doc.metadata}")
            
except Exception as e:
    print("\n❌ Error in Pinecone setup:")
    print(str(e))
    print("\nTroubleshooting steps:")
    print("1. Check your .env file has PINECONE_API_KEY and PINECONE_ENV")
    print("2. Verify your Pinecone account status")
    print("3. Check if you've reached your index limit")
    print("4. Verify network connectivity")
    print("\nTo get your environment name:")
    print("1. Log in to Pinecone Console (https://app.pinecone.io/)")
    print("2. Go to API Keys section")
    print("3. Find your environment (e.g., 'gcp-starter')")

Exception: The official Pinecone python package has been renamed from `pinecone-client` to `pinecone`. Please remove `pinecone-client` from your project dependencies and add `pinecone` instead. See the README at https://github.com/pinecone-io/pinecone-python-client for more information on using the python SDK.

# Re-enabled Pinecone Integration with Best Practices

## Why Pinecone?
- Production-ready vector database
- Automatic scaling
- High availability
- Real-time updates

## Configuration Requirements:
1. Pinecone Account Setup
2. Environment Variables
3. Index Configuration
4. Error Handling

This implementation includes:
- Proper environment checks
- Automated index creation
- Connection retry logic
- Error recovery

### Testing Google Generative AI Embeddings

Let's test the embeddings directly to understand how they work:

In [None]:
# Test Google's embedding model directly
test_texts = [
    "Machine learning is amazing",
    "Python programming is fun",
    "AI technology is advancing rapidly"
]

# Get embeddings for each text
try:
    # Single query embedding
    single_embedding = embeddings.embed_query(test_texts[0])
    print(f"Single embedding dimension: {len(single_embedding)}")
    
    # Multiple document embeddings
    doc_embeddings = embeddings.embed_documents(test_texts)
    print(f"\nNumber of document embeddings: {len(doc_embeddings)}")
    print(f"Each document embedding dimension: {len(doc_embeddings[0])}")
    
    print("\n✅ Embeddings generated successfully!")
    
except Exception as e:
    print(f"❌ Error generating embeddings: {str(e)}")

## 3. Milvus Vector Store

Milvus is an open-source vector database that can be self-hosted or used via Zilliz Cloud. For this example, we'll use the Python client to connect to a local Milvus instance.

### Setting up Milvus with Docker

To run Milvus locally, create a `docker-compose.yml` file with:
```yaml
version: '3.5'
services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  milvus:
    image: milvusdb/milvus:latest
    environment:
      - ETCD_ENDPOINTS=etcd:2379
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
    depends_on:
      - etcd
```

Then run:
```bash
docker-compose up -d
```

In [None]:
# Initialize Milvus with proper error handling and connection management
from langchain_community.vectorstores import Milvus
from pymilvus import connections, utility

try:
    # First, check if Milvus is running
    connections.connect(
        alias="default",
        host="localhost",
        port="19530",
        timeout=10  # 10 second timeout
    )
    
    # Check server status
    if utility.get_server_version():
        print("Connected to Milvus server ✅")
        
        try:
            # Create vector store
            milvus_store = Milvus.from_documents(
                documents=documents,
                embedding=embeddings,
                connection_args={"host": "localhost", "port": "19530"},
                collection_name="langchain_demo",
                drop_old=True  # Will recreate collection if it exists
            )
            print("Milvus vector store created successfully! ✅")
            
            # Test search functionality
            query = "What is machine learning?"
            results = milvus_store.similarity_search(query, k=2)
            print("\nTest search results:")
            for i, doc in enumerate(results, 1):
                print(f"\n{i}. {doc.page_content[:100]}...")
                
        except Exception as store_error:
            print(f"Error creating vector store: {str(store_error)}")
            
    else:
        print("❌ Could not verify Milvus server version")
        
except Exception as conn_error:
    print("❌ Could not connect to Milvus server")
    print(f"Error: {str(conn_error)}")
    print("\nTroubleshooting steps:")
    print("1. Make sure Milvus is running: docker ps | grep milvus")
    print("2. Check docker-compose logs: docker-compose logs milvus")
    print("3. Verify ports are open: netstat -an | grep 19530")
    
finally:
    # Clean up connection
    try:
        connections.disconnect("default")
        print("\nMilvus connection closed properly")
    except:
        pass

Failed to create new connection using: ed2b3e4ea533416e908aff92ba76fb14


Error initializing Milvus: <MilvusException: (code=2, message=Fail connecting to server on localhost:19530, illegal connection params or server unavailable)>

Make sure you have Milvus running locally or update connection details for cloud deployment


## 4. Weaviate Vector Store

Weaviate is an open-source vector search engine that can be self-hosted or used via Weaviate Cloud Services.

### Setting up Weaviate with Docker

To run Weaviate locally, create a `docker-compose.weaviate.yml` file:
```yaml
version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      CLUSTER_HOSTNAME: 'node1'
    volumes:
      - weaviate_data:/var/lib/weaviate

volumes:
  weaviate_data:
```

Run with:
```bash
docker-compose -f docker-compose.weaviate.yml up -d
```

In [None]:
# Initialize Weaviate with proper error handling and schema
from langchain.vectorstores import Weaviate
import weaviate
import time

try:
    # Initialize client with retry logic
    max_retries = 3
    retry_delay = 2
    
    for attempt in range(max_retries):
        try:
            client = weaviate.Client(
                url="http://localhost:8080",
            )
            # Test connection
            client.schema.get()
            print("Connected to Weaviate ✅")
            break
        except Exception as e:
            if attempt < max_retries - 1:
                print(f"Connection attempt {attempt + 1} failed, retrying in {retry_delay} seconds...")
                time.sleep(retry_delay)
            else:
                raise e
    
    # Define class schema for our documents
    class_name = "RAGDocument"
    class_config = {
        "class": class_name,
        "vectorizer": "none",  # We'll provide our own vectors
        "properties": [
            {
                "name": "content",
                "dataType": ["text"]
            },
            {
                "name": "metadata",
                "dataType": ["object"]
            }
        ]
    }
    
    # Create schema if it doesn't exist
    if not client.schema.exists(class_name):
        client.schema.create_class(class_config)
        print(f"Created schema for class {class_name} ✅")
    
    # Create vector store
    weaviate_store = Weaviate.from_documents(
        documents=documents,
        embedding=embeddings,
        client=client,
        by_text=False,
        index_name=class_name
    )
    print("Weaviate vector store created successfully! ✅")
    
    # Test search
    query = "What is machine learning?"
    results = weaviate_store.similarity_search(query, k=2)
    print("\nTest search results:")
    for i, doc in enumerate(results, 1):
        print(f"\n{i}. {doc.page_content[:100]}...")
    
except Exception as e:
    print(f"\n❌ Error with Weaviate: {str(e)}")
    print("\nTroubleshooting steps:")
    print("1. Check if Weaviate is running: docker ps | grep weaviate")
    print("2. Verify the API is accessible: curl http://localhost:8080/v1/.well-known/ready")
    print("3. Check Weaviate logs: docker-compose -f docker-compose.weaviate.yml logs weaviate")

Error initializing Weaviate: Client.__init__() got an unexpected keyword argument 'url'

Make sure you have Weaviate running locally or update connection details for cloud deployment


## Vector Store Comparison

Here's a quick comparison of the vector stores we've looked at:

1. **FAISS**
   - ✅ Local, in-memory storage
   - ✅ Great for quick prototyping
   - ✅ No external dependencies
   - ❌ Not suitable for large-scale production

2. **Pinecone**
   - ✅ Fully managed cloud service
   - ✅ Highly scalable
   - ✅ Great for production
   - ❌ Paid service

3. **Milvus**
   - ✅ Open source
   - ✅ Can be self-hosted or cloud
   - ✅ Highly scalable
   - ❌ More complex setup

4. **Weaviate**
   - ✅ Advanced features (semantic search)
   - ✅ GraphQL interface
   - ✅ Can be self-hosted or cloud
   - ❌ More resource intensive

## Vector Store Performance Comparison

Let's compare the performance of different vector stores across key metrics:
1. Insertion Speed
2. Query Speed
3. Memory Usage
4. Accuracy

In [None]:
import time
import psutil
import numpy as np
from typing import List, Dict, Any

def measure_performance(
    store_name: str,
    vector_store: Any,
    test_queries: List[str],
    n_runs: int = 5
) -> Dict[str, Any]:
    """
    Measure vector store performance metrics.
    """
    results = {
        "name": store_name,
        "query_times": [],
        "memory_usage": [],
        "results": []
    }
    
    process = psutil.Process()
    
    for query in test_queries:
        query_times = []
        for _ in range(n_runs):
            # Clear memory
            if hasattr(vector_store, 'clear_cache'):
                vector_store.clear_cache()
                
            # Measure memory before
            mem_before = process.memory_info().rss / 1024 / 1024  # MB
            
            # Time the query
            start_time = time.time()
            search_results = vector_store.similarity_search(query, k=2)
            end_time = time.time()
            
            # Measure memory after
            mem_after = process.memory_info().rss / 1024 / 1024  # MB
            
            query_times.append(end_time - start_time)
            results["memory_usage"].append(mem_after - mem_before)
            results["results"].append(len(search_results))
    
        results["query_times"].append(np.mean(query_times))
    
    return results

# Test queries
test_queries = [
    "What is machine learning?",
    "Explain Python programming",
    "Tell me about DevOps",
    "What is artificial intelligence?",
    "Describe software development"
]

# Run performance tests for each available store
stores_to_test = {
    "FAISS": faiss_store,
    # "Pinecone": vector_store,  # Uncomment if Pinecone is configured
    "Milvus": milvus_store if 'milvus_store' in locals() else None,
    "Weaviate": weaviate_store if 'weaviate_store' in locals() else None
}

# Store results
performance_results = {}

for name, store in stores_to_test.items():
    if store is not None:
        try:
            print(f"\nTesting {name}...")
            results = measure_performance(name, store, test_queries)
            performance_results[name] = results
            
            print(f"Average query time: {np.mean(results['query_times']):.4f} seconds")
            print(f"Average memory usage: {np.mean(results['memory_usage']):.2f} MB")
            
        except Exception as e:
            print(f"❌ Error testing {name}: {str(e)}")
    else:
        print(f"\n⚠️ {name} not available for testing")

### Notes — Pinecone removed for this demo

- The Pinecone section has been disabled because the notebook experienced package/version conflicts during setup.
- If you want to re-enable Pinecone later:
  1. Create an index named `langchain-demo` in the Pinecone Console with dimension `3072` and metric `cosine`.
  2. Add `PINECONE_API_KEY` to your `.env` file.
  3. Re-insert the Pinecone cell or use LangChain's Pinecone helper methods.

Troubleshooting:
- Milvus: make sure Milvus is running locally on port `19530` or change `connection_args` to your Milvus host/port.
- Weaviate: ensure `weaviate-client` is the expected version; the client initializer may accept `url=` or `base_url=` depending on version. Update `weaviate.Client(base_url="http://localhost:8080")` if needed.

If you want, I can re-enable Pinecone in the notebook with a pinned, working client version and exact code — or we can keep it disabled and proceed with FAISS/Milvus/Weaviate.