# Embedding Model Comparison for RAG Systems

This notebook evaluates and compares different embedding models for Retrieval-Augmented Generation (RAG) systems. We analyze performance metrics like embedding generation time, query response time, and relevance of retrieved documents across different embedding models.

## Models Compared
- Llama 3.2 (4096 dimensions)
- Nomic Embed (768 dimensions)
- BGE-M3 (1024 dimensions)

## Process Overview
1. Load and preprocess documents
2. Configure embedding models
3. Run test queries
4. Evaluate and visualize performance metrics

In [None]:
# Install required dependencies
%pip install langchain langchain_community langchain_ollama langchain_text_splitters langchain_huggingface sentence-transformers pandas matplotlib

Note: you may need to restart the kernel to use updated packages.


## Setup and Dependencies

Import all required libraries for document loading, text splitting, embedding generation, evaluation, and visualization.

In [2]:
# Import all necessary libraries
import re
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import ndcg_score

# Langchain imports
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_ollama import OllamaEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings

# Configure plot settings
plt.rcParams.update({'font.size': 14})

## Document Loading and Processing

Load text documents from a directory and split them into smaller, context-specific chunks for embedding and retrieval.

In [4]:
# Load all text files from the specified directory
folder_path = "C:/Users/LocalUser/Desktop/RAG_BOT/RAG_BOT/Embedding/sampledata"  # Change as needed
# folder_path = "/content/sampledata"  # Change as needed
document_loader = DirectoryLoader(folder_path, glob="**/*.txt", loader_cls=TextLoader)
raw_documents = document_loader.load()

print(f"Number of documents loaded: {len(raw_documents)}")
print("Documents loaded:")
print(raw_documents)

Number of documents loaded: 1
Documents loaded:


In [5]:
# Split each raw document based on dashed line and create sub-documents
split_documents = []
for doc in raw_documents:
    # Use regex to split based on dashed lines like '-----...'
    parts = re.split(r'-{5,}', doc.page_content)
    for i, part in enumerate(parts):
        cleaned_part = part.strip()
        if cleaned_part:
            split_documents.append(
                Document(
                    page_content=cleaned_part,
                    metadata={"source": doc.metadata["source"], "part": i + 1}
                )
            )

print(f"Total split chunks: {len(split_documents)}")

# Prepare documents with consistent metadata for embedding
documents = [
    Document(
        page_content=chunk.page_content,
        metadata=chunk.metadata
    ) for chunk in split_documents
]

Total split chunks: 35


## Embedding Models Configuration

Configure different embedding models with their respective dimensions for comparative evaluation.

In [6]:
# Set up embedding models with dimensions noted
embedding_models = {
    "Llama 3.2": {
        "model": OllamaEmbeddings(model="llama3.2:latest", base_url="http://localhost:11434"),
        "dimensions": 4096
    },
    "Nomic": {
        "model": OllamaEmbeddings(model="nomic-embed-text:latest", base_url="http://localhost:11434"),
        "dimensions": 768
    },
    "BGE-M3": {
        "model": OllamaEmbeddings(model="bge-m3:567m", base_url="http://localhost:11434"),
        "dimensions": 1024
    }
}

### 1. Create FAISS Vector Stores for Each Model

In [None]:
from langchain_community.vectorstores import FAISS
import time
import numpy as np
import matplotlib.pyplot as plt

# Dictionary to store vector stores
vector_stores = {}
embedding_times = {}

# Create vector stores for each model
for model_name, model_info in embedding_models.items():
    print(f"Creating vector store for {model_name}...")
    start_time = time.time()
    
    # Create vector store
    vector_store = FAISS.from_documents(
        documents,
        model_info["model"]
    )
    
    end_time = time.time()
    
    # Store vector store and embedding time
    vector_stores[model_name] = vector_store
    embedding_times[model_name] = end_time - start_time
    
    print(f"{model_name} - Embedding time: {embedding_times[model_name]:.2f} seconds")

Creating vector store for Llama 3.2...
