## Method 3

#### What happens in this method:

* This method uses my **custom-built and PyPI-published library: RetrievalMind** (`pip install RetrievalMind==0.1.3`), an advanced **Retrieval-Augmented Generation (RAG)** framework.  
* It automates the entire pipeline, including document loading, embedding generation, vector storage, retrieval, and dynamic reasoning through an integrated LLM.  
* The system generates **AI-driven recommendations and insights dynamically**, adapting responses to the user’s query in real time.  
* The pipeline follows a modular and production-oriented design: ensuring scalable integration between data retrieval, semantic matching, and language model reasoning.

In [1]:
# Imports for RAG Integration
from RetrievalMind.embeddings_manager import EmbeddingManager
from RetrievalMind.data_ingestion.text_ingestor import TextDocumentIngestor
from RetrievalMind.vector_store_manager import VectorStore
from RetrievalMind.rag_retriver import Retrieval
import chromadb
import uuid
import logging
logging.basicConfig(level=logging.WARNING, format='[%(levelname)s] %(message)s')
logger = logging.getLogger(__name__)

# Imports for AI Agent
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import GoogleGenerativeAI
from langchain_core.runnables import Runnable
import os

if os.getenv("RAG_DEBUG", "0") == "1":
    logger.setLevel(logging.DEBUG)

### Document Loading and Embedding

The **`load_text_document()`** function handles loading and preparing text data for embedding and retrieval.  
It converts raw text files into structured document chunks with metadata for downstream vectorization.

**Key Steps:**
1. Initializes a `TextDocumentIngestor` to read and split text data.  
2. Loads all documents from the specified folder and converts them into chunks.  
3. Logs the number of chunks and previews sample content for verification.  
4. Returns the processed chunks for the embedding stage.

**Role in Pipeline:**  
Serves as the **data ingestion stage**, ensuring clean, structured input for semantic embedding and retrieval.

In [2]:
def load_text_document(pdf_folder_path: str) -> list:
    """
    Load all PDF documents from a folder and return them as chunks.

    Args:
        pdf_folder_path (str): Path to the folder containing PDFs.
        file_pattern (str): Glob pattern to match PDF files (default '**/*.pdf').

    Returns:
        list: List of document chunks with page content and metadata.
    """
    # pdf_ingestor = PDFDocumentIngestor(file_path = pdf_folder_path, loader_type='mu')
    text_ingestor = TextDocumentIngestor(file_path = pdf_folder_path)
    text_loader = text_ingestor.load_document()
    document_chunks = text_loader.load()
    logger.debug(f"Loaded {len(document_chunks)} document chunks from {pdf_folder_path}")

    # Debug: preview first 5 document chunks
    for idx, chunk in enumerate(document_chunks[:5]):
        logger.debug(f"Document Chunk {idx} preview: {repr(chunk.page_content[:100])}")
    
    return document_chunks

### Document Embedding Generation

The **`generate_document_embeddings()`** function encodes document chunks into high-dimensional semantic vectors using a pretrained **SentenceTransformer** model.  
These embeddings serve as the core representations for similarity search and context retrieval.

**Key Steps:**
1. Extracts text content from document chunks.  
2. Initializes an `EmbeddingManager` with the chosen model (`all-MiniLM-L6-v2` by default).  
3. Generates semantic embeddings for all text segments in batch mode.  
4. Logs processing details, including model and chunk count.  
5. Returns both the generated embeddings and the embedding manager.

**Role in Pipeline:**  
Transforms raw text into semantic vectors, enabling meaningful retrieval and reasoning within the RAG framework.

In [3]:
def generate_document_embeddings(document_chunks: list, embedding_model: str = "all-miniLM-L6-v2") -> tuple:
    """
    Generate embeddings for a list of document chunks using a SentenceTransformer model.

    Args:
        document_chunks (list): List of document chunks.
        embedding_model (str): Name of the embedding model (default "all-miniLM-L6-v2").

    Returns:
        tuple: embeddings list, EmbeddingManager instance
    """
    texts = [chunk.page_content for chunk in document_chunks]
    embedding_manager = EmbeddingManager(model_name=embedding_model)
    embeddings = embedding_manager.generate_embeddings(texts)
    logger.debug(f"Generated embeddings for {len(texts)} chunks using model '{embedding_model}'")
    return embeddings, embedding_manager

### Vector Store Creation and Persistence

The **`store_documents_in_vector_store()`** function handles the storage of document embeddings and metadata in a **ChromaDB vector database**.  
It establishes a persistent, queryable foundation for semantic retrieval in the RAG pipeline.

**Key Steps:**
1. Creates the persistence directory if it does not exist.  
2. Initializes a persistent ChromaDB client and collection.  
3. Iterates through each document and embedding, generating unique IDs and metadata.  
4. Inserts all records — documents, embeddings, and metadata — into the collection.  
5. Logs insertion success and validates stored document count.

**Role in Pipeline:**  
Transforms generated embeddings into a **persistent vector knowledge base**, enabling scalable, reusable, and low-latency semantic search.

In [4]:
def store_documents_in_vector_store(document_chunks: list, embeddings: list, collection_name: str, persist_dir: str, doc_type: str = "PDF") -> object:
    """
    Store document chunks and embeddings in a ChromaDB vector store.

    Args:
        document_chunks (list): List of document chunks.
        embeddings (list): Corresponding embeddings for each chunk.
        collection_name (str): Name of the vector store collection.
        persist_dir (str): Directory where the vector store is persisted.
        doc_type (str): Type of documents being stored (default "PDF").

    Returns:
        VectorStore: Initialized and populated vector store instance.
    """
    # Ensure persistence directory exists
    os.makedirs(persist_dir, exist_ok=True)

    # Initialize persistent ChromaDB client and collection
    client = chromadb.PersistentClient(path=persist_dir)
    collection = client.get_or_create_collection(
        name=collection_name,
        metadata={"description": f"{doc_type} document embeddings for RAG pipeline"}
    )

    # Prepare records for ChromaDB
    ids, metadatas, documents_text, embeddings_list = [], [], [], []
    for i, (doc, emb) in enumerate(zip(document_chunks, embeddings)):
        doc_id = f"doc_{uuid.uuid4().hex[:8]}_{i}"
        ids.append(doc_id)

        metadata = dict(getattr(doc, "metadata", {}))
        metadata.update({
            "doc_index": i,
            "content_length": len(getattr(doc, "page_content", "")),
            "source": getattr(doc, "metadata", {}).get("source", None)
        })
        metadatas.append(metadata)

        documents_text.append(getattr(doc, "page_content", ""))
        embeddings_list.append(emb.tolist() if hasattr(emb, 'tolist') else list(emb))

    collection.add(
        ids=ids,
        documents=documents_text,
        metadatas=metadatas,
        embeddings=embeddings_list
    )

    logger.debug(f"Successfully added {len(documents_text)} documents to '{collection_name}'.")
    logger.debug(f"Total documents in collection: {collection.count()}")
    return collection

### Querying the Vector Store

The **`query_vector_store()`** function performs **semantic retrieval** from a ChromaDB vector store using a natural language query.  
It encodes the query into an embedding, searches stored vectors for semantically similar results, and returns top-*k* matches with metadata and similarity scores.

**Key Steps:**
1. Generate an embedding for the input query.  
2. Query the ChromaDB collection for nearest semantic matches.  
3. Convert distances to normalized similarity scores.  
4. Filter low-confidence results and apply a keyword fallback if needed.  
5. Log all retrieval steps for transparency and debugging.

**Role in Pipeline:**  
Acts as the **retrieval core** of the RAG architecture — connecting user intent to stored knowledge through semantic similarity.

In [5]:
def query_vector_store(collection: object, embedding_manager: EmbeddingManager, query_text: str, top_k: int = 3, min_score: float = 0.0, raw_documents: list | None = None) -> list:
    """
    Perform semantic search on the vector store.

    Args:
        vector_store (VectorStore): Initialized vector store instance.
        embedding_manager (EmbeddingManager): Embedding generator.
        query_text (str): Query string for retrieval.
        top_k (int): Number of top results to return.
        min_score (float): Minimum similarity score threshold.

    Returns:
        list: List of retrieved documents with metadata and similarity scores.
    """
    # Direct Chroma query using the provided collection
    results = []
    try:
        try:
            q_embs = embedding_manager.generate_embeddings([query_text])
            q_emb = q_embs[0] if hasattr(q_embs, '__len__') and len(q_embs) > 0 else q_embs
            logger.debug(f"Query embedding length: {getattr(q_emb, 'shape', None) or len(q_emb)}")
            logger.debug(f"Query embedding (first 6 values): {q_emb[:6]}")
        except Exception as e:
            logger.debug(f"Failed to generate query embedding: {e}")
            raise

        raw = collection.query(query_embeddings=[q_emb.tolist()], n_results=top_k)
        raw_dist = raw.get('distances', [[]])[0] if raw.get('distances') else []
        raw_docs = raw.get('documents', [[]])[0] if raw.get('documents') else []
        raw_ids = raw.get('ids', [[]])[0] if raw.get('ids') else []
        raw_metas = raw.get('metadatas', [[]])[0] if raw.get('metadatas') else []

        interpreted = []
        for idx, d in enumerate(raw_dist):
            try:
                similarity = 1.0 / (1.0 + float(d))
            except Exception:
                similarity = 0.0

            if similarity >= min_score and idx < len(raw_docs):
                interpreted.append({
                    'id': raw_ids[idx] if idx < len(raw_ids) else f'raw-{idx}',
                    'content': raw_docs[idx],
                    'metadata': raw_metas[idx] if idx < len(raw_metas) else {},
                    'similarity_score': round(similarity, 4),
                    'distance': round(d, 4)
                })

        logger.info(f"Direct Chroma reinterpretation returned {len(interpreted)} results")
        results = interpreted
    except Exception as e:
        logger.debug(f"Direct Chroma query failed: {e}")

    # If still no results or results don't answer a teacher-related query, attempt a keyword fallback over raw documents
    lower_q = query_text.lower()
    teacher_query = any(k in lower_q for k in ['teacher', 'teach', 'instructor', 'who are the teachers'])

    # If no results, or this is a teacher query and top results don't contain teacher info, run fallback
    need_fallback = (len(results) == 0 and raw_documents) or (
        teacher_query and raw_documents and not any(
            any(tok in (r.get('content') or '').lower() for tok in ['teacher', 'instructor', 'himanshu', 'mihir', 'shivam'])
            for r in results
        )
    )

    if need_fallback:
        logger.warning("Running keyword fallback search over raw documents (teacher-related or no direct hits).")
        fallback = []
        for idx, doc in enumerate(raw_documents):
            text = getattr(doc, 'page_content', None) or (doc.get('page_content') if isinstance(doc, dict) else str(doc))
            if not text:
                continue
            if any(k in text.lower() for k in ['teacher', 'instructor', 'himanshu', 'mihir', 'shivam']):
                fallback.append({
                    'id': f'fallback-{idx}',
                    'similarity_score': 1.0,
                    'content': text,
                })
        logger.info(f"Fallback matched {len(fallback)} documents")
        if fallback:
            return fallback

    return results

### LLM Initialization — Google Gemini

This section defines the **`initialize_llm()`** function, which configures and authenticates the **Google Gemini Large Language Model (LLM)** for use within the RAG or conversational reasoning pipeline.

**Objective:**  
To securely initialize the Gemini API client using an environment variable-stored API key, ensuring that the LLM is accessible without exposing sensitive credentials in the codebase.

**Process Overview:**
1. Retrieves the API key from a specified environment variable (`Gemini_APIKEY` by default).  
2. Validates the existence of the key; if not found, raises a descriptive error for secure debugging.  
3. Initializes the **`GoogleGenerativeAI`** client using the chosen Gemini model (here, `gemini-2.5-flash`), which balances speed and reasoning performance.  
4. Returns an authenticated instance of the Gemini model, ready to handle query reasoning, summarization, or contextual answer generation.

**Security and Design Notes:**
- Storing API keys in environment variables prevents accidental exposure in version control systems.  
- The design supports modular substitution with other models (e.g., GPT, Claude, or Llama) with minimal changes.  
- This initialization step forms the entry point for integrating **LLM reasoning capabilities** into the retrieval-augmented generation workflow.

In [6]:
def initialize_llm(api_key_env_var: str = "Gemini_APIKEY") -> GoogleGenerativeAI:
    """
    Initialize the Google Gemini LLM using the API key from environment variables.
    """
    api_key = os.getenv(api_key_env_var)
    if not api_key:
        raise ValueError(f"API key not found in environment variable '{api_key_env_var}'")
    return GoogleGenerativeAI(model="gemini-2.5-flash", api_key=api_key)

### Prompt Design for the Vibe-Based Fashion Recommender

This section defines the **`get_prompt()`** function, which constructs a structured **chat prompt template** to guide the behavior of the fashion recommendation LLM.  
The prompt is designed to align the model’s responses with the tone, purpose, and style expectations of a digital fashion stylist, ensuring consistency and domain relevance across user interactions.

**Objective:**  
To define a reusable and context-aware prompt framework that enables the AI assistant to interpret user queries (vibe descriptions) and generate precise, stylistically coherent fashion recommendations based on retrieved data.

**Prompt Structure:**
1. **System Role Definition:**  
   Establishes the assistant’s identity as a *professional AI fashion stylist* capable of interpreting fashion aesthetics and recommending suitable outfits.  
   The model is guided to maintain an elegant, helpful, and fashion-oriented tone throughout interactions.

2. **User Context Definition:**  
   Provides the model with the retrieved product data and the user’s query (vibe description).  
   The prompt enforces grounded reasoning — the AI must base its recommendations **only** on the provided data, avoiding fabrication or external inference.

3. **Instructional Directives:**  
   - Focus on product descriptions that match the user’s vibe.  
   - Justify recommendations briefly (e.g., texture, color palette, or design aesthetic).  
   - Respond with polished, concise, and brand-consistent phrasing.  
   - Include a polite fallback response if no relevant items are found.

**Design Rationale:**  
This prompt structure ensures a balance between **creativity** and **factual grounding**, allowing the model to behave like an intelligent stylist rather than a generic chatbot.  
It provides a controlled conversational scaffold for the RAG pipeline, ensuring that retrieved data is meaningfully transformed into high-quality recommendations.

In [7]:
def get_prompt() -> ChatPromptTemplate:
    """
    Returns a ChatPromptTemplate for the RAG AI assistant specialized in company policies.
    """
    return ChatPromptTemplate([
        ('system', """You are a helpful and knowledgeable AI fashion stylist and product recommender for an online fashion platform. 
        Your role is to assist users in discovering fashion items that match their personal vibe, mood, or style preferences. 
        You understand aesthetic themes (like 'urban chic', 'boho casual', or 'minimal elegance') and can recommend matching outfits from the retrieved product data. 
        Always maintain a professional, engaging, and style-oriented tone — similar to a fashion consultant."""),


        ('user', """Here are the retrieved product descriptions and metadata relevant to the user's vibe or style preference:
        {retrieved_docs_from_rag}
        User Query: {user_query}

        Instructions:

        - Recommend the most relevant fashion items based **only** on the retrieved product information.
        - Do **not** fabricate or assume product details not present in the data above.
        - If no relevant match is found, respond politely by saying: 
        "It seems we don’t have a perfect match for this vibe right now. You can try a different style keyword, or check back soon for new arrivals!"
        - Keep the tone sophisticated yet accessible, similar to a modern online stylist or fashion consultant.
        - Mention why an item fits the described vibe (e.g., color palette, texture, aesthetic, or design).
        - Keep responses concise, elegant, and engaging.
        - Please provide the answer in plain text format without using markdown, bold text, or any special formatting.
        Provide your recommendation below:""")

    ])

### Chain Construction

This function, **`get_chain()`**, creates an executable RAG pipeline by linking the prompt template, language model, and output parser into a single runnable chain.  
It ensures seamless flow — the prompt feeds into the LLM, and the generated response is parsed into a clean, human-readable output.


In [8]:
def get_chain(llm: GoogleGenerativeAI, prompt_template: ChatPromptTemplate) -> Runnable:
    """
    Create a Runnable chain combining the prompt template, LLM, and output parser.
    """
    return prompt_template | llm | StrOutputParser()

### Query Execution with RAG

The **`ask_with_rag()`** function generates the final AI response by combining the user’s query with retrieved context documents.  
It feeds both into the runnable chain, enabling the LLM to produce a context-aware and grounded recommendation.

In [9]:
def ask_with_rag(chain: Runnable, query: str, retrieved_docs: list) -> str:
    """
    Generate an AI answer for a query based on RAG retrieved documents.
    """
    docs_text = "\n".join([doc['content'] for doc in retrieved_docs])
    input_mapping = {
        "user_query": query,
        "retrieved_docs_from_rag": docs_text
    }
    return chain.invoke(input_mapping)

### Main RAG Pipeline

The **`main()`** function executes the complete RAG workflow — loading text data, generating embeddings, storing them in a vector database, and retrieving the most relevant documents for a given query.  
It integrates all key components (embedding, storage, and retrieval) into a unified, end-to-end pipeline.

In [10]:
def main(query: str):
    """
    Full RAG pipeline: load documents, generate embeddings, store/retrieve, and prepare for LLM query.
    """
    # Configuration
    text_folder_path = "data.txt"
    vector_collection_name = "nexora_collection"
    vector_store_directory = "data/nexora_vector_store"

    # Load PDF documents (example: "Travel Policy", "Expense Policy", "HR Guidelines")
    text_chunks = load_text_document(text_folder_path)

    # Generate embeddings
    embeddings, embedding_manager = generate_document_embeddings(text_chunks)

    # Store documents in vector store using RetrievalMind's VectorStore
    try:
        vector_store = VectorStore(collection_name=vector_collection_name, persist_directory=vector_store_directory, document_type="PDF")
        vector_store.add_document(documents=text_chunks, embeddings=embeddings)
        logger.debug(f"Added {len(text_chunks)} documents to RetrievalMind VectorStore '{vector_collection_name}'")
    except Exception as e:
        logger.error(f"Failed to store documents in RetrievalMind VectorStore: {e}")
        raise

    # Retrieve relevant documents using RetrievalMind's Retrieval
    try:
        retrieval_pipeline = Retrieval(vector_store=vector_store, embedding_manager=embedding_manager)
        retrieved_docs = retrieval_pipeline.retrieve(query=query, top_k=5, score_threshold=0)
        logger.info(f"RetrievalMind returned {len(retrieved_docs)} results")
    except Exception as e:
        logger.error(f"RetrievalMind retrieval failed: {e}")
        retrieved_docs = []

    # Preview retrieved docs
    for doc in retrieved_docs:
        logger.debug("Retrieved Document ID: %s", doc.get('id'))
        logger.debug("Similarity Score: %s", doc.get('similarity_score'))
        logger.debug("Content Preview: %s...", (doc.get('content') or '')[:500])
    return retrieved_docs

### End-to-End Execution

This block runs the complete RAG pipeline: retrieves context documents, initializes the LLM, builds the prompt chain, and generates the final vibe-based fashion recommendation.

In [11]:
query = "energetic urban chic"

retrieved_docs = main(query=query)
logger.debug("retrieved_docs: %s", retrieved_docs)

llm_model = initialize_llm()

prompt_template = get_prompt()
rag_chain = get_chain(llm_model, prompt_template)

# Step 7: Ask query
answer = ask_with_rag(rag_chain, query=query, retrieved_docs=retrieved_docs)

print("-" * 80)
print("Query: ", query)
print("Recommendation: ", answer)

[INFO] Vector store ready: 'nexora_collection'
[INFO] Current document count: 1
[INFO] No new documents to add to 'nexora_collection'. Skipped 1 duplicates.
--------------------------------------------------------------------------------
Query:  energetic urban chic
Recommendation:  For an energetic urban chic vibe, I have curated a selection of items that perfectly embody this dynamic aesthetic.

First, the Street Hoodie is an excellent choice. Its graffiti prints and bold typography instantly resonate with an energetic urban feel, offering a youthful edge that truly energizes your look.

Next, consider the Athletic Joggers. These slim-fit joggers are ideal for urban streetwear looks, blending performance with everyday versatility, which perfectly aligns with the active and energetic dimension of your desired style.

Finally, to infuse that essential 'chic' element with an energetic twist, the Leather Biker Jacket is an impeccable piece. Its edgy design, complete with silver zippers a