## Method 2

#### What happens in this method:

* I’ll use **ChromaDB**, a lightweight and fast vector database, to store and search embeddings efficiently.
* The product descriptions and user queries will be embedded once and saved in the vector store.
* When a new vibe query comes in, the system will quickly retrieve the most similar items directly from ChromaDB instead of recalculating everything.
* I’ll also use the `timeit` library here to compare performance with Method 1.

## Related Projects: Advanced RAG and AI Systems Engineering

Below is a collection of my advanced projects focused on **Retrieval-Augmented Generation (RAG)** architectures, **Agentic AI systems**, and **end-to-end AI pipeline integration**.  
These repositories collectively demonstrate expertise in building production-level retrieval systems, AI orchestration frameworks, and intelligent agent design.

### 1. [Advanced RAG from Scratch](https://github.com/Himanshu7921/Advanced-RAG-From-Scratch)
A complete RAG implementation built entirely from the ground up, showcasing custom retriever logic, context management, and LLM integration for domain-specific knowledge retrieval.

### 2. [ModernAgeCoders — AI Chatbot Backend (Freelance Project)](https://github.com/Himanshu7921/ModernAgeCoder-backend)
Developed a scalable AI backend service for ModernAgeCoders, integrating conversational AI pipelines with vector-based retrieval and contextual memory systems.

### 3. [RetrievalMind — Custom Advanced RAG Framework](https://github.com/Himanshu7921/RetrievalMind)
A self-developed and PyPI-published RAG framework featuring dynamic retrieval orchestration, multi-source knowledge ingestion, and modular retriever-LLM interaction design.

### 4. [Agentic AI Engineering](https://github.com/Himanshu7921/Agentic-AI-Engineering)
An exploration of next-generation **Agentic AI architectures**, emphasizing reasoning, autonomy, and multi-step goal execution through intelligent agent collaboration.

### 5. [PolicyPal — Advanced RAG Agent System](https://github.com/Himanshu7921/PolicyPal-RAG-Agent)
Implements advanced RAG-based policy analysis and summarization, demonstrating deep integration of knowledge retrieval, context refinement, and prompt optimization.

### 6. [AI Startup Analyst](https://github.com/Himanshu7921/ai-startup-analyst)
An AI-powered analytical system that leverages retrieval pipelines and LLM reasoning to evaluate and summarize startup ecosystems, funding trends, and innovation patterns.

### 7. [Agentarium — Multi-Agent Systems Playground](https://github.com/Himanshu7921/Agentarium)
An experimental framework for developing and orchestrating **multi-agent systems** using the **Agentic design pattern**, enabling coordinated autonomous behaviors and modular agent collaboration.

In [1]:
# Import Libraries
import chromadb
from chromadb.utils import embedding_functions
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from timeit import default_timer as timer
import pandas as pd
import numpy as np

### Step 0: Data Collection and Preparation

This step defines the **`get_data()`** function, which creates a small, curated mock dataset of fashion products used for semantic similarity experiments.

**Objective:**  
To simulate a real-world product catalog by providing structured data containing product descriptions and stylistic attributes for testing the Vibe Matcher system.

**Dataset Composition:**
- Each entry in the dataset represents a unique fashion product.  
- Each product contains two key fields:  
  - **`desc`** — A detailed natural-language description that conveys the product’s visual style, material, and mood.  
  - **`vibes`** — A list of descriptive tags that summarize the product’s aesthetic (e.g., `"boho"`, `"urban"`, `"casual"`).

**Purpose in the Pipeline:**  
This dataset acts as the foundational input for generating text embeddings.  
Each product’s description is later transformed into a numerical vector and stored in **ChromaDB**, enabling semantic retrieval based on user-defined vibe queries.

In [2]:
def get_data():
    fashion_data = {
        "Boho Dress": {
            "desc": "A flowy, earthy-toned dress made from lightweight cotton fabric, perfect for outdoor festivals or beach walks. Its loose fit and floral embroidery reflect a free-spirited, nature-inspired aesthetic.",
            "vibes": ["boho", "cozy", "free-spirited"]
        },
        "Street Hoodie": {
            "desc": "An oversized streetwear hoodie featuring graffiti prints and bold typography. Ideal for casual city outings or skate park sessions, it embodies an energetic and urban vibe with a youthful edge.",
            "vibes": ["urban", "energetic", "casual"]
        },
        "Minimalist Blazer": {
            "desc": "A structured blazer with clean lines and a modern silhouette, designed for professionals who appreciate simplicity and elegance. Its neutral tones and tailored fit exude confidence and minimalism.",
            "vibes": ["minimal", "formal", "modern"]
        },
        "Denim Jacket": {
            "desc": "A rugged blue denim jacket with faded wash and metal buttons. This timeless piece adds a vintage cool factor and pairs effortlessly with both casual tees and stylish dresses.",
            "vibes": ["vintage", "casual", "cool"]
        },
        "Floral Maxi Dress": {
            "desc": "A soft chiffon maxi dress with delicate floral prints, ideal for garden parties or summer dates. The pastel hues and flowing fabric create a romantic, bohemian mood.",
            "vibes": ["feminine", "romantic", "boho"]
        },
        "Athletic Joggers": {
            "desc": "Slim-fit athletic joggers made from breathable stretch fabric for maximum comfort. Whether for gym sessions or urban streetwear looks, they combine performance and everyday versatility.",
            "vibes": ["sporty", "urban", "active"]
        },
        "Leather Biker Jacket": {
            "desc": "A premium black leather biker jacket with silver zippers and quilted shoulders. It adds a rebellious charm and chic boldness, making it a statement piece for any wardrobe.",
            "vibes": ["edgy", "chic", "rock"]
        },
        "Cozy Knit Sweater": {
            "desc": "A warm, chunky knit sweater crafted from soft wool-blend yarn. Designed for cold winter evenings, it wraps you in comfort and homeliness while maintaining a relaxed charm.",
            "vibes": ["cozy", "soft", "homey"]
        },
        "Silk Evening Gown": {
            "desc": "A luxurious silk gown with a flowing silhouette and subtle shimmer. Perfect for red-carpet events or elegant dinners, it radiates sophistication and timeless glamour.",
            "vibes": ["luxury", "classy", "glamorous"]
        },
        "Cargo Pants": {
            "desc": "Durable cargo pants featuring multiple utility pockets and adjustable straps. Their relaxed fit and earthy tones give off an adventurous, practical streetwear vibe.",
            "vibes": ["streetwear", "practical", "adventure"]
        }
    }
    return fashion_data

### Step 1: Initializing the ChromaDB Vector Store

This step sets up the **ChromaDB client and collection**, which serve as the vector database for storing and retrieving semantic embeddings.

**Objective:**  
To create a dedicated ChromaDB collection that manages the storage, indexing, and querying of product embeddings, enabling efficient vector-based search.

**Process Overview:**
1. Initializes a ChromaDB client instance for managing collections and data operations.  
2. Defines an embedding function using the `SentenceTransformerEmbeddingFunction` with the pretrained `all-MiniLM-L6-v2` model.  
3. Creates (or retrieves, if already existing) a collection named **`fashion_collection`**, which acts as the vector storage layer for all embedded fashion descriptions.  
4. Returns the collection object for downstream use in the embedding and query stages.  

This initialization ensures a structured, reusable vector environment that aligns with real-world database-backed semantic retrieval systems.  
The ChromaDB collection becomes the foundation upon which all subsequent embedding and similarity search operations are performed.


In [3]:
def initialize_chroma_collection(model):
    print("[INFO] Initializing ChromaDB client and collection...")
    client = chromadb.Client()
    embedding_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

    # Create or get collection
    collection = client.get_or_create_collection(
        name="fashion_collection",
        embedding_function=embedding_func
    )
    print("[INFO] ChromaDB collection ready for vector storage.\n")
    return collection

### Step 2: Embedding Generation and Vector Storage

This step defines the **`generate_description_embeddings()`** function, which encodes product descriptions into high-dimensional numerical vectors and stores them in **ChromaDB** for efficient semantic retrieval.

**Objective:**  
To transform raw textual fashion descriptions into vector representations that capture semantic meaning, and to persist these vectors in a searchable vector database.

**Process Overview:**
1. Fetches the mock fashion dataset from the `get_data()` function.  
2. Converts it into a structured DataFrame containing product names, descriptions, and associated vibes.  
3. Transforms the list of vibe tags into comma-separated strings to ensure metadata compatibility with ChromaDB.  
4. Generates embeddings for each product description using the `SentenceTransformer` model (`all-MiniLM-L6-v2`).  
5. Adds the resulting vectors, along with their metadata, to the ChromaDB collection via `collection.add()`.  
6. Logs the total number of stored vectors and the time taken for embedding and storage operations.

By embedding and storing product data in ChromaDB, this step provides a persistent and queryable semantic representation of the fashion dataset, forming the backbone for all subsequent retrieval operations.

In [4]:
def generate_description_embeddings(model, collection):
    print("[INFO] Fetching fashion data...")
    fashion_data = get_data()

    print("[INFO] Creating DataFrame from product data...")
    # Convert vibes list → comma-separated string for metadata compatibility
    fashion_df = pd.DataFrame([
        {
            "name": name,
            "desc": details["desc"],
            "vibes": ", ".join(details["vibes"])  # FIX HERE
        }
        for name, details in fashion_data.items()
    ])
    print(f"[INFO] Loaded {len(fashion_df)} products into DataFrame.\n")

    start = timer()
    print("[INFO] Generating and storing embeddings into ChromaDB...")

    # Add product data to ChromaDB
    collection.add(
        ids=[str(i) for i in range(len(fashion_df))],
        documents=fashion_df["desc"].tolist(),
        metadatas=fashion_df.to_dict(orient="records")
    )

    end = timer()
    execution_time = end - start
    print(f"[INFO] Embeddings stored in ChromaDB for {len(fashion_df)} products.")
    print(f"[TIME] Embedding generation and storage completed in {execution_time:.4f} seconds.\n")

    return fashion_df, execution_time

### Step 3: Semantic Retrieval and Similarity Matching

This step defines the **`vibe_matcher()`** function, which performs the semantic retrieval stage of the pipeline using **ChromaDB** as the vector search engine.

**Objective:**  
To identify and return the top fashion products whose descriptions are most semantically aligned with a user’s vibe-based query.

**Process Overview:**
1. The function accepts a natural language query and the ChromaDB collection containing pre-stored product embeddings.  
2. It performs a **semantic similarity search** using Chroma’s built-in querying mechanism (`collection.query()`), which retrieves the most relevant vectors based on cosine similarity.  
3. The retrieved results include product descriptions, metadata (names, vibes), and similarity distances.  
4. A results DataFrame is generated to display the top-*k* matches, sorted by their similarity scores.  
5. Execution time for the retrieval operation is measured to evaluate query latency and system responsiveness.  
6. If no close matches are found, the function returns a warning message indicating the absence of strong semantic correlation.

This step represents the **core retrieval logic** of the Vibe Matcher system, where high-dimensional vector representations are leveraged to interpret and fulfill user intent beyond exact keyword matching.

In [5]:
def vibe_matcher(query, collection, top_k=3):
    print(f"[INFO] Running vibe match for query: '{query}'")
    start = timer()

    # Query ChromaDB using semantic similarity
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )

    end = timer()
    execution_time = end - start
    print(f"[TIME] Query processed in {execution_time:.4f} seconds.\n")

    if not results["documents"] or len(results["documents"][0]) == 0:
        print("[WARN] No strong vibe match found for the query.")
        return None, execution_time

    # Create results DataFrame
    results_df = pd.DataFrame({
        "name": [meta["name"] for meta in results["metadatas"][0]],
        "desc": results["documents"][0],
        "similarity_score": results["distances"][0]
    })

    print("[INFO] Top matching products identified.\n")
    return results_df, execution_time

### Step 4: Pipeline Runner

This section defines the main **pipeline controller** function responsible for executing the complete Vibe Matcher process in a single, cohesive flow.  
It integrates all preceding components — data embedding, vector storage, and semantic retrieval — using ChromaDB as the underlying vector database.

**Function Overview:**
1. Initializes the ChromaDB client and creates (or loads) a vector collection using the embedding function.  
2. Calls `generate_description_embeddings()` to encode and store all product descriptions along with their metadata.  
3. Passes the user’s vibe-based query (e.g., *"energetic urban chic"*) to `vibe_matcher()` for semantic similarity search.  
4. Logs execution times for both embedding and retrieval stages to assess performance.  
5. Returns the resulting DataFrame containing the top-ranked fashion products that best align with the user’s query.

This function acts as the **orchestrator** of the entire semantic recommendation workflow, ensuring clear modular execution and measurable performance across stages.


In [6]:
def run_pipeline(query, model, top_k=3):
    print("=" * 80)
    print(f"Running Vibe Matcher Pipeline with ChromaDB for Query: '{query}'")
    print("=" * 80)

    # Initialize Chroma and embed data
    collection = initialize_chroma_collection(model)
    fashion_df, embed_time = generate_description_embeddings(model, collection)

    # Run query
    results_df, query_time = vibe_matcher(query, collection, top_k)

    print("[INFO] Pipeline executed successfully.")
    print(f"[TIME] Embedding Time: {embed_time:.4f} s | Query Time: {query_time:.4f} s\n")
    print("=" * 80)

    return results_df

### Step 5: Execution and Result Interpretation

This final step executes the complete **Vibe Matcher pipeline** using the ChromaDB-integrated implementation.  
The model generates embeddings for all product descriptions, stores them in a Chroma vector collection, and retrieves the most semantically similar items for a given vibe query.

**Process Summary:**
1. Initialize the `SentenceTransformer` model (`all-MiniLM-L6-v2`) to generate text embeddings locally.  
2. Define a test query (e.g., *"energetic urban chic"*) representing the desired fashion mood or style.  
3. Run the `run_pipeline()` function, which:
   - Initializes the ChromaDB client and collection.  
   - Embeds and stores fashion descriptions with metadata.  
   - Performs similarity search for the input query.  
4. Display the top-ranked matching products, ordered by cosine similarity scores.  

This section validates the end-to-end flow of the **semantic recommendation system**, demonstrating how the system interprets a user's descriptive input and returns stylistically relevant fashion items.

In [7]:
model = SentenceTransformer("all-MiniLM-L6-v2")
query = "energetic urban chic"

results_df = run_pipeline(query, model)
if results_df is not None:
    print(results_df)

Running Vibe Matcher Pipeline with ChromaDB for Query: 'energetic urban chic'
[INFO] Initializing ChromaDB client and collection...
[INFO] ChromaDB collection ready for vector storage.

[INFO] Fetching fashion data...
[INFO] Creating DataFrame from product data...
[INFO] Loaded 10 products into DataFrame.

[INFO] Generating and storing embeddings into ChromaDB...
[INFO] Embeddings stored in ChromaDB for 10 products.
[TIME] Embedding generation and storage completed in 0.0866 seconds.

[INFO] Running vibe match for query: 'energetic urban chic'
[TIME] Query processed in 0.0354 seconds.

[INFO] Top matching products identified.

[INFO] Pipeline executed successfully.
[TIME] Embedding Time: 0.0866 s | Query Time: 0.0354 s

                name                                               desc  \
0  Floral Maxi Dress  A soft chiffon maxi dress with delicate floral...   
1      Street Hoodie  An oversized streetwear hoodie featuring graff...   
2  Minimalist Blazer  A structured blazer wit