<span style="color: navy;">

# **Myntra Fashion AI Project - Using LammaIndex**

Dive into the innovative world of artificial intelligence with the "Myntra Fashion AI" project. This initiative offers a unique opportunity for us to engage our enthusiasm for AI technology and develop a cutting-edge search system specifically tailored for the fashion industry.

## Project Objective

Our aim is to construct a generative search system that meticulously sifts through a vast array of fashion product descriptions on Myntra, delivering tailored recommendations based on user queries.

    
### Project Goals for Fashion Search Using LLaMA Index:

The primary objective of the Fashion Search project is to develop a Generative search system that enables users to efficiently find and validate fashion items, trends, and recommendations from a vast corpus of fashion data.

#### Main Goal is to provide below high level solution:

- **Enhance User Experience:** Reduce the time spent searching through fashion databases by providing accurate and relevant search results.
- **Support Trend Analysis:** Help users quickly locate historical and current fashion trends to inform their choices and designs.
- **Improve Data Accessibility:** Ensure that valuable information buried in fashion databases is easily accessible and usable.
- **Increase Efficiency:** Streamline the process of fashion knowledge retrieval, leading to better utilization of resources and informed decision-making in the fashion industry.
</span>



<span style="color: navy;">

### <font color = navy>**2 Implementation**
#### <font color = navy> 2.1. RAG model

![RAG](rag.png)

<span style="color: navy;">
    
    
#### 2.2. Framework Used

LlamaIndex (formerly known as GPT Index) is an ideal framework for the Fashion AI project using Myntra's fashion database due to its powerful capabilities in natural language processing (NLP), ease of integration, and ability to handle large datasets. <br><br>

- **Contextual Understanding:** LlamaIndex models excel at maintaining context over extensive data, making them well-suited for parsing through fashion datasets where understanding the sequence of trends and items is vital for accurate retrieval and validation of fashion information. <br>
- **Easy Integration:** LlamaIndex provides robust APIs and integration tools that facilitate seamless incorporation into existing systems, ensuring smooth implementation with Myntra’s infrastructure. <br>
- **Semantic Search Capabilities:** Beyond simple keyword matching, LlamaIndex supports semantic search, crucial for understanding and retrieving fashion items based on their meaning and context rather than just specific terms. This results in more relevant and precise search outcomes. <br>
- **Scalability:** Designed to scale with growing data needs, LlamaIndex can efficiently handle increasing volumes of fashion data without significant performance degradation as Myntra’s fashion database expands. <br>
</span>


<span style="color: navy;">
    
    
### **3 Key Design Stages**

#### **3.1 Chunking and Parsing**
- **RecursiveCharacterTextSplitter:** This tool breaks down fashion-related text into smaller, manageable segments, ensuring that each chunk adheres to a specified character limit. This approach helps in handling extensive fashion data without losing context.<br>
- **LangchainNodeParser:** After chunking, this parser processes these segments to extract meaningful fashion insights, maintain context, and prepare the data for subsequent analysis or search operations.<br>

#### **3.2 Embedding**
- This stage involves converting fashion data into numerical vectors, which facilitates efficient similarity searches and clustering of fashion items based on their attributes and styles.<br>

#### **3.3 Query Engine**
- The query engine processes user inputs related to fashion trends, preferences, or items and retrieves relevant results from the database. It leverages advanced search algorithms to ensure accurate and timely responses.<br>

#### **3.4 Retrieval**
- In this stage, the system retrieves the most relevant fashion data from the database based on the query results. This involves ranking and filtering to provide the best matches for user queries.<br>

#### **3.5 Response Formation**
- This final stage involves compiling and formatting the retrieved fashion information into coherent responses. It ensures that the information is presented in a user-friendly manner, enhancing the overall experience for users seeking fashion insights.<br>
</span>


<span style="color: navy;">

    
    
#### Install and Import the required Libraries



In [1]:
# Install the OpenAI, ChromaDB, and Sentence-Transformers libraries quietly and ensure they are up to date.
!pip install -U -q openai chromadb==0.5.3  sentence-transformers tiktoken

In [2]:
from pathlib import Path
import pandas as pd
import ast
from tqdm import tqdm
import re
from IPython.display import Image, display
from operator import itemgetter
import json
import tiktoken
import urllib.request
from PIL import Image
from io import BytesIO
import textwrap
import openai

import warnings
# Suppress warnings
warnings.filterwarnings("ignore")


<span style="color: navy;">
    
    
#### Import the necessary libraries for data manipulation and display.



In [25]:
pd.set_option('display.max_colwidth', 80)  # No limit for column width

# Define the path to the Fashion Dataset CSV file.
input_file_path = "/Users/shrinivasd/#Upgrad/#12_GenAi_Upgrad/LammaIndex_Assingment/FashionDataFile/FashionDatasetv2_sampled.csv"

# Read the CSV file into a pandas DataFrame.
fashion_data_all_cols = pd.read_csv(input_file_path)

# Display the first few rows of the DataFrame to inspect the data.
fashion_data_all_cols.head(2)

Unnamed: 0,p_id,name,products,price,colour,brand,img,ratingCount,avg_rating,description,p_attributes
0,18202160,POONAM DESIGNER Women Red Yoke Design Kurti with Trousers,"Kurti, Trousers",2499.0,Red,POONAM DESIGNER,http://assets.myntassets.com/assets/images/18202160/2022/5/9/9677d0a5-387b-4...,7.0,3.714286,Red yoke design Kurti with Trousers <br> <br> <b> Kurti design: </b> <ul> <...,"{'Add-Ons': 'NA', 'Body Shape ID': '333,424', 'Body or Garment Size': 'Garme..."
1,5389041,STREET 9 Women White & Black Checked Culotte Jumpsuit,Culotte Jumpsuit,1899.0,White,STREET 9,http://assets.myntassets.com/assets/images/5389041/2018/4/20/11524219473683-...,105.0,4.171429,"White and black checked culotte jumpsuit with waist tie-up detail, has a sho...","{'Body or Garment Size': 'To-Fit Denotes Body Measurements in', 'Fabric': 'C..."


In [26]:
fashion_data_all_cols.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   p_id          2000 non-null   int64  
 1   name          2000 non-null   object 
 2   products      2000 non-null   object 
 3   price         2000 non-null   float64
 4   colour        2000 non-null   object 
 5   brand         2000 non-null   object 
 6   img           2000 non-null   object 
 7   ratingCount   2000 non-null   float64
 8   avg_rating    2000 non-null   float64
 9   description   2000 non-null   object 
 10  p_attributes  2000 non-null   object 
dtypes: float64(3), int64(1), object(7)
memory usage: 172.0+ KB


<span style="color: navy;">

### Observation

- Our dataset contains around 2000 entries across 11 columns.
- Its clean dataset with no null values    


</span>


In [27]:
fashion_data_all_cols.description[53]

"<ul><li>Light shade, light fade blue jeans</li><li>Slim fit, mid-rise</li><li>Clean look</li><li>Stretchable</li><li>5 pockets</li><li>Length: cropped</li></ul>Fit: Slim Fit<br>The model (height 5'8) is wearing a size 2895% Cotton 5% Lycra<br>Machine wash"

<span style="color: navy;">

### Observation

- A significant cleanup is required as product descriptions contains many HTML tags. 
- These HTML tags need to be removed to ensure the descriptions are clean and useful for search queries and analysis.
- While creating embedding, we dont want the processor to process those HTML tags, which will add no value
    

</span>


In [28]:
# Function to eliminate HTML tags from a given string.
def clean_html_tags(html_string):
    # Define a regular expression pattern to identify HTML tags.
    pattern = re.compile(r'<.*?>')

    # Replace HTML tags with spaces using the pattern.
    cleaned_string = re.sub(pattern, ' ', html_string)
    
    # Remove extra spaces: split the string into words, filter out empty elements, and rejoin into a single string.
    cleaned_string = cleaned_string.split(' ')
    cleaned_string = list(filter(None, cleaned_string))
    cleaned_string = " ".join(cleaned_string)
    
    # Return the cleaned string without HTML tags.
    return cleaned_string

In [30]:
# Validate if function worked
clean_html_tags(fashion_data_all_cols.description[53])

"Light shade, light fade blue jeans Slim fit, mid-rise Clean look Stretchable 5 pockets Length: cropped Fit: Slim Fit The model (height 5'8) is wearing a size 2895% Cotton 5% Lycra Machine wash"

<span style="color: navy;">

### Observation

- These HTML tags were removed and now the descriptions are clean and useful for search queries and analysis.

</span>

In [31]:
fashion_data_all_cols.loc[:, 'cleaned_description'] = fashion_data_all_cols['description'].apply(lambda x: clean_html_tags(x))

# Display the first few rows of the modified DataFrame.
fashion_data_all_cols.head(3)

Unnamed: 0,p_id,name,products,price,colour,brand,img,ratingCount,avg_rating,description,p_attributes,cleaned_description
0,18202160,POONAM DESIGNER Women Red Yoke Design Kurti with Trousers,"Kurti, Trousers",2499.0,Red,POONAM DESIGNER,http://assets.myntassets.com/assets/images/18202160/2022/5/9/9677d0a5-387b-4...,7.0,3.714286,Red yoke design Kurti with Trousers <br> <br> <b> Kurti design: </b> <ul> <...,"{'Add-Ons': 'NA', 'Body Shape ID': '333,424', 'Body or Garment Size': 'Garme...",Red yoke design Kurti with Trousers Kurti design: Yoke design Straight shape...
1,5389041,STREET 9 Women White & Black Checked Culotte Jumpsuit,Culotte Jumpsuit,1899.0,White,STREET 9,http://assets.myntassets.com/assets/images/5389041/2018/4/20/11524219473683-...,105.0,4.171429,"White and black checked culotte jumpsuit with waist tie-up detail, has a sho...","{'Body or Garment Size': 'To-Fit Denotes Body Measurements in', 'Fabric': 'C...","White and black checked culotte jumpsuit with waist tie-up detail, has a sho..."
2,13975624,Vishudh Women Off-White Checked Kurta with Palazzo and Dupatta,"Kurta, Palazzo, Dupatta",2549.0,Off White,Vishudh,http://assets.myntassets.com/assets/images/productimage/2021/3/26/b329d1f5-b...,14.0,2.857143,<p>Length 88 width 40(inches)</p><p>Cotton Machine wash</p>The model (height...,"{'Add-Ons': 'NA', 'Body Shape ID': '443,333,324,424', 'Body or Garment Size'...","Length 88 width 40(inches) Cotton Machine wash The model (height 5'8"") is we..."


<span style="color: navy;">

### Observation

- These HTML tags were removed in entire DF and now the descriptions are clean and useful for search queries and analysis.

</span>


In [32]:
fashion_data_all_cols.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   p_id                 2000 non-null   int64  
 1   name                 2000 non-null   object 
 2   products             2000 non-null   object 
 3   price                2000 non-null   float64
 4   colour               2000 non-null   object 
 5   brand                2000 non-null   object 
 6   img                  2000 non-null   object 
 7   ratingCount          2000 non-null   float64
 8   avg_rating           2000 non-null   float64
 9   description          2000 non-null   object 
 10  p_attributes         2000 non-null   object 
 11  cleaned_description  2000 non-null   object 
dtypes: float64(3), int64(1), object(8)
memory usage: 187.6+ KB


<span style="color: navy;">

### Set OpenAI API Key

In [33]:
with open('/Users/shrinivasd/#Upgrad/#12_GenAi_Upgrad/OpenAI_API_Key.txt', 'r') as file:
    api_key = file.read().strip()
openai.api_key = api_key

<span style="color: navy;">

### Consolidate key info into single col

In [34]:
def generate_product_assortment(product):

    # Format the product details into a readable string, excluding non-significant fields.
    product_assortment_details = f"""
    Name : {product['name']}
    Products : {product['products']}
    Price : {product['price']}
    Color : {product['colour']}
    Brand : {product['brand']} 
    Rating : {f"{product['avg_rating']:.2f}"} 
    Description : {product['cleaned_description']}
    Attributes : {product['p_attributes']}
    ImageLink : {product['img']}
    """
   
    return product_assortment_details.strip()

# An example product_assortment
print(generate_product_assortment(fashion_data_all_cols.iloc[5]))

Name : Indo Era Women Pink Ethnic Motifs Embroidered Kurta with Trousers & Dupatta
    Products : Kurta, Trousers, Dupatta
    Price : 5999.0
    Color : Pink
    Brand : Indo Era 
    Rating : 4.22 
    Description : Pink embroidered Kurta with Trousers with dupatta Kurta design: Ethnic motifs embroidered Straight shape Regular style Round neck, three-quarter regular sleeves Calf length with straight hem Cotton blend machine weave fabric Trousers design: Solid Trousers Partially elasticated waistband Slip-on closure 1 pockets Dupatta Lenght : 2.3m, Dupatta Width : 85 cm Size worn by the model: S Chest: 32" Waist: 28" Hips: 33" Height: 5'7" Cotton Blend Organza&nbsp; Machine Wash
    Attributes : {'Add-Ons': 'NA', 'Body Shape ID': '333,424', 'Body or Garment Size': 'Garment Measurements in', 'Bottom Closure': 'Slip-On', 'Bottom Fabric': 'Cotton Blend', 'Bottom Pattern': 'Solid', 'Bottom Type': 'Trousers', 'Character': 'NA', 'Dupatta': 'With Dupatta', 'Dupatta Border': 'Taping', 'Dupatt

In [35]:
# Apply the generate_product_assortment function to each row and set 'product_assortment' column
fashion_data_all_cols.loc[:, 'product_assortment'] = fashion_data_all_cols.apply(generate_product_assortment, axis=1)

# Compute the length of 'product_assortment' and set 'product_assortment_len' column
fashion_data_all_cols.loc[:, 'product_assortment_len'] = fashion_data_all_cols['product_assortment'].apply(len)
fashion_data_all_cols.head(2)

Unnamed: 0,p_id,name,products,price,colour,brand,img,ratingCount,avg_rating,description,p_attributes,cleaned_description,product_assortment,product_assortment_len
0,18202160,POONAM DESIGNER Women Red Yoke Design Kurti with Trousers,"Kurti, Trousers",2499.0,Red,POONAM DESIGNER,http://assets.myntassets.com/assets/images/18202160/2022/5/9/9677d0a5-387b-4...,7.0,3.714286,Red yoke design Kurti with Trousers <br> <br> <b> Kurti design: </b> <ul> <...,"{'Add-Ons': 'NA', 'Body Shape ID': '333,424', 'Body or Garment Size': 'Garme...",Red yoke design Kurti with Trousers Kurti design: Yoke design Straight shape...,Name : POONAM DESIGNER Women Red Yoke Design Kurti with Trousers\n Produc...,1721
1,5389041,STREET 9 Women White & Black Checked Culotte Jumpsuit,Culotte Jumpsuit,1899.0,White,STREET 9,http://assets.myntassets.com/assets/images/5389041/2018/4/20/11524219473683-...,105.0,4.171429,"White and black checked culotte jumpsuit with waist tie-up detail, has a sho...","{'Body or Garment Size': 'To-Fit Denotes Body Measurements in', 'Fabric': 'C...","White and black checked culotte jumpsuit with waist tie-up detail, has a sho...",Name : STREET 9 Women White & Black Checked Culotte Jumpsuit\n Products :...,802


<span style="color: navy;">

### Observation

- Comprehensive Formatting: generate_product_assortment function compiles essential product details such as name, category, price, color, brand, rating, description, and attributes into a single formatted string.
- Exclusion of Non-Significant Fields: Fields deemed non-essential for search relevance, such as 'ratingCount' and 'p_id', are excluded from the formatted string.
- Enhanced Readability: The formatted string enhances readability by clearly presenting product information, which is crucial for generating meaningful search results and recommendations.
- generate_product_assortment has been sucessfully applied at DF level and product_assortment and product_assortment_len cols created
   


</span>


In [36]:
pd.set_option('display.max_colwidth', None)  # No limit for column width
fashion_data = fashion_data_all_cols[['product_assortment']]
fashion_data.head(2)

Unnamed: 0,product_assortment
0,"Name : POONAM DESIGNER Women Red Yoke Design Kurti with Trousers\n Products : Kurti, Trousers\n Price : 2499.0\n Color : Red\n Brand : POONAM DESIGNER \n Rating : 3.71 \n Description : Red yoke design Kurti with Trousers Kurti design: Yoke design Straight shape Regular style Round neck, three-quarter regular sleeves Calf length with straight hem Silk blend machine weave fabric Trousers design: Solid Trousers Partially elasticated waistband Slip-on closure Wash Care :- Dry Clean The model (height 5'8) is wearing a size S\n Attributes : {'Add-Ons': 'NA', 'Body Shape ID': '333,424', 'Body or Garment Size': 'Garment Measurements in', 'Bottom Closure': 'Slip-On', 'Bottom Fabric': 'Silk Blend', 'Bottom Pattern': 'Solid', 'Bottom Type': 'Trousers', 'Character': 'NA', 'Dupatta': 'NA', 'Dupatta Border': 'NA', 'Dupatta Fabric': 'NA', 'Dupatta Pattern': 'NA', 'Main Trend': 'NA', 'Neck': 'Round Neck', 'Number of Pockets': 'NA', 'Occasion': 'Festive', 'Ornamentation': 'NA', 'Pattern Coverage': 'Yoke or Border', 'Sleeve Length': 'Three-Quarter Sleeves', 'Sleeve Styling': 'Regular Sleeves', 'Slit Detail': 'Side Slits', 'Stitch': 'Ready to Wear', 'Sustainable': 'Regular', 'Technique': 'NA', 'Top Design Styling': 'Regular', 'Top Fabric': 'Silk Blend', 'Top Hemline': 'Straight', 'Top Length': 'Calf Length', 'Top Pattern': 'Yoke Design', 'Top Shape': 'Straight', 'Top Type': 'Kurti', 'Waistband': 'Partially Elasticated', 'Wash Care': 'Dry Clean', 'Weave Pattern': 'Regular', 'Weave Type': 'Machine Weave', 'Wedding': 'NA'}\n ImageLink : http://assets.myntassets.com/assets/images/18202160/2022/5/9/9677d0a5-387b-40f3-80b4-1701239b3ac81652097227106POONAMDESIGNERWomenRedYokeDesignKurtiwithTrousers1.jpg"
1,"Name : STREET 9 Women White & Black Checked Culotte Jumpsuit\n Products : Culotte Jumpsuit\n Price : 1899.0\n Color : White\n Brand : STREET 9 \n Rating : 4.17 \n Description : White and black checked culotte jumpsuit with waist tie-up detail, has a shoulder straps, sleevelessCotton Machine-washThe model (height 5'8'') is wearing a size S\n Attributes : {'Body or Garment Size': 'To-Fit Denotes Body Measurements in', 'Fabric': 'Cotton', 'Neck': 'Shoulder Straps', 'Number of Pockets': '2', 'Pattern': 'Checked', 'Sleeve Length': 'Sleeveless', 'Surface Styling': 'NA', 'Type': 'Culotte Jumpsuit', 'Wash Care': 'Machine Wash'}\n ImageLink : http://assets.myntassets.com/assets/images/5389041/2018/4/20/11524219473683-STREET-9-White--Black-Checked-Culotte-Jumpsuit-511524219473492-1.jpg"


<span style="color: navy;">

### Observation

- We just need single column which we will feed for embedding and storing in Chroma DB
- Create new DF with single col
   


</span>


In [37]:
fashion_path = '/Users/shrinivasd/#Upgrad/#12_GenAi_Upgrad/LammaIndex_Assingment/FashionDataFile/out/'
fashion_data.to_csv(fashion_path+"fashion_file.csv", index=False)

display(fashion_data.shape)

(2000, 1)

In [38]:
fashion_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 1 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   product_assortment  2000 non-null   object
dtypes: object(1)
memory usage: 15.8+ KB


In [39]:
from IPython.display import display, HTML
from llama_index.core import VectorStoreIndex
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

<span style="color: navy;">

## **Part 1 - Document ingestion to Vector DB**

### Load document from folder
Since the data is sourced from CSV files, the `CSVReader` is utilized for loading. By setting `concat_rows = False`, each row in the CSV is treated as an individual document. This means that each row of fashion information is processed and analyzed separately as a distinct document.




In [40]:
from llama_index.readers.file import (CSVReader)
from llama_index.core.node_parser import SimpleNodeParser, TextSplitter
from llama_index.core import SimpleDirectoryReader

parser = CSVReader(concat_rows=False)
file_extractor = {".csv":parser}

documents = SimpleDirectoryReader(input_dir = fashion_path, 
                                  file_extractor=file_extractor).load_data()

documents[20]

Document(id_='c335387c-4271-4ff4-a0ec-86e53351acdc', embedding=None, metadata={'filename': 'fashion_file.csv', 'extension': '.csv', 'file_path': '/Users/shrinivasd/#Upgrad/#12_GenAi_Upgrad/LammaIndex_Assingment/FashionDataFile/out/fashion_file.csv', 'file_name': 'fashion_file.csv', 'file_type': 'text/csv', 'file_size': 2563113, 'creation_date': '2024-07-25', 'last_modified_date': '2024-07-25'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text="Name : KALINI Women Blue Printed Pure Cotton Kurta with Trousers & With Dupatta\n    Products : Kurta, Trousers, Dupatta\n    Price : 3999.0\n    Color : Blue\n    Brand : KALINI \n    Rating : 3.79 \n    Description : Blue printed Kurta with Trousers with dupatta Kurta design: Woven design printed A-line shape Regul

In [41]:
len(documents)

2001

<span style="color: navy;">

### Document parsing and chunking
 
- **RecursiveCharacterTextSplitter:** This tool recursively divides text into smaller segments, ensuring each chunk stays within a specified character limit. This approach helps manage large volumes of fashion data while preserving contextual integrity.

- **LangchainNodeParser:** Following chunking, this parser processes the segmented data to extract relevant information, maintain context, and prepare the data for subsequent analysis or search operations.
    

In [43]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from llama_index.core.node_parser import LangchainNodeParser

parser = LangchainNodeParser(RecursiveCharacterTextSplitter())
nodes = parser.get_nodes_from_documents(documents)

parser

LangchainNodeParser(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x12ffcbb90>, id_func=<function default_id_func at 0x12ec7d120>)

<span style="color: navy;">

### Generate and Store Embeddings 
    
- In this step, we will utilize the **text-embedding-ada-002** model framework to embed the product descriptions. The resulting embeddings will be stored in a ChromaDB collection, leveraging its efficient handling of vector data for quick retrieval based on similarity.


</span>


In [45]:
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_model = "text-embedding-ada-002"
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name=embedding_model)
embedding_function

<chromadb.utils.embedding_functions.OpenAIEmbeddingFunction at 0x12eac0d50>

<span style="color: navy;">


### Chroma Vector Store

While the `VectorStoreIndex` provides fundamental indexing features, the `ChromaVectorStore` significantly enhances performance, scalability, and advanced search capabilities. For a Fashion AI project, these improvements lead to faster, more precise, and more efficient retrieval of relevant fashion data. As a result, `ChromaVectorStore` becomes an essential component for developing a robust and high-performing search system within the fashion domain.


In [50]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
import os

chromadb_path = os.path.join(fashion_path, 'chromadb/')


chromadb_client = chromadb.PersistentClient(path=chromadb_path)
fashion_collection = chromadb_client.get_or_create_collection(name='RAG_on_fashion', embedding_function=embedding_function)
vector_store = ChromaVectorStore(chroma_collection=fashion_collection, distance_metric="cosine")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

<span style="color: navy;">

### Vector Index

The `VectorStoreIndex` is a key component that manages:

- **Embedding Generation:** Creates embeddings for both fashion documents and user queries.
- **Efficient Storage:** Stores document vectors in a way that optimizes retrieval performance.
- **Similarity Search:** Identifies relevant fashion items or documents based on vector similarity.
- **Filtering and Ranking:** Refines and prioritizes search results to ensure the most relevant fashion data is presented.


In [51]:
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
index

<llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x133558fd0>

<span style="color: navy;">

    
## **Part 2 - Chunks retrieval from Vector DB for query**

### Node retriever

In [52]:
from llama_index.core.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)
retriever

<llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever at 0x1385b6110>

<span style="color: navy;">

### Query Engine with configuration

In [55]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.75)]
)
query_engine

<llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x1357afc10>

<span style="color: navy;">

## **Part 3 - Response generation by LLM based on chunks retrieved from Vector DB**
### LLM Model

In [None]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.4)

<span style="color: navy;">

## Prompt Template

In [74]:
from llama_index.core import ChatPromptTemplate

# Define the QA prompt template for the fashion assistant
qa_prompt_str = ("""You are a knowledgeable fashion assistant who can provide detailed information about 
                    fashion products based on user queries.
                    You have a question asked by the user in {query_str} and you have some search results 
                    from a corpus of fashion product data in the dataframe {context_str}. 
                    These search results include information about various fashion products that 
                    may be relevant to the user query.

                    Use the information in {context_str} to answer the query {query_str}. Frame a detailed 
                    and informative response and provide relevant product information such as 
                    product name, price, description, and image link.

                    Follow the guidelines below when performing the task.
                    1. Provide relevant and accurate product details if available.
                    2. Analyze the information carefully and provide a comprehensive answer to the query.
                    3. If the product details include prices, ensure that they are presented in INR. Provide result within user budget in case query contains price info
                    4. Include product images links at end of each recommended product.
                    5. If you can't provide the complete answer, include any additional information that might 
                       help the user make a better decision.
                    6. Provide clear and concise product information with relevant details.
                    7. Give first row as "Top recommend:", and for rest "Similar product:" title
                    8. Try to suggest top 3 products where-ever applicable 
                    

                    The generated response should address the user query directly and concisely, avoiding 
                    unnecessary information. 
                    If you determine that the query is not relevant to the provided documents, 
                    state that the query is irrelevant.
                    Ensure the final response is well-formatted and easy to read, with all relevant 
                    details included.

                    Answer -
                    {{Provide the detailed answer natural language format without any formatting to the 
                    question asked, ONLY 
                    including 
                    Name , 
                    Products, 
                    Price, 
                    Color , 
                    Rating, 
                    Description, 
                    just few key info from Attributes, 
                    wash care
                    and 
                    ImageLink. 
                    
                    Image link MUST always be with ImageLink: and just link. Image link is in ImageLink}}
                    The answer should look like natural english language, just using values from above columns.

                    """)

# Define the chat messages for the QA prompt template
fashion_text_msgs = [
    (
        "system",
        "You are a knowledgeable fashion assistant who can provide detailed information about fashion products based on user queries.",
    ),
    ("user", qa_prompt_str),
]

# Create the ChatPromptTemplate for the QA prompt
text_qa_template = ChatPromptTemplate.from_messages(fashion_text_msgs)

# ------------------------------------------------------------------------------------------------------------

# Define the refine prompt template for additional context
refine_prompt_str = (
    """
    We have the opportunity to refine the original answer 
    (only if needed) with some more context below.
    ------------
    {context_msg}
    ------------
    Given the new context, refine the original answer to better 
    address the question: {query_str}. 
    If the context isn't useful, output the original answer again.
    Original Answer: {existing_answer}    
    
    """

)

# Define the chat messages for the refine prompt template
chat_refine_msgs = [
    (
        "system",
        "Always answer the question based on the received information, refining as necessary.",
    ),
    ("user", refine_prompt_str),
]

# Create the ChatPromptTemplate for the refine prompt
refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)


<span style="color: navy;">

## Generate Response

In [75]:
def query_response(user_input):
    global llm, text_qa_template, refine_template

    final_response = index.as_query_engine(
        text_qa_template=text_qa_template,
        refine_template = refine_template,
        embedding_function=embedding_function,
        llm=llm,
    ).query(user_input)
    return final_response

<span style="color: navy;">

## Generate images and display

In [76]:
import re
import pandas as pd
from IPython.display import display, HTML

# Function to parse the response into a DataFrame
def parse_response_to_df(response_text):
    """
    Parse the response text into a DataFrame.
    """
    data = []

    if not response_text:
        return pd.DataFrame(columns=["Title", "Product Info", "Image"])

    # Split the response into sections
    sections = re.split(r'\n\n(?=Top recommend:|Similar product:)', response_text.strip())

    # Process each section
    for section in sections:
        if section.startswith("Top recommend:"):
            title = "Top recommend"
        elif section.startswith("Similar product:"):
            title = "Similar product"
        else:
            continue

        # Extract product info and image link
        product_info_match = re.search(r'(Top recommend:|Similar product:)(.*?)(?=ImageLink:)', section, re.DOTALL)
        image_match = re.search(r'ImageLink: (http[^\s]+)', section)

        if product_info_match:
            product_info = product_info_match.group(2).strip().replace('\n', ' ')
        else:
            product_info = "Information not available"

        if image_match:
            image_link = image_match.group(1).strip()
        else:
            image_link = "No image available"

        data.append([title, product_info, image_link])

    # Create a DataFrame
    op_df = pd.DataFrame(data, columns=["Title", "Product Info", "Image"])
    return op_df

# Function to display the DataFrame with images
def display_df_with_images(df):
    """
    Display the DataFrame with text and images.
    """
    for index, row in df.iterrows():
        print(f"{row['Title']}:\n{row['Product Info']}\n")
        if row["Image"] != "No image available":
            display(HTML(f'<img src="{row["Image"]}" width="275px" />'))
        else:
            print("Image not available")
        print("\n")
        
# Main clubbed        
def process_user_query(user_input):
    response = query_response(user_input)
    fmt_response = response.response
    
    # Parse the formatted response and display it
    op_df = parse_response_to_df(fmt_response)
    display_df_with_images(op_df)        

<span style="color: navy;">


## Validate user queries

##### Query 1 : "Suggest me Floral Printed dress "

In [119]:
process_user_query("Suggest me Floral Printed dress ")

Top recommend:
Name: Ahalyaa Women Peach-Coloured Floral Printed Regular Sequinned Kurta with Palazzos & With Dupatta Products: Kurta, Palazzos, Dupatta Price: INR 6550.0 Color: Peach Rating: 3.62 Description: Embrace ethnicity with grace by wearing this modish kurta set. This set comprises of an appealing printed kurta and well-fitting printed palazzos to add to your overall style. Trend Alert: Symbols of freshness and beauty, romantic florals in vibrant hues and prints have been a mainstay in fashion for decades. Wash Care: Hand Wash





Similar product:
Name: Berrylush Classic White Polyester Romantic Florals Kurta Products: Kurta Price: INR 1699.0 Color: White Rating: 4.20 Description: Opt for this trendy kurta and be creative with how you want to style it. The eye-catching straight hemline and calf-length will add to your ethnic look. Trend Alert: Symbols of freshness and beauty, romantic florals in vibrant hues and prints have been a mainstay in fashion for decades. Wash Care: Machine Wash







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">


## Validate user queries

##### Query 2 : "Suggest me Peach coloured Shorts."

In [106]:
process_user_query("Suggest me Peach coloured Shorts.")

Top recommend:
Name: MANGO Women Peach-Coloured Solid Regular Shorts Products: Shorts Price: INR 2390.0 Color: Peach Rating: 3.58 Description: Peach-coloured solid mid-rise regular shorts, has 2 pockets, slip-on closure. 63% Viscose rayon, 36% Polyester and 1% Elastane. Machine wash. Regular Fit. Wash Care: Machine Wash





Similar product:
Name: HRX By Hrithik Roshan Racket sports Women Optic White & Peach-Coloured Colourblock Shorts Products: Shorts Price: INR 1499.0 Color: White Rating: 4.59 Description: Enjoy a full range of motion for the game with the HRX Women's Racket Sport Shorts. Rapid Dry Technology wicks sweat & makes the fabric dry fast. Anti-microbial technology prevents odor-causing microbes. Anti static technology. Mid rise waist gives secure fit. Colour: Optic White & Peach-coloured. Machine Wash. Regular fit. Wash Care: Machine Wash







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">


## Validate user queries

##### Query 3 : "Suggest me top rated kurta with Palazzos from Anubhutee brand"

In [102]:
process_user_query("Suggest me top rated kurta with Palazzos from Anubhutee brand")

Top recommend:
Name: Anubhutee Women Navy Blue & Off-White Bandhani Print Kurta with Palazzos Products: Kurta, Palazzos Price: INR 2749.0 Color: Navy Blue Rating: 4.29 Description: Navy blue and off-white bandhani print kurta with palazzos. Navy blue and off-white bandhani print straight calf length kurta, has a round neck, three-quarter sleeves, straight hem, side slits. Off-white checked palazzos, has elasticated waistband with drawstring closure. Hand-wash.





Similar product:
Name: Anubhutee Women Maroon Embroidered Kurta with Palazzos & Dupatta Products: Kurta, Palazzos, Dupatta Price: INR 4349.0 Color: Maroon Rating: 3.97 Description: Maroon embroidered Kurta with Palazzos with dupatta. Kurta design: Floral embroidered straight shape, regular style, round neck, three-quarter regular sleeves, thread work detail, calf length with straight hem. Palazzos design: Printed palazzos, partially elasticated waistband, hook and eye closure. Hand wash.







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">


## Validate user queries

##### Query 4 : "Suggest me red kurta"

In [91]:
process_user_query("Suggest me red kurta")

Top recommend:
Name: Inddus Women Red Ethnic Motifs Embroidered Regular Sequinned Kurta with Sharara & With Dupatta Products: Kurta, Sharara, Dupatta Price: INR 7499.0 Color: Red Rating: 4.24 Description: Red embroidered Kurta with Sharara with dupatta. Ethnic motifs embroidered design with sequinned detail. Knee length with scalloped hem. Net machine weave fabric. Attributes: Sleeve Length - Short Sleeves, Neck - Round Neck, Top Length - Knee Length, Bottom Type - Sharara, Dupatta - With Dupatta, Occasion - Festive Wash Care: Dry Clean





Similar product:
Name: Anubhutee Women Red Ethnic Motifs Printed Empire Pure Cotton Kurta with Trousers & With Dupatta Products: Kurta, Trousers, Dupatta Price: INR 5899.0 Color: Red Rating: 4.25 Description: Red printed Kurta with Trousers with dupatta. Ethnic motifs printed design with empire style. Calf length with flared hem. Pure cotton machine weave fabric. Attributes: Sleeve Length - Three-Quarter Sleeves, Neck - Round Neck, Top Length - Calf Length, Bottom Type - Trousers, Dupatta - With Dupatta, Occasion - Festive Wash Care: Hand Wash







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">


## Validate user queries

##### Query 5 : "Suggest me Stretchable Jeans within 2000 rupess"

In [94]:
process_user_query("Suggest me Stretchable Jeans within 2000 rupess")

Top recommend:
Name: Tokyo Talkies Women Blue Super Skinny Fit Mid-Rise Clean Look Stretchable Jeans Products: Jeans Price: 1899.0 INR Color: Blue Rating: 3.91 Description: Blue light wash 5-pocket mid-rise jeans, clean look with no fade, button and zip closure, waistband with belt loops. Super Skinny Fit Stretchable. 98% Cotton 2% Lycra. Machine-wash. Attributes: Super Skinny Fit, Stretchable, Mid-Rise, 5 pockets, Button and Zip closure Wash Care: Machine Wash





Similar product:
Name: Levis Women Navy Blue Super Skinny Fit Stretchable Jeans 710 Products: Jeans Price: 1999.0 INR Color: Navy Blue Rating: 4.12 Description: Dark shade, no fade navy blue jeans. Super skinny fit, mid-rise, clean look, stretchable. 83% Cotton, 16% Polyester, 1% Elastane. Machine wash. Attributes: Super Skinny Fit, Stretchable, Mid-Rise, 5 pockets, Button and Zip closure Wash Care: Machine Wash







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">


## Validate user queries

##### Query 6 : "Suggest me Jumpsuit in Navy Blue Color with rating aboue 4"

In [100]:
process_user_query("Suggest me Jumpsuit in Navy Blue Color with rating aboue 4")

Top recommend:
Name: Mast & Harbour Women Navy Blue Solid Cinched Waist Jumpsuit Products: Jumpsuit Price: INR 2199.0 Color: Navy Blue Rating: 4.39 Description: A blend of style and allure, this jumpsuit is ideal for your next outing. Well-crafted with a shirt collar and short sleeves, this jumpsuit lends a playful look. Navy blue shade with a classic solid pattern. Polyester material, machine washable. Trend Alert: A garment with a cinched waist helps define the body shape and lends a slimming effect. Style Tip: Glam up this jumpsuit with bling teardrop earrings, lace-up boots, and rectangular tortoiseshell shades. Where-to-wear: Prep your look for weekend brunch. Attributes: Closure - Button, Neck - Shirt Collar, Number of Pockets - 2, Sleeve Length - Short Sleeves, Type - Basic Jumpsuit Wash Care: Machine Wash





Similar product:
Name: Tokyo Talkies Women Navy Blue Solid Basic Jumpsuit Products: Jumpsuit Price: INR 2299.0 Color: Navy Blue Rating: 4.31 Description: Navy Blue solid basic jumpsuit with a mandarin collar and three-quarter sleeves. Cotton material, machine washable. Attributes: Closure - Button, Neck - Mandarin Collar, Sleeve Length - Three-Quarter Sleeves, Type - Basic Jumpsuit Wash Care: Machine Wash







<span style="color: navy;">

### Test result Observation    
    
-  RAG with LammaIndex recommended right product
-  Generative search has expressed same info in natural langauge format
   

<span style="color: navy;">

### Final Observation and notes  
    
-  The program is working as expected and able to cater to user queries. 
-  The images are extracted directly from HTTP server and matched as per user query. 
   

<span style="color: navy;">

# Exploration Zone
    
- Try out / explore 
    


In [None]:
process_user_query("Pink top above 2500 INR")

In [None]:
process_user_query("Floral Top with 4 star rating")