## **HelpMate AI project**(Fashion AI Chatbot)

This project builds an AI-powered fashion search system using LangChain to enable intelligent, natural language-based product discovery. By leveraging Retrieval-Augmented Generation (RAG), vector search, and LLM-based query processing, the system searches through a vast collection of fashion product descriptions and recommends the most relevant choices. The Myntra dataset from Kaggle serves as the primary data source.

1.**Data Collection & Preprocessing**

Load the Myntra dataset (CSV format) using LangChain's CSVLoader.

Clean missing values, normalize text fields (brand names, colors, descriptions).

2.**Text Embedding & Vectorization (LangChain + FAISS/ChromaDB)**

Use LangChain's OpenAIEmbeddings to convert product descriptions into embeddings.

Store embeddings in a vector database (FAISS or ChromaDB) for fast retrieval.

3.**Retrieval-Augmented Generation (RAG) for Fashion Search**

Implement semantic search using LangChain’s VectorStoreRetriever.

Retrieve similar product descriptions based on user queries.

4.**LLM-based Query Processing with LangChain**

Use LangChain’s LLMChain to refine and interpret user queries.

Convert user-friendly queries into structured search parameters.

5.**Search & Ranking Mechanism**

Retrieve top-ranked product matches using cosine similarity.

Apply LangChain’s prompt templates to generate search explanations.

6.**Conversational Agent for Interactive Search**

Integrate LangChain’s ConversationalRetrievalChain to allow follow-up queries.

Maintain context: Users can refine searches dynamically (e.g., “Show more similar options”).

# **Instructions for running:**

1. Use Myntra Dataset [Myntra dataset](https://www.kaggle.com/datasets/djagatiya/myntra-fashion-product-dataset) .

2. Use openAI API key

In [6]:
#Installing and Import necessary libraries

In [1]:
!pip install -q openai langchain chromadb faiss-cpu pypdf tiktoken docarray

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m37.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m50.7 MB/s[0m eta [36m0:00:0

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
import openai
os.chdir('/content/drive/MyDrive/Chatcompletion API docs')
openai.api_key = open("OPenAI key.txt", "r").read().strip()
os.environ['OPENAI_API_KEY'] = openai.api_key

In [4]:

!pip install -q langchain-openai
!pip install -U langchain-community


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/60.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.9/60.9 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain<1.0.0,>=0.3.21 (from langchain-community)
  Downloading langchain-0.3.21-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow

In [5]:
import pandas as pd
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter

# **Data Collection and Preprocessing**


In [7]:
df=pd.read_csv("Fashion Dataset v2.csv")

In [8]:
df.head()

Unnamed: 0,p_id,name,products,price,colour,brand,img,ratingCount,avg_rating,description,p_attributes
0,17048614,Khushal K Women Black Ethnic Motifs Printed Ku...,"Kurta, Palazzos, Dupatta",5099.0,Black,Khushal K,http://assets.myntassets.com/assets/images/170...,4522.0,4.418399,Black printed Kurta with Palazzos with dupatta...,"{'Add-Ons': 'NA', 'Body Shape ID': '443,333,32..."
1,16524740,InWeave Women Orange Solid Kurta with Palazzos...,"Kurta, Palazzos, Floral Print Dupatta",5899.0,Orange,InWeave,http://assets.myntassets.com/assets/images/165...,1081.0,4.119334,Orange solid Kurta with Palazzos with dupatta<...,"{'Add-Ons': 'NA', 'Body Shape ID': '443,333,32..."
2,16331376,Anubhutee Women Navy Blue Ethnic Motifs Embroi...,"Kurta, Trousers, Dupatta",4899.0,Navy Blue,Anubhutee,http://assets.myntassets.com/assets/images/163...,1752.0,4.16153,Navy blue embroidered Kurta with Trousers with...,"{'Add-Ons': 'NA', 'Body Shape ID': '333,424', ..."
3,14709966,Nayo Women Red Floral Printed Kurta With Trous...,"Kurta, Trouser, Dupatta",3699.0,Red,Nayo,http://assets.myntassets.com/assets/images/147...,4113.0,4.088986,Red printed kurta with trouser and dupatta<br>...,"{'Add-Ons': 'NA', 'Body Shape ID': '333,424', ..."
4,11056154,AHIKA Women Black & Green Printed Straight Kurta,Kurta,1350.0,Black,AHIKA,http://assets.myntassets.com/assets/images/110...,21274.0,3.978377,"Black and green printed straight kurta, has a ...","{'Body Shape ID': '424', 'Body or Garment Size..."


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14214 entries, 0 to 14213
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   p_id          14214 non-null  int64  
 1   name          14214 non-null  object 
 2   products      14214 non-null  object 
 3   price         14214 non-null  float64
 4   colour        14214 non-null  object 
 5   brand         14214 non-null  object 
 6   img           14214 non-null  object 
 7   ratingCount   6530 non-null   float64
 8   avg_rating    6530 non-null   float64
 9   description   14214 non-null  object 
 10  p_attributes  14214 non-null  object 
dtypes: float64(3), int64(1), object(7)
memory usage: 1.2+ MB


In [9]:
# eliminate duplicates in df

df = df.drop_duplicates()


In [10]:
df.shape

(14214, 11)

In [41]:
#combine product id,name,description and price to column text

df['text'] = df['p_id'].astype(str) + " " + df['name'] + " " + df['description'] + " " + df['price'].astype(str) + " " + df['img']


In [42]:
df['text']

Unnamed: 0,text
0,17048614 Khushal K Women Black Ethnic Motifs P...
1,16524740 InWeave Women Orange Solid Kurta with...
2,16331376 Anubhutee Women Navy Blue Ethnic Moti...
3,14709966 Nayo Women Red Floral Printed Kurta W...
4,11056154 AHIKA Women Black & Green Printed Str...
...,...
14209,15415116 Flying Machine Women Blue Solid Mock-...
14210,16470114 Juelle Women Green Printed Hooded Swe...
14211,16382150 Vero Moda Women Pink Sweatshirt Pink ...
14212,16379664 Vero Moda Women Blue Sweatshirt Blue ...


In [43]:
# Converting the df into LangChain Document

from langchain.docstore.document import Document

documents = [
    Document(
        page_content=row["text"],
        metadata={
            "p_id": row["p_id"],
            "name": row["name"],
            "brand": row["brand"],
            "price": row["price"]
        }
    )
    for _, row in df.iterrows()
]


In [44]:
documents[0]

Document(metadata={'p_id': 17048614, 'name': 'Khushal K Women Black Ethnic Motifs Printed Kurta with Palazzos & With Dupatta', 'brand': 'Khushal K', 'price': 5099.0}, page_content="17048614 Khushal K Women Black Ethnic Motifs Printed Kurta with Palazzos & With Dupatta Black printed Kurta with Palazzos with dupatta <br> <br> <b> Kurta design:  </b> <ul> <li> Ethnic motifs printed </li> <li> Anarkali shape </li> <li> Regular style </li> <li> Mandarin collar,  three-quarter regular sleeves </li> <li> Calf length with flared hem </li> <li> Viscose rayon machine weave fabric </li> </ul> <br> <b> Palazzos design:  </b> <ul> <li> Printed Palazzos </li> <li> Elasticated waistband </li> <li> Slip-on closure </li> </ul>Dupatta Length 2.43 meters Width:&nbsp;88 cm<br>The model (height 5'8) is wearing a size S100% Rayon<br>Machine wash 5099.0 http://assets.myntassets.com/assets/images/17048614/2022/2/4/b0eb9426-adf2-4802-a6b3-5dbacbc5f2511643971561167KhushalKWomenBlackEthnicMotifsAngrakhaBeadsandS

### **Text Embedding & Vectorization (LangChain + FAISS)**

In [45]:
# Create embeddings using OpenAI and create vectore store using FAISS
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents, embeddings)

## **Retrieval-Augmented Generation (RAG) for Fashion Search**

In [46]:
#Use db as the retriever
retriever=db.as_retriever()

In [47]:
# define memory to preserve the chat history
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


## **Defining Prompt Template**

In [48]:
# Define prompt using Prompt Template of LangChain

prompt_template = PromptTemplate(
    input_variables=["context", "question"],

    template="""
    You are an AI-powered fashion assistant. Based on the user query,Use the given context to recommend the best fashion products.

    Context:
    {context}

    Question:
    {question}

    Provide a relevant Product Name, Product ID, Brand, Description, Price, and Image URL based on the query.
    Make sure to mention the product1,product2 etc in different lines and Also add the introduction like here are some of the suggestions I have
    """
)
prompt_template


PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\n    You are an AI-powered fashion assistant. Based on the user query,Use the given context to recommend the best fashion products.\n    \n    Context:\n    {context}\n    \n    Question:\n    {question}\n    \n    Provide a relevant Product Name, Product ID, Brand, Description, Price, and Image URL based on the query.\n    Make sure to mention the product1,product2 etc in different lines and Also add the introduction like here are some of the suggestions I have\n    ')

# **Initialize LLM & ConversationalRetrievalChain**

In [49]:

# Initialize LLM & ConversationalRetrievalChain

llm = ChatOpenAI(model_name="gpt-4", temperature=0)

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    combine_docs_chain_kwargs={"prompt": prompt_template}
)


# **Conversational chat with Fashion AI**

In [50]:
# def function to get response based on query input and continue chat conversation
def chat_with_fashion_ai(query):
    response = conversation_chain.invoke({"question": query})
    answer = response.get("answer", "")

    # Ensure response is not an empty list
    if isinstance(answer, list):
        answer = "\n".join(answer)

    # Remove leading/trailing spaces
    return answer.strip()

while True:
    user_query = input("You: ")

    if "exit" in user_query.lower():
        print("Thank you for using Fashion AI. Goodbye!")
        break

    ai_response = chat_with_fashion_ai(user_query)


    print("\nAI Response:\n", ai_response)

You: I want a blue kurta set under 6000

AI Response:
 Here are some of the suggestions I have:

Product 1:
- Product Name: Prakhya Women Blue Embroidered Kurta with Trousers & Dupatta
- Product ID: 13259360
- Brand: Prakhya
- Description: Blue embroidered kurta with trousers. Blue straight calf length kurta, has a round neck, three-quarter sleeves, side slits. Blue Embroidered trousers, has elasticated waistband. Dupatta: 2 x 1 meters (length x width). The model (height 5'8") is wearing a size S. Kurta fabric: viscose rayon. Bottom fabric: viscose rayon. Dupatta fabric: silk chiffon. Hand-wash.
- Price: 4498.0
- Image URL: http://assets.myntassets.com/assets/images/productimage/2020/12/16/7c2188e5-8443-4a77-add1-e03c1def6b301608111700911-1.jpg

Product 2:
- Product Name: HERE&NOW Women Blue Solid Kurta with Trousers
- Product ID: 15886988
- Brand: HERE&NOW
- Description: Blue solid Kurta with Trousers. Kurta design: Solid, Straight shape, Regular style, Round neck, Three-quarter regul