# Helpmate Fashion Search AI :

#### We are developing a generative search system to recommend fashion products based on user queries. This system leverages the Myntra Fashion dataset from Kaggle to search and filter through a wide range of product descriptions.

# Methodology Used

- Step 1 : Importing the warnings and installing required libraries
- Step 2 : Understanding of the Dataset
- Step 3 : Data Pre-processing before creating Chatbot
- Step 4 : Creating an Embedding model
- Step 5 : Defining the Chatbot Logic
- Step 6 : Testing the Chatbot
- Step 7 : Conclusion

# Step 1 : Importing the warnings and installing required libraries

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install pandas faiss-cpu numpy sentence-transformers flask


Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_6

In [None]:
import os
print(os.listdir())

['.config', 'sample_data']


In [None]:
#Installing necessary libraries

import pandas as pd
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from flask import Flask, request, jsonify

# Step 2 : Understanding of the Dataset

In [17]:
path="/content/drive/MyDrive/Dataset/Fashion Dataset v2.csv"

df=pd.read_csv(path)
df.head(5)

Unnamed: 0,p_id,name,products,price,colour,brand,ratingCount,avg_rating,description,p_attributes
0,17048614,Khushal K Women Black Ethnic Motifs Printed Ku...,"Kurta, Palazzos, Dupatta",5099,Black,Khushal K,4522,4.4,Black printed Kurta with Palazzos with dupatta...,"Add-Ons: NA, Body Shape ID: 443,333,324,424, B..."
1,16524740,InWeave Women Orange Solid Kurta with Palazzos...,"Kurta, Palazzos, Floral Print Dupatta",5899,Orange,InWeave,1081,4.1,"Orange solid Kurta with Palazzos with dupatta,...","Add-Ons: NA, Body Shape ID: 443,333,324,424, B..."
2,16331376,Anubhutee Women Navy Blue Ethnic Motifs Embroi...,"Kurta, Trousers, Dupatta",4899,Navy Blue,Anubhutee,1752,4.2,Navy blue embroidered Kurta with Trousers with...,"Add-Ons: NA, Body Shape ID: 333,424, Body or G..."
3,14709966,Nayo Women Red Floral Printed Kurta With Trous...,"Kurta, Trouser, Dupatta",3699,Red,Nayo,4113,4.1,"Red printed kurta with trouser and dupatta,Kur...","Add-Ons: NA, Body Shape ID: 333,424, Body or G..."
4,11056154,AHIKA Women Black & Green Printed Straight Kurta,Kurta,1350,Black,AHIKA,21274,4.0,"Black and green printed straight kurta,has a n...","Body Shape ID: 424, Body or Garment Size: Garm..."


In [18]:
df.shape

(14214, 10)

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14214 entries, 0 to 14213
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   p_id          14214 non-null  int64  
 1   name          14214 non-null  object 
 2   products      14214 non-null  object 
 3   price         14214 non-null  int64  
 4   colour        14214 non-null  object 
 5   brand         14214 non-null  object 
 6   ratingCount   14214 non-null  int64  
 7   avg_rating    14214 non-null  float64
 8   description   14214 non-null  object 
 9   p_attributes  14214 non-null  object 
dtypes: float64(1), int64(3), object(6)
memory usage: 1.1+ MB


In [20]:
#Checking for any missing value

df.isnull().sum()

Unnamed: 0,0
p_id,0
name,0
products,0
price,0
colour,0
brand,0
ratingCount,0
avg_rating,0
description,0
p_attributes,0


#### No Missing Value, which is a good sign.

# Step 3 : Data Pre-processing before creating Chatbot

In [23]:
# Display correct column names
print("Column Names in the Dataset:", df.columns.tolist())

Column Names in the Dataset: ['p_id', 'name', 'products', 'price', 'colour', 'brand', 'ratingCount', 'avg_rating', 'description', 'p_attributes']


In [25]:
# Trim spaces and convert text to lowercase for relevant columns
text_columns = ["description", "brand", "p_attributes"]  # Using correct column names

for col in text_columns:
    if col in df.columns:
        df[col] = df[col].astype(str).str.strip().str.lower()
    else:
        print(f"Warning: Column '{col}' not found in the dataset.")

In [26]:
# Check for duplicates
duplicates = df.duplicated().sum()
print("\nNumber of Duplicate Rows:", duplicates)


Number of Duplicate Rows: 0


In [27]:
# Display a sample of cleaned data
print("\nCleaned Data Sample:\n", df.head())


Cleaned Data Sample:
        p_id                                               name  \
0  17048614  Khushal K Women Black Ethnic Motifs Printed Ku...   
1  16524740  InWeave Women Orange Solid Kurta with Palazzos...   
2  16331376  Anubhutee Women Navy Blue Ethnic Motifs Embroi...   
3  14709966  Nayo Women Red Floral Printed Kurta With Trous...   
4  11056154   AHIKA Women Black & Green Printed Straight Kurta   

                                products  price     colour      brand  \
0               Kurta, Palazzos, Dupatta   5099      Black  khushal k   
1  Kurta, Palazzos, Floral Print Dupatta   5899     Orange    inweave   
2               Kurta, Trousers, Dupatta   4899  Navy Blue  anubhutee   
3                Kurta, Trouser, Dupatta   3699        Red       nayo   
4                                  Kurta   1350      Black      ahika   

   ratingCount  avg_rating                                        description  \
0         4522         4.4  black printed kurta with palazzo

In [29]:
df.to_csv('/content/cleaned_fashion_dataset.csv', index=False)  # Save cleaned data
df = pd.read_csv('/content/cleaned_fashion_dataset.csv')  # Load it back

# Step 4 : Creating an Embedding model

In [31]:
# Load the model (you can replace with another model if needed)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Convert product descriptions into embeddings
product_descriptions = df["description"].astype(str).tolist()  # Ensure it's text
product_embeddings = model.encode(product_descriptions, normalize_embeddings=True)

# Create a FAISS index for fast similarity search
dimension = product_embeddings.shape[1]  # Get embedding dimensions
index = faiss.IndexFlatL2(dimension)
index.add(np.array(product_embeddings))


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Step 5 : Defining the Chatbot Logic

In [33]:
def recommend_fashion_items(query, top_n=5):
    query_embedding = model.encode([query], normalize_embeddings=True)
    distances, indices = index.search(np.array(query_embedding), top_n)

    recommendations = []
    for i in range(len(indices[0])):
        product_info = df.iloc[indices[0][i]][["name", "brand", "price", "colour", "description"]].to_dict()
        product_info["similarity_score"] = float(distances[0][i])  # Convert numpy float to normal float
        recommendations.append(product_info)

    return recommendations


# Step 6: Testing the Chatbot

In [35]:
query = "red dress"
results = recommend_fashion_items(query)

for item in results:
    print(f"Product: {item['name']}, Brand: {item['brand']}, Price: {item['price']}, Colour: {item['colour']}")
    print(f"Description: {item['description']}\n")

Product: Sera Red Two Piece Party Dress, Brand: sera, Price: 1998, Colour: Red
Description: red two piece party dress,red foil print crop top has shoulder strap neck,red foil print minin skirtthe model (height 58) is wearing a size shand wash,flat dry,do not iron

Product: Trendyol Red  Yellow Floral Print Ruched  Ruffled Straight Skirt, Brand: trendyol, Price: 1699, Colour: Red
Description: red &amp; yellow floral print ruched and ruffled straight skirt,has curved hem and slip-on closurethe model (height 58") is wearing a size 3695% polyester 5% elastane,machine wash

Product: Studio Shringaar Red Floral Printed Organza Skirt, Brand: studio shringaar, Price: 4500, Colour: Red
Description: red,floral printed maxi skirt,has a zip and hook closure,has a linning and flared hemthe model (height 58") is wearing a free size100% polyester,hand wash with cold water

Product: Chemistry Women Maroon & Blue Colourblocked Sweater Dress, Brand: chemistry, Price: 2499, Colour: Maroon
Description: re

# Step 7 : Conclusion

- We have successfully developed a simple yet effective fashion chatbot that returns relevant product recommendations based on user queries. By leveraging text preprocessing and similarity-based retrieval, our system provides accurate descriptions and details for fashion items. This approach ensures a seamless and interactive experience, making it easy for users to explore products efficiently. Given the scope of our assignment, we have kept the implementation straightforward while ensuring functionality and accuracy. This chatbot lays a strong foundation for further enhancements, such as integrating advanced NLP techniques or deploying it as a web-based service in the future. For now, we conclude our project, having met the required objectives with a clean and functional solution.