# E-Commerce Chatbot

In the rapidly evolving world of e-commerce, providing exceptional customer service and personalized shopping experiences is crucial. One effective way to achieve this is through the integration of intelligent chatbots. However, building a chatbot that can effectively handle both structured and unstructured data is a complex challenge.

To address this challenge, we employ a multi-step approach:
- **Step 1: Data Filtering with LLM** - We use a Large Language Model (LLM) to pre-filter the data.
- **Step 2: Relevant Data Extraction with RAG** - We implement the LLM in a Retrieval-Augmented Generation (RAG) architecture to identify the most relevant parts of the data and generate accurate responses.

Experiments show that RAG on its own is not very accurate and often produces random answers. However, pre-filtering the data using an LLM results in more deterministic and consistent outcomes.

## Load and Pre-Process the Data

### Imports

In [1]:
import pandas as pd

### Read the raw data

Data Source: [Amazon Product Dataset 2020 - Kaggle](https://www.kaggle.com/datasets/promptcloud/amazon-product-dataset-2020?resource=download)


In [2]:
df = pd.read_csv("../data/raw/marketing_sample_for_amazon_com-ecommerce__20200101_20200131__10k_data.csv")
df.fillna("", inplace=True)
df.columns

  df.fillna("", inplace=True)


Index(['Uniq Id', 'Product Name', 'Brand Name', 'Asin', 'Category',
       'Upc Ean Code', 'List Price', 'Selling Price', 'Quantity',
       'Model Number', 'About Product', 'Product Specification',
       'Technical Details', 'Shipping Weight', 'Product Dimensions', 'Image',
       'Variants', 'Sku', 'Product Url', 'Stock', 'Product Details',
       'Dimensions', 'Color', 'Ingredients', 'Direction To Use',
       'Is Amazon Seller', 'Size Quantity Variant', 'Product Description'],
      dtype='object')

In [3]:
print("Number of unique categories: ", len(df["Category"].unique()))

Number of unique categories:  939


### Copy the Raw Data

In [4]:
df_processed = df.copy()

### Extract the Main and Sub Categories

In [5]:
df_processed["cat-1"] = df_processed["Category"].apply(lambda x: x.split("|")[0].strip() if len(x.split("|"))>=1 else "")
df_processed["cat-2"] = df_processed["Category"].apply(lambda x: x.split("|")[1].strip() if len(x.split("|"))>=2 else "")
df_processed["cat-3"] = df_processed["Category"].apply(lambda x: x.split("|")[2].strip() if len(x.split("|"))>=3 else "")

print("Unique values in Cat 1: ", len(df_processed["cat-1"].unique()))
print("Unique values in Cat 2: ", len(df_processed["cat-2"].unique()))
print("Unique values in Cat 3: ", len(df_processed["cat-3"].unique()))

Unique values in Cat 1:  24
Unique values in Cat 2:  108
Unique values in Cat 3:  321


### Merge Columns with Text Contents to `page_content_column`

In [6]:
df_processed["page_content_column"] = df_processed.apply(
    lambda x: "\n\n".join(
        [
            col_name + ": " + x[col_name] for col_name in 
            ["Product Name", "About Product", "Product Specification", 
             "Technical Details","Selling Price", "Shipping Weight", 
             "Product Dimensions", "Product Url"]
        ]
    ),
    axis=1
)

### Filter Columns

In [7]:
df_processed = df_processed[
    ["page_content_column", "Image", "Product Url", "cat-1", "cat-2", "cat-3"]
]

### Save Precessed Dataframe

In [8]:
df_processed.to_csv("../data/processed/amazon.csv", index=False)

## Chat with Data

### Imports

In [9]:
import pandas as pd

from decouple import config
from IPython.display import Markdown, display
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders import DataFrameLoader
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from pprint import pprint

In [10]:
OPENAI_API_KEY = "YOUR_API_KEY" # or config("OPENAI_API_KEY")

### Load the Processed Data

In [11]:
df_processed = pd.read_csv("../data/processed/amazon.csv")

### Define LLM clients

In [12]:
llm_client = ChatOpenAI(
    model="gpt-4o-mini",
    openai_api_key=OPENAI_API_KEY,
    temperature=0,
)

embd_client = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [13]:
# Example
response = llm_client.invoke("Where is Kashan?")
pprint(response.content)

('Kashan is a city located in the Isfahan Province of Iran. It is situated in '
 'the central part of the country, approximately 250 kilometers (about 155 '
 'miles) south of Tehran, the capital of Iran. Kashan is known for its '
 'historical architecture, traditional Persian gardens, and significant '
 'cultural heritage, including beautiful mosques, ancient houses, and the '
 'famous Fin Garden. The city has a rich history that dates back to ancient '
 'times and is often associated with the production of carpets and textiles.')


### Ask Your Question

In [14]:
question = "I am looking for a set of tables and chair for my garden for 6-8 people. Can you help me with that?"
question = "Do you have a bike for my 3 year old son?"

### Step1: Filter the Data based on the Question using LLLM

In [15]:
df_filtered = df_processed.copy()

for cat_no in [1,2,3]:
    # Get unique categories
    categories = df_filtered[f"cat-{cat_no}"].unique().tolist()

    # Make the query
    query = (
        f"Given the user question and the "
        f"following list of product categories, identify the categories from "
        f"the list that most closely matches the user's intended product. "
        f"If the user input is not close to any of the ctegories, "
        f"return 'NOT_AVAILABLE'. Provide only the full category name "
        f"from the list of the categories separated by '|'.\n\n"
        f"user input: '{question}'\n\n"
        f"Product Categories: {categories}"
    )

    # Get the response
    response = llm_client.invoke(query)
    print(f"Related Categories at Level {cat_no}: {response.content.split('|')}")

    # Filter the dataframe
    df_filtered = df_filtered[
        df_filtered[f"cat-{cat_no}"].isin(response.content.split("|"))
    ]
    print(f"No. of records:  {df_filtered.shape[0]}")

Related Categories at Level 1: ['Toys & Games', 'Baby Products']
No. of records:  6876
Related Categories at Level 2: ['Tricycles, Scooters & Wagons', 'Sports & Outdoor Play']
No. of records:  476
Related Categories at Level 3: ['Balance Bikes', 'Ride-On Toys & Accessories']
No. of records:  48


### Step 2: Make a Retriever with the Filtered Data

In [16]:
loader = DataFrameLoader(
    df_filtered, 
    page_content_column="page_content_column"
)
docs = loader.load()
print("No. of documents: ", len(docs))

No. of documents:  48


In [17]:
vectorstore = FAISS.from_documents(
    documents=docs,
    embedding=embd_client,
)
retriever = vectorstore.as_retriever()

### Finding Relevant Docs (to be used as context)

In [18]:
relevant_docs = retriever.invoke(question, top_k=10)

### Create the Document Chain

In [19]:
## Document Chain
prompt = ChatPromptTemplate.from_template(
    "You are **Amazon Assistant**, an adviser for Amazon products. \n\n"
    "Answer the following question based on the provided context. "
    "If the answer is not in the context, avoid being apologetic. "
    "Instead, ask the user for more specific information or "
    "suggest next steps they can take. "
    "Provide your response confidently and clearly.\n\n"
    "Alaways provide the full product name, product URL and price in your response.\n\n"
    "<context>\n{context}\n</context>\n\n"
    "Question: {input}"
)

document_chain = create_stuff_documents_chain(llm_client, prompt)

### Answer the Question

In [20]:
answer = document_chain.invoke({"context": relevant_docs, "input": question})
display(Markdown(answer))

Yes, I recommend the **FirstBIKE Limited Bike with Brake, Yellow**. It's designed for children aged 24 months to 5 years, making it suitable for your 3-year-old son. This balance bike features a lightweight frame, a child-friendly rear drum brake, and a unique design that helps develop balance and motor skills.

- **Product Name:** FirstBIKE Limited Bike with Brake, Yellow
- **Product URL:** [FirstBIKE Limited Bike with Brake, Yellow](https://www.amazon.com/FirstBIKE-Limited-Bike-Brake-Yellow/dp/B0716JNHBX)
- **Price:** $130.03

If you need more options or specific features, let me know!