# Project ShopTalk RAG Implementation

For this example, we'll follow a very similar approach to the RAG Implementation from the Movie Recommendations project. In this case RAG might be a little more difficult to implement due to the dynamic nature of the queries that are coming in. Here we showcase you can still use RAG with the CSV data via LangChain, but sometimes a more Agentic approach might be better when you have different types of queries or actions that need to take place. Our stack remains very similar to the original project:

- <b>Vector Store</b>: ChromaDB
- <b>LLM</b>: OpenAI API
- <b>Embeddings</b>: HuggingFace Sentence Embeddings
- <b>Orchestration</b>: LangChain

## Setup

In [2]:
!pip install chromadb==0.5.3
!pip install langchain-openai
!pip install langchain_community
!pip install sentence-transformers



In [4]:
!pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ---- ----------------------------------- 1.3/11.5 MB 10.4 MB/s eta 0:00:01
   ---------- ----------------------------- 3.1/11.5 MB 9.1 MB/s eta 0:00:01
   ---------------------- ----------------- 6.6/11.5 MB 12.0 MB/s eta 0:00:01
   --------------------------------- ------ 9.7/11.5 MB 12.6 MB/s eta 0:00:01
   ---------------------------------------- 11.5/11.5 MB 12.2 MB/s eta 0:00:00
Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
Installing collected packages: pytz, tzdata, pandas
Successfully install

In [1]:
import chromadb
import json
import os
import pandas as pd
import chromadb.utils.embedding_functions as embedding_functions

In [None]:
# globals for OpenAI models. Replace your key for OPEN_API_KEY
OPENAI_API_KEY = "sk-assjdlajds"
LLM_MODEL_NAME = "gpt-4o-mini-2024-07-18"

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [3]:
import pandas as pd
df  = pd.read_csv('data/products_data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Item ID,Item Name - Language Tag,Item Name - Value,Main Image ID,Other Image IDs,Item Keywords
0,0,B06X9STHNG,nl_NL,Amazon-merk - vinden. Dames Leder Gesloten Tee...,81iZlv3bjpL,"['91mIRxgziUL', '91eqBkW06wL', 'A1BHZSKNbkL']","['loafer shoes', 'womens shoes', 'womens fashi..."
1,1,B07P8ML82R,es_MX,"22"" Bottom Mount Drawer Slides, White Powder C...",619y9YG9cnL,"['51Fqps5k9YL', '51lCKFuYuWL']","['café claro', 'pistas', 'con', 'cierre', 'Hog..."
2,2,B07H9GMYXS,en_AE,"AmazonBasics PETG 3D Printer Filament, 1.75mm,...",81NP7qh2L6L,"['81A0u5L4VAL', '61xhS6iLrZL']","['petg filament', 'petg printer filament', 'tr..."
3,3,B07CTPR73M,en_GB,"Stone & Beam Stone Brown Swatch, 25020039-01",61Rp4qOih9L,[],"['couch', 'button', 'reclining', 'sets', 'love..."
4,4,B01MTEI8M6,en_AU,The Fix Amazon Brand Women's French Floral Emb...,714CmIfKIYL,"['71C4hQAAs2L', '718uEco1DAL', '71BMHcaG5GL', ...",['zapatos shoe para de ladies mujer womans moc...


## Embeddings Model & Retriever Setup
For this sample we showcase using a HF Embeddings Model for demo purposes, with OSS models we can use HF Hub it contains many popular models with the weights exposed.

In [4]:
from langchain.embeddings import HuggingFaceEmbeddings
persist_directory = 'dbembed/'
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path='data/products_data.csv', source_column='Item ID', encoding='utf-8')
data = loader.load()

In [6]:
data[0]

Document(metadata={'source': 'B06X9STHNG', 'row': 0}, page_content=": 0\nItem ID: B06X9STHNG\nItem Name - Language Tag: nl_NL\nItem Name - Value: Amazon-merk - vinden. Dames Leder Gesloten Teen Hakken,Veelkleurig Vrouw Blauw,5 UK\nMain Image ID: 81iZlv3bjpL\nOther Image IDs: ['91mIRxgziUL', '91eqBkW06wL', 'A1BHZSKNbkL']\nItem Keywords: ['loafer shoes', 'womens shoes', 'womens fashion', 'womens loafers', 'womenswear', 'block heel shoes', 'loafers', 'womens block heel shoes', 'metallic shoes', 'womens loafer shoes']")

In [9]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
persist_directory = 'dbembed/'
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding_function)
vectordb = Chroma.from_documents(documents=data, embedding=embeddings, 
                                 persist_directory=persist_directory)
vectordb.persist()

# vectordb = Chroma.from_documents(documents=data, embedding=embeddings, persist_directory=persist_directory)
# vectordb.persist()
# retriever = vectordb.as_retriever()


  vectordb.persist()


In [None]:
#Store data in to vector db. Run it only once otherwise duplicate records will be added. Hence commenting it out.
# vectordb.add_documents(documents=data)

['a3425de0-48f5-45ed-a6a8-6cbed4a879e0',
 '9cbfae4c-7c99-4ac4-b4bb-58dc2cf2ae4c',
 '293256a5-f4ef-40f8-9f5a-6f265cb3e7b0',
 'a53bbf28-9d92-4e8f-a2b9-cbb3f310b386',
 '59df4393-9061-4a0b-985c-b943b24c24be',
 '3d6297ac-616d-4561-881b-80f148d99b6d',
 'd51bffe2-d31f-46f2-87d2-06d3b67b2f30',
 '6e6d3d4b-44bc-4997-a39d-0a5373c644e6',
 'b356740b-607a-4274-b932-7b9c91db5875',
 'a7ade38d-2406-49e6-a3be-0faa2d20a352',
 '7b158edc-cea3-4f54-9743-dd89f048593c',
 'c9bc3df0-86d9-499a-a8fb-8daffebe64d0',
 'd89d22ae-7889-4b2a-b3fd-022fec2283c1',
 'd44218a6-241c-41b7-868b-2d51ec2111f7',
 '5e1278ac-8896-444f-bf18-9ed75e849e47',
 '69e73d74-1068-4df3-a421-ddcd53609925',
 '9d17039b-1a45-4efa-b25a-8740191653c3',
 '46f79826-8a88-441d-ae43-0aeb9f2477e5',
 'd76b6aaf-bf13-4a43-bce2-633b364cf996',
 'fd2a2768-bb7f-4c53-a7c2-03131e30ab28',
 '11d68253-369b-49f5-98b3-1411dab505b1',
 'ed422831-aeb8-4ff2-bfa2-be91ac34f345',
 '42d05cd1-3a89-4f10-baea-4268dea82432',
 'ff7d2c57-b1f1-40fa-8c3b-cf633b0bdc0d',
 '6174da1e-8730-

In [23]:
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"fetch_k": 20, "k": 5})
retriever.get_relevant_documents("loafer shoes")

[Document(metadata={'row': 0, 'source': 'B06X9STHNG'}, page_content=": 0\nItem ID: B06X9STHNG\nItem Name - Language Tag: nl_NL\nItem Name - Value: Amazon-merk - vinden. Dames Leder Gesloten Teen Hakken,Veelkleurig Vrouw Blauw,5 UK\nMain Image ID: 81iZlv3bjpL\nOther Image IDs: ['91mIRxgziUL', '91eqBkW06wL', 'A1BHZSKNbkL']\nItem Keywords: ['loafer shoes', 'womens shoes', 'womens fashion', 'womens loafers', 'womenswear', 'block heel shoes', 'loafers', 'womens block heel shoes', 'metallic shoes', 'womens loafer shoes']"),
 Document(metadata={'row': 457, 'source': 'B079145LXJ'}, page_content=": 457\nItem ID: B079145LXJ\nItem Name - Language Tag: en_IN\nItem Name - Value: Amazon Brand - Symbol Men's Navy Suede Sneakers-8 UK/India (42 EU) (AZ-YS-196 A)\nMain Image ID: 81iQbWaCmXL\nOther Image IDs: ['91HRt1g86hL', '91okhWAqZpL', '91wapAXEbML']\nItem Keywords: ['mens shoes', 'men shoes casual shoes', 'casual shoes for mens', 'lofar shoes for mens', 'sneakers for mens', 'sneakers', 'shoes for me

## LLM & LangChain Setup

In [12]:
from langchain_openai import ChatOpenAI
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
llm = ChatOpenAI(model=LLM_MODEL_NAME)

In [None]:
#Below code commented out because it is resulting : I don't know. 
# And also this code is little different from what is been taught by the instructor in the class.

# from langchain.chains import ConversationalRetrievalChain
# from langchain.prompts import PromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

# def response_generator(query, chat_history, db, llm):
#     CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, Use the following pieces of context to answer the question at the end.
#   If you don’t know the answer, just output that ‘I don’t know’, don’t try to make up an answer.
#   Chat History:
#   {chat_history}
#   Follow Up Input: {question}
#   Standalone question:""")
#     general_system_template = r""" You are a ecommerce assistant. Given the following conversation and a follow up question, Use the following pieces of context to answer the question at the end.
#   If you don’t know the answer, just output that ‘I don’t know’, don’t try to make up an answer.
#    ----
#   {context}
#   ----
#   """
#     general_user_template = r"""Chat History: {chat_history} Follow Up Input:{question} """
#     messages = [
#         SystemMessagePromptTemplate.from_template(general_system_template),
#         HumanMessagePromptTemplate.from_template(general_user_template)
#     ]
#     qa_prompt = ChatPromptTemplate.from_messages( messages )

#     qa = ConversationalRetrievalChain.from_llm(llm=llm,
#                     retriever=db.as_retriever(),
#                     condense_question_prompt=CONDENSE_QUESTION_PROMPT,
#                     combine_docs_chain_kwargs={"prompt": qa_prompt},
#                     return_source_documents=True,
#                     max_tokens_limit=4000,
#                     verbose=True)
#     #chat_history = []
#     query = query
#     result = qa({"question": query, "chat_history": chat_history})


#     print("Question:", query)
#     print("Answer:", result["answer"])
#     return result["answer"]

In [24]:
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(
    """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
  Chat History:
  {chat_history}
  Follow Up Input: {question}
  Standalone question:""")

QA_PROMPT = PromptTemplate.from_template(""" You are an AI based ecommerce assistant for answering questions about products. Use the following pieces of context to answer the question at the end.
  If you don’t know the answer, just output that ‘I don’t know’, don’t try to make up an answer.
  Context:
  {context}
  
  Chat History: 
  {chat_history}
  
  Question: {question}
  Answer:""")
                                         
chain = ConversationalRetrievalChain.from_llm(llm=llm,
                    retriever=retriever,
                    condense_question_prompt=CONDENSE_QUESTION_PROMPT,
                    combine_docs_chain_kwargs={"prompt": QA_PROMPT},
                    return_source_documents=True,
                    max_tokens_limit=4000,
                    verbose=True)

In [25]:
chat_history = []
def show_result(query) :
    result = chain({"question":query, "chat_history":chat_history})
    print(result["answer"])
    chat_history.append((query, result["answer"]))

In [27]:
# query = 'loafer shoes'
query = 'What about men?'
print (show_result(query))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
  Chat History:
  
Human: loafer shoes
Assistant: The item related to "loafer shoes" is the Amazon-merk - vinden. Dames Leder Gesloten Teen Hakken, Veelkleurig Vrouw Blauw (Item ID: B06X9STHNG), which is a type of women's loafer shoe. Additionally, there are other related items that include loafers for men, such as the Amazon Brand - 206 Collective Men's Pike Driving Slip-on Loafer (Item ID: B01N5XROXR). If you need more specifics or details, feel free to ask!
  Follow Up Input: What about men?
  Standalone question:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m You are an AI based ecommerce assistant for answering questions about products. Use the following pieces of c

In [None]:
# dataset preparation/engineering is still really important. RAG sometimes can't just understand a CSV file and derive recommendations
# sometimes we need to help explain the data for our LLM (data prep, maybe another approach, where we actually parse this data and prepare it and explain t your LLM what to do with it)