# 🔍 RAG-Based Chatbot using LangChain, FAISS, and Groq
This notebook builds a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on Amazon product reviews using LangChain and a Groq-hosted LLM.

In [1]:
!pip install -q kaggle faiss-cpu sentence-transformers pandas langchain langchain-community langchain-groq

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m34.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.4/129.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m131.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m92.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m58.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## 🔑 Upload Kaggle API Key

In [2]:
from google.colab import files
files.upload()  # Upload kaggle.json

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"shrinivasmore","key":"08451b3b14201cc268968fa407b3ce10"}'}

## 🔐 Setup Kaggle Credentials

In [3]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

## 📥 Download and Extract Dataset

In [4]:
!kaggle datasets download -d datafiniti/consumer-reviews-of-amazon-products
!unzip -q consumer-reviews-of-amazon-products.zip

Dataset URL: https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products
License(s): CC-BY-NC-SA-4.0
Downloading consumer-reviews-of-amazon-products.zip to /content
  0% 0.00/16.3M [00:00<?, ?B/s]
100% 16.3M/16.3M [00:00<00:00, 1.04GB/s]


## 📊 Load and Preprocess the Dataset

In [5]:
import pandas as pd

df = pd.read_csv("Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv")
df = df[["name", "reviews.text"]].dropna().reset_index(drop=True)
df["content"] = df["name"] + ": " + df["reviews.text"]
documents = df["content"].tolist()
documents[:2]

['AmazonBasics AAA Performance Alkaline Batteries (36 Count): I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to make the battery work.',
 'AmazonBasics AAA Performance Alkaline Batteries (36 Count): Bulk is always the less expensive way to go for products like these']

## 🤖 Load Sentence Transformer and Build FAISS Index

In [12]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(documents, show_progress_bar=False)

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))

## 🔌 Setup LangChain and Groq API

In [13]:
from langchain_groq import ChatGroq
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
import os

os.environ["GROQ_API_KEY"] = "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"

## 📚 Wrap Documents and Create Vector Store

In [14]:
wrapped_docs = [Document(page_content=d) for d in documents]

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(wrapped_docs, embedding_model)

## 🧠 Load Groq LLM

In [15]:
llm = ChatGroq(
    groq_api_key=os.environ["GROQ_API_KEY"],
    model_name="llama3-70b-8192"
)

## 🧵 Create Retrieval-Based QA Chain

In [16]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=False
)

## ❓ Ask Questions and Save Answers

In [17]:
questions = [
    "What do users think about Kindle?",
    "How is the battery life of Fire tablet?",
    "What are users saying about Echo Dot?"
]

with open("chatbot_responses.txt", "w") as f:
    for q in questions:
        answer = qa_chain.invoke({"query": q})["result"]
        print(f"Q: {q}\nA: {answer}\n")
        f.write(f"Q: {q}\nA: {answer}\n\n")

Q: What do users think about Kindle?
A: Based on the provided context, users think very highly of the Kindle Voyage and Kindle Oasis e-readers. They praise the devices for being portable, having excellent displays that don't bother their eyes, allowing for font and size modifications, and having long battery life. They also appreciate the convenience of connecting to their Goodreads account and the seamless syncing of their e-books. Additionally, they find the devices to be completely user-friendly. Overall, users consider the Kindle e-readers to be the best on the market, with one user stating that the Kindle Voyage is the "best e-reader out there" and another stating that it is "phenomenal".

Q: How is the battery life of Fire tablet?
A: Based on the provided reviews, the battery life of Fire tablets seems to be a concern for many users. Some reviewers mentioned that the battery life does not live up to the estimated 7 hours, with some instances of the battery lasting around 1-4 hour