# 🔍 RAG-Based Chatbot using LangChain, FAISS, and Groq
This notebook builds a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on Amazon product reviews using LangChain and a Groq-hosted LLM.

In [1]:
!pip install -q kaggle faiss-cpu sentence-transformers pandas langchain langchain-community langchain-groq

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m69.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.4/129.4 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m47.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## 🔑 Upload Kaggle API Key

In [2]:
from dotenv import load_dotenv
import os

In [3]:
from google.colab import files
import os

print("Please upload your kaggle.json file:")
uploaded = files.upload()

if len(uploaded) == 0:
    raise FileNotFoundError("No file uploaded")

filename = list(uploaded.keys())[0]

os.makedirs("/root/.kaggle", exist_ok=True)
with open("/root/.kaggle/kaggle.json", "wb") as f:
    f.write(uploaded[filename])
os.chmod("/root/.kaggle/kaggle.json", 0o600)

print("kaggle.json saved successfully!")


Please upload your kaggle.json file:


Saving kaggle.json to kaggle.json
kaggle.json saved successfully!


In [4]:
from getpass import getpass
import os

groq_key = getpass("Enter your GROQ_API_KEY (input will be hidden): ").strip()
if not groq_key:
    raise ValueError("You must enter a valid GROQ_API_KEY")

os.environ["GROQ_API_KEY"] = groq_key


Enter your GROQ_API_KEY (input will be hidden): ··········


## 🔐 Setup Kaggle Credentials

In [5]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

## 📥 Download and Extract Dataset

In [6]:
!kaggle datasets download -d datafiniti/consumer-reviews-of-amazon-products
!unzip -q consumer-reviews-of-amazon-products.zip

Dataset URL: https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products
License(s): CC-BY-NC-SA-4.0
Downloading consumer-reviews-of-amazon-products.zip to /content
  0% 0.00/16.3M [00:00<?, ?B/s]
100% 16.3M/16.3M [00:00<00:00, 393MB/s]


## 📊 Load and Preprocess the Dataset

In [7]:
import pandas as pd

df = pd.read_csv("Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv")
df = df[["name", "reviews.text"]].dropna().reset_index(drop=True)
df["content"] = df["name"] + ": " + df["reviews.text"]
documents = df["content"].tolist()
documents[:2]

['AmazonBasics AAA Performance Alkaline Batteries (36 Count): I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to make the battery work.',
 'AmazonBasics AAA Performance Alkaline Batteries (36 Count): Bulk is always the less expensive way to go for products like these']

## 🤖 Load Sentence Transformer and Build FAISS Index

In [15]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(documents, show_progress_bar=False)

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))

## 🔌 Setup LangChain and Groq API

In [9]:
from langchain_groq import ChatGroq
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
import os



## 📚 Wrap Documents and Create Vector Store

In [10]:
wrapped_docs = [Document(page_content=d) for d in documents]

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(wrapped_docs, embedding_model)

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


## 🧠 Load Groq LLM

In [11]:
llm = ChatGroq(
    groq_api_key=os.environ["GROQ_API_KEY"],
    model_name="llama3-70b-8192"
)

## 🧵 Create Retrieval-Based QA Chain

In [12]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=False
)

## ❓ Ask Questions and Save Answers

In [13]:
questions = [
    "What do users think about Kindle?",
    "How is the battery life of Fire tablet?",
    "What are users saying about Echo Dot?"
]

with open("chatbot_responses.txt", "w") as f:
    for q in questions:
        answer = qa_chain.invoke({"query": q})["result"]
        print(f"Q: {q}\nA: {answer}\n")
        f.write(f"Q: {q}\nA: {answer}\n\n")

Q: What do users think about Kindle?
A: Based on the provided context, users have extremely positive opinions about Kindle e-readers. They praise the devices for being small and light, having excellent backlights, long battery life, and being completely user-friendly. They also appreciate the ability to modify font and size, connect to Goodreads, and sync purchases seamlessly. The Kindle Oasis, in particular, is evident in the repeated use of the phrase "Kindle Oasis is phenomenal..." with users loving its portable size, book-like screen, and long battery life. Overall, users seem to think that Kindle e-readers are the best on the market, with one user stating that it's the "best e-reader out there, and a worthy upgrade from a previous version".

Q: How is the battery life of Fire tablet?
A: Based on the provided reviews, the battery life of Fire tablets seems to be a mixed bag. Some reviewers reported poor battery life, with one tablet dying after an hour, another not lasting 4 hours,