# 🔍 RAG-Based Chatbot using LangChain, FAISS, and Groq
This notebook builds a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on Amazon product reviews using LangChain and a Groq-hosted LLM.

In [None]:
!pip install -q kaggle faiss-cpu sentence-transformers pandas langchain langchain-community langchain-groq

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.4/129.4 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## 🔑 Upload Kaggle API Key

In [None]:
from google.colab import files
files.upload()  # Upload kaggle.json

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"shrinivasmore","key":"08451b3b14201cc268968fa407b3ce10"}'}

## 🔐 Setup Kaggle Credentials

In [None]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

## 📥 Download and Extract Dataset

In [None]:
!kaggle datasets download -d datafiniti/consumer-reviews-of-amazon-products
!unzip -q consumer-reviews-of-amazon-products.zip

Dataset URL: https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products
License(s): CC-BY-NC-SA-4.0
Downloading consumer-reviews-of-amazon-products.zip to /content
  0% 0.00/16.3M [00:00<?, ?B/s]
100% 16.3M/16.3M [00:00<00:00, 683MB/s]


## 📊 Load and Preprocess the Dataset

In [None]:
import pandas as pd

df = pd.read_csv("Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv")
df = df[["name", "reviews.text"]].dropna().reset_index(drop=True)
df["content"] = df["name"] + ": " + df["reviews.text"]
documents = df["content"].tolist()
documents[:2]

['AmazonBasics AAA Performance Alkaline Batteries (36 Count): I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to make the battery work.',
 'AmazonBasics AAA Performance Alkaline Batteries (36 Count): Bulk is always the less expensive way to go for products like these']

## 🤖 Load Sentence Transformer and Build FAISS Index

In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(documents, show_progress_bar=True)

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(np.array(embeddings))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/886 [00:00<?, ?it/s]

## 🔌 Setup LangChain and Groq API

In [None]:
from langchain_groq import ChatGroq
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
import os

os.environ["GROQ_API_KEY"] = "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"

## 📚 Wrap Documents and Create Vector Store

In [None]:
wrapped_docs = [Document(page_content=d) for d in documents]

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(wrapped_docs, embedding_model)

  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


## 🧠 Load Groq LLM

In [None]:
llm = ChatGroq(
    groq_api_key=os.environ["GROQ_API_KEY"],
    model_name="llama3-70b-8192"
)

## 🧵 Create Retrieval-Based QA Chain

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=False
)

## ❓ Ask Questions and Save Answers

In [None]:
questions = [
    "What do users think about Kindle?",
    "How is the battery life of Fire tablet?",
    "What are users saying about Echo Dot?"
]

with open("chatbot_responses.txt", "w") as f:
    for q in questions:
        answer = qa_chain.invoke({"query": q})["result"]
        print(f"Q: {q}\nA: {answer}\n")
        f.write(f"Q: {q}\nA: {answer}\n\n")

Q: What do users think about Kindle?
A: Based on the provided reviews, users think very highly of the devices. They praise the devices' portability, high-resolution displays, and user-friendly interfaces. They also appreciate the long battery life, ability to modify font sizes, and seamless connectivity to Goodreads accounts. Many reviewers use their Kindles daily, often for extended periods, and have reported no issues or damages. Overall, users seem to be extremely satisfied with their Kindle devices, with some even calling them "phenomenal" and the "best e-reader" available.

Q: How is the battery life of Fire tablet?
A: Based on the provided reviews, the battery life of Fire tablets seems to be a mixed bag. Some reviewers reported poor battery life, with one tablet dying after an hour, another not lasting 4 hours, and another draining quickly under normal use. On the other hand, one reviewer reported excellent battery life, using the tablet off and on for 4-5 days on a single charg