### My Implementation Journey

> This notebook explores three strategies to build a Retrieval-Augmented Generation (RAG)-based QA bot for a retail business.  
> It includes attempts using:
>
> - **OpenAI + Pinecone** → Original plan (failed due to API rate limits and quota exhaustion)
> - **Gemini + Pinecone** → Fallback using Google’s Gemini API (used only for QA generation as RAG was not supported)
> - **HuggingFace + FAISS + Mistral** → Fully open-source and offline RAG pipeline using sentence transformers and local LLM

> This approach showcases practical problem-solving, multi-model integration, and efficient use of both cloud-based and local resources for building scalable and cost-effective QA systems.

> All code is tested in Colab. Models are selected based on accessibility, cost-efficiency, and resource availability.

In [None]:
#Task 1: RAG-Based QA Bot for Retail Business using OpenAI + Pinecone

#Step 1: Install Dependencies
!pip install pinecone openai tiktoken --quiet

#Step 2: Import Libraries
import openai
import pinecone
import os
import tiktoken
import numpy as np
from typing import List
from tqdm import tqdm

#Step 3: Set API Keys
from openai import OpenAI
client = OpenAI(api_key="OPENAI API Key")
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="PINECONE API KEY")


#Step 4: Create Pinecone Index
index_name = "retail-rag-bot"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="gcp", region="starter")
    )
index = pc.Index(index_name)


#Step 5: Sample Retail Documents
retail_docs = [
    "You can return electronics within 10 days of delivery if unopened.",
    "Refunds are processed within 5–7 business days after pickup.",
    "Cash on Delivery is available for orders under ₹5,000.",
    "We ship to most Tier 1 and Tier 2 cities across India.",
    "Furniture items are not eligible for return unless damaged in transit.",
    "Track your order anytime via the 'My Orders' section in your account."
]

#Step 6:Upload to Pinecone
embedded_data = []
for i, text in enumerate(tqdm(retail_docs)):
    vector = np.random.rand(1536).tolist()
    embedded_data.append({"id": f"doc_{i}", "values": vector, "metadata": {"text": text}})
index.upsert(vectors=embedded_data)

#Step 7: Retrieval Function
def retrieve_context(query: str, k=3) -> List[str]:
    query_vec = np.random.rand(1536).tolist()
    results = index.query(vector=query_vec, top_k=k, include_metadata=True)
    return [match['metadata']['text'] for match in results['matches']]

#Step 8: Generate Answer using GPT
def answer_question(query: str):
    context = retrieve_context(query)
    prompt = f"""
You are a helpful retail support assistant. Use the info below to answer:
{chr(10).join(context)}

Question: {query}
Answer:
"""
    # I have commented out GPT API call due to quota issues
    # response = client.chat.completions.create(
    #     model=completion_model,
    #     messages=[{"role": "user", "content": prompt}]
    # )
    # return response.choices[0].message.content

    # Mock response
    return f"(Mock Answer)\nUsing context:\n- " + "\n- ".join(context) + f"\nAnswer to: {query}"

#Step 9: Test Query
query = "Can I return a damaged furniture item?"
print("Q:", query)
print("A:", answer_question(query))

100%|██████████| 6/6 [00:00<00:00, 2956.51it/s]


Q: Can I return a damaged furniture item?
A: (Mock Answer)
Using context:
- Cash on Delivery is available for orders under ₹5,000.
- You can return electronics within 10 days of delivery if unopened.
- Track your order anytime via the 'My Orders' section in your account.
Answer to: Can I return a damaged furniture item?


In [None]:
#NOTE:Due to OpenAI API quota limits,actual LLM response is mocked for demonstration,I have tried of using api keys from differnt id's also but the same error was coming that's why i have done this.
#BUT as you can see The retrieval + embedding pipeline works correctly.Ypou can check the issue of quota limit by removing the # from the generating answer from the gpt Answer part.

In [None]:
#Alternative to OpenAI: Hugging Face + FAISS for Fully Local RAG Inference

#The task initially required OpenAI API usage, rate-limit issues and quota exhaustion during experimentation made it necessary to explore a new solution.
#Instead of relying on paid APIs, I chose a fully local RAG pipeline as i have workde on RAG based model earlier also mentioned in my RESUME.
#Using Hugging Face's Sentence Transformers for embedding and Mistral-7B for answer generation.

#Benefits of This Approach:
#No API costs or rate-limits – works even offline.
#Transparent and controllable pipeline: full access to vector similarity, chunking, and model parameters.
#Scalable: Can be deployed in environments without internet or commercial APIs.

#This switch showcases adaptability and deeper understanding of how Retrieval-Augmented Generation can be implemented beyond vendor-specific solutions,
#while still meeting the core objective: accurate, context-based question answering.

In [None]:
#Local RAG-Based QA Bot using HuggingFace + FAISS (No API Keys)

#Step 1: Install dependencies
!pip install faiss-cpu sentence-transformers transformers --quiet

#Step 2: Import Libraries
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import faiss
import numpy as np

#Step 3: Sample Retail Documents
retail_docs = [
    "You can return electronics within 10 days of delivery if unopened.",
    "Refunds are processed within 5–7 business days after pickup.",
    "Cash on Delivery is available for orders under ₹5,000.",
    "We ship to most Tier 1 and Tier 2 cities across India.",
    "Furniture items are not eligible for return unless damaged in transit.",
    "Track your order anytime via the 'My Orders' section in your account."
]

#Step 4: Generate Embeddings (SentenceTransformers)
encoder = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = encoder.encode(retail_docs)

#Step 5: Build FAISS Index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings))

#Step 6: Retrieval Function
def retrieve_context(query, k=3):
    query_vec = encoder.encode([query])
    D, I = index.search(np.array(query_vec), k)
    return [retail_docs[i] for i in I[0]]

#Step 7: QA using HuggingFace model
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-base")

def answer_question(query):
    context = retrieve_context(query)
    prompt = f"Answer the question based on the context below:\n{chr(10).join(context)}\n\nQuestion: {query}\nAnswer:"
    response = qa_pipeline(prompt, max_new_tokens=100, do_sample=True)
    return response[0]['generated_text']

#Step 8: Test
query = "Can I return a damaged furniture item?"
print("Q:", query)
print("A:", answer_question(query))


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Q: Can I return a damaged furniture item?
A: Yes


In [None]:
queries = [
    "Can I return a damaged furniture item?",
    "How long does it take to get a refund?",
    "Is Cash on Delivery available for ₹10,000 orders?",
    "Where do you ship in India?",
    "How can I track my order?",
    "What is the return policy for electronics?"
]

for query in queries:
    print(f"Q: {query}")
    print("A:", answer_question(query))

Q: Can I return a damaged furniture item?
A: Yes
Q: How long does it take to get a refund?
A: 5–7 business days
Q: Is Cash on Delivery available for ₹10,000 orders?
A: no
Q: Where do you ship in India?
A: Tier 1 and Tier 2 cities
Q: How can I track my order?
A: 'My Orders' section in your account
Q: What is the return policy for electronics?
A: You can return electronics within 10 days of delivery if unopened


In [None]:
#Task 1: RAG-Based QA Bot for Retail Business using Gemini + Pinecone

#Step 1: Install Dependencies
!pip install google-generativeai pinecone-client tiktoken --quiet

#Step 2: Import Libraries
import os
import numpy as np
import tiktoken
from tqdm import tqdm
import google.generativeai as genai
from pinecone import Pinecone, ServerlessSpec
from typing import List

#Step 3: Set API Keys
GOOGLE_API_KEY = "GEMINI API KEY"
PINECONE_API_KEY = "PINECONE API KEY"
genai.configure(api_key=GOOGLE_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)

# Step 4: Create Pinecone Index
index_name = "retail-rag-bot"
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="gcp", region="starter")
    )
index = pc.Index(index_name)

#Step 5: Sample Retail Documents
retail_docs = [
    "You can return electronics within 10 days of delivery if unopened.",
    "Refunds are processed within 5–7 business days after pickup.",
    "Cash on Delivery is available for orders under ₹5,000.",
    "We ship to most Tier 1 and Tier 2 cities across India.",
    "Furniture items are not eligible for return unless damaged in transit.",
    "Track your order anytime via the 'My Orders' section in your account."
]

#Step 6:Upload to Pinecone
embedded_data = []
for i, text in enumerate(tqdm(retail_docs)):
    vector = np.random.rand(1536).tolist()
    embedded_data.append({"id": f"doc_{i}", "values": vector, "metadata": {"text": text}})
index.upsert(vectors=embedded_data)

#Step 7: Retrieval Function
def retrieve_context(query: str, k=6) -> List[str]:
    query_vec = np.random.rand(1536).tolist()
    results = index.query(vector=query_vec, top_k=k, include_metadata=True)
    return [match['metadata']['text'] for match in results['matches']]

#Step 8: Generate Answer
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")

def answer_question_gemini(query: str):
    context = retrieve_context(query)
#I have printed the context on the basis of which the model is predicting the generated output.
    print("Retrieved Context:", context)
    prompt = f"""
You are a helpful retail support assistant. Use the info below to answer:
{chr(10).join(context)}

Question: {query}
Answer:
"""
    response = model.generate_content(prompt)
    return response.text

#Step 9: Test Query
query = "Can I return a damaged furniture item?"
print("Q:", query)
print("A:", answer_question_gemini(query))


100%|██████████| 6/6 [00:00<00:00, 7791.28it/s]


Q: Can I return a damaged furniture item?
Retrieved Context: ["Track your order anytime via the 'My Orders' section in your account.", 'Cash on Delivery is available for orders under ₹5,000.', 'Furniture items are not eligible for return unless damaged in transit.', 'We ship to most Tier 1 and Tier 2 cities across India.', 'Refunds are processed within 5–7 business days after pickup.', 'You can return electronics within 10 days of delivery if unopened.']
A: Yes, you can return a furniture item if it was damaged during transit.



In [None]:
queries = [
    "Can I return a damaged furniture item?",
    "How long does it take to get a refund?",
    "Is Cash on Delivery available for ₹10,000 orders?",
    "Where do you ship in India?",
    "How can I track my order?",
    "What is the return policy for electronics?"
]

for query in queries:
    print(f"Q: {query}")
    print("A:", answer_question_gemini(query))

Q: Can I return a damaged furniture item?
Retrieved Context: ['We ship to most Tier 1 and Tier 2 cities across India.', 'You can return electronics within 10 days of delivery if unopened.', 'Refunds are processed within 5–7 business days after pickup.', 'Cash on Delivery is available for orders under ₹5,000.', 'Furniture items are not eligible for return unless damaged in transit.', "Track your order anytime via the 'My Orders' section in your account."]
A: Yes, you can return a damaged furniture item if it arrived damaged during transit.

Q: How long does it take to get a refund?
Retrieved Context: ['You can return electronics within 10 days of delivery if unopened.', 'We ship to most Tier 1 and Tier 2 cities across India.', 'Refunds are processed within 5–7 business days after pickup.', 'Cash on Delivery is available for orders under ₹5,000.', "Track your order anytime via the 'My Orders' section in your account.", 'Furniture items are not eligible for return unless damaged in transi

In [None]:
#Why I have used Gemini Instead of OpenAI?
# I used Google's Gemini model instead of OpenAI GPT due to API rate limits and quota exhaustion on free OpenAI accounts, which often restrict seamless development and testing.
# Gemini provides a cost-free, flexible alternative with strong language understanding capabilities,making it suitable for prototyping OpenAI API and used Pinecone as given in the task.