In [126]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from dotenv import load_dotenv

load_dotenv()

True

In [127]:
loader=DirectoryLoader("data", glob = "./*.pdf", loader_cls= PyPDFLoader)

In [128]:
data=loader.load()

In [129]:
data

[Document(metadata={'producer': 'Microsoft® Word 2021', 'creator': 'Microsoft® Word 2021', 'creationdate': '2025-12-24T00:03:40+05:30', 'author': 'CH Mouni', 'moddate': '2025-12-24T00:03:40+05:30', 'source': 'data\\Cancellation Policy.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="Cancellation Policy:- \nThe customer can choose to cancel an order any time before it's dispatched. The order \ncannot be canceled once it’s out for delivery. However, the customer may choose to reject it \nat the doorstep. \nThe time window for cancellation varies based on different categories and the order cannot \nbe canceled once the specified time has passed. \nIn some cases, the customer may not be allowed to cancel the order for free, post the \nspecified time and a cancellation fee will be charged. The details about the time window \nmentioned on the product page or order confirmation page will be considered final. \nIn case of any cancellation from the seller due to unforeseen c

In [130]:
import re

def clean_pdf_text(text: str) -> str:
    #Removing page numbers like "Page 1", "Page 2 of 5"
    text = re.sub(r'Page\s+\d+(\s+of\s+\d+)?', '', text, flags=re.IGNORECASE)

    #Removing repeated headers/footers
    text = re.sub(r'(Refund Policy|Cancellation Policy|Shipping Policy)', 
                  lambda m: m.group(0), text)

    #Removing extra newlines
    text = re.sub(r'\n{2,}', '\n', text)

    text = re.sub(r'\s+', ' ', text)

    return text.strip()


In [131]:
for doc in data:
    doc.page_content = clean_pdf_text(doc.page_content)


In [132]:
data[0].page_content

"Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation of any order. Flipkart also reserves the right to waive off or modify the time window or cancellation fee from time to time. Cancellation Policy – Hyperlocal:- The Orders placed by you on the Platform

In [133]:
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=150)
splitted_data = splitter.split_documents(data)

I choose a chunk size of 800 characters because policy rules often extend to multiple sentences if we use smaller chunks (like 200–300) can seperate a policy rule into half, separating the main rule from its exceptions, which can lead to incomplete or misleading answers. So, I choose 800 as chuck size,800 characters usually capture a complete policy rule and if we want we can use 1000 also. Policy documents are usually written in full paragraphs, and those paragraphs often explains one complete rule. With this large chuck size, most chunks naturally contain a whole idea instead of cutting it into half.

If the chunks is too small then, important details like exceptions or conditions could get separated, which would confuse the system during retrieval. If they were too big, the model might give extra, unrelated information. The chunk_overlap makes sure that nothing important information gets lost between chunks.

Overall, this chunk size just felt like the right value to use because it keeps the policy text together in a way that makes sense,and because no important information will get seperated,which leads to accurate and, more reliable answers.

In [134]:
splitted_data[0]

Document(metadata={'producer': 'Microsoft® Word 2021', 'creator': 'Microsoft® Word 2021', 'creationdate': '2025-12-24T00:03:40+05:30', 'author': 'CH Mouni', 'moddate': '2025-12-24T00:03:40+05:30', 'source': 'data\\Cancellation Policy.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the seller due to unforeseen circumstances, a ful

In [135]:
splitted_data[0].page_content

"Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation"

In [136]:
splitted_data[1].page_content

'seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation of any order. Flipkart also reserves the right to waive off or modify the time window or cancellation fee from time to time. Cancellation Policy – Hyperlocal:- The Orders placed by you on the Platform are non-cancellable and non-refundable via self serve under MINUTES delivery option owing to quick delivery times, except if cancellation/refund is requested via CX Agent under the following circumstances: • The Order could not be delivered within the estimated time that was displayed while placing the order; • The Order has not been picked by the Delivery Partner; • The Seller has not accepted or has canceled the Order due to reasons not'

In [137]:
embedding = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

In [138]:
db = FAISS.from_documents(splitted_data, embedding)

In [139]:
import pickle
file_path="vector_index.pkl"
with open(file_path, "wb") as f:
    pickle.dump(db, f)

In [140]:
if os.path.exists(file_path):
    with open(file_path,"rb") as f:
        vectori=pickle.load(f)

In [141]:
query="Until what stage of an order can a customer cancel it on Flipkart?"
result=vectori.similarity_search(query)
result

[Document(id='beb8b02b-1da4-484b-9cda-0eb13aca5ddf', metadata={'producer': 'Microsoft® Word 2021', 'creator': 'Microsoft® Word 2021', 'creationdate': '2025-12-24T00:03:40+05:30', 'author': 'CH Mouni', 'moddate': '2025-12-24T00:03:40+05:30', 'source': 'data\\Cancellation Policy.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content="Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the s

In [142]:
result[0].page_content

"Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation"

In [143]:
query="What option does a customer have if an order cannot be cancelled once it is out for delivery?"
result2=vectori.similarity_search(query,k=2)
result2

[Document(id='487b324a-5108-480f-9b31-2d6aaec22c4d', metadata={'producer': 'Microsoft® Word 2021', 'creator': 'Microsoft® Word 2021', 'creationdate': '2025-12-24T00:03:40+05:30', 'author': 'CH Mouni', 'moddate': '2025-12-24T00:03:40+05:30', 'source': 'data\\Cancellation Policy.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}, page_content='seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation of any order. Flipkart also reserves the right to waive off or modify the time window or cancellation fee from time to time. Cancellation Policy – Hyperlocal:- The Orders placed by you on the Platform are non-cancellable and non-refundable via self serve under MINUTES delivery option owing to quick delivery times, except if cancellation/refund is requested via CX Agent under the following circumstances: • The Order could not be delivered within the estimated time that was displayed while placing the o

In [66]:
content=""
for doc in result:
    content+=doc.page_content
    content+='.'

In [67]:
content

"Cancellation Policy:- The customer can choose to cancel an order any time before it's dispatched. The order cannot be canceled once it’s out for delivery. However, the customer may choose to reject it at the doorstep. The time window for cancellation varies based on different categories and the order cannot be canceled once the specified time has passed. In some cases, the customer may not be allowed to cancel the order for free, post the specified time and a cancellation fee will be charged. The details about the time window mentioned on the product page or order confirmation page will be considered final. In case of any cancellation from the seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation.seller due to unforeseen circumstances, a full refund will be initiated for prepaid orders. Flipkart reserves the right to accept the cancellation of any order. Flipkart also reserves the right to wai

In [49]:
llm=ChatGroq(
    model="llama-3.1-8b-instant"
)

In [None]:
from langchain_core.prompts import ChatPromptTemplate

initial_prompt = ChatPromptTemplate.from_template("""
you are an expert in answering questions about company policies.
Answer the following question based only on the provided context.
<context>
{context}
</context>
Question: {input}
""")

In [None]:
chain =initial_prompt|llm

In [69]:
query="Until what stage of an order can a customer cancel it on Flipkart?"
ans = chain.invoke({"context": content,"input": query})

In [70]:
ans

AIMessage(content="According to the Cancellation Policy mentioned in the context, a customer can choose to cancel an order any time before it's dispatched.", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 711, 'total_tokens': 738, 'completion_time': 0.032044664, 'completion_tokens_details': None, 'prompt_time': 0.064692523, 'prompt_tokens_details': None, 'queue_time': 0.068452247, 'total_time': 0.096737187}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_ff2b098aaf', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b4f4e-6a84-7a32-a04b-1aef86b06b74-0', usage_metadata={'input_tokens': 711, 'output_tokens': 27, 'total_tokens': 738})

In [71]:
ans.content

"According to the Cancellation Policy mentioned in the context, a customer can choose to cancel an order any time before it's dispatched."

In [None]:
improved_prompt=ChatPromptTemplate.from_template("""
You are an excellent policy question-answering assistant who will answers the user questions effiently and accurately and also you are an assistant who follow the given rules.

Rules:
1. Answer the given question only using the information provided in the context.
2. Do NOT use any prior knowledge or make assumptions on your own.
3. If the answer is not explicitly stated or is unclear, respond with:
   "I could not find this information in the provided policy documents."
4. Keep your response concise, factual, and easy to understand.

Context:
{context}

Question:
{input}

Answer Format:
- **Answer**: <clear answer based on context>
- **Source**: <name of the policy document>
""")

The initial prompt simply telling the model to answer questions using the provided context.In this prompt,I did not restrict the model from using outside knowledge or guessing the answer when the answer is unclear. It also does not specify how the final answer should be displayed.And also it doesn't include how the model should response when the answer is missing.

In the improved prompt,I added explicit rules that clearly tell the model to rely only on the retrieved context and not make any assumptions. I also included a structured response for cases where the information is missing or unclear, which helps prevent hallucinated answers. Additionally, the improved prompt enforces a structured output format, making the responses easier to read, more consistent, and easier to understand.

Overall, these changes make the assistant more reliable, and aligned with the goal of accurate policy-based question answering.

In [None]:
chain2=improved_prompt|llm

In [77]:
ans2=chain2.invoke({"context": content,"input": query})

In [78]:
ans2

AIMessage(content="**Answer**: The customer can cancel an order any time before it's dispatched.\n**Source**: Cancellation Policy", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 806, 'total_tokens': 829, 'completion_time': 0.033752352, 'completion_tokens_details': None, 'prompt_time': 0.045801656, 'prompt_tokens_details': None, 'queue_time': 0.050376913, 'total_time': 0.079554008}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_1151d4f23c', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b4f56-0bdb-7a23-91e1-ef7d8572b0be-0', usage_metadata={'input_tokens': 806, 'output_tokens': 23, 'total_tokens': 829})

In [80]:
print(ans2.content)

**Answer**: The customer can cancel an order any time before it's dispatched.
**Source**: Cancellation Policy


# 4.EVALUATION

In [86]:
def content_retrieval(query,k=4):
    result=vectori.similarity_search(query,k=2)
    content=""
    for doc in result:
        content+=doc.page_content
        content+='.'
    return content    

Answerable Questions

In [87]:
query1="Why does the cancellation time window vary across products?"
ans1 = chain2.invoke({"context":content_retrieval(query1),"input": query1})
print(ans1.content)

- **Answer**: The cancellation time window varies across products based on different categories.
- **Source**: Cancellation Policy


In [88]:
query2="What is the maximum order value allowed for Cash on Delivery (COD)?"
ans2 = chain2.invoke({"context":content_retrieval(query2),"input": query2})
print(ans2.content)

**Answer**: ₹50,000
**Source**: The policy document explaining the Cash on Delivery (C-o-D) payment method.


In [89]:
query3="Does Flipkart deliver items internationally?"
ans3 = chain2.invoke({"context":content_retrieval(query3,2),"input": query3})
print(ans3.content)

- **Answer**: As of now, Flipkart doesn't deliver items internationally.
- **Source**: Flipkart policy documents


Partially answerable

In [None]:
query4="How long does Flipkart take to resolve customer complaints?"
ans4 = chain2.invoke({"context":content_retrieval(query4,3),"input": query4})
print(ans4.content)

**Answer**: Flipkart is duty bound to provide fair treatment to our Consumer and Consumer grievances. However, it does not explicitly state the timeframe required to resolve customer complaints.
**Source**: Flipkart's Grievance Redressal Mechanism policy document


Same question with different prompt

In [97]:
query4="How long does Flipkart take to resolve customer complaints?"
ans4 = chain.invoke({"context":content_retrieval(query4,3),"input": query4})
print(ans4.content)

As per the information provided in the context, Flipkart's Grievance Redressal Mechanism has the following timeline for resolving customer complaints:

1. Upon receiving a Consumer Grievance, the Consumer shall receive an acknowledgment within 48 (Forty-Eight) hours through email, phone call, or SMS.
However, the context does not explicitly state the time frame for completely resolving the issue.


In [93]:
query5="Are cancellation fees always charged when canceling an order?"
ans5 = chain2.invoke({"context":content_retrieval(query5,5),"input": query5})
print(ans5.content)

- **Answer**: No, cancellation fees are not always charged when canceling an order. In some cases, customers may not be allowed to cancel the order for free, and a cancellation fee will be charged.
- **Source**: Our Cancellation Policy


Same question with different prompt

In [95]:
query5="Are cancellation fees always charged when canceling an order?"
ans5 = chain.invoke({"context":content_retrieval(query5,5),"input": query5})
print(ans5.content)

Based on the provided context, the answer is no. Cancellation fees are not always charged when canceling an order. According to the context, the customer may not be allowed to cancel the order for free after a specified time, but a cancellation fee will be charged. However, it does not explicitly state that a cancellation fee will always be charged. In some cases, like when the seller cancels the order due to unforeseen circumstances, a full refund will be initiated for prepaid orders, with no mention of any cancellation fee.


UnAnswerable Questions

In [102]:
query6="What happens if an international shipment is lost in transit?"
ans6 = chain2.invoke({"context":content_retrieval(query6,1),"input": query6})
print(ans6.content)

**Answer**: I could not find this information in the provided policy documents.
**Source**: Product Availability and Delivery Section


same question with different prompt

In [107]:
query6="What happens if an international shipment is lost in transit?"
ans6 = chain.invoke({"context":content_retrieval(query6,1),"input": query6})
print(ans6.content)

Unfortunately, the provided context does not specifically address what happens if an international shipment is lost in transit. The context does mention that 'Imported' items can take at least 10 days or more to be delivered, but it does not provide information on what happens if the shipment is lost during transit.


In [105]:
query7="What happens if a payment fails after money is debited from the bank?"
ans7=chain2.invoke({"context":content_retrieval(query7,1),"input": query6})
print(ans7.content)

**Answer**: I could not find this information in the provided policy documents.
**Source**: Policy Document


# 5.Edge Case Handling

Case 1: No Relevant Documents Found

In [124]:
question="Is there a paid express delivery guarantee with compensation?"
ans=chain2.invoke({"context":content_retrieval(question,1),"input": question})
print(ans.content)

**Answer**: No, there is no paid express delivery guarantee with compensation mentioned in the provided policy documents.
**Source**: Cancellation Policy – Hyperlocal


Case 2: Question Outside the Knowledge Base

In [117]:
question="What is Flipkart’s annual revenue?"
ans=chain2.invoke({"context":content_retrieval(question,1),"input": question})
print(ans.content)

**Answer**: I could not find this information in the provided policy documents.
**Source**: Flipkart policy documents
