### RAG → Expand KB with policies, FAQs, and past conversations; add chunking & vector retrieval.

In [1]:
# Run this cell once in the notebook (may take a minute)
!pip install --upgrade pip
!pip install langchain langchain-groq faiss-cpu sentence-transformers transformers python-dotenv

Collecting pip
  Downloading pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.2-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
   ---------------------------------------- 1.8/1.8 MB 16.0 MB/s eta 0:00:00


ERROR: To modify pip, please run the following command:
C:\Users\saket.khopkar\anaconda3\python.exe -m pip install --upgrade pip




In [2]:
import os
from dotenv import load_dotenv
from typing import List, Dict

load_dotenv()  # loads GROQ_API_KEY from .env if present
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

if not GROQ_API_KEY:
    raise ValueError("Please set GROQ_API_KEY in a .env file or environment variable before running this notebook.")

# LangChain + helpers
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Groq LLM wrapper for LangChain
from langchain_groq import ChatGroq

# For optional saving/inspection
import pickle, csv

print("GROQ key present:", bool(GROQ_API_KEY))

GROQ key present: True


In [3]:
# Embedder using sentence-transformers (free, local)
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)

# Chunking config
splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=80)

print("Embedding model:", EMBED_MODEL)

  embeddings = HuggingFaceEmbeddings(model_name=EMBED_MODEL)


Embedding model: sentence-transformers/all-MiniLM-L6-v2


In [9]:
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
from PyPDF2 import PdfReader
import os
from typing import List

kb_documents: List[Document] = []

policies_dir = "./policies"

if os.path.isdir(policies_dir) and any(os.scandir(policies_dir)):
    for entry in os.scandir(policies_dir):
        if entry.is_file():
            # Load .txt and .md files
            if entry.name.lower().endswith((".txt", ".md")):
                loader = TextLoader(entry.path, encoding="utf8")
                docs = loader.load()
                for d in docs:
                    d.metadata = {**d.metadata, "source": entry.name, "type": "policy"}
                kb_documents.extend(docs)

            # Load .pdf files
            elif entry.name.lower().endswith(".pdf"):
                pdf_reader = PdfReader(entry.path)
                full_text = ""
                for page in pdf_reader.pages:
                    text = page.extract_text()
                    if text:
                        full_text += text + "\n"
                if full_text.strip():
                    kb_documents.append(Document(
                        page_content=full_text,
                        metadata={"source": entry.name, "type": "policy"}
                    ))

    print(f"✅ Loaded {len(kb_documents)} docs from {policies_dir}.")

else:
    # Fallback sample KB
    samples = [
        {"source": "refund_policy.txt", "type": "policy",
         "text": "Refund Policy: Customers can request refunds within 30 days of purchase. Refunds are processed in 5-7 business days. Digital goods are non-refundable once downloaded."},
        {"source": "shipping_policy.txt", "type": "policy",
         "text": "Shipping Policy: Orders ship within 2 business days. Standard shipping: 5-7 days. Express shipping: 2-3 days. Free shipping for orders above $50."},
        {"source": "faq.csv", "type": "faq",
         "text": "FAQ: Q: How can I track my order? A: Use your tracking link or enter order id on /track. Q: How to contact support? A: Email support@example.com or use live chat 9am-6pm Mon-Fri."},
        {"source": "conv1.json", "type": "history",
         "text": "Conversation: User: I ordered 5 days ago and haven't received it. Agent: Please provide order id. User: 1234. Agent: It shows shipped 5 days ago."},
        {"source": "conv2.json", "type": "history",
         "text": "Conversation: User: I downloaded the eBook and request refund. Agent: Policy: digital goods non-refundable after download."}
    ]
    for s in samples:
        kb_documents.append(Document(
            page_content=s["text"],
            metadata={"source": s["source"], "type": s["type"]}
        ))

    print(f"⚠️ Using fallback sample KB: {len(kb_documents)} documents.")

✅ Loaded 1 docs from ./policies.


In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Split into smaller chunks for better retrieval
# These settings work well for both PDF and text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,    # Max characters per chunk
    chunk_overlap=100, # Overlap to keep context
    separators=["\n\n", "\n", ".", " ", ""]
)

split_docs = text_splitter.split_documents(kb_documents)
print(f"Split {len(kb_documents)} source docs into {len(split_docs)} chunks.")

Split 1 source docs into 11 chunks.


In [26]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Store documents in a FAISS vector DB
vectorstore = FAISS.from_documents(kb_documents, embeddings)

# Retriever for semantic search
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print("✅ Retriever is ready — can fetch relevant chunks from policies/")

✅ Retriever is ready — can fetch relevant chunks from policies/


In [30]:
from langchain_groq import ChatGroq
import os

# Initialize the Groq LLM
llm = ChatGroq(
    groq_api_key=os.getenv("GROQ_API_KEY"),
    model_name="llama-3.1-8b-instant"  # Recommended replacement for mixtral
)
print("✅ Groq LLM initialized with llama-3.1-70b-versatile")

✅ Groq LLM initialized with llama-3.1-70b-versatile


In [31]:
from langchain.chains import RetrievalQA

# Build RetrievalQA chain with source docs enabled
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

print("✅ RetrievalQA chain created — ready for Q&A")

✅ RetrievalQA chain created — ready for Q&A


In [32]:
while True:
    query = input("\nAsk a question (or type 'exit' to quit): ").strip()
    if query.lower() in ("exit", "quit"):
        break

    try:
        result = qa_chain.invoke({"query": query})

        print("\nAnswer:")
        print(result["result"])  # Safe — always a string

        print("\nSources:")
        for src_doc in result["source_documents"]:
            print(f"- {src_doc.metadata.get('source')} ({src_doc.metadata.get('type')})")

    except Exception as e:
        print(f"⚠️ Error: {e}")


Ask a question (or type 'exit' to quit):  What is Casual Leave



Answer:
According to the provided context, Casual Leave (CL) is a type of leave that is granted to employees for unforeseen situations or personal matters that require their attention. 

Here are some key characteristics of Casual Leave:

1. **Maximum 3 days in a month**: Employees can take a maximum of 3 Casual Leave days in a month.
2. **Permission in advance**: Employees need to take permission in advance for Casual Leave, except in cases of unforeseen situations.
3. **Minimum 0.5 to maximum 3 days**: Casual Leave can be taken for a minimum of 0.5 days to a maximum of 3 days.
4. **Not encashable**: Casual Leave is not encashable, meaning it cannot be converted into cash.
5. **No carry-forward**: Unused Casual Leave lapses automatically at the end of the year.
6. **Not clubbable with other leaves**: Casual Leave cannot be clubbed with Earned/Privileged Leave or Sick Leave.
7. **Pro-rata for new joiners and resigned employees**: Casual Leave for new joiners and resigned employees is 


Ask a question (or type 'exit' to quit):  Whom should inform before taking leaves



Answer:
According to the provided leave policy, employees need to apply for each leave and take approval except in cases where approval could not be taken in advance, usually for casual or sick leaves.

Sources:
- Leave-Policy.pdf (policy)



Ask a question (or type 'exit' to quit):  Who approves the leaves?



Answer:
According to the leave policy, the grant of leave shall depend upon the policies of the workplace and is at the discretion of the manager/management. This means that the manager or higher management has the authority to approve or reject leave applications.

Sources:
- Leave-Policy.pdf (policy)



Ask a question (or type 'exit' to quit):  As a male associate, can I take Maternity leave?



Answer:
No, you cannot take Maternity leave as per the provided context. Maternity Leave is covered by Maternity Benefit Act and is specifically mentioned to be applicable to female employees. It is a leave provided for female employees to take care of their pregnancy, childbirth, and related complications.

Sources:
- Leave-Policy.pdf (policy)



Ask a question (or type 'exit' to quit):  I have to go voting in my village, can i get a leave?



Answer:
I don't know if you will be able to get a leave to vote in your village. The leave policy document doesn't specifically mention "Leave for Voting" as an official type of leave, but it does mention "Leave for Voting" as being at the organization's discretion, along with other unpaid or half-paid leaves like Study Leave and Bereavement Leave.

Sources:
- Leave-Policy.pdf (policy)



Ask a question (or type 'exit' to quit):  I am a new joiner, what leaves can I take?



Answer:
As a new joiner, you are entitled to pro-rated leaves. This means that you will get a fraction of the total leaves based on the number of days you have worked.

You can take the following leaves as a new joiner:

1. Casual Leave (CL): You can take a maximum of 3 days of CL in a month, but since you are new, your CL will be pro-rated based on the number of days you have worked.
2. Sick Leave (SL): You can take a maximum of 7 days of SL, and your SL will also be pro-rated based on the number of days you have worked.
3. Earned Leave (EL) or Privilege Leave (PL): You will be entitled to a fraction of the EL or PL based on the number of working days you have completed. Since you are new, you will not have enough working days to accumulate a full EL or PL.

It's worth noting that you should check with your HR or management to confirm the details of your leave entitlement as a new joiner.

Sources:
- Leave-Policy.pdf (policy)



Ask a question (or type 'exit' to quit):  exit
