### Production Ready RAG : compression, model routing, caching, metrics

In [1]:
!pip install langchain langchain-groq chromadb langchain_community sentence-transformers faiss-cpu
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
import time, json, os
from dotenv import load_dotenv

load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

  You can safely remove it manually.


Collecting tokenizers<=0.20.3,>=0.13.2 (from chromadb)
  Using cached tokenizers-0.20.3-cp312-none-win_amd64.whl.metadata (6.9 kB)
INFO: pip is looking at multiple versions of transformers to determine which version is compatible with other requirements. This could take a while.
Collecting transformers<5.0.0,>=4.41.0 (from sentence-transformers)
  Downloading transformers-4.57.0-py3-none-any.whl.metadata (41 kB)
  Downloading transformers-4.56.2-py3-none-any.whl.metadata (40 kB)
  Using cached transformers-4.56.1-py3-none-any.whl.metadata (42 kB)
  Downloading transformers-4.56.0-py3-none-any.whl.metadata (40 kB)
  Downloading transformers-4.55.4-py3-none-any.whl.metadata (41 kB)
  Downloading transformers-4.55.3-py3-none-any.whl.metadata (41 kB)
  Downloading transformers-4.55.1-py3-none-any.whl.metadata (41 kB)
INFO: pip is still looking at multiple versions of transformers to determine which version is compatible with other requirements. This could take a while.
  Downloading transf

### Load the docs

In [2]:
import glob

# 🔹 Load all PDFs from 'policies' folder
policy_files = glob.glob("policies/*.pdf")

if not policy_files:
    raise FileNotFoundError("⚠️ No policy PDFs found inside 'policies/' folder!")

docs = []
for file in policy_files:
    loader = PyPDFLoader(file)
    docs.extend(loader.load())

print(f"✅ Loaded {len(policy_files)} documents with {len(docs)} total pages.")

# 🔹 Split into smaller chunks for better retrieval
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(docs)
print(f"✂️ Split into {len(chunks)} chunks.")

# 🔹 Use open-source HuggingFace embeddings (no API key needed)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# 🔹 Store embeddings in Chroma vector DB
vectorstore = Chroma.from_documents(chunks, embedding=embeddings, persist_directory="chroma_opt")
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

print("✅ Vector store ready and retriever initialized.")

✅ Loaded 3 documents with 15 total pages.
✂️ Split into 46 chunks.


  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

✅ Vector store ready and retriever initialized.


### Setup LLMs

In [3]:
small_model = ChatGroq(
    model="llama-3.1-8b-instant", 
    api_key=groq_api_key
)

large_model = ChatGroq(
    model="llama-3.3-70b-versatile", 
    api_key=groq_api_key
)

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


### Samrt Query Router

In [6]:
from langchain.chains import ConversationalRetrievalChain

qa_cache = {}  # Simple in-memory cache

def get_model_for_query(query):
    if len(query.split()) < 7:
        return small_model
    return large_model

def ask(query):
    if query in qa_cache:
        print("🧠 Cached answer:")
        return qa_cache[query]

    model = get_model_for_query(query)
    qa_chain = ConversationalRetrievalChain.from_llm(model, retriever=retriever, memory=memory)
    
    start = time.time()
    result = qa_chain.invoke({"question": query})
    end = time.time()
    
    qa_cache[query] = result["answer"]

    print(f"⏱ Time taken: {round(end - start, 2)}s")
    return result["answer"]

In [7]:
print("🧩 Optimized RAG Chat Ready — type 'exit' to quit.\n")

while True:
    q = input("You: ").strip()
    if q.lower() in ["exit", "quit"]:
        print("👋 Bye!")
        break
    print("🤖:", ask(q))

🧩 Optimized RAG Chat Ready — type 'exit' to quit.



You:  What is Casual Leave


⏱ Time taken: 0.46s
🤖: Casual Leave refers to a type of leave that an employee can take for personal or private reasons, such as family emergencies, personal appointments, or other non-work-related matters. It is typically granted by an employer to allow employees to attend to such matters without having to use their annual leave or other types of leave.


You:  What are POSH Policy Guidelines


⏱ Time taken: 0.64s
🤖: Based on the provided context, the POSH Policy Guidelines are as follows:

The POSH (Policy for Prevention of Sexual Harassment) guidelines are established to create and maintain a safe work environment free from sexual harassment and discrimination for all employees of BBIL SYSTEMS. These guidelines are based on the guidelines of "The Sexual harassment of women at workplace (prevention, prohibition & redressal) Act, 2013.

The key points of the POSH policy guidelines are:

- To adopt a zero-tolerance attitude against any kind of Sexual Harassment or discrimination.
- To prevent and prohibit any kind of sexual harassment or discrimination caused by any employee.
- To provide a safe and respectful work environment for all employees, including clients, vendors, and contractors.
- To establish a framework for the prevention, prohibition, and redressal of sexual harassment.

The POSH policy guidelines are applicable to all employees of BBIL SYSTEMS.


You:  exit


👋 Bye!


In [8]:
print("\n📊 Chat Summary — Total Cached Queries:", len(qa_cache))
for q, a in qa_cache.items():
    print(f"🧠 {q} → {a[:60]}...")


📊 Chat Summary — Total Cached Queries: 2
🧠 What is Casual Leave → Casual Leave refers to a type of leave that an employee can ...
🧠 What are POSH Policy Guidelines → Based on the provided context, the POSH Policy Guidelines ar...
