###What is Adaptive RAG?
Adaptive RAG is an enhancement over traditional RAG. It adds intelligence in deciding:

- When to retrieve (for complex or ambiguous queries).

- When not to retrieve (if the LLM is confident enough).

- Optionally uses honesty probes or confidence thresholds to balance between internal LLM memory and external vectorstore knowledge.

In [None]:
# ===================== INSTALL DEPENDENCIES =====================
!pip install -q langchain langchain-community langchain-core langchain-groq sentence-transformers faiss-cpu pypdf

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m45.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.1/131.1 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m50.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# ===================== IMPORTS =====================
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import (
    RunnableLambda, RunnableMap, RunnablePassthrough
)
from langchain_core.output_parsers import StrOutputParser

In [None]:
# ===================== MOUNT GOOGLE DRIVE =====================
from google.colab import drive
# drive.mount('/content/drive')

# Define your folder path with PDFs
pdf_folder = "/content/primay-health-centres.pdf"
pdf_paths = [pdf_folder]#os.path.join(pdf_folder, file) for file in os.listdir(pdf_folder) if file.endswith(".pdf")]

In [None]:
# ===================== LOAD AND SPLIT =====================
all_docs = []
for pdf in pdf_paths:
    loader = PyPDFLoader(pdf)
    docs = loader.load_and_split()
    all_docs.extend(docs)

In [None]:
# ===================== SPLIT INTO CHUNKS =====================
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(all_docs)
print(f"Total chunks: {len(docs)}")

Total chunks: 97


In [None]:
# ===================== EMBEDDINGS + VECTORSTORE =====================
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embedding_model)
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 5}, lambda_mult=0.3)

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# ===================== DEFINE LLM =====================
from google.colab import userdata

llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",  # matches Groq deployment
    api_key=userdata.get("GROQ_API_KEY")
)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7f7a7462bb50>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7f7a6d1627d0>, model_name='llama-3.3-70b-versatile', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [None]:
# ===================== PROMPTS =====================
query_rewriter_prompt = ChatPromptTemplate.from_template(
    "You are a helpful AI assistant. Rephrase the following question to be more clear and specific for retrieval:\n\n"
    "Original: {question}\n\nRephrased:"
)

context_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a clinical assistant AI. Use the context to answer medical questions clearly."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])

In [None]:
# ===================== QUERY REWRITER CHAIN =====================
query_rewriter_chain = query_rewriter_prompt | llm | StrOutputParser()

In [None]:
# ===================== CONFIDENCE SCORER =====================
confidence_prompt = ChatPromptTemplate.from_template(
    "How confident are you in answering this question from your own knowledge, without external help?\n"
    "Question: {question}\nRespond with one word only: High or Low."
)

confidence_chain = confidence_prompt | llm | StrOutputParser()

def should_retrieve_based_on_confidence(question: str) -> bool:
    confidence = confidence_chain.invoke({"question": question})
    return "low" in confidence.lower()

In [None]:
# ===================== RETRIEVER WRAPPER (WITH SOURCE FLAG) =====================
def adaptive_retriever(inputs):
    question = inputs["question"]
    if should_retrieve_based_on_confidence(question):
        improved_query = query_rewriter_chain.invoke({"question": question})
        docs = retriever.get_relevant_documents(improved_query)
        context = "\n\n".join([doc.page_content for doc in docs])
        source = "retrieved"
    else:
        context = ""
        source = "llm_only"
    return {"context": context, "question": question, "source": source}

In [None]:
# ===================== FINAL RAG CHAIN WITH SOURCE METADATA =====================
def include_source_in_output(output, source):
    return {
        "answer": output.content,
        "source": source
    }

rag_chain = (
    RunnableLambda(adaptive_retriever)
    | RunnableMap({
        "output": context_prompt | llm,
        "source": lambda x: x["source"]  # propagate source flag
    })
    | RunnableLambda(lambda x: include_source_in_output(x["output"], x["source"]))
)
rag_chain

RunnableLambda(adaptive_retriever)
| {
    output: ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a clinical assistant AI. Use the context to answer medical questions clearly.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Context:\n{context}\n\nQuestion: {question}'), additional_kwargs={})])
            | ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7f7a7462bb50>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7f7a6d1627d0>, model_name='llama-3.3-70b-versatile', model_kwargs={}, groq_api_key=SecretStr('**********')),
    source: RunnableLambda(...)
  }
| RunnableLambda(...)

In [None]:
# ===================== RUN THE ADAPTIVE RAG (DOC) =====================
question = "What proforma for Facility Survey for PHC on IPHS?"
response = rag_chain.invoke({"question": question})
print(f"Answer:\n{response['answer']}\n\nSource: {response['source']}")

  docs = retriever.get_relevant_documents(improved_query)


Answer:
The proforma for Facility Survey for PHC on IPHS includes the following information:

1. **Identification**:
   - Name of the State
   - District
   - Tehsil/Taluk/Block
   - Location & Name of PHC
   - Is the PHC providing 24 hours and 7 days delivery facilities?
   - Date of Data Collection
   - Name and Signature of the Person Collecting Data

2. **Facility Details**:
   - Number of beds available
   - Bed Occupancy Rate in the last 12 months
   - Average daily OPD Attendance (Males and Females)
   - Treatment of specific cases (yes/No):
     - Is the primary management of wounds done at the PHC?
     - Is the primary management of fracture done at the PHC?

3. **Services**:
   - Population covered (in numbers)
   - Type of PHC (Type A or Type B)
   - Assured Services available (yes/No):
     - OPD Services
     - Emergency services (24 Hours)
     - Referral Services
     - In-patient Services

4. **Infrastructure and Behavioral Aspects**:
   - Details about store room, kit

In [None]:
# ===================== RUN THE ADAPTIVE RAG (LLM) =====================
question = "Who is Owner of BMW?"
response = rag_chain.invoke({"question": question})
print(f"Answer:\n{response['answer']}\n\nSource: {response['source']}")

Answer:
The owner of BMW is not an individual, but rather a complex corporate structure. BMW is a publicly-traded company, listed on the Frankfurt Stock Exchange. As such, it is owned by its shareholders. The largest shareholders of BMW include:

1. Stefan Quandt (29%): A German billionaire and member of the Quandt family, which has been involved with BMW for many years.
2. Susanne Klatten (21%): A German billionaire and member of the Quandt family.
3. Public float (around 50%): The remaining shares are held by institutional and individual investors, including pension funds, mutual funds, and private investors.

So, while there isn't a single "owner" of BMW, the Quandt family has a significant stake in the company and plays an important role in its governance and direction.

Source: llm_only
