<a href="https://colab.research.google.com/github/Nareshedagotti/RAG/blob/main/Adaptive_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install langchain groq faiss-cpu sentence-transformers PyPDF2 python-docx langchain-community

Collecting groq
  Downloading groq-0.20.0-py3-none-any.whl.metadata (15 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting python-docx
  Downloading python_docx-1.1.2-py3-none-any.whl.metadata (2.0 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3

Text Extraction from Pdfs or docs

In [None]:
from PyPDF2 import PdfReader
from docx import Document

def extract_text(file_paths):
    documents = []
    for file_path in file_paths:
        if file_path.endswith('.pdf'):
            with open(file_path, 'rb') as file:
                pdf_reader = PdfReader(file)
                text = "".join(page.extract_text() or "" for page in pdf_reader.pages)
                documents.append({"content": text, "source": file_path})
        elif file_path.endswith('.docx'):
            doc = Document(file_path)
            text = "\n".join(para.text for para in doc.paragraphs)
            documents.append({"content": text, "source": file_path})
    return documents

file_path = ["/content/Categories-of-Admission_1.pdf" ]
text = extract_text(file_path)
print(text)

[{'content': '4. CATEGORIES OF ADMISSION  \n \n4.1 Categories of admission in Ph.D programme:  \n \na. Regular scholars with institute fellowship [HTRA] or with other national level f ellowship or \nwith project s upport  or without f ellowship .   \n \nb. Regular scholars with institute fellowship, other national level fellowships and p roject \ncategories will be considered equivalent to HTRA.  The terms and conditions and eligibility \ncriteria applicable is as same as HTRA for the above candidates.   \nScholars selected Under  HTRA list can move to N -HTRA and revert back to HTRA later.  \n  \nc. Regular scholars who meets HTRA eligibility c riteria but offered under project c ategory can \nbe kept under HTRA Waiting List and converted  to HTRA l ater subject to recommendation \nof DSC/DC and fulfillment of terms and conditions applicable to conversion .  This option is \nnot appl icable to other national level fellowship s cholars.  \n \nd. Regul ar scholars who were IIT Madras pr

Splitting text into chunks

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_text(documents):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = []
    for doc in documents:
        split_docs = text_splitter.create_documents(
            [doc["content"]],
            metadatas=[{"source": doc["source"]}]
        )
        chunks.extend(split_docs)
    return chunks

chunks = chunk_text(text)
for i, chunk in enumerate(chunks, 1):
    print(f"\n🔹 **Chunk {i}:**\n{chunk.page_content}\n{'-'*50}")


🔹 **Chunk 1:**
4. CATEGORIES OF ADMISSION  
 
4.1 Categories of admission in Ph.D programme:  
 
a. Regular scholars with institute fellowship [HTRA] or with other national level f ellowship or 
with project s upport  or without f ellowship .   
 
b. Regular scholars with institute fellowship, other national level fellowships and p roject 
categories will be considered equivalent to HTRA.  The terms and conditions and eligibility 
criteria applicable is as same as HTRA for the above candidates.   
Scholars selected Under  HTRA list can move to N -HTRA and revert back to HTRA later.  
  
c. Regular scholars who meets HTRA eligibility c riteria but offered under project c ategory can 
be kept under HTRA Waiting List and converted  to HTRA l ater subject to recommendation 
of DSC/DC and fulfillment of terms and conditions applicable to conversion .  This option is 
not appl icable to other national level fellowship s cholars.
--------------------------------------------------

🔹 **Chunk 

Converting chunks to embeddings

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

def generate_embeddings(chunks):
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
    return embedding_model, chunks

embedding_model, chunks = generate_embeddings(chunks)

Storing embeddings in Vector DB

In [None]:
from langchain.vectorstores import FAISS

def store_in_db(chunks, embedding_model):
    vector_db = FAISS.from_documents(chunks, embedding_model)
    return vector_db

vector_db = store_in_db(chunks, embedding_model)

Converting query to embeddings

In [None]:
def query_to_embeddings(query, embedding_model):
    query_embedding = embedding_model.embed_query(query)
    return query_embedding

Retriving Top matches from Vector DB

In [None]:
# Assuming vector_db and embedding_model are already defined
def retrieve_top_matches(vector_db, query, k=5):
    """Retrieve top-k relevant documents from the vector database using LangChain similarity search."""
    # Perform similarity search directly with the query string
    top_matches = vector_db.similarity_search(query, k=k)
    # Optionally, get scores as well (if needed for feedback or debugging)
    docs_and_scores = vector_db.similarity_search_with_score(query, k=k)
    scores = [score for doc, score in docs_and_scores]
    return top_matches, scores

# Run Step 1
query = "categories of admission in ph.D programme?"
top_matches, scores = retrieve_top_matches(vector_db, query, k=5)
print("Top Matches Retrieved:")
for i, match in enumerate(top_matches):
    print(f"Match {i+1}: {match.page_content[:100]}... (Score: {scores[i]})")

Top Matches Retrieved:
Match 1: 4. CATEGORIES OF ADMISSION  
 
4.1 Categories of admission in Ph.D programme:  
 
a. Regular scholar... (Score: 0.7639802098274231)
Match 2: requirement within the time limit.   
iii) Leave not required for attending the courses.  
 
m. Fore... (Score: 0.8147319555282593)
Match 3: two 
b. For candidates who have marked option (iii), they would be first selected for regular 
seats... (Score: 0.9749877452850342)
Match 4: have gained at least 1 year experience in a project can be selected through a written test and 
inte... (Score: 0.9884757995605469)
Match 5: at IIT Madras under the supervision of a guide at IIT Madras. The feasibility of doing this 
with su... (Score: 1.0166704654693604)


Intialising LLM

In [None]:
from groq import Groq

def pass_to_llm(query, top_matches, api_key):
    """Generate response using Groq LLM."""
    context = "\n\n".join(match.page_content for match in top_matches)
    prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
    client = Groq(api_key=api_key)
    chat_completion = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="gemma2-9b-it"
    )
    return chat_completion.choices[0].message.content, top_matches

# Run Step 2
api_key = "Your_API_Key"
llm_response, returned_matches = pass_to_llm(query, top_matches, api_key)
print("\nLLM Response:")
print(llm_response)


LLM Response:
The provided text outlines several categories of admission for a PhD program. Here's a breakdown:

**General PhD Admission Categories**

* **a. Regular Scholars:**
    * **HTRA (Hostel Term Regular Award):**  Funded by the institute and have specific terms and conditions. They can move to and from the N-HTRA category.
    * **Other National Level Fellowships:**  Scholarships from UGC, CSIR, INSPIRE, etc. Treated the same as HTRA regarding terms and conditions.
    * **Project Category:** Funded by a project, but can be placed on the HTRA waiting list with potential conversion to HTRA later. 
* **b.  Direct PhD Admission:** Applicable to specific departments or programs, like the M.Tech-Ph.D Dual Degree in Engineering Design. Selection criteria are set by the departmental committee.
* **c. Foreign Nationals:**  Must meet specific eligibility requirements detailed elsewhere in the document.

* **d. JR Fellows:** Candidates with Junior Research Fellowships (JRF) from fundin

evaluating the llm response

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def evaluate_response(response, query, embedding_model, min_length=50, relevance_threshold=0.6):
    """Evaluate response quality using semantic relevance and length."""
    # Embed query and response
    query_embedding = embedding_model.embed_query(query)
    response_embedding = embedding_model.embed_query(response)  # Treat response as a "query" for embedding

    # Compute cosine similarity between query and response
    similarity = cosine_similarity(
        np.array(query_embedding).reshape(1, -1),
        np.array(response_embedding).reshape(1, -1)
    )[0][0]

    # Evaluate based on relevance and length
    if similarity < relevance_threshold:
        return "irrelevant"  # Response doesn’t match query semantically
    elif len(response) < min_length:
        return "insufficient"  # Response too short
    return "good"  # Response is relevant and long enough

feedback = evaluate_response(llm_response, query, embedding_model)
print("\nFeedback:", feedback)
print("Response Length:", len(llm_response))


Feedback: good
Response Length: 1995


Adaptive RAG Loop

In [None]:
def adapt_retrieval(vector_db, query, embedding_model, top_matches, feedback, k=5):
    """Adapt retrieval strategy based on feedback."""
    # Modify retrieval parameters based on feedback
    if feedback == "irrelevant":
        k = k + 2  # Increase k if the response was irrelevant
        print(f"Adapting: Increasing k to {k}")
        top_matches, scores = vector_db.similarity_search_with_score(query, k=k)

    elif feedback == "insufficient":
        refined_query = f"{query} provide more details"  # Add a prompt to request more details
        print(f"Adapting: Refining query to '{refined_query}'")
        top_matches, scores = vector_db.similarity_search_with_score(refined_query, k=k)

    else:
        # If feedback is "good", we don't change the query but return the matches and scores
        print("Adapting: No changes needed, feedback is 'good'.")
        top_matches, scores = vector_db.similarity_search_with_score(query, k=k)

    return top_matches, scores

# Print adapted top matches
print("\nAdapted Top Matches:")
for i, match in enumerate(top_matches):
    print(f"Match {i+1}: {match.page_content[:100]}... (Score: {scores[i]})")


Adapted Top Matches:
Match 1: 4. CATEGORIES OF ADMISSION  
 
4.1 Categories of admission in Ph.D programme:  
 
a. Regular scholar... (Score: 0.7639802098274231)
Match 2: requirement within the time limit.   
iii) Leave not required for attending the courses.  
 
m. Fore... (Score: 0.8147319555282593)
Match 3: two 
b. For candidates who have marked option (iii), they would be first selected for regular 
seats... (Score: 0.9749877452850342)
Match 4: have gained at least 1 year experience in a project can be selected through a written test and 
inte... (Score: 0.9884757995605469)
Match 5: at IIT Madras under the supervision of a guide at IIT Madras. The feasibility of doing this 
with su... (Score: 1.0166704654693604)


Adaptive Rag Final Response


In [None]:
def run_adaptive_rag(vector_db, query, embedding_model, api_key, max_iterations=3):
    """Run the full adaptive RAG loop."""
    top_matches, scores = retrieve_top_matches(vector_db, query)
    print("Initial Top Matches:")
    for i, match in enumerate(top_matches):
        print(f"Match {i+1}: {match.page_content[:100]}... (Score: {scores[i]})")

    iteration = 0
    while iteration < max_iterations:
        llm_response, top_matches = pass_to_llm(query, top_matches, api_key)
        print(f"\nIteration {iteration} Response:\n{llm_response}")

        feedback = evaluate_response(llm_response, query, embedding_model)
        print("Feedback:", feedback)

        if feedback == "good" or iteration == max_iterations - 1:
            return llm_response

        top_matches, scores = adapt_retrieval(vector_db, query, embedding_model, top_matches, feedback)
        print("Adapted Top Matches:")
        for i, match in enumerate(top_matches):
            print(f"Match {i+1}: {match.page_content[:100]}... (Score: {scores[i]})")
        iteration += 1

    return llm_response
final_response = run_adaptive_rag(vector_db, query, embedding_model, api_key)
print("\nFinal Response:\n", final_response)

Initial Top Matches:
Match 1: 4. CATEGORIES OF ADMISSION  
 
4.1 Categories of admission in Ph.D programme:  
 
a. Regular scholar... (Score: 0.7639802098274231)
Match 2: requirement within the time limit.   
iii) Leave not required for attending the courses.  
 
m. Fore... (Score: 0.8147319555282593)
Match 3: two 
b. For candidates who have marked option (iii), they would be first selected for regular 
seats... (Score: 0.9749877452850342)
Match 4: have gained at least 1 year experience in a project can be selected through a written test and 
inte... (Score: 0.9884757995605469)
Match 5: at IIT Madras under the supervision of a guide at IIT Madras. The feasibility of doing this 
with su... (Score: 1.0166704654693604)

Iteration 0 Response:
The Ph.D. program admissions are categorized in the following ways: 

**1. Regular Scholars:**

*  **a. With Institute Fellowship (HTRA):**
    *  HTRA scholars are considered the primary category.
    *  They can move to N-HTRA and revert back to HTR

In [None]:
import numpy as np

# Define rewards grid (4x4 grid)
rewards = np.array([
    [-1, -1, -1, 10],   # Row 0 (Treasure at column 3)
    [-1, -1, 10, -1],   # Row 1
    [-1, 10, -1, -1],   # Row 2
    [10, -1, -1, -1]    # Row 3
])

def choose_action(state):
    return np.argmax(rewards[state])  # Choose action with highest reward

# Test the agent
for state in range(4):
    action = choose_action(state)
    print(f"State {state}: Best action → Column {action}")