### In this section, I will build an Agentic RAG

Now I have a Rerank RAG which can retrieve relevant medicine documents based on the query.

But why I need an **Agentic RAG**?

Becuase Rerank RAG can only similarity search documents by query. If the query contains no relevant information linked to the documents in vector database, it is not able to retrieve relevant docs.

I describe some real-life scenarios we can have below:

### Problem description:

In real conversation, users can ask anything we can not predict ahead. 

For example:
In the third turn the user really want to ask 'How do I take Phenylephrine?'

But he types 'How do I take it?'. From the context, 'it' means 'Phenylephrine'.

If we retrieve documents by query 'How do I take it?', we can get unrelevant document.  'How do I take Phenylephrine?' makes more sense.

Other scenarios:

1. In first turn, a user just greet without any question.
2. User ask a random question in the middle of conversation.
3. .........

### Analysis:

The root problem is how to determine whether a query is a clinial/medical query and whether a query is related previous conversation.

### Solution:

#### To handle all those, I will put a local LLM as a master agent to determine what to do next based on different situation.
#### So I will involve Basic RAG, langgraph, memory, local LLM, wiki search tool... working together to make the RAG can retrieve real relevant documents by itself.

### Implementation:

* I will involve an agent to decide what to do next based on the query and history conversation. 
* Then, the agent will execute the task and observe the result to decide again..... until get a proper result.

In [None]:
# My own libraries
from mytools import best_dtype, best_device, login_huggingface
from rerank_rag import Rerank_RAG

import os
import json
import re
import copy
import time
import torch
import uuid
import random
import settings

from typing import TypedDict, List, Literal, Any, Dict
from langgraph.graph import StateGraph, START, END
from langchain_core.documents import Document
from sentence_transformers import CrossEncoder
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory # Short-term Memory
from langchain_core.messages import BaseMessage
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain import hub
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain.output_parsers import OutputFixingParser
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

#### As I mentioned previously:

Bio-Medical-Llama-3-8B model is a specialized large language model designed for biomedical applications. It is finetuned from the meta-llama/Meta-Llama-3-8B-Instruct model using a custom dataset containing over 500,000 diverse entries. These entries include a mix of synthetic and manually curated data, ensuring high quality and broad coverage of biomedical topics.

The model is trained to understand and generate text related to various biomedical fields, making it a valuable tool for researchers, clinicians, and other professionals in the biomedical domain.

@misc{ContactDoctor_Bio-Medical-Llama-3-8B, author = ContactDoctor, title = {ContactDoctor-Bio-Medical: A High-Performance Biomedical Language Model}, year = {2024}, howpublished = {https://huggingface.co/ContactDoctor/Bio-Medical-Llama-3-8B}, }

In [None]:
model_id = "ContactDoctor/Bio-Medical-Llama-3-8B"

cross_encoder_model_id = "ncbi/MedCPT-Cross-Encoder" 

In [3]:
login_huggingface() 

Login HuggingFace!


In [4]:
# Load a HuggingFace model. Inference it from local GPU.

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype = best_dtype(),
    device_map={"":best_device()},     
    low_cpu_mem_usage=True     
)
print("Load tokenizer and base model done!")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Load tokenizer and base model done!


In [5]:
print(model)                    # full architecture tree (long but useful)
print(model.config)             # core hyperparameters (dims, layers, heads…)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_

In [6]:
original_pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,  
    temperature=0.1,  
    return_full_text=False,   
)

# Wrapper normal piple with huggingfacepipeline
hug_pipeline = HuggingFacePipeline(pipeline=original_pipeline)

master_llm = ChatHuggingFace(llm=hug_pipeline) # It is the brain of the whole system

Device set to use cuda


In [None]:
cross_encoder = CrossEncoder(cross_encoder_model_id)

### Master LLM is ready. Next RAG....

In [None]:
rag = Rerank_RAG()
rag.build_medicine_retriever()
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

In [7]:
# Define a short-term memory

class Short_Term_Memory():
    def __init__(self) -> None: 
        """Initialize the message container and current session id """       
        self.session_store: dict[int,BaseChatMessageHistory] = {}
        self.current_session_id: int = 0

    def get_history(self, session_id: int) -> BaseChatMessageHistory:    
        """return history messages by sessionId"""    
        self.current_session_id = session_id
        if session_id not in self.session_store:
            self.session_store[session_id] = ChatMessageHistory()
        return self.session_store[session_id]
    
    def get_current_history(self) -> BaseChatMessageHistory:
        """return history messages for current session"""
        return self.get_history(self.current_session_id)
    
    def add_message(self, session_id: int, message: BaseMessage) -> None:
        history_messages = self.get_history(session_id)
        if len(history_messages.messages) >= 5: # Only keep the recent 5 messages
            del history_messages.messages[0] # Remove the first message
            history_messages.add_message(message)
    
    def delete_history(self, session_id: int) -> bool:
        """delete history messages by sessionId"""
        if session_id in self.session_store:
            deleted = self.session_store.pop(session_id)
            if deleted:
                return True
            else:
                return False
        return True
    
    def delete_current_history(self) -> bool:
        """delete history messages for current session"""
        return self.delete_history(self.current_session_id)
    
# Convert a history chat message to a string
def history_as_text(history: BaseChatMessageHistory) -> str:
    """convert history messsages into a string"""
    return "\n".join([
        f"{m.type.upper()}: {m.content}"   # e.g. "HUMAN: …" or "AI: …"
        for m in history.messages])

In [None]:
class AgentState(TypedDict):
    """
    Represents the state of the graph.

    Attributes:
        session_id: current session id
        query: user's query or augmented query
        retrieved_doc: retrieval docment    
        grade: keep the binary score for every router node to make decision   
        wiki_used: Flag whether it already used Wiki search
    """
    session_id: int
    query: str
    retrieved_doc: str
    grade: dict
    wiki_used: bool      # Avoid infinity loop in graph
    rewrite_counter: int # Avoid infinity loop in graph

In [9]:
# Initialize a global short-term memory for all users
settings.SHORT_TERM_MEMORY = Short_Term_Memory()

In [None]:
# Unit test function

def unit_test(action_func, decide_function):
    questions = [
        "Why doesn't my friend play tennis with me?",
        "What is the tallest mountain in South America?",
        "How does a solar eclipse occur?",
        "Can you explain how blockchain technology works?",
        "What are the main differences between classical and jazz music?",
        "If you could visit any planet in our solar system, which would you choose and why?",
        "My nasal is disconfort. Do you have a medicine to relieve sinus congestion and pressure?",
        "What are the common side effects of taking ibuprofen daily?",
        "Which symptoms usually appear first in a case of seasonal influenza?",
        "Is it safe to take antihistamines and decongestants at the same time?",
        "What are the warning signs of a severe allergic reaction?",
        "How can I tell the difference between a tension headache and a migraine?"
    ]

    session_id = 1

    for query in questions:
        state = AgentState(session_id=session_id,query=query,wiki_used=False)
        state = action_func(state)
        result = decide_function(state)
        print(f"Decision: {result} \n")
        print(f"State: {state} \n")

#### Local and small LLMs usually has no robust structured output. So I have to prepare for all possible results it might output.

In [None]:
def _normalize_score(text: str) -> str:
    """Force 'yes'/'no' from messy text."""
    t = text.strip().lower()
    if t in {"yes", "y", "true","ok", "1"}:
        return "yes"
    if t in {"no", "n", "false", "0"}:
        return "no"
    # heuristics: ambiguous/under-specified → "no"
    return "no"


def _extract_json_like(s: str) -> Dict[str, Any]:
    """
    Try hard to find {"score": "..."} inside messy output.
    """
    # 1) quick regex for a minimal JSON object with score
    m = re.search(r'\{[^{}]*"score"\s*:\s*"(?P<score>yes|no|true|false)"[^{}]*\}', s, flags=re.I)
    if m:
        return {"score": _normalize_score(m.group("score"))}

    # 2) fall back: if the model just said "yes"/"no" without JSON
    yn = re.search(r'\b(yes|no|true|false)\b', s, flags=re.I)
    if yn:
        return {"score": _normalize_score(yn.group(1))}

    # 3) last resort default
    return {"score": "no"}

def robust_binary_grader(prompt: PromptTemplate, query: str, document: str = "") ->dict:
    """ 
    Make sure robustly parse the grade result of Local LLM 
    """
    # Base parser (strict JSON with a single key)
    base_parser = JsonOutputParser(pydantic_object=None, json_kwargs={"strict": False})
    # Auto-fixing parser: if model outputs invalid JSON, it asks the LLM to repair
    fixing_parser = OutputFixingParser.from_llm(parser=base_parser, llm=master_llm)
       # Lower temperature for determinism
    chain = prompt | master_llm | fixing_parser
    result = None
    try:
        # First attempt: LLM → (auto-fixing) parser
        if document == "":
            result = chain.invoke({"question": query})  
        else:
            result = chain.invoke({"question": query, "document": document})        
        # result may already be a dict (from parser), but be defensive:
        if isinstance(result, dict) and "score" in result:
            score = _normalize_score(str(result["score"]))
            return {"score": score}  # exact contract            

        # If parser returned a string (some models), try to json-load or extract
        if isinstance(result, str):
            try:
                json_obj = _extract_json_like(result)
                score = _normalize_score(str(json_obj.get("score", "")))
                return {"score": score}                
            except Exception:
                pass
        # Fall through to the worst baseline
        if any(r in str(result).lower() for r in ["yes", "true"]):
            return {"score": "yes"}
        else:
            return {"score": "no"}       

    except Exception as e:
        # Hard fallback path if LLM/parse fails entirely
        print(f"[grade_selfcontained_query] Warning: parse failed: {e} \n") 

        return {"score": "no"}

#### Local LLMs usually output things without control. So I have to handle with all possible results it might output.

In [None]:
def _clean_one_line_question(text: str, fallback: str, max_len: int = 100) -> str:
    """
    Make whatever the LLM returned into a clean single-line question.
    - strip code fences, quotes, labels
    - collapse whitespace
    - take the first question-looking sentence if multiple
    - ensure it ends with '?'
    - length-limit (soft)
    """
    if not isinstance(text, str):
        text = str(text or "")

    t = text.strip()

    # remove common code fences or labels
    t = re.sub(r"^`{3,}.*?\n|\n`{3,}$", "", t, flags=re.S)        # ```...```
    t = re.sub(r"^(re.?written|improved|final|answer)\s*:\s*", "", t, flags=re.I)
    t = re.sub(r"^\"|\"$", "", t)  # trim surrounding quotes
    t = re.sub(r"^'+|'+$", "", t)  # trim surrounding single quotes

    # collapse to one line
    t = " ".join(t.split())

    # If LLM returned multiple sentences, try to pick the first question-like sentence.
    # Prefer the first chunk that ends with '?'
    m = re.search(r"([^?]{3,}\?)", t)
    if m:
        t = m.group(1).strip()

    # If still no question mark, try to cut at a sentence boundary and add '?'
    if "?" not in t:
        # take up to first period/exclamation if present, else keep entire
        m2 = re.split(r"[.!]", t, maxsplit=1)
        candidate = m2[0].strip()
        # guard against empty
        if len(candidate) >= 3:
            t = candidate
        if not t.endswith("?"):
            t = t.rstrip("?") + "?"

    # truncate softly (avoid cutting mid-word)
    if len(t) > max_len:
        t = t[:max_len].rsplit(" ", 1)[0].rstrip("?,.;:! ") + "?"

    # last resort fallback
    if len(t) < 3:
        t = fallback.strip()
        if not t.endswith("?"):
            t += "?"

    return t


def robust_question_generater(prompt: PromptTemplate, query: str, document: str = "") -> str:
    """ 
    Make sure robustly extract the question Local LLM generates.
    """
    chain = prompt | master_llm | StrOutputParser()
    try:
        raw = chain.invoke({"question": query, "document": document})
        result = _clean_one_line_question(raw, fallback=query, max_len=50)
        return result
    except Exception:
        # hard fallback: if model call fails, return original as a question
        return query if query.endswith("?") else (query + "?")

In [None]:
def wiki_to_json(s: str):
    """ 
    convert the docs from Wikipedia into a list of json objects
    """
    records = [r.strip() for r in s.strip().split("\n\n") if r.strip()]

    data = []
    for record in records:
        page_match = re.search(r"Page:\s*(.+)", record)
        summary_match = re.search(r"Summary:\s*(.+)", record, re.DOTALL)
        if page_match and summary_match:
            data.append({
                "Page": page_match.group(1).strip(),
                "Summary": summary_match.group(1).strip()
            })

    return data

In [None]:
#Action Node
def grade_selfcontained_query(state: AgentState) -> AgentState:
    """
    Determine whether a query is meaningful, clear, and self-contained
    without relying on prior conversation context.    
    """
    query = state["query"]

    print(f"===Step {settings.STEP}. Got a new query: {query}===\n")
    print("I will check if the query is self-contained.\n")  
    
    prompt = PromptTemplate(
        template="""You are a grader for a question. \n 
        You need to determine if a question is meaningful, clear, self-contained without any ambiguity, if you don't know the conversation context. \n    
        Here is the user's question: {question} \n   
        Give a binary score 'yes' or 'no' score to indicate whether the question is meaningful and self-contained. \n     
        Only provide the binary score as a JSON object with a single key 'score', for example {{"score": "yes"}} or {{"score": "no"}}. No premable or explanation.""",
        input_variables=["question"],
    )

    state["grade"] = robust_binary_grader(prompt=prompt, query=query)
    settings.STEP += 1
    return state

In [None]:
#Decision Node
def decide_selfcontained_query(state: AgentState) -> str:
    """ 
    If it's a self-contained query, go to grader node for clinical checking.
    If it's not a self-contained query, go to grader node for history related checking. 
    """
    if state['grade']["score"] == "yes":
        print(f"The query is a self-contained one. We don't need to augment it. Let's check if it is a clinical query.\n")
        return "grade_clinical"
    else:
        print(f"The query is not self-contained. Let's check if it is related to history conversation. \n")
        return "grade_related_history"

In [None]:
unit_test(grade_selfcontained_query, decide_selfcontained_query)

In [None]:
#Action Node
def grade_clinical_query(state: AgentState) -> AgentState:
    """
    Determine whether a query is about medicine, clinical questions
    without relying on prior conversation context.    
    """
    query = state["query"]

    print(f"===Step {settings.STEP}. The query is: {query}===\n")
    print("I will check if the query is about medicine or clinical questions.\n")   
    
    prompt = PromptTemplate(
        template="""You are a grader for a question.
        You need to determine if the user's question is a clinical/medical question.
        Consider clinical if it asks about diagnosis, symptoms, treatment, medications (dose, interactions, side effects), test/lab interpretation, procedures, triage ("should I see a doctor/ER?"), risks/prognosis, or health advice for humans or animals.
        Non-clinical includes general health trivia/news, biology concepts without personal care decisions, admin/insurance/scheduling, or unrelated topics.
        Here is the user's question: {question} \n
        Give a binary score 'yes' or 'no' to indicate whether it is a clinical question.
        Only provide the binary score as a JSON with a single key 'score', for example {{"score": "yes"}} or {{"score": "no"}}.
        No preamble or explanation.""",
        input_variables=["question"],
    )    

    state["grade"] = robust_binary_grader(prompt=prompt, query=query)
    settings.STEP += 1
    return state

In [None]:
#Decision Node
def decide_clinical_query(state: AgentState) -> str:
    """ 
    If it is a clinical query and self-contained, go to retrieve node directly.
    If it is not a clinical query at all, go to return_sorry node.
    """
    if state['grade']["score"] == "yes":
        print(f"The query is a clinical one. We can retrive some documents now.\n")
        return "retrieve"
    else:
        print(f"The query is not clinical query. I have nothing to do with it. \n")
        return "return_sorry"

In [None]:
unit_test(grade_clinical_query, decide_clinical_query)

In [None]:
#Action Node
def grade_history_related_query(state: AgentState) -> AgentState:
    """
    Determine whether a query is related to history conversations.    
    """
    query = state["query"]
    history = settings.SHORT_TERM_MEMORY.get_history(state["session_id"])
    history_conversation = history_as_text(history)

    print(f"===Step {settings.STEP}. The query is: {query}===\n")
    print("I will check if the query is related to history conversations. \n")   
    print(f"history conversations: {history_conversation} \n")   
    
    prompt = PromptTemplate(
        template="""You are a grader assessing relevance between the user's current question and history conversation. \n 
        Here is current question: {question} \n
        Here is the history conversations: \n {document} \n        
        Give a binary score 'yes' or 'no' to indicate whether the question is related to history conversations.
        Only provide the binary score as a JSON with a single key 'score', for example {{"score": "yes"}} or  {{"score": "no"}}.
        No premable or explanation.""",
        input_variables=["question", "document"],
    )    

    state["grade"] = robust_binary_grader(prompt=prompt, query=query, document=history_conversation)
    settings.STEP += 1
    return state

In [None]:
#Decision Node
def decide_history_related_query(state: AgentState) -> str:
    """ 
    If the query is related to the history conversation, but it is not self-contained. Go to rewrite node to augment the query.
    If the query is not related to the history. Go to "return_sorry" node.
    """
    if state['grade']["score"] == "yes" and state['rewrite_counter'] < 3:
        print(f"The query is related to the history. But it is not self-contained. Let's re-write it.\n")
        return "rewrite"
    else:
        print(f"The query is not related to the history. I have nothing to do with it. \n")
        return "return_sorry"

In [None]:
# Testing
history = settings.SHORT_TERM_MEMORY.get_history(session_id = 1)
history.add_user_message("hi, there!")
history.add_ai_message("hi, how can I help you?")
history.add_user_message("My nasal is disconfort. Do you have a medicine to relieve sinus congestion and pressure?")
history.add_ai_message("phenylephrine is used to relieve nasal discomfort caused by colds, allergies, and hay fever. it is also used to relieve sinus congestion and pressure. phenylephrine will relieve symptoms but will not treat the cause of the symptoms or speed recovery. phenylephrine is in a class of medications called nasal decongestants. it works by reducing swelling of the blood vessels in the nasal passages.about Phenylephrine")

In [None]:
# Should be related to history
state = AgentState(session_id=1, query="How can I take it?",wiki_used=False)

state = grade_history_related_query(state)

result = decide_history_related_query(state)

print(f"Decision: {result} \n")
print(f"State: {state} \n")
# Related case
state = AgentState(session_id=1, query="where can I buy it?",wiki_used=False)

state = grade_history_related_query(state)

result = decide_history_related_query(state)

print(f"Decision: {result} \n")
print(f"State: {state} \n")

In [None]:
# Should be not related to history
state = AgentState(session_id=1, query="Why doesn't my friend play tennis with me?",wiki_used=False)

state = grade_history_related_query(state)

result = decide_history_related_query(state)

print(f"Decision: {result} \n")
print(f"State: {state} \n")

In [None]:
#Action Node
def rewrite_query(state: AgentState) -> AgentState:
    """
    Determine whether a query is related to history conversations.    
    """
    query = state["query"]
    history = settings.SHORT_TERM_MEMORY.get_history(state["session_id"])
    history_conversation = history_as_text(history)

    print(f"===Step {settings.STEP}. The query is: {query}===\n")
    print("I am rewriting the query so that I can retrieve relevant documents with a new query. \n")   
    print(f"history conversations: {history_conversation} \n")   
    
    prompt = PromptTemplate(
        template="""You are question re-writer that converts an input question to a better version that is optimized \n 
        for vectorstore retrieval. Use the history conversation to resolve references. Keep the contextual meaning. \n
        Here is the history conversation: \n\n {document} \n\n
        Here is the initial question: \n\n {question}. \n
        Improved question with no preamble.""",
        input_variables=["question", "document"],
    )

    new_query = robust_question_generater(prompt=prompt, query=query, document=history_conversation)
    state['rewrite_counter'] += 1
    state = AgentState(session_id=state["session_id"], query=new_query, rewrite_counter=state['rewrite_counter']) # Avoid infinity loop in graph.
    settings.STEP += 1
    return state

In [None]:
# Should be related to history
state = AgentState(session_id=1, query="How can I take it?",wiki_used=False)

state = rewrite_query(state)

print(f"State: {state} \n")

In [None]:
#Action Node
def return_without_docs(state: AgentState) -> AgentState:
    """ 
    When the query has nothing to do with clinical topic or retrieval documents are not relevant to the query, 
    Then return 'sorry...' 
    """
    print(f"===Step {settings.STEP} ===\n")
    apology_sentences = [
    "I'm sorry, but I wasn't able to find any documents that match your request right now.",
    "Apologies for the inconvenience—our system couldn't locate relevant information for that query.",
    "I'm sorry, I couldn't retrieve the documents you're looking for at the moment.",
    "I apologize that no relevant results were found. I'll keep improving to serve you better.",
    "Sorry about that! I wasn't able to pull up the information you need this time."
    ]

    state["retrieved_doc"] = random.choice(apology_sentences)
    print(state["retrieved_doc"])
    settings.STEP += 1
    return state

In [None]:
#Action Node
def return_with_docs(state: AgentState) -> AgentState:
    """ 
    When it successfully retrieved relevant documents, 
    Then return 
    """
    print(f"===Step {settings.STEP} ===\n")        
    print("I am happy to get what you need!\n")
    settings.STEP += 1
    return state

In [None]:
#Action Node
def save_to_memory(state: AgentState) -> AgentState:
    """ 
    Before End, save user's query and final answer to memory 
    """   
    print(f"===Step {settings.STEP} ===\n")
    print("I am saving the user query and RAG response to memory.\n")  
    print(f"""User query: {state["query"]} - RAG response: {state["retrieved_doc"]}""") 
    history = settings.SHORT_TERM_MEMORY.get_history(state["session_id"])
    history.add_user_message(state["query"])
    history.add_ai_message(state["retrieved_doc"])
    settings.STEP = 1 # Reset the STEP 
    return state    

In [None]:
#Action Node
def retrieve(state: AgentState) -> AgentState:
    """ 
    Retrieve documents by query.
    Then grade the relevance.
    """

    print(f"===Step {settings.STEP} ===\n")
    print("I am seaching documents.\n")  
    documents = rag.retrieve(state["query"], top_k=3)
    final_documents = [d.metadata["page_content"] for d in documents if d.metadata["rerank_score"] > 0.5]
    state["retrieved_doc"] = ". ".join(final_documents)
    if len(final_documents) == 0:        
        state["grade"].docrelevant = {"score": "no"}
    else:
        state["grade"].docrelevant = {"score": "yes"}

    settings.STEP += 1

    return state
    

In [None]:
#Decision Node
def decide_relevant_docs(state: AgentState) -> str:
    """ 
    If it retrieved relevant documents from RAG, go to "return_with_docs" node
    If it didn't find anything from RAG and Wiki tool has not been used, then go to "wiki_search" tool node.
    If it didn't find anything from RAG and Wiki, the return sorry.
    """
    if state["grade"].docrelevant["score"] == "yes":
        print(f"I found some documents you may need.\n")
        return "return_with_docs"
    elif not state["wiki_used"]:
        print("I am sorry I didn't get the relevant document from RAG. I am going to search on wikipedia.\n")          
        return "wiki_search"
    else:
        print("I am sorry I didn't get the relevant document from RAG and wikipedia.\n")         
        return "return_sorry"

''

In [None]:
#Action Node
def wiki_search(state: AgentState) -> AgentState:
    """ 
    Search documents by Wikipedia seach tool.
    Then grade the relevance.
    """

    print(f"===Step {settings.STEP} ===\n")
    print("I am seaching documents from Wikipedia.\n")  
    documents = wiki.invoke({"query": state["query"]})
    json_list = wiki_to_json(documents)
    
    # Rank the wiki docs with crossEncoder
    pairs = [[state["query"], s["Summary"]] for s in json_list]
    scores = cross_encoder.predict(pairs, batch_size=32)
    for j_l, score in zip(json_list, scores):
        j_l["score"] = float(score)

    final_documents = [d["Summary"] for d in json_list if d["score"] > 0.5]
    state["retrieved_doc"] = ". ".join(final_documents)
    state["wiki_used"] = True # For one good query, only use wiki search once. Avoid infinity loop.
    if len(final_documents) == 0:        
        state["grade"].docrelevant = {"score": "no"}
    else:        
        state["grade"].docrelevant = {"score": "yes"}

    settings.STEP += 1

    return state

In [None]:
# Define graph
agentic_rag_graph = StateGraph(AgentState)
# Add nodes
agentic_rag_graph.add_node("grade_selfcontained_node", grade_selfcontained_query)

#agentic_rag_graph.add_node("decide_selfcontained_node", decide_selfcontained_query)

agentic_rag_graph.add_node("grade_history_related_node", grade_history_related_query)

#agentic_rag_graph.add_node("decide_history_related_node", decide_history_related_query)

agentic_rag_graph.add_node("rewrite_query_node", rewrite_query)

agentic_rag_graph.add_node("grade_clinical_node", grade_clinical_query)

#agentic_rag_graph.add_node("decide_clinical_node", decide_clinical_query)

agentic_rag_graph.add_node("retrieve_node", retrieve)

agentic_rag_graph.add_node("decide_relevant_router", lambda state:state) # Transparent

agentic_rag_graph.add_node("return_sorry_node", return_without_docs)

agentic_rag_graph.add_node("return_docs_node", return_with_docs)

agentic_rag_graph.add_node("save_node", save_to_memory)

agentic_rag_graph.add_node("wiki_search_node", wiki_search)

# Add Edges

agentic_rag_graph.add_edge(START, "grade_selfcontained_node")

agentic_rag_graph.add_conditional_edges(
    source="grade_selfcontained_node",
    path=decide_selfcontained_query,
    path_map={
        "grade_clinical": "grade_clinical_node",
        "grade_related_history": "grade_history_related_node"
    }
)

agentic_rag_graph.add_conditional_edges(
    source="grade_history_related_node",
    path=decide_history_related_query,
    path_map={
        "rewrite": "rewrite_query_node",
        "return_sorry": "return_sorry_node"
    }
)

agentic_rag_graph.add_conditional_edges(
    source="grade_clinical_node",
    path=decide_clinical_query,
    path_map={
        "retrieve": "retrieve_node",
        "return_sorry": "return_sorry_node"
    }
)

agentic_rag_graph.add_edge("rewrite_query_node", "grade_selfcontained_node")

agentic_rag_graph.add_edge("retrieve_node", "decide_relevant_router")

agentic_rag_graph.add_conditional_edges(
    source="decide_relevant_router",
    path=decide_relevant_docs,
    path_map={
        "return_sorry": "return_sorry_node",
        "return_with_docs": "return_docs_node",
        "wiki_search": "wiki_search_node"
    }
)

agentic_rag_graph.add_edge("wiki_search_node", "decide_relevant_router")

agentic_rag_graph.add_edge("return_sorry_node", "save_node")

agentic_rag_graph.add_edge("return_docs_node", "save_node")

agentic_rag_graph.add_edge("save_node", END)

app = agentic_rag_graph.compile()

In [None]:
questions = [    
    "Is there anything I can assist you with?",    
    "Can I help you in any way, next?",
    "Do you have any questions about this?",  
    "Are you looking for any particular information?",
    "Do you want me to go over anything again?",
    "What more information do you want?"    
]

user_input = input("I am a Medicine Agentic RAG. I can help you get medical and clinical documents. Just tell me what you need?")

while user_input.strip().lower() not in ["end", "exit"]:
    query = AgentState(query=user_input, session_id=1,wiki_used=False,rewrite_counter=0)
    result = app.invoke(query)
    print(f"result:{result}")
    user_input = input(random.choice(questions))