### In this section, I will build an Agentic RAG

Now I have a basic RAG which can retrieve relevant medicine documents based on the query.

But why I need an **Agentic RAG**?

### Problem description:

In real conversation, users can ask anything we can not predict ahead. 

For example:
In the third turn the user really want to ask 'How do I take Phenylephrine?'

But he types 'How do I take it?'. From the context, 'it' means 'Phenylephrine'.

If we retrieve documents by query 'How do I take it?', we can get unrelevant document.  'How do I take Phenylephrine?' makes more sense.

Other senarios:

1. In first turn, a user just greet without any question.
2. User ask a random question in the middle of conversation.
3. .........

### Analysis:

The root problem is how to determine whether a query is a clinial/medical query and whether a query is related previous conversation.

### Solution:

#### To handle all those, I will put a local LLM as a master agent to determine what to do next based on different situation.
#### So I will involve RAG, langgraph, memory, local LLM, websearch tool... working together to make the RAG to ReAct by itself.

### Implementation:

* I will involve an agent to decide what to do next based on the query and history conversation. 
* Then, the agent will execute the task and observe the result to decide again..... until get a proper result.

In [None]:
from mytools import best_dtype, best_device, login_huggingface
import os
import json
import copy
import time
import torch
import uuid
import settings
from typing import TypedDict, List
from langgraph.graph import StateGraph, START, END

from dotenv import load_dotenv
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain_core.documents import Document

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory # Short-term Memory
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import JsonOutputParser

ImportError: cannot import name 'ST' from 'langgraph.graph' (c:\Users\Montr\AI_Projects\.venv\Lib\site-packages\langgraph\graph\__init__.py)

#### As I mentioned previously:

Bio-Medical-Llama-3-8B model is a specialized large language model designed for biomedical applications. It is finetuned from the meta-llama/Meta-Llama-3-8B-Instruct model using a custom dataset containing over 500,000 diverse entries. These entries include a mix of synthetic and manually curated data, ensuring high quality and broad coverage of biomedical topics.

The model is trained to understand and generate text related to various biomedical fields, making it a valuable tool for researchers, clinicians, and other professionals in the biomedical domain.

@misc{ContactDoctor_Bio-Medical-Llama-3-8B, author = ContactDoctor, title = {ContactDoctor-Bio-Medical: A High-Performance Biomedical Language Model}, year = {2024}, howpublished = {https://huggingface.co/ContactDoctor/Bio-Medical-Llama-3-8B}, }

In [None]:
model_id = "ContactDoctor/Bio-Medical-Llama-3-8B"

In [None]:
login_huggingface() 

In [None]:
# Load a HuggingFace model. Inference it from local GPU.

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype = best_dtype(),
    device_map={"":best_device()}, 
    low_cpu_mem_usage=True     
)
print("Load tokenizer and base model done!")

In [None]:
print(model)                    # full architecture tree (long but useful)
print(model.config)             # core hyperparameters (dims, layers, heads…)

In [None]:
original_pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
)

# Wrapper normal piple with huggingfacepipeline
hug_pipeline = HuggingFacePipeline(pipeline=original_pipeline)

master_agent = ChatHuggingFace(llm=hug_pipeline) # It is the brain of the whole system

In [None]:
# Define a short-term memory

class Short_Term_Memory():
    def __init__(self) -> None: 
        """Initialize the message container and current session id """       
        self.session_store: dict[int,BaseChatMessageHistory] = {}
        self.current_session_id: int = 0

    def get_history(self, session_id: int) -> BaseChatMessageHistory:    
        """return history messages by sessionId"""    
        self.current_session_id = session_id
        if session_id not in self.session_store:
            self.session_store[session_id] = ChatMessageHistory()
        return self.session_store[session_id]
    
    def get_current_history(self) -> BaseChatMessageHistory:
        """return history messages for current session"""
        return self.get_history(self.current_session_id)
    
    def delete_history(self, session_id: int) -> bool:
        """delete history messages by sessionId"""
        if session_id in self.session_store:
            deleted = self.session_store.pop(session_id)
            if deleted:
                return True
            else:
                return False
        return True
    
    def delete_current_history(self) -> bool:
        """delete history messages for current session"""
        return self.delete_history(self.current_session_id)
    
# Convert a history chat message to a string
def history_as_text(history: BaseChatMessageHistory) -> str:
    """convert history messsages into a string"""
    return "\n".join([
        f"{m.type.upper()}: {m.content}"   # e.g. "HUMAN: …" or "AI: …"
        for m in history.messages])

In [None]:
class AgentState(TypedDict):
    """
    Represents the state of the graph.

    Attributes:
        session_id: current session id
        query: user's query or augmented query
        retrieved_doc: retrieval docment        
        generation: LLM generation        
    """
    session_id: int
    query: str
    retrieved_doc: str        
    generation: str

In [None]:
# Initialize a global short-term memory for all users
settings.SHORT_TERM_MEMORY = Short_Term_Memory()

In [None]:
# First turn: greeting
query_1 = "hi, there"
session_id_1 = 1