<a href="https://colab.research.google.com/github/advik-7/Agentic-RAG/blob/main/Medical_Text_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain_community
!pip install pypdf

Collecting langchain_community
  Downloading langchain_community-0.3.15-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.15 (from langchain_community)
  Downloading langchain-0.3.15-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.31 (from langchain_community)
  Downloading langchain_core-0.3.31-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.7.1-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.25.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

In [3]:
!pip install --upgrade google-generativeai -qq

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/175.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.4/175.4 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:

!pip install faiss-cpu==1.7.4

Collecting faiss-cpu==1.7.4
  Downloading faiss_cpu-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Downloading faiss_cpu-1.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.7.4


In [10]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from sentence_transformers import SentenceTransformer
from typing import Optional, List, Mapping, Any
import google.generativeai as genai
from langchain.llms.base import LLM


class CustomGemini:
    """Custom class to interact with Google Gemini."""

    def __init__(self, temperature: float, max_tokens: int, model: str, google_api_key: str):
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.model = model
        genai.configure(api_key=google_api_key)
        self.generation_config = {
            "temperature": temperature,
            "max_output_tokens": max_tokens,
        }
        self.model_instance = genai.GenerativeModel(
            model_name=model,
            generation_config=self.generation_config,
        )

    def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        response = self.model_instance.generate_content(prompt)
        return response.text  # Access text directly


class CustomLLMWrapper(LLM):
    """Custom wrapper for the Gemini LLM."""

    custom_llm: CustomGemini

    @property
    def _llm_type(self) -> str:
        return "custom_gemini"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        return self.custom_llm(prompt, stop=stop)

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {
            "temperature": self.custom_llm.temperature,
            "max_tokens": self.custom_llm.max_tokens,
            "model": self.custom_llm.model,
        }


def process_query_with_retrieval(file_path: str, query: str, chat_history: Optional[List[str]] = None) -> str:
    """Processes a medical query with RAG and ensures retrieved documents are passed to the model."""

    # Load and preprocess documents
    loader = PyPDFLoader(file_path)
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    # Generate embeddings using SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    embeddings = model.encode([t.page_content for t in texts])

    # Custom embeddings class
    class CustomEmbeddings:
        def __init__(self, model):
            self.model = model

        def embed_documents(self, texts):
            return self.model.encode(texts)

        def embed_query(self, text):
            return self.model.encode(text)

        def __call__(self, text):
            return self.model.encode(text)

    custom_embeddings = CustomEmbeddings(model)

    # Create a FAISS database for retrieval
    db = FAISS.from_embeddings(
        text_embeddings=[(t.page_content, embedding) for t, embedding in zip(texts, embeddings)],
        embedding=custom_embeddings
    )

    # Configure Gemini LLM
    custom_gemini = CustomGemini(
        temperature=0.7,
        max_tokens=512,
        model="gemini-1.5-flash",
        google_api_key=os.environ["GEMINI_API_KEY"]
    )

    wrapped_llm = CustomLLMWrapper(custom_llm=custom_gemini)

    # Retrieve documents for the query
    retriever = db.as_retriever(search_kwargs={"k": 3})
    retrieved_docs = retriever.get_relevant_documents(query)

    # Combine retrieved content into context
    retrieved_content = "\n\n".join([doc.page_content for doc in retrieved_docs])

    # Integrate chat history into the enhanced query
    history_content = "\n".join(chat_history) if chat_history else ""
    enhanced_query = f"""
    You are a medical diagnosis expert. When responding to queries:
    1. Provide a diagnosis or possible causes based on the details provided.
    2. If the information provided is insufficient, clearly state what additional features or details are required. Use the format: 'Question: Can you provide more details about [specific feature]?' for requesting further input.

    Here is the chat history for context:
    {history_content}

    Here are some relevant documents for context:
    {retrieved_content}

    Query: {query}
    """

    # Generate a response
    response = wrapped_llm(enhanced_query)

    return response


# file_path = "path_to_medical_documents.pdf"
# query = "I have been feeling dizzy for a week."
# chat_history = ["Patient mentioned a headache two days ago.", "Patient also reported nausea yesterday."]
# print(process_query_with_retrieval(file_path, query, chat_history))


In [12]:
file = "/content/L-G-0000597158-0002362898.pdf"
query = "I have been feeling dizzy for a week."
chat_history = ["Patient mentioned a headache two days ago.", "Patient also reported nausea yesterday."]
response = print(process_query_with_retrieval(file, query, chat_history))



The provided information mentions a headache two days ago, nausea yesterday, and dizziness for a week.  This combination of symptoms could have several causes, including:

* **Migraine:**  Migraines can present with headaches, nausea, and dizziness.  The duration of dizziness aligns with the possibility of a prolonged migraine aura.
* **Vestibular disorder:** Dizziness lasting a week points towards a possible inner ear problem affecting balance.  This could be labyrinthitis, vestibular neuritis, or other vestibular disorders.
* **Dehydration:**  Severe dehydration can cause headaches, nausea, and dizziness.
* **Other neurological causes:** While less likely given the information, other neurological conditions could be considered.

**Question: Can you provide more details about the characteristics of your headache (location, severity, type of pain), the nature of your nausea (vomiting, frequency), and the type of dizziness (spinning sensation, lightheadedness, unsteadiness)?**  Also, an