<a href="https://colab.research.google.com/github/Pedrohmlara/community-contributions/blob/main/notebooks/MedRAG_MES_RAG_into_Healthcare_with_VertexAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Entity-Centric RAG for Healthcare: Implementing MES-RAG & MedRAG Principles with Vertex AI and HPO to Address Entity Confusion

This colab addresses the "Confusion Among Similar Entities" (CASE) problem in RAG systems, where LLMs mix information from related entities like patients with similar symptoms. We implement entity-centric retrieval principles from MES-RAG and MedRAG frameworks using Vertex AI Gemini and the Human Phenotype Ontology (HPO) to demonstrate how structured, entity-aware approaches improve diagnostic accuracy by maintaining clear boundaries between medical entities.

#### References
- [MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG](https://https://arxiv.org/abs/2503.13563)
- [MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot
](https://arxiv.org/abs/2502.04413)
- [Human Phenotype Ontology](https://hpo.jax.org/)

#### Configuration
Configure in the Secrets tab here in Colab, the following variables from your GCP Account:
- **PROJECT_ID**
- **LOCATION** (ex: us-central1)

**Important:**
- Enable the Vertex AI in the APIs & Services section inside [Google Cloud Platform](https://console.cloud.google.com/).
- Enable billing into your project inside [GCP](https://console.cloud.google.com/) as well.

In [119]:
from google.colab import auth, userdata
from vertexai.generative_models import GenerativeModel, Part, FinishReason, GenerationConfig
import vertexai
import os
import sys


auth.authenticate_user()

PROJECT_ID = userdata.get('PROJECT_ID')
LOCATION = userdata.get('LOCATION')

vertexai.init(project=PROJECT_ID, location=LOCATION)
!pip install google-cloud-aiplatform chromadb tiktoken -q

In [120]:
import chromadb
from chromadb.utils import embedding_functions
from vertexai.language_models import TextEmbeddingModel, TextGenerationModel
import json
import time
import requests

#### Sample Pacient Data

We will load the patient data from the provided [sample](https://raw.githubusercontent.com/Pedrohmlara/community-contributions/refs/heads/main/datasets/pacients_dataset.json) on GitHub.

This patient data is structured as list of individual patient record. Each record contains an `patient_id`, `record_text` (Description of the patient's medical history and symptoms) and `potential_diagnosis_notes` (Notes about a possible diagnosis based on the patient's information).


In [121]:
patients_data_url = "https://raw.githubusercontent.com/Pedrohmlara/community-contributions/refs/heads/main/dataset/pacients_dataset.json"

response = requests.get(patients_data_url)
response.raise_for_status()
patients_data = response.json()
print(patients_data)

{'patients': [{'patient_id': 'P001', 'record_text': 'Patient, 74 years old, male, presents with right lower back pain radiating to the right lower limb, with numbness in both feet. Reports that the pain worsens when standing or walking for long distances, but feels significant relief when sitting or leaning forward. Denies fever or weight loss. History of controlled hypertension.', 'potential_diagnosis_notes': 'Suspected lumbar spinal stenosis due to postural relief.'}, {'patient_id': 'P002', 'record_text': 'Patient, 45 years old, female, complains of acute lower back pain that started after lifting a heavy object. The pain radiates to the buttock and posterior aspect of the left thigh, down to the foot. States that the pain worsens when sitting for prolonged periods and when coughing or sneezing. Slightly improves with walking. Denies significant recent trauma.', 'potential_diagnosis_notes': 'Suggestive of sciatica, possibly due to a herniated disc, due to worsening симптоms when sitt

#### Loading the Knowledge Graph (KG) from the HPO JSON file

The KG will provide structured information from the Human Phenotype Ontology to assist Gemini in differential reasoning.

> *Disclaimer: This is a modified and simplified version of the original HPO data. The data has been processed, categorized and restructured for demonstration purposes. For official and complete data, see: https://hpo.jax.org/*

In [122]:
diagnostic_kg_url = "https://raw.githubusercontent.com/Pedrohmlara/community-contributions/refs/heads/main/dataset/human_phenotype_ontology_%5BKG-reduced%5D.json"

response = requests.get(diagnostic_kg_url)
response.raise_for_status()
diagnostic_kg = response.json()
print(diagnostic_kg)

{'Oral synechia': {'common_symptoms': ['Oral fibrous bands'], 'differentiating_symptoms': ['Synechiae of the mouth'], 'description': 'Fibrous band between the mucosal surfaces of the upper and lower alveolar ridges. These bands must be distinguished from synechiaee between the tongue and palate (glossopalatal ankylosis) and from syn...', 'hpo_id': '0010285', 'related_terms': ['Oral fibrous bands', 'Synechiae of the mouth'], 'category': 'general'}, 'Patchy sclerosis of thumb phalanx': {'common_symptoms': ['Patchy sclerosis of the phalanges of the thumb'], 'differentiating_symptoms': ['Uneven increase in bone density in thumb bone'], 'description': 'An uneven increase in bone density of one or more of the phalanges of the thumb.', 'hpo_id': '0009655', 'related_terms': ['Patchy sclerosis of the phalanges of the thumb', 'Uneven increase in bone density in thumb bone'], 'category': 'musculoskeletal'}, 'Reduced leukocyte alkaline phosphatase': {'common_symptoms': ['Low leukocyte alkaline pho

#### Defining our Embedding Model and Function



In [123]:
VERTEX_EMBEDDING_MODEL_NAME = "text-embedding-005"
embedding_model_vertex = TextEmbeddingModel.from_pretrained(VERTEX_EMBEDDING_MODEL_NAME)

class VertexAIEmbeddingFunction(embedding_functions.EmbeddingFunction):
    def __init__(self, model, batch_size=5, sleep_time=0.2):
        if model is None:
            raise ValueError("Embedding model was not loaded. Please check the configuration.")
        self.model = model
        self.batch_size = batch_size
        self.sleep_time = sleep_time

    def __call__(self, input_texts: chromadb.Documents) -> chromadb.Embeddings:
        embeddings_list = []
        for i in range(0, len(input_texts), self.batch_size):
            batch = input_texts[i:i + self.batch_size]
            try:
                embeddings_response = self.model.get_embeddings(batch)
                embeddings_list.extend([emb.values for emb in embeddings_response])
                if self.sleep_time > 0 and i + self.batch_size < len(input_texts):
                    time.sleep(self.sleep_time)
            except Exception as e:
                print(f"Error generating embedding for batch: {batch}. Error: {e}")
                embeddings_list.extend([[0.0] * 768] * len(batch))
        return embeddings_list

vertex_chroma_embedding_function = VertexAIEmbeddingFunction(embedding_model_vertex)

#### Defining our Metadata Extraction Model and function

Instead of manually defining all metadatas, we can leverage Gemini to extract relevant information from the patient records. This aligns with creating a more dynamic and intelligent "Entity-centric Data Construction (EDC)" module as envisioned in MES-RAG.

In [124]:
DEFAUTL_MODEL_NAME = "gemini-2.0-flash-001"
llm_extraction_model = GenerativeModel(DEFAUTL_MODEL_NAME)

def extract_metadata_with_llm(patient_id:str, record_text: str):
    prompt = f"""
      Analyze the following patient record and extract the specified information in JSON format.
      The output MUST be only the JSON object, without any surrounding text or markdown.

      Patient Record:
      ---
      {record_text}
      ---

      Extract:
      1.  "main_symptoms_list": A list of up to 3-4 key symptoms described.
      2.  "potential_conditions_list": A list of up to 2-3 potential conditions or diagnoses explicitly mentioned or strongly implied.
      3.  "record_date_iso": If a date is mentioned for the record/visit, format it as YYYY-MM-DD. If not found, use "N/A".
      4.  "age_years": The patient's age in years, if mentioned (as an integer). If not found, use null.
      5.  "gender": The patient's gender ("male", "female", "other", or "N/A" if not mentioned).

      JSON Output:
      """

    generation_config_extraction = GenerationConfig(
        temperature=0.1,
        max_output_tokens=256,
    )

    response = llm_extraction_model.generate_content(
        prompt,
        generation_config=generation_config_extraction,
    )

    json_str = response.text.strip().replace("```json", "").replace("```", "").strip()
    extracted = json.loads(json_str)

    default_values = {
        "main_symptoms_list": [],
        "potential_conditions_list": [],
        "record_date_iso": "N/A",
        "age_years": None,
        "gender": "N/A"
    }

    for key, default in default_values.items():
            extracted.setdefault(key, default)

    return extracted


if patients_data:
    sample_patient_for_meta_extraction = patients_data['patients'][0]
    print(f"\n--- Testing Metadata Extraction for Patient {sample_patient_for_meta_extraction.get('patient_id')} ---")
    extracted_meta_sample = extract_metadata_with_llm(
        sample_patient_for_meta_extraction.get('patient_id'),
        sample_patient_for_meta_extraction.get('record_text')
    )
    print(json.dumps(extracted_meta_sample, indent=2))


--- Testing Metadata Extraction for Patient P001 ---
{
  "main_symptoms_list": [
    "right lower back pain",
    "pain radiating to right lower limb",
    "numbness in both feet"
  ],
  "potential_conditions_list": [
    "spinal stenosis",
    "sciatica"
  ],
  "record_date_iso": "N/A",
  "age_years": 74,
  "gender": "male"
}


#### Populating ChromaDB (Our VectorDB) with LLM-Extracted Metadata (Entity-Storage)

Each patient record is stored along with its `patient_id` and the metadata dynamically extracted by Gemini.

In [125]:
client = chromadb.Client()
collection_name = "patient_mes_rag_storage_vertex"

try:
    client.delete_collection(name=collection_name)
except:
    pass

collection = client.create_collection(
    name=collection_name,
    embedding_function=vertex_chroma_embedding_function
)

all_documents = []
all_metadatas = []
all_ids = []

print(f"\n--- Populating VectorDB with LLM-Extracted Metadata ---")
for i, patient_record in enumerate(patients_data['patients']):
    patient_id = patient_record.get("patient_id")
    record_text = patient_record.get("record_text")

    print(f"Processing and extracting metadata for patient: {patient_id}...")
    llm_extracted_meta = extract_metadata_with_llm(patient_id, record_text)

    # Combine predefined and LLM-extracted metadata
    combined_metadata = {
        "patient_id": patient_id,
        "source_description": f"Patient record for {patient_id}",
        "record_date_iso": llm_extracted_meta.get("record_date_iso", "N/A"),
        "age_years": llm_extracted_meta.get("age_years"),
        "gender": llm_extracted_meta.get("gender", "N/A"),
        "main_symptoms_str": ", ".join(llm_extracted_meta.get("main_symptoms_list", [])),
        "potential_conditions_str": ", ".join(llm_extracted_meta.get("potential_conditions_list", []))
    }

    all_documents.append(record_text)
    all_metadatas.append(combined_metadata)
    all_ids.append(f"record_llm_meta_{patient_id}_{i}")
    time.sleep(0.5)

if all_documents:
    collection.add(
        documents=all_documents,
        metadatas=all_metadatas,
        ids=all_ids
    )
    print(f"\nAdded {len(all_documents)} documents with LLM-extracted metadata to collection '{collection_name}'.")
else:
    print("No documents were prepared to be added to the collection.")

print(f"Total documents in collection: {collection.count()}")


--- Populating VectorDB with LLM-Extracted Metadata ---
Processing and extracting metadata for patient: P001...
Processing and extracting metadata for patient: P002...
Processing and extracting metadata for patient: P003...

Added 3 documents with LLM-extracted metadata to collection 'patient_mes_rag_storage_vertex'.
Total documents in collection: 3


#### Demonstrating the Problem: Naive RAG (Leading to Entity Confusion) or CASE (Confusion Among Similar Entities)

Here, we simulate a scenario where the RAG system retrieves context naively, potentially mixing information from different patients, leading to confusion.


In [126]:
rag_llm_model = GenerativeModel(DEFAUTL_MODEL_NAME)

def get_naive_rag_context(query_text: str, k_results: int = 2):
    results = collection.query(
        query_texts=[query_text],
        n_results=min(k_results, collection.count())
    )

    if results['documents']:
        context_str = "\n\n".join(
            [f"Patient ID (from metadata): {meta.get('patient_id', 'Unknown')}\nContent: {doc}"
             for doc, meta in zip(results['documents'][0], results['metadatas'][0])]
        )
        return context_str
    return "No relevant context found with naive retrieval."

def get_llm_rag_response(prompt_text: str):
    if not rag_llm_model: return "RAG LLM not initialized."
    generation_config_rag = GenerationConfig(temperature=0.2, max_output_tokens=512)
    response = rag_llm_model.generate_content(
        prompt_text,
        generation_config=generation_config_rag
    )
    if response.candidates and response.candidates[0].finish_reason == FinishReason.STOP:
        return response.text
    return f"RAG Response Generation stopped or failed. Reason: {response.candidates[0].finish_reason if response.candidates else 'Unknown'}."


confusing_query_symptoms = "lower back pain radiating to the leg"
print(f"\n--- DEMONSTRATING PROBLEM ---")
print(f"Symptoms: {confusing_query_symptoms}")

retrieved_naive_context = get_naive_rag_context(confusing_query_symptoms, k_results=2)
print("\n--- RETRIEVED CONTEXT (Potential CASE) ---")
print(retrieved_naive_context)

prompt_for_naive_rag = f"""
  You are a medical assistant.
  Analyze the following retrieved patient records. These records MIGHT BE FROM DIFFERENT PATIENTS.

  Retrieved Records:
  {retrieved_naive_context}

  Question:
  A patient presents with "lower back pain radiating to the leg". Based strictly on the (potentially mixed) records above, what is the most likely diagnosis and key differentiating factor?
  If the records show conflicting information for such a patient, highlight it.

  Strict Analysis based *only* on provided records:
  """

print("\n--- MODEL RESPONSE (EXPECTING CONFUSION/INCORRECT SYNTHESIS) ---")
response_naive_rag = get_llm_rag_response(prompt_for_naive_rag)

print(response_naive_rag)


--- DEMONSTRATING PROBLEM ---
Symptoms: lower back pain radiating to the leg

--- RETRIEVED CONTEXT (Potential CASE) ---
Patient ID (from metadata): P002
Content: Patient, 45 years old, female, complains of acute lower back pain that started after lifting a heavy object. The pain radiates to the buttock and posterior aspect of the left thigh, down to the foot. States that the pain worsens when sitting for prolonged periods and when coughing or sneezing. Slightly improves with walking. Denies significant recent trauma.

Patient ID (from metadata): P001
Content: Patient, 74 years old, male, presents with right lower back pain radiating to the right lower limb, with numbness in both feet. Reports that the pain worsens when standing or walking for long distances, but feels significant relief when sitting or leaning forward. Denies fever or weight loss. History of controlled hypertension.

--- MODEL RESPONSE (EXPECTING CONFUSION/INCORRECT SYNTHESIS) ---
Based solely on the provided patient

#### Advanced Query Parsing with LLM (MES-RAG Principle)

To make the RAG system more robust and user-friendly, we need to automatically identify the target entity (e.g., patient) from the user's query. The MES-RAG framework proposes a Query Parser (QP) module for this. Here, we'll implement a simple version using Gemini.

In [127]:
query_parser_llm = GenerativeModel(DEFAUTL_MODEL_NAME)

def query_parser(user_query: str):
    """
    Parses the user query to extract target patient ID (if any),
    descriptive attributes of an entity mentioned in the query,
    core symptoms/question, and potential HPO terms.
    """

    prompt = f"""
      You are an advanced query understanding system for a medical RAG.
      Your task is to analyze the user's query and extract key information into a structured JSON format.
      The output MUST be only the JSON object, without any surrounding text or markdown.

      Available Patient ID patterns in the system generally look like: "P001", "P002", etc.

      User Query: "{user_query}"

      Extraction Tasks:
      1.  "target_patient_id":
          - If a Patient ID matching the known patterns (e.g., "P001") is explicitly mentioned, extract that ID.
          - If multiple such IDs are mentioned (e.g., for comparison), extract them as a list.
          - If no explicit Patient ID is found, return null.

      2.  "entity_descriptive_attributes":
          - If the query describes a patient without giving an ID (or in addition to an ID), extract descriptive attributes.
          - Attributes to look for: "age_approx" (e.g., "70", "60") ALWAYS int val, "gender" ("male", "female", "other"),
            "key_symptoms_from_query" (a list of 2-4 main symptoms or complaints mentioned *in the user's query*).
          - If no such descriptive attributes are found for an entity, return an empty dictionary {{}}.

      3.  "core_search_phrase":
          - Extract the most relevant phrase or question from the user's query that should be used for a semantic search
            against medical records or a knowledge base. This should capture the essence of the user's information need.

      4.  "mentioned_hpo_or_condition_terms":
          - List any medical conditions, HPO-like terms, or specific diseases mentioned in the query.

      JSON Output Examples:

      Query: "What is the diagnosis for patient P001 considering their symptoms of lower back pain and issues with standing?"
      Output:
      {{
        "target_patient_id": "P001",
        "entity_descriptive_attributes": {{
            "key_symptoms_from_query": ["lower back pain", "issues with standing"]
        }},
        "core_search_phrase": "diagnosis for lower back pain and issues with standing",
        "mentioned_hpo_or_condition_terms": ["lower back pain"]
      }}

      Query: "A 68 year old male patient reports progressively blurred vision, especially at night, and sees halos around lights. What could it be?"
      Output:
      {{
        "target_patient_id": null,
        "entity_descriptive_attributes": {{
          "age_approx": "68 year old",
          "gender": "male",
          "key_symptoms_from_query": ["progressively blurred vision", "difficulty seeing at night", "halos around lights"]
        }},
        "core_search_phrase": "causes for progressively blurred vision, difficulty seeing at night, and halos around lights in a 68 year old male",
        "mentioned_hpo_or_condition_terms": ["blurred vision", "halos around lights"]
      }}

      Query: "Tell me about Areflexia and also Hypoprolinemia."
      Output:
      {{
        "target_patient_id": null,
        "entity_descriptive_attributes": {{}},
        "core_search_phrase": "information about Areflexia and Hypoprolinemia",
        "mentioned_hpo_or_condition_terms": ["Areflexia", "Hypoprolinemia"]
      }}

      JSON Output:
    """
    generation_config_parser = GenerationConfig(temperature=0.0, max_output_tokens=512)

    response = query_parser_llm.generate_content(
        prompt,
        generation_config=generation_config_parser
    )

    json_str = response.text.strip().replace("```json", "").replace("```", "").strip()
    parsed_info = json.loads(json_str)

    parsed_info.setdefault("target_patient_id", None)
    parsed_info.setdefault("entity_descriptive_attributes", {})
    parsed_info["entity_descriptive_attributes"].setdefault("key_symptoms_from_query", [])
    parsed_info.setdefault("core_search_phrase", user_query)
    parsed_info.setdefault("mentioned_hpo_or_condition_terms", [])

    return parsed_info

#### Testing the Query Parser

Let's ensure that our function is working accurately.

In [128]:
queries_for_parser_test = [
    "What is the diagnosis for patient P001 considering their symptoms of lower back pain and issues with standing?",
    "Tell me about the HPO term Areflexia.",
    "My patient, a 68 year old man, has progressively blurred vision and sees halos around lights. What could be the cause?",
    "Compare P002 and P003 regarding their main complaints.",
    "Any information on headaches for patient P007?"
]

for i, t_query in enumerate(queries_for_parser_test):
    parsed_output = query_parser(t_query)
    print(f"\n--- Query Parser Test {i+1} ---")
    print(f"User Query: \"{t_query}\"")
    print(f"Parsed Output: {json.dumps(parsed_output, indent=2)}")


--- Query Parser Test 1 ---
User Query: "What is the diagnosis for patient P001 considering their symptoms of lower back pain and issues with standing?"
Parsed Output: {
  "target_patient_id": "P001",
  "entity_descriptive_attributes": {
    "key_symptoms_from_query": [
      "lower back pain",
      "issues with standing"
    ]
  },
  "core_search_phrase": "diagnosis for patient with lower back pain and issues with standing",
  "mentioned_hpo_or_condition_terms": [
    "lower back pain"
  ]
}

--- Query Parser Test 2 ---
User Query: "Tell me about the HPO term Areflexia."
Parsed Output: {
  "target_patient_id": null,
  "entity_descriptive_attributes": {
    "key_symptoms_from_query": []
  },
  "core_search_phrase": "information about Areflexia",
  "mentioned_hpo_or_condition_terms": [
    "Areflexia"
  ]
}

--- Query Parser Test 3 ---
User Query: "My patient, a 68 year old man, has progressively blurred vision and sees halos around lights. What could be the cause?"
Parsed Output: {
 

#### Implementing the Solution: Entity-Centric RAG (MES-RAG/MedRAG Principles)

Now we implement the "Entity-Storage" approach by isolating patient-specific data through semantic filtering and metadata enrichment, leveraging MES-RAG methodologies. The Human Phenotype Ontology (HPO) knowledge graph is then integrated to ensure focused, accurate, and clinically-relevant context for Gemini's analysis.

In [129]:
rag_llm_model = GenerativeModel(DEFAUTL_MODEL_NAME)

def get_entity_centric_rag_db_context(target_patient_id: str, semantic_search_text: str, k_results: int = 1):
    results = collection.query(
        query_texts=[semantic_search_text],
        n_results=k_results,
        where={"patient_id": target_patient_id}
    )

    if results['documents'] and results['documents'][0]:
        metadata_for_context = results['metadatas'][0][0] if results['metadatas'] and results['metadatas'][0] else {}
        return metadata_for_context;

    return ""

def get_attribute_based_rag_db_context(descriptive_attrs: dict, semantic_search_text: str, k_results: int = 1):
    chroma_filters_list_attr = []

    if descriptive_attrs:
        gender = descriptive_attrs.get("gender")
        if gender and gender != "N/A":
            chroma_filters_list_attr.append({"gender": {"$eq": gender}})

        age_from_parser = descriptive_attrs.get("age_approx")

        if isinstance(age_from_parser, int):
            chroma_filters_list_attr.append({"age_years": {"$gte": age_from_parser - 7}})
            chroma_filters_list_attr.append({"age_years": {"$lte": age_from_parser + 7}})

        elif isinstance(age_from_parser, str):
            try:
                import re
                age_numbers = re.findall(r'\d+', age_from_parser)

                if age_numbers:
                    age_val = int(age_numbers[0])
                    chroma_filters_list_attr.append({"age_years": {"$gte": age_val - 7}})
                    chroma_filters_list_attr.append({"age_years": {"$lte": age_val + 7}})

            except:
              pass

    final_where_clause = {}
    if len(chroma_filters_list_attr) > 1:
        final_where_clause = {"$and": chroma_filters_list_attr}
    elif len(chroma_filters_list_attr) == 1:
        final_where_clause = chroma_filters_list_attr[0]

    if not final_where_clause:
        return ""

    try:
        results = collection.query(
            query_texts=[semantic_search_text],
            n_results=k_results,
            where=final_where_clause
        )
        if results['documents'] and results['documents'][0]:
            matched_pid = results['metadatas'][0][0].get('patient_id', 'Unknown')

            return (f"--- Retrieved Record for Patient {matched_pid} (MATCHED BY DESCRIPTION VIA METADATA FILTERS) ---\n"
                    f"Content: {results['documents'][0][0]}\n"
                    f"Matched Metadata from DB: {results['metadatas'][0][0]}")

        return ""
    except:
        return ""


def get_hpo_kg_info_for_rag(hpo_term_keys: list):
    kg_details = []
    for term_key in hpo_term_keys:
        if term_key in diagnostic_kg:
            data = diagnostic_kg[term_key]
            kg_details.append(
                f"HPO Term: {term_key} (ID: {data.get('hpo_id')})\n"
                f"  Description: {data.get('description')}\n"
                f"  Common Symptoms: {'; '.join(data.get('common_symptoms', []))}\n"
                f"  Differentiating Symptoms: {'; '.join(data.get('differentiating_symptoms', []))}"
            )
    if not kg_details: return ""

    return "".join(kg_details)


def process_entity_centric_rag_query(user_query_text):
    try:
        parsed_query_info = query_parser(user_query_text)
        if parsed_query_info.get("error"):
            return {
                "error": f"Error on query parsing: {parsed_query_info['error']}",
                "analysis": "",
                "thinking_process": ""
            }
    except Exception as e:
        return {
            "error": f"Error on query parsing: {str(e)}",
            "analysis": "",
            "thinking_process": "Erro no parsing da query"
        }

    target_pids_parsed = parsed_query_info.get("target_patient_id")
    descriptive_attrs_parsed = parsed_query_info.get("entity_descriptive_attributes", {})
    core_search_text_parsed = parsed_query_info.get("core_search_phrase", user_query_text)
    mentioned_hpo_terms_parsed = parsed_query_info.get("mentioned_hpo_or_condition_terms", [])

    patient_context_for_llm = "No specific patient context was identified or retrieved."
    focused_pid = None

    if target_pids_parsed:
        pid_to_search = None
        if isinstance(target_pids_parsed, list) and target_pids_parsed:
            pid_to_search = target_pids_parsed[0]
        elif isinstance(target_pids_parsed, str):
            pid_to_search = target_pids_parsed

        if pid_to_search:
            try:
                patient_context_for_llm = get_entity_centric_rag_db_context(pid_to_search, core_search_text_parsed)
                if not ("No specific context found" in patient_context_for_llm or "Error" in patient_context_for_llm):
                    focused_pid = pid_to_search
            except:
                pass

    kg_context_for_llm = get_hpo_kg_info_for_rag(mentioned_hpo_terms_parsed)

    final_rag_prompt_text = f"""
      User's Original Query:
      "{user_query_text}"

      Retrieved Patient Context:
      {patient_context_for_llm}

      Retrieved HPO KG Information:
      {kg_context_for_llm}

      Instruction:
      You are a specialized medical assistant. Your task is to analyze the provided information and respond to the "User's Original Query".
      Your response MUST be a single, valid JSON object with exactly two keys: "analysis" and "thinking_process".
      - "analysis": Provide a concise answer to the user's query based STRICTLY on the "Retrieved Patient Context" (if a specific patient is in focus and context was found) AND/OR the "Retrieved HPO KG Information".
          - If a specific patient's context is available and relevant, focus your analysis ONLY on that patient.
          - If no specific patient context is relevant or successfully retrieved, but HPO information is available, base your analysis on the HPO information.
          - If information is insufficient for a direct answer (e.g., patient P007 not found), the "analysis" should state this clearly (e.g., "Cannot provide information for patient P007 as no record was found.") or be an empty string if no conclusion can be drawn.
          - For the query "lower back pain radiating to the leg" (if no specific patient context is focused), the "analysis" should generally discuss this symptom based on HPO KG info.
      - "thinking_process": Briefly state what information was primarily used to generate the analysis.
          - Example: "Analysis based on the provided record for Patient P001 and HPO information on Cataract."
          - Example: "Analysis based on HPO information for Areflexia as no specific patient context was provided or relevant."
          - Example: "No specific patient context or relevant HPO information was found to address the query."

      JSON Output:
      """

    try:
        raw_llm_response_text = get_llm_rag_response(final_rag_prompt_text)
        try:
            cleaned_response_text = raw_llm_response_text.strip().replace("```json", "").replace("```", "").strip()
            parsed_llm_output = json.loads(cleaned_response_text)
        except:
            parsed_llm_output = {
                "analysis": "Erro ao processar resposta do LLM",
                "thinking_process": f"Erro no parse JSON: {str(e)}"
            }
    except:
        parsed_llm_output = {
            "analysis": "Erro ao obter resposta do LLM",
            "thinking_process": f"Erro na comunicação com LLM: {str(e)}"
        }

    parsed_llm_output["query"] = user_query_text
    parsed_llm_output["parsed_query_info"] = parsed_query_info

    return parsed_llm_output


def process_multiple_rag_queries(queries_list):
    results = []

    for i, query in enumerate(queries_list):
        result = process_entity_centric_rag_query(query)
        results.append(result)

    return results

#### Testing across different queries

In [130]:
queries = [
    "What is the diagnosis for patient P001?",
    "Tell me about HPO term Areflexia",
    "lower back pain radiating to the leg"
]

results = process_multiple_rag_queries(queries)

for result in results:
    #print(f"Query: {result['query']}")
    print(f"Query: {result['query']}")
    print(f"Analysis: {result['analysis']} \n")
    print(f"Query info: {json.dumps(result['parsed_query_info'], indent=2, ensure_ascii=False)} \n")
    print(f"Thinking process: {result['thinking_process']} \n")
    print("-" * 50 + "\n")

Query: What is the diagnosis for patient P001?
Analysis: Based on the patient record for P001, the potential diagnoses are spinal stenosis or herniated disc. The patient presents with right lower back pain radiating to the right lower limb, numbness in both feet, pain worsening with standing/walking, and pain relief with sitting/leaning forward, which are consistent with these conditions. 

Query info: {
  "target_patient_id": "P001",
  "entity_descriptive_attributes": {
    "key_symptoms_from_query": []
  },
  "core_search_phrase": "diagnosis for patient",
  "mentioned_hpo_or_condition_terms": []
} 

Thinking process: Analysis based on the provided record for Patient P001. 

--------------------------------------------------

Query: Tell me about HPO term Areflexia
Analysis: Areflexia (HPO:0001284) refers to the absence of neurologic reflexes, such as the knee-jerk reaction. Common symptoms associated with areflexia include absent reflexes, absent deep tendon reflexes, and absent tend

### Conclusion

We demonstrated an entity-centric Retrieval-Augmented Generation (RAG) system for healthcare, leveraging Vertex AI's Gemini for intelligent metadata extraction, query parsing, and response generation, alongside its Text Embedding Model and ChromaDB for robust entity-storage.

By implementing MES-RAG and MedRAG principles, such as dynamic metadata enrichment and Human Phenotype Ontology (HPO) knowledge graph integration, the system effectively mitigated the "Confusion Among Similar Entities" (CASE) problem, ensuring precise, patient-focused context retrieval that significantly outperformed naive RAG approaches.

This structured, entity-aware methodology establishes a strong foundation for developing more accurate and reliable AI-driven clinical decision support tools, enhancing information management and diagnostic accuracy in complex medical scenarios.