# ❤️‍🩹📖 VitalStory: GenAI Model Evaluation | Feature 2 - Health Log Chatbot Model

**Author:** Tyler Gustafson (Gustani)

This notebook is used to evaluate our model for Feature 2 (Health Log Chatbot Model) - testing different pipelines, models, and parameter tunings.

## 1. 📦 Setup & Installs
First we will setup the initial libraries that are required


In [None]:
# HuggingFace Login

from huggingface_hub import login
import os

# Retrieve the token from Colab secrets
hf_token = os.getenv("HUGGINGFACE_TOKEN")
login(hf_token)

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Core Installations
%%capture
!pip -q install git+https://github.com/huggingface/transformers
!pip -q install bitsandbytes accelerate  # For 4-bit/8-bit quantized models
!pip -q install sentencepiece einops     # Tokenization & tensor ops
!pip install sentence_transformers

# LangChain & Ecosystem
!pip -q install langchain
!pip -q install langchain_community
!pip -q install langchainhub
!pip -q install -U langchain-huggingface
!pip -q install -U langchain-cohere

# Vector DBs / Search
!pip -q install faiss-gpu               # GPU-accelerated similarity search
!pip -q install --upgrade --quiet chromadb bs4 qdrant-client
!pip -q install --upgrade --quiet wikipedia arxiv pymupdf xmltodict  # Document sources

# Model Tuning / Tools
!pip -q install loralib                # LoRA fine-tuning

# Evaluation Tools
!pip -q install evaluate               # HF evaluation framework
!pip -q install bert_score             # Semantic similarity
!pip -q install ragas                  # RAG evaluation metrics
!pip install -q nltk evaluate
!pip install rouge_score


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# from langchain.llms import HuggingFacePipeline
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from pydantic import BaseModel, Field
from typing import List



# Core Python Libraries
import os
import json
import re
import time
import locale
from pprint import pprint

# Data Processing & Scientific Computing
import numpy as np
import pandas as pd
import torch

# NLP Evaluation Libraries
import nltk
from bert_score import score
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import evaluate

# Download required NLTK data
nltk.download('punkt')
nltk.download('punkt_tab')  # Important! Prevents the LookupError

# Web and Document Parsing
import bs4

# Hugging Face Transformers
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    pipeline,
    BitsAndBytesConfig
)

# LangChain Core Components
from langchain import PromptTemplate, LLMChain, hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter
)

# LangChain LLM Interfaces
from langchain_huggingface import HuggingFacePipeline
from langchain.llms import HuggingFacePipeline
from langchain_cohere import ChatCohere
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
# from langchain_community.chat_models import ChatCohere  # Commented out

# LangChain Vector Stores
from langchain_community.vectorstores import (
    FAISS,
    Chroma,
    Qdrant
)

# LangChain Embeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.utils.math import cosine_similarity

# LangChain Document Loaders
from langchain_community.document_loaders import (
    WebBaseLoader,
    TextLoader,
    ArxivLoader,
    WikipediaLoader,
    OnlinePDFLoader,
    PyMuPDFLoader,
    PubMedLoader
)

# Google Colab Utilities
from google.colab import userdata

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# Set up GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print("CUDA available:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")

Using device: cuda
CUDA available: True
Device: Tesla T4


## 2. 📊 Load Gold Reference Dataset (Feature 2)

In [None]:
# Fake patient 30 logs over a span of a year (used for context to answer questions)

example_logs = [
    {
        "log_number": 1,
        "date": "2024-07-28",
        "content": "I woke up today feeling really exhausted with an overwhelming sense of tiredness that seemed out of place. I'm starting to wonder if something is beginning."
    },
    {
        "log_number": 2,
        "date": "2024-08-03",
        "content": "This morning brought a dull headache and some body aches. I’m trying to note every detail as I feel more off than usual."
    },
    {
        "log_number": 3,
        "date": "2024-08-10",
        "content": "Today, I experienced a bout of nausea as soon as I got out of bed, and a dizzy spell hit shortly after. It left me feeling uneasy throughout the day."
    },
    {
        "log_number": 4,
        "date": "2024-08-15",
        "content": "I had a restless night even after 8 hours of sleep. Along with a slight sore throat, the persistent fatigue and dizziness made the day challenging."
    },
    {
        "log_number": 5,
        "date": "2024-08-28",
        "content": "This afternoon, I felt my heart racing unexpectedly and a surge of anxiety. It’s unsettling, and I’m beginning to connect these episodes with my ongoing fatigue."
    },
    {
        "log_number": 6,
        "date": "2024-08-29",
        "content": "Nausea is becoming more noticeable each morning. I spent most of the day resting and hydrating, hoping to ease the discomfort."
    },
    {
        "log_number": 7,
        "date": "2024-09-08",
        "content": "Mid-day chores were interrupted by a sudden bout of dizziness. I had to sit down and rest, feeling the chronic fatigue in every muscle."
    },
    {
        "log_number": 8,
        "date": "2024-09-12",
        "content": "I woke up with a persistent headache and some joint pain today. I took some over-the-counter medicine, but the lingering nausea reminded me that something isn’t right."
    },
    {
        "log_number": 9,
        "date": "2024-09-17",
        "content": "This morning was a mix of emotions—mild nausea, muscle soreness, and an overall tired feeling. I’m starting to document a pattern with these symptoms."
    },
    {
        "log_number": 10,
        "date": "2024-09-19",
        "content": "Today was erratic: a brief fever in the early hours followed by dizziness and fatigue that persisted. I’m noticing my energy dipping unpredictably."
    },
    {
        "log_number": 11,
        "date": "2024-09-21",
        "content": "I felt particularly weak this morning. A dizzy spell and slight stomach upset after lunch made me wonder if my body is signaling an underlying issue."
    },
    {
        "log_number": 12,
        "date": "2024-09-27",
        "content": "Most of the day was spent quietly at home. The constant fatigue made even simple tasks feel overwhelming, and the nausea never quite left."
    },
    {
        "log_number": 13,
        "date": "2024-10-05",
        "content": "Today was slightly better—nausea was less severe and the dizziness was more intermittent. Still, the chronic fatigue remains a constant hurdle."
    },
    {
        "log_number": 14,
        "date": "2024-10-14",
        "content": "I’ve noticed my energy levels fluctuating more than ever. Moments of clarity are interrupted by sudden bouts of tiredness and dizziness."
    },
    {
        "log_number": 15,
        "date": "2024-10-15",
        "content": "Tried some home remedies today: herbal teas and light stretching. I felt a bit of improvement, yet the fatigue and nausea continue to linger."
    },
    {
        "log_number": 16,
        "date": "2024-10-17",
        "content": "It was a long day; even with plenty of rest, the overwhelming tiredness coupled with recurring nausea left me feeling anxious about my health."
    },
    {
        "log_number": 17,
        "date": "2024-11-17",
        "content": "My symptoms seem to be taking a toll on my daily routine. Multiple dizzy spells and a constant feeling of exhaustion have me worried about the underlying cause."
    },
    {
        "log_number": 18,
        "date": "2024-12-17",
        "content": "Woke up with a slight fever and throat irritation today. I can’t shake the feeling that these symptoms are interconnected with my chronic fatigue."
    },
    {
        "log_number": 19,
        "date": "2025-01-21",
        "content": "Today, anxiety hit hard along with a foggy mind. The persistent fatigue is affecting my mood and ability to concentrate at work."
    },
    {
        "log_number": 20,
        "date": "2025-01-23",
        "content": "Another dizzy episode occurred this afternoon. I felt my body strain under the constant fatigue, making even simple tasks seem monumental."
    },
    {
        "log_number": 21,
        "date": "2025-01-25",
        "content": "I experienced some sporadic joint pain and a brief period of numbness in my fingers. I documented every detail, hoping it might help identify a pattern."
    },
    {
        "log_number": 22,
        "date": "2025-01-31",
        "content": "Today was a roller coaster of symptoms: waves of nausea, intermittent dizziness, and a relentless fatigue that made me cancel a few appointments."
    },
    {
        "log_number": 23,
        "date": "2025-02-05",
        "content": "I attempted to push through a busy day at work, but the overwhelming exhaustion and repeated dizzy spells forced me to take frequent breaks."
    },
    {
        "log_number": 24,
        "date": "2025-02-08",
        "content": "There were brief moments of clarity today, but then an unexpected wave of fatigue hit me hard. The recurring nausea is becoming impossible to ignore."
    },
    {
        "log_number": 25,
        "date": "2025-02-11",
        "content": "I skipped lunch due to persistent nausea, which only made the lightheadedness worse. My appetite has noticeably diminished over the past few weeks."
    },
    {
        "log_number": 26,
        "date": "2025-02-17",
        "content": "Today was challenging; even a short walk left me dizzy and weak. The mix of symptoms—chronic fatigue, nausea, and dizziness—is hard to shake."
    },
    {
        "log_number": 27,
        "date": "2025-02-21",
        "content": "I felt a slight improvement today, though the fatigue remains. The recurring pattern of morning nausea continues to disrupt my routine."
    },
    {
        "log_number": 28,
        "date": "2025-02-27",
        "content": "I ended the day feeling unusually irritable and stressed. The cumulative effect of these symptoms is impacting my overall well-being and mood."
    },
    {
        "log_number": 29,
        "date": "2025-03-04",
        "content": "Today I experienced a few hours of relative normalcy, but the familiar cycle of fatigue and nausea soon returned, leaving me exhausted by the evening."
    },
    {
        "log_number": 30,
        "date": "2025-03-10",
        "content": "Reflecting on the past months, I’ve noticed several isolated symptoms like headaches and muscle aches. However, the persistent chronic fatigue with recurring nausea and dizziness remains my primary concern. I hope sharing these detailed logs helps shed light on my overall health journey."
    }
]


In [None]:
# Ramakrishna Ramadurgam - imagine you have a better way of pulling logs into context window but just a fyi how I did in eval

# Turn the list of logs into a readable string
full_logs_text = "\n\n".join(
    f"{log['date']}: {log['content']}" for log in example_logs
)


In [None]:
# Gold standard question set to evaluate model (note question one and two are particularly interesting questions if we use them to get the summary output)

gold_standard = {
    "gold_standard_qa": [
        {
            "question": "Can you write a short, first-person summary of the patient’s experience based on the log entries — as if the patient were explaining how they’ve been feeling in a casual, conversational tone?",
            "answer": "Over the past several months, I’ve been feeling constantly worn out, no matter how much I rest. I keep waking up feeling nauseous, and the dizziness hits me randomly throughout the day. It’s been hard to get through normal tasks, and I’ve had to cancel plans or take breaks more often than usual. There are some good days, but they never last. I’ve also had occasional headaches, joint pain, and this weird numbness in my fingers. My heart has raced a few times for no clear reason, and lately I’ve felt more anxious and foggy-headed. It feels like my body’s running on empty and I can’t seem to bounce back."
        },
        {
            "question": "Can you list the key symptoms the patient is experiencing? Do not provide additional context, just list the individual symptoms.",
            "answer": "Constant fatigue, Morning nausea, Random dizzy spells, Headaches, Joint and muscle aches, Sore throat and occasional fever, Racing heart and anxiety, Brain fog and trouble focusing, Loss of appetite, Finger numbness"
        },
        {
            "question": "When did the patient first mention that they were starting to feel really tired?",
            "answer": "The patient first mentioned feeling unusually tired on 2024-07-28, stating that they woke up with an overwhelming sense of tiredness that seemed out of place."
        },
        {
            "question": "What symptoms were recorded on 2024-08-03?",
            "answer": "On 2024-08-03, the patient reported a dull headache and body aches, mentioning they felt 'off' and wanted to note every detail."
        },
        {
            "question": "Has the patient experienced dizziness?",
            "answer": "Yes, the patient has documented multiple instances of dizziness, including on 2024-08-10, 2024-09-08, 2024-09-21, 2025-02-08, and other occasions."
        },
        {
            "question": "When did the patient first mention nausea?",
            "answer": "The first mention of nausea was on 2024-08-10, when the patient experienced nausea upon getting out of bed, followed by a dizzy spell."
        },
        {
            "question": "Is there any mention of anxiety?",
            "answer": "Yes, the patient mentioned feeling anxious on 2024-08-28 when their heart was racing, and again on 2025-1-21."
        },
        {
            "question": "Has the patient reported morning nausea?",
            "answer": "Yes, on 2024-08-29, the patient noted that nausea was becoming more noticeable each morning."
        },
        {
            "question": "On what date did the patient say their symptoms were impacting their ability to work?",
            "answer": "On 2025-01-21, the patient mentioned that fatigue was affecting their ability to work."
        },
        {
            "question": "What was recorded about joint pain?",
            "answer": "The patient reported joint pain on 2025-01-25, describing sporadic pain and numbness in fingers."
        },
        {
            "question": "Did the patient mention any heart-related symptoms?",
            "answer": "Yes, on 2024-08-28, the patient described their heart racing unexpectedly and feeling anxious."
        },
        {
            "question": "When did the patient mention brain fog?",
            "answer": "On 2025-01-21, the patient reported feeling foggy-minded and struggling with concentration at work."
        },
        {
            "question": "Did the patient experience difficulty sleeping?",
            "answer": "Yes, on 2024-08-15, they mentioned having a restless night despite getting 8 hours of sleep."
        },
        {
            "question": "What day did the patient say they had to take frequent breaks at work?",
            "answer": "On 2025-02-05, the patient mentioned that overwhelming exhaustion and dizzy spells forced them to take frequent breaks."
        },
        {
            "question": "When was the first time dizziness was reported?",
            "answer": "Dizziness was first reported on 2024-08-10, following a bout of nausea in the morning."
        },
        {
            "question": "When was the last recorded log, and what was mentioned?",
            "answer": "The last recorded log was on 2025-03-10, where the patient reflected on months of symptoms and noted that chronic fatigue, nausea, and dizziness remained their primary concern."
        },
        {
            "question": "Did the patient ever report a fever?",
            "answer": "Yes, the patient noted a brief fever on 2024-09-19 and again on 2024-12-17."
        },
        {
            "question": "Did the patient ever try home remedies?",
            "answer": "Yes, on 2024-10-15, the patient mentioned trying herbal teas and light stretching but still experiencing lingering fatigue and nausea."
        },
        {
            "question": "What symptoms were reported on 2025-02-11?",
            "answer": "On 2025-02-11, the patient skipped lunch due to persistent nausea, which worsened their lightheadedness and impacted their appetite."
        },
        {
            "question": "Did the patient report ringing in their ears?",
            "answer": "No explicit mention of ringing sound in their ears."
        },
        {
            "question": "What did the patient say about their appetite?",
            "answer": "On 2025-02-11, the patient noted a noticeable decline in their appetite due to persistent nausea."
        },
        {
            "question": "Did the patient ever report shortness of breath?",
            "answer": "No explicit mention of shortness of breath, though dizziness and fatigue were frequently noted."
        },
        {
            "question": "What symptoms were reported on 2024-09-17?",
            "answer": "On 2024-09-17, the patient noted mild nausea, muscle soreness, and an overall tired feeling."
        },
        {
            "question": "Did the patient ever describe a day of relative normalcy?",
            "answer": "Yes, on 2025-03-04, the patient mentioned experiencing a few hours of relative normalcy before fatigue and nausea returned."
        },
        {
            "question": "Did the patient report headaches?",
            "answer": "Yes, headaches were reported on multiple occasions, including 2024-08-03 and 2024-09-12."
        },
        {
            "question": "What symptoms were recorded on 2024-10-17?",
            "answer": "On 2024-10-17, the patient reported feeling extremely tired despite resting and was concerned about their ongoing nausea and fatigue."
        },
        {
            "question": "Did the patient experience any muscle aches?",
            "answer": "Yes, on 2024-08-03 and 2024-09-17, the patient described muscle soreness and body aches."
        },
        {
            "question": "What symptoms did the patient describe on 2024-08-15?",
            "answer": "On 2024-08-15, the patient noted a restless night, a sore throat, and ongoing fatigue and dizziness."
        },
        {
            "question": "When did the patient first describe nausea becoming a persistent issue?",
            "answer": "On 2024-08-29, the patient mentioned that nausea was becoming more noticeable every morning."
        },
        {
            "question": "When did the patient first mention dizziness interfering with their day?",
            "answer": "On 2024-09-08, the patient stated that a dizzy spell interrupted their mid-day chores and made them feel weak."
        },
        {
            "question": "What did the patient say about their symptoms on 2025-02-17?",
            "answer": "On 2025-02-17, the patient reported the day was challenging; even a short walk left me dizzy and weak. The mix of symptoms—chronic fatigue, nausea, and dizziness was hard to shake."
        },
        {
            "question": "Did the patient experience any skin-related symptoms?",
            "answer": "No specific skin-related symptoms were documented in the logs."
        },
        {
            "question": "Has the patient ever described their symptoms as cyclical?",
            "answer": "Yes, on 2025-03-10, the patient reflected on months of symptoms, stating that chronic fatigue, nausea, and dizziness followed a persistent pattern."
        },
        {
            "question": "When did the patient mention experiencing numbness?",
            "answer": "On 2025-01-25, the patient reported sporadic joint pain and numbness in their fingers."
        },
        {
            "question": "What was the patient's overall assessment of their symptoms at the end?",
            "answer": "In the final log on 2025-03-10, the patient summarized their symptoms, emphasizing chronic fatigue, nausea, and dizziness as the most persistent concerns."
        }
    ]
}


## 3. ⚙️ Model & Pipeline Setup
This section is where we load the actual language model and set it up so it can generate text based on prompts.

Tokenizer prepares input / output for the model.
THe model predicts new text given the input tokens.

### **3a. Load model, tokenizer and initial pipeline functions**

In [None]:
model_names = {
    "mistral": "mistralai/Mistral-7B-Instruct-v0.3",
    "llama3": "meta-llama/Llama-3.2-3B-Instruct",
    "med42_8b": "m42-health/Llama3-Med42-8B",
    "med42_70b": "m42-health/Llama3-Med42-70B"
}

In [None]:
# Ramakrishna Ramadurgam - loaded model (note 4 bit)

model_name = model_names["med42_8b"]
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    load_in_4bit=True
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/439 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/698 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/121 [00:00<?, ?B/s]

### **3b. Model Pipeline & Prompt Engineering**

In [None]:
# Ramakrishna Ramadurgam - pipeline and parser

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_community.llms import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

def extract_answer_only(text: str) -> str:
    """
    Extracts the final answer from a model response that may contain the full prompt.
    Assumes the answer comes after the last occurrence of 'Answer:'.
    """
    if "Answer:" in text:
        return text.split("Answer:")[-1].strip()
    return text.strip()



# 1. Load Model + Tokenizer and Build Pipeline
def build_health_chain_with_model(model, tokenizer, prompt_template, temperature=0.2, max_new_tokens=100):
    text_gen_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        do_sample=True
    )

    llm = HuggingFacePipeline(pipeline=text_gen_pipeline)

    # 🔁 2. Build the LangChain pipeline
    chain = (
        {"patient_logs": RunnablePassthrough(), "question": RunnablePassthrough()}
        | prompt_template
        | llm
        | extract_answer_only  # parser
    )
    return chain


In [None]:
mistral_qa_prompt = PromptTemplate.from_template(
    """You are a helpful medical assistant. Given the patient's health log history and a specific question, provide a concise, informative answer.

Patient's Health Log History:
{patient_logs}

Question:
{question}

Answer:"""
)


In [None]:
llama_friendly_qa_prompt = PromptTemplate.from_template(
    """You are a helpful medical assistant. Given the patient's health log history and a question, provide a short but complete answer using only the information from the logs.

Do not repeat the question or logs. Do not include explanations or disclaimers. Just provide a factual, well-formed sentence.

Patient Logs:
{patient_logs}

Question:
{question}

Answer:"""
)

In [None]:
med42__friendly_qa_prompt = PromptTemplate.from_template(
    """You are a helpful medical assistant. Given the patient's health log history and a specific question, provide a short and informative answer using only the information in the logs.

If the answer involves a date, also include one short sentence explaining what was mentioned on that date.

Do not repeat the question or logs. Do not include disclaimers or generalizations.

Patient Logs:
{patient_logs}

Question:
{question}

Answer:"""
)


In [None]:
# CHAIN OF THOUGHT PROMPT

med42__cot_qa_prompt = PromptTemplate.from_template(
    """You are a helpful medical assistant. Given the patient's health log history and a specific question, follow these steps to arrive at a concise answer:

1. First, reason step by step through the patient's logs to identify relevant entries.
2. Then, synthesize those findings into a short and informative answer.
3. If the answer involves a date, include one short sentence explaining what was mentioned on that date.

Do not include disclaimers or repeat the question or logs in your answer.

Patient Logs:
{patient_logs}

Question:
{question}

Answer:"""
)



In [None]:
# Ramakrishna Ramadurgam - final prompt for final config

# FEW SHOT PROMPT (*Best prompt for MED42-8B) used an unrelated example / scenario to the gold dataset

med42__fewshot_qa_prompt = PromptTemplate.from_template(
    """You are a helpful medical assistant. Given the patient’s health log history and a specific question, provide a short and informative answer using only information found in the logs.

If the answer involves a date, include one short sentence explaining what was mentioned on that date.

Do not repeat the question or logs. Do not include disclaimers or generalizations.

Few Shot Examples:

Example 1
Question: When did the patient first report breathing difficulties?
Answer: On 2023-06-12, the patient noted experiencing shortness of breath while climbing stairs.

Example 2
Question: Has the patient ever experienced chest tightness?
Answer: Yes, chest tightness was reported on 2023-07-04, especially in the evening after outdoor exposure.

Example 3
Question: What symptoms were mentioned on 2023-08-01?
Answer: On 2023-08-01, the patient reported wheezing and using their rescue inhaler twice.

Patient Logs:
{patient_logs}

Question:
{question}

Answer:"""
)



### **3c. Set model hyperparameters and build model**

In [None]:
# Ramakrishna Ramadurgam - chain build

# Create Chains - DO NOT RUN MULTIPLE OR WILL CRASH CUDA & REMEMBER YOU NEED TO SET THE MODEL A COUPLE CELLS UP

# Model (need to set the model a couple cells up - but also ensure right prompt too)
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=mistral_qa_prompt, temperature=0.2, max_new_tokens=150) # Mistral
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=llama_friendly_qa_prompt, temperature=0.2, max_new_tokens=150) # llama-3-8b
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.2, max_new_tokens=150) # Med42-8b

# Temperature Tuning (med42-8b )
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.1, max_new_tokens=150) # Med42-8b
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.2, max_new_tokens=150) # Med42-8b
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.5, max_new_tokens=150) # Med42-8b (Highest)
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.9, max_new_tokens=150) # Med42-8b

# Prompt Engineering (med42-8b)
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__friendly_qa_prompt, temperature=0.5, max_new_tokens=150) # Med42-8b - Zero shot
# qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__cot_qa_prompt, temperature=0.5, max_new_tokens=150) # Med42-8b - Chain of Thought
qa_chain = build_health_chain_with_model(model, tokenizer, prompt_template=med42__fewshot_qa_prompt, temperature=0.5, max_new_tokens=150) # Med42-8b - Few shot



Device set to use cuda:0


In [None]:
# # ✅ Run test inference
# test_log = "I've been having really bad headaches lately."
# response = health_chain.invoke({"health_log": test_log})
# print(response)

In [None]:
import torch
torch.cuda.empty_cache()


In [None]:
response = qa_chain.invoke({
    "patient_logs": full_logs_text,
    "question": "When did the patient first mention that they were starting to feel really tired?"
})

print(response)

Token indices sequence length is longer than the specified maximum sequence length for this model (2565 > 2048). Running this sequence through the model will result in indexing errors


On 2024-07-28, the patient noted feeling "really exhausted with an overwhelming sense of tiredness that seemed out of place."


## 6. 📈 Evaluation (BERTScore)

### **6a. Generate model responses for test / eval questions**

In [None]:
from tqdm import tqdm

generated_qa = []

for qa in tqdm(gold_standard["gold_standard_qa"]):

    question = qa["question"]

    try:
        response = qa_chain.invoke({
            "patient_logs": full_logs_text,  # string with logs
            "question": question
        })

        generated_qa.append({
            "question": question,
            "gold_answer": qa["answer"],
            "model_answer": response
        })

        print(f"✅ Q: {question}\n💬 A: {response}\n")
        print("-" * 60)
    except Exception as e:
        generated_qa.append({
            "question": question,
            "gold_answer": qa["answer"],
            "model_answer": f"Error: {str(e)}"
        })


  3%|▎         | 1/35 [00:26<14:58, 26.44s/it]

✅ Q: Can you write a short, first-person summary of the patient’s experience based on the log entries — as if the patient were explaining how they’ve been feeling in a casual, conversational tone?
💬 A: Lately, I've been feeling really exhausted, like something's off. My mornings often bring headaches and body aches. Nausea and dizziness have become more frequent, especially after waking up and eating. Even simple tasks feel overwhelming due to persistent fatigue. Some days are better than others, but overall, I'm worried about the underlying cause of these symptoms.

------------------------------------------------------------


  6%|▌         | 2/35 [01:00<16:55, 30.79s/it]

✅ Q: Can you list the key symptoms the patient is experiencing? Do not provide additional context, just list the individual symptoms.
💬 A: chronic fatigue, nausea, dizziness, headache, body aches, sore throat, joint pain, fever, anxiety, muscle soreness, stomach upset, fatigue, lightheadedness, appetite loss, irritability, stress, exhaustion, numbness, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue,

------------------------------------------------------------


  9%|▊         | 3/35 [01:20<13:57, 26.16s/it]

✅ Q: When did the patient first mention that they were starting to feel really tired?
💬 A: On 2024-07-28, the patient noted feeling "really exhausted with an overwhelming sense of tiredness that seemed out of place."

------------------------------------------------------------


 11%|█▏        | 4/35 [01:41<12:17, 23.78s/it]

✅ Q: What symptoms were recorded on 2024-08-03?
💬 A: On 2024-08-03, the patient reported a dull headache and body aches.

------------------------------------------------------------


 14%|█▍        | 5/35 [02:04<11:50, 23.70s/it]

✅ Q: Has the patient experienced dizziness?
💬 A: Yes, the patient experienced dizziness on multiple occasions, as noted in entries such as 2024-08-10, 2024-09-08, 2024-09-21, and 2025-01-23.

------------------------------------------------------------


 17%|█▋        | 6/35 [02:25<10:57, 22.67s/it]

✅ Q: When did the patient first mention nausea?
💬 A: On 2024-08-10, the patient reported experiencing a bout of nausea as soon as they got out of bed.

------------------------------------------------------------


 20%|██        | 7/35 [02:48<10:37, 22.78s/it]

✅ Q: Is there any mention of anxiety?
💬 A: Yes, anxiety was reported on 2024-08-28, especially in the morning along with heart racing and a surge of anxiety. It’s unsettling, and I’m beginning to connect these episodes with my ongoing fatigue.

------------------------------------------------------------


 23%|██▎       | 8/35 [03:09<10:02, 22.30s/it]

✅ Q: Has the patient reported morning nausea?
💬 A: Yes, morning nausea was reported on 2024-08-10 and has been noted as recurring since then, especially in the morning.

------------------------------------------------------------


 26%|██▌       | 9/35 [03:32<09:42, 22.41s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


✅ Q: On what date did the patient say their symptoms were impacting their ability to work?
💬 A: 2025-01-21: Today, anxiety hit hard along with a foggy mind. The persistent fatigue is affecting my mood and ability to concentrate at work.

------------------------------------------------------------


 29%|██▊       | 10/35 [03:53<09:10, 22.01s/it]

✅ Q: What was recorded about joint pain?
💬 A: On 2024-09-12, the patient noted experiencing joint pain along with a persistent headache and lingering nausea.

------------------------------------------------------------


 31%|███▏      | 11/35 [04:15<08:52, 22.19s/it]

✅ Q: Did the patient mention any heart-related symptoms?
💬 A: Yes, on 2024-08-28, the patient felt their heart racing unexpectedly and experienced a surge of anxiety, which they are beginning to connect with their ongoing fatigue.

------------------------------------------------------------


 34%|███▍      | 12/35 [04:37<08:24, 21.94s/it]

✅ Q: When did the patient mention brain fog?
💬 A: On 2025-01-21, the patient reported experiencing anxiety along with a foggy mind, indicating brain fog.

------------------------------------------------------------


 37%|███▋      | 13/35 [04:59<08:01, 21.90s/it]

✅ Q: Did the patient experience difficulty sleeping?
💬 A: On 2024-08-15, the patient reported having a restless night even after 8 hours of sleep, suggesting difficulty sleeping.

------------------------------------------------------------


 40%|████      | 14/35 [05:21<07:42, 22.01s/it]

✅ Q: What day did the patient say they had to take frequent breaks at work?
💬 A: On 2025-02-05, the patient mentioned having to take frequent breaks at work due to overwhelming exhaustion and repeated dizzy spells.

------------------------------------------------------------


 43%|████▎     | 15/35 [05:42<07:16, 21.84s/it]

✅ Q: When was the first time dizziness was reported?
💬 A: On 2024-08-10, the patient experienced a dizzy spell shortly after waking up and feeling nauseous.

------------------------------------------------------------


 46%|████▌     | 16/35 [06:06<07:03, 22.29s/it]

✅ Q: When was the last recorded log, and what was mentioned?
💬 A: The last recorded log was on 2025-03-10, where the patient reflected on the past months, noting persistent chronic fatigue, recurring nausea, and dizziness as their primary concern.

------------------------------------------------------------


 49%|████▊     | 17/35 [06:28<06:39, 22.17s/it]

✅ Q: Did the patient ever report a fever?
💬 A: Yes, a brief fever was reported on 2024-09-19, which was followed by dizziness and fatigue that persisted.

------------------------------------------------------------


 51%|█████▏    | 18/35 [06:49<06:12, 21.91s/it]

✅ Q: Did the patient ever try home remedies?
💬 A: Yes, the patient tried home remedies on 2024-10-15, specifically herbal teas and light stretching.

------------------------------------------------------------


 54%|█████▍    | 19/35 [07:12<05:58, 22.42s/it]

✅ Q: What symptoms were reported on 2025-02-11?
💬 A: On 2025-02-11, the patient skipped lunch due to persistent nausea, which only made the lightheadedness worse. Their appetite has noticeably diminished over the past few weeks.

------------------------------------------------------------


 57%|█████▋    | 20/35 [07:32<05:24, 21.64s/it]

✅ Q: Did the patient report ringing in their ears?
💬 A: No, the patient did not report ringing in their ears.

------------------------------------------------------------


 60%|██████    | 21/35 [07:54<05:01, 21.53s/it]

✅ Q: What did the patient say about their appetite?
💬 A: On 2025-02-11, the patient noted a noticeable diminishment of appetite due to persistent nausea.

------------------------------------------------------------


 63%|██████▎   | 22/35 [08:14<04:35, 21.22s/it]

✅ Q: Did the patient ever report shortness of breath?
💬 A: No, the patient did not report shortness of breath in their health logs.

------------------------------------------------------------


 66%|██████▌   | 23/35 [08:37<04:20, 21.75s/it]

✅ Q: What symptoms were reported on 2024-09-17?
💬 A: On 2024-09-17, the patient reported mild nausea, muscle soreness, and an overall tired feeling, starting to document a pattern with these symptoms.

------------------------------------------------------------


 69%|██████▊   | 24/35 [08:58<03:56, 21.53s/it]

✅ Q: Did the patient ever describe a day of relative normalcy?
💬 A: On 2025-03-04, the patient reported experiencing a few hours of relative normalcy.

------------------------------------------------------------


 71%|███████▏  | 25/35 [09:19<03:33, 21.36s/it]

✅ Q: Did the patient report headaches?
💬 A: Yes, the patient reported headaches on 2024-08-03, along with body aches.

------------------------------------------------------------


 74%|███████▍  | 26/35 [09:41<03:13, 21.54s/it]

✅ Q: What symptoms were recorded on 2024-10-17?
💬 A: On 2024-10-17, the patient reported overwhelming tiredness, recurring nausea, and fatigue that persisted despite rest.

------------------------------------------------------------


 77%|███████▋  | 27/35 [10:02<02:50, 21.35s/it]

✅ Q: Did the patient experience any muscle aches?
💬 A: Yes, muscle aches were reported on 2024-08-03, specifically body aches.

------------------------------------------------------------


 80%|████████  | 28/35 [10:26<02:34, 22.13s/it]

✅ Q: What symptoms did the patient describe on 2024-08-15?
💬 A: On 2024-08-15, the patient reported having a restless night even after 8 hours of sleep, along with a slight sore throat, persistent fatigue, and dizziness, making the day challenging.

------------------------------------------------------------


 83%|████████▎ | 29/35 [10:48<02:12, 22.05s/it]

✅ Q: When did the patient first describe nausea becoming a persistent issue?
💬 A: On 2024-08-10, the patient noted experiencing nausea as soon as they got out of bed, along with a dizzy spell.

------------------------------------------------------------


 86%|████████▌ | 30/35 [11:11<01:51, 22.32s/it]

✅ Q: When did the patient first mention dizziness interfering with their day?
💬 A: On 2024-08-10, the patient reported a dizzy spell that interfered with their day, along with nausea and feeling uneasy throughout the day.

------------------------------------------------------------


 89%|████████▊ | 31/35 [11:34<01:30, 22.53s/it]

✅ Q: What did the patient say about their symptoms on 2025-02-17?
💬 A: Today was challenging; even a short walk left me dizzy and weak. The mix of symptoms—chronic fatigue, nausea, and dizziness—is hard to shake.

------------------------------------------------------------


 91%|█████████▏| 32/35 [11:54<01:05, 21.74s/it]

✅ Q: Did the patient experience any skin-related symptoms?
💬 A: No, the patient did not report any skin-related symptoms.

------------------------------------------------------------


 94%|█████████▍| 33/35 [12:18<00:44, 22.45s/it]

✅ Q: Has the patient ever described their symptoms as cyclical?
💬 A: No, the patient has not described their symptoms as cyclical. The patient's logs indicate a persistent and fluctuating pattern of fatigue, nausea, dizziness, and other symptoms, but there is no indication of a cyclical pattern.

------------------------------------------------------------


 97%|█████████▋| 34/35 [12:39<00:22, 22.10s/it]

✅ Q: When did the patient mention experiencing numbness?
💬 A: On 2025-01-25, the patient reported experiencing a brief period of numbness in their fingers.

------------------------------------------------------------


100%|██████████| 35/35 [13:03<00:00, 22.40s/it]

✅ Q: What was the patient's overall assessment of their symptoms at the end?
💬 A: At the end, the patient described their symptoms as a persistent chronic fatigue with recurring nausea and dizziness, affecting their daily routine and overall well-being. Despite some slight improvements, the symptoms continue to disrupt their routine and impact their mood.

------------------------------------------------------------





In [None]:
generated_qa[1]

{'question': 'Can you list the key symptoms the patient is experiencing? Do not provide additional context, just list the individual symptoms.',
 'gold_answer': 'Constant fatigue, Morning nausea, Random dizzy spells, Headaches, Joint and muscle aches, Sore throat and occasional fever, Racing heart and anxiety, Brain fog and trouble focusing, Loss of appetite, Finger numbness',
 'model_answer': 'chronic fatigue, nausea, dizziness, headache, body aches, sore throat, joint pain, fever, anxiety, muscle soreness, stomach upset, fatigue, lightheadedness, appetite loss, irritability, stress, exhaustion, numbness, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue, fatigue,

In [None]:
# import pandas as pd

# # Convert to DataFrame
# df_qa = pd.DataFrame(generated_qa)

# df_qa.rename(columns={
#     "question": "Question",
#     "gold_answer": "Gold Answer",
#     "model_answer": "Model Answer"
# }, inplace=True)

# # Save to Excel
# excel_path = "/content/generated_qa.xlsx"
# df_qa.to_excel(excel_path, index=False)

# # Download in Colab
# from google.colab import files
# files.download(excel_path)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## 7. 📊 Results Visualization

In [None]:
torch.cuda.empty_cache()

import logging
logging.getLogger("transformers.modeling_utils").setLevel(logging.ERROR)

### **7b. Define Metric Calculation Functions**

In [None]:
# BERTScore, Bleu and Rouge functions

def calculate_bertscore(generated_texts, reference_texts):
    """
    Calculate BERTScore for a list of generated texts against reference texts.

    Args:
        generated_texts: List of generated text strings
        reference_texts: List of reference text strings

    Returns:
        Dict containing lists of precision, recall, f1 scores and their averages
    """
    # Calculate BERTScore
    P, R, F1 = score(generated_texts, reference_texts, lang="en", verbose=False)

    return {
        "precision": P.tolist(),
        "recall": R.tolist(),
        "f1": F1.tolist(),
        "avg_precision": float(P.mean()),
        "avg_recall": float(R.mean()),
        "avg_f1": float(F1.mean())
    }

def calculate_bleu(generated_text, reference_text):
    """
    Calculate BLEU score for a single generated text against a reference.

    Args:
        generated_text: String of generated text
        reference_text: String of reference text

    Returns:
        BLEU score as a float
    """
    smooth = SmoothingFunction().method1
    reference = [nltk.word_tokenize(reference_text)]
    hypothesis = nltk.word_tokenize(generated_text)

    return sentence_bleu(reference, hypothesis, smoothing_function=smooth)

def calculate_rouge(generated_texts, reference_texts):
    """
    Calculate ROUGE scores for a list of generated texts against reference texts.

    Args:
        generated_texts: List of generated text strings
        reference_texts: List of reference text strings

    Returns:
        Dict containing ROUGE scores
    """
    rouge = evaluate.load("rouge")
    return rouge.compute(predictions=generated_texts, references=reference_texts)

### **7c. Calculate Metrics**

In [None]:
from bert_score import BERTScorer
scorer = BERTScorer(lang="en", model_type="roberta-large")

qa_results = []

for entry in generated_qa:
    question = entry["question"]
    gold = entry["gold_answer"]
    pred = entry["model_answer"]

    # Skip if model failed
    if pred.startswith("Error"):
        qa_results.append({
            "question": question,
            "avg_bert_f1": 0,
            "rouge_l": 0,
            "bleu": 0,
            "composite": 0
        })
        continue

    # Evaluate scores
    bert_result = calculate_bertscore([pred], [gold])
    rouge_result = calculate_rouge([pred], [gold])
    bleu_result = calculate_bleu(pred, gold)

    # Composite weighted score (adjust weights if needed)
    weights = {"bert": 0.65, "rouge": 0.3, "bleu": 0.05}
    composite_score = (
        weights["bert"] * bert_result["avg_f1"] +
        weights["rouge"] * rouge_result["rougeL"] +
        weights["bleu"] * bleu_result
    )

    qa_results.append({
        "question": question,
        "avg_bert_f1": bert_result["avg_f1"],
        "rouge_l": rouge_result["rougeL"],
        "bleu": bleu_result,
        "composite": composite_score
    })


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

### **7e. Display evaluation summary with overall averages**

In [None]:
# Create a DataFrame
qa_df = pd.DataFrame(qa_results)

# Calculate overall averages
overall_avg = {
    "question": "OVERALL AVG",
    "avg_bert_f1": qa_df["avg_bert_f1"].mean(),
    "rouge_l": qa_df["rouge_l"].mean(),
    "bleu": qa_df["bleu"].mean(),
    "composite": qa_df["composite"].mean()
}

# Append overall avg
qa_df_with_avg = pd.concat([
    qa_df,
    pd.DataFrame([overall_avg])
], ignore_index=True)

# Display
print("📊 Q&A Model Evaluation Summary:")
display(qa_df_with_avg[["question", "composite", "avg_bert_f1", "rouge_l", "bleu"]])


📊 Q&A Model Evaluation Summary:


Unnamed: 0,question,composite,avg_bert_f1,rouge_l,bleu
0,"Can you write a short, first-person summary of...",0.619911,0.885444,0.146893,0.006089
1,Can you list the key symptoms the patient is e...,0.588548,0.825031,0.173077,0.007084
2,When did the patient first mention that they w...,0.827647,0.950322,0.625,0.448752
3,What symptoms were recorded on 2024-08-03?,0.857761,0.957858,0.722222,0.369724
4,Has the patient experienced dizziness?,0.804499,0.950933,0.566038,0.331627
5,When did the patient first mention nausea?,0.749569,0.943329,0.444444,0.061436
6,Is there any mention of anxiety?,0.662689,0.88849,0.280702,0.019196
7,Has the patient reported morning nausea?,0.693661,0.914701,0.324324,0.036155
8,On what date did the patient say their symptom...,0.731814,0.916266,0.45,0.024826
9,What was recorded about joint pain?,0.690048,0.923554,0.294118,0.030065


In [None]:
# # Save just the selected columns to Excel
# qa_df_with_avg[["question", "composite", "avg_bert_f1", "rouge_l", "bleu"]].to_excel("/content/qa_evaluation_metrics.xlsx", index=False)
# from google.colab import files
# files.download("/content/qa_evaluation_metrics.xlsx")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>