<a href="https://colab.research.google.com/github/RubinThomas75/epfLLM-eval/blob/main/1_load_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
# Install Dependencies
!pip install transformers==4.47.1  # or your preferred version
!pip install torch                # or 'torch==2.0.0' for a specific version
!pip install accelerate         # (Optional) for efficient inference on multi-GPU
!pip install fuzzywuzzy

Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB)
Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


In [3]:
import os
from google.colab import drive

drive.mount('/content/drive')
token_file_path = "/content/drive/MyDrive/hf_read_token.txt"

with open(token_file_path, "r", encoding="utf-8-sig") as f:
    token = f.read().strip()

os.environ["HF_TOKEN"] = token

print("Hugging Face token loaded.")

Mounted at /content/drive
Hugging Face token loaded.


In [4]:
# Load EPFL LLM (Meditron)

# Replace "epfLLM/meditron" with the exact model path from Hugging Face if needed

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "epfl-llm/meditron-7b"
cache_dir = "/content/drive/MyDrive/epfLLM_meditron7b"

# Load tokenizer
print(f"Loading tokenizer for {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    cache_dir=cache_dir,
    use_auth_token=os.environ["HF_TOKEN"]
)

# Load model
print(f"Loading model for {model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=cache_dir,
    use_auth_token=os.environ["HF_TOKEN"]
)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
print(f"Model loaded on device: {device}")

Loading tokenizer for epfl-llm/meditron-7b...




Loading model for epfl-llm/meditron-7b...




Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Model loaded on device: cuda


In [6]:
# List of prompts
prompts = [ "What are the common symptoms of COVID-19?", "What is the treatment for a common cold?", "What causes high blood pressure?", "What is the main cause of type 2 diabetes?", "How is asthma treated?", "What is a stroke?", "What are the symptoms of a heart attack?", "What is the treatment for pneumonia?", "What are the causes of back pain?", "How is tuberculosis diagnosed?", "What is the treatment for hypothyroidism?", "What are the risks of smoking?", "What is the function of the liver?", "What are the signs of dehydration?", "What causes Alzheimer’s disease?", "How do you treat high cholesterol?", "What is a common cause of sudden weight loss?", "How do you treat an allergic reaction?", "What are the symptoms of hepatitis?", "What causes rheumatoid arthritis?", "What is the treatment for osteoporosis?", "What is the role of insulin in the body?", "What are the signs of a bacterial infection?", "What causes acid reflux?", "How is a heart murmur diagnosed?", "What are the symptoms of appendicitis?", "What is the treatment for a sprained ankle?", "What are the symptoms of depression?", "What is the treatment for a stroke?", "What causes insomnia?", "How do you treat a broken bone?", "What are the side effects of chemotherapy?", "What is sleep apnea?", "What causes acne?", "How is gout treated?", "What is the treatment for a seizure?", "What is the cause of high cholesterol?", "How do you diagnose diabetes?", "What is the treatment for a sinus infection?", "How do you prevent a stroke?", "What are the side effects of antidepressants?", "What is a common cause of sudden chest pain?", "What are the causes of a stroke?", "What is the role of the thyroid gland in metabolism?", "How is diabetes managed with lifestyle changes?", "What are common causes of fatigue in adults?", "How is high blood pressure diagnosed?", "What is a common cause of dizziness?", "What are the risk factors for developing breast cancer?", "How is rheumatoid arthritis diagnosed?", "What is a mammogram used for?", "What is the purpose of a blood glucose test?", "How can stress be managed to prevent health issues?", "What is the treatment for bacterial pneumonia?", "What is the difference between a heart attack and a stroke?", "What causes chronic fatigue syndrome?", "How do doctors treat a herniated disc?", "What are the signs of dehydration in children?", "What is the role of a cardiologist?", "What is an EKG used for?", "How do you treat asthma attacks?", "What are the causes of acne in adults?", "How is the flu treated?", "What are the symptoms of a brain tumor?", "How is hypertension managed?", "What is the function of the immune system?", "How does smoking affect the lungs?", "What are the signs of an allergic reaction?", "What are the side effects of oral contraceptives?", "What is the most common cause of dizziness in elderly individuals?", "How is dehydration treated?", "What are the symptoms of dehydration in infants?", "How do you prevent diabetes type 2?", "What is the treatment for IBS (Irritable Bowel Syndrome)?", "What is the difference between type 1 and type 2 diabetes?", "How is colorectal cancer diagnosed?", "What is a colonoscopy used for?", "What is a pacemaker used for?", "How do doctors test for HIV?", "What is the difference between a cold and the flu?", "How is a kidney stone treated?", "What causes a fever?", "What are the symptoms of chronic obstructive pulmonary disease (COPD)?", "How is liver cirrhosis diagnosed?", "What is a chest X-ray used for?", "How is multiple sclerosis diagnosed?", "What are common causes of shortness of breath?", "How do you treat a burn?", "What causes a headache after eating?", "How is bipolar disorder treated?", "What is the difference between an MRI and a CT scan?", "How do you treat food poisoning?", "What are the causes of bloating and gas?" ]

# List of corresponding ground truths
ground_truth = [ "Fever, cough, fatigue, shortness of breath, loss of taste or smell, sore throat, body aches.", "Rest, hydration, over-the-counter pain relievers (ibuprofen or acetaminophen), decongestants, and cough syrup.", "Genetics, obesity, lack of exercise, poor diet (high salt), smoking, alcohol consumption, and stress.", "Insulin resistance due to obesity, sedentary lifestyle, and poor diet.", "Inhalers (bronchodilators), corticosteroids, avoiding triggers, and lifestyle modifications.", "A stroke occurs when there is an interruption of blood flow to the brain, leading to brain cell death.", "Chest pain, shortness of breath, nausea, lightheadedness, pain in the arm, back, or jaw.", "Antibiotics (for bacterial pneumonia), rest, fluids, and in some cases, oxygen therapy.", "Muscle strain, poor posture, herniated discs, osteoarthritis, and spinal stenosis.", "Chest X-ray, sputum culture, TB skin test, or blood tests.", "Synthetic thyroid hormone replacement (levothyroxine).", "Lung cancer, heart disease, stroke, respiratory infections, chronic obstructive pulmonary disease (COPD).", "Detoxification, protein synthesis, production of bile, metabolism of fats, carbohydrates, and proteins.", "Dry mouth, fatigue, dark-colored urine, dizziness, confusion.", "The exact cause is unknown, but factors include genetics, aging, and environmental influences.", "Statins, lifestyle changes (healthy diet, exercise), and sometimes other lipid-lowering medications.", "Hyperthyroidism, diabetes, cancer, gastrointestinal issues, and malnutrition.", "Antihistamines, corticosteroids, and in severe cases, epinephrine.", "Fatigue, jaundice, abdominal pain, loss of appetite, nausea, and dark urine.", "Autoimmune disorder where the immune system attacks the joints.", "Calcium and vitamin D supplementation, weight-bearing exercises, and medications like bisphosphonates.", "Insulin helps regulate blood sugar by allowing glucose to enter cells for energy.", "Fever, chills, redness, swelling, and pus or drainage from a wound.", "Weakening of the lower esophageal sphincter, obesity, smoking, and certain foods.", "Physical examination with a stethoscope, echocardiogram, and sometimes an EKG.", "Abdominal pain (starting around the belly button), nausea, vomiting, fever, and loss of appetite.", "R.I.C.E (Rest, Ice, Compression, Elevation), pain relievers, and physical therapy if needed.", "Persistent sadness, loss of interest, fatigue, changes in appetite or sleep, and thoughts of death or suicide.", "Immediate medical treatment, thrombolytic drugs (if within the first few hours), physical therapy.", "Stress, anxiety, poor sleep habits, medical conditions, and certain medications.", "Immobilization (cast or splint), pain management, and sometimes surgery.", "Nausea, vomiting, hair loss, fatigue, weakened immune system, and mouth sores.", "A disorder where breathing repeatedly stops and starts during sleep, often causing daytime fatigue.", "Excess oil production, clogged hair follicles, bacteria, and hormones (especially during puberty).", "Anti-inflammatory drugs (NSAIDs, colchicine), uric acid-lowering medications, and dietary changes.", "Antiepileptic drugs, lifestyle modifications, and in some cases, surgery.", "Diet high in saturated fats, genetics, lack of physical activity, and smoking.", "Fasting blood glucose test, oral glucose tolerance test, or A1C test.", "Decongestants, nasal sprays, rest, and sometimes antibiotics.", "Controlling blood pressure, cholesterol levels, and diabetes, avoiding smoking, and maintaining a healthy diet.", "Nausea, weight gain, sleep disturbances, sexual dysfunction, and dry mouth.", "Angina, heart attack, acid reflux, or a panic attack.", "A blocked artery or bleeding in the brain due to high blood pressure, heart disease, or aneurysms.", "The thyroid helps regulate metabolism, growth, and energy levels by producing hormones (T3 and T4).", "Exercise, weight loss, healthy eating, and blood sugar monitoring.", "Stress, sleep deprivation, anemia, thyroid issues, and chronic illnesses.", "High blood pressure, high cholesterol, obesity, and smoking.", "Benign paroxysmal positional vertigo, dehydration, or a vestibular disorder.", "Obesity, family history, age, and hormone replacement therapy.", "Joint pain, swelling, and stiffness, diagnosed through physical examination and blood tests.", "A mammogram detects changes in breast tissue, including early signs of cancer.", "A blood glucose test measures your blood sugar levels to diagnose diabetes or monitor its management.", "Regular physical activity, meditation, deep breathing exercises, and a healthy work-life balance.", "Antibiotics, oxygen, fluids, and sometimes mechanical ventilation.", "Chest pain and pressure, nausea, dizziness, and shortness of breath are common symptoms.", "Muscle weakness, loss of balance, slurred speech, and tremors.", "Physical therapy, stress management, and in some cases, surgery to remove herniated disc material.", "Clear fluids, rehydration solutions, and electrolyte replacement.", "Regular physical activity, healthy diet, no smoking, and maintaining normal cholesterol and blood pressure.", "Cognitive therapy, medications, exercise, and avoiding excessive alcohol consumption.", "Drugs like insulin or sulfonylureas to regulate blood sugar levels and lifestyle changes.", "Medications, heart monitoring, and exercise therapy.", "Corticosteroids or DMARDs (disease-modifying antirheumatic drugs), and joint protection techniques.", "Stress reduction, weight control, avoiding triggers like high-fat foods.", "MRI, physical exams, and sometimes a biopsy." ]


In [33]:
model.config.pad_token_id = model.config.eos_token_id


def generate_response(prompt):
      default_prompt = """
    You are a medical assistant, and will be asked to answer questions. The questions come in multiple choice format. Please respond with your answer choice, as a letter, and that only. For example, for a question, respond with "A" or "B" or "C" or "D". If you don't know, say "E".
    Question:

    """
  input_text = default_prompt + prompt
  input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
  attention_mask = torch.ones(input_ids.shape, device=device)

  # Generate the response with attention mask
  with torch.no_grad():
      output_ids = model.generate(
          input_ids,
          attention_mask=attention_mask,
          max_length=200,         # Adjust for longer/shorter responses
          max_new_tokens=20,
          num_beams=5,          # Increase for more exhaustive search
          early_stopping=True,
          no_repeat_ngram_size=2,
          top_p=0.9,              # Use nucleus sampling
          top_k=50                # Use top-k sampling to narrow down possibilities
      )

  # Decode and print the output
  response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
  return response


In [21]:
# Required libraries
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from fuzzywuzzy import fuzz

# Function to compute both cosine similarity and fuzzy matching score
def evaluate_responses(response_index, generated_response):
    # Create a TF-IDF Vectorizer for cosine similarity
    vectorizer = TfidfVectorizer()

    # Cosine Similarity
    tfidf_matrix = vectorizer.fit_transform([generated_response, ground_truth[response_index]])
    cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

    # Fuzzy Matching (using fuzzywuzzy for partial ratio)
    fuzzy_score = fuzz.partial_ratio(generated_response.lower(), ground_truth[response_index].lower())

    return cosine_sim, fuzzy_score

for i in range(len(prompts)):
  x = generate_response(prompts[i])
  print(prompts[i])
  print(ground_truth[i])
  print(x)
  print("_________________________________________________")
  cos_score, fuz_score = evaluate_responses(i, x)
  print(f"{i} {cos_score} {fuz_score}")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the common symptoms of COVID-19?
Fever, cough, fatigue, shortness of breath, loss of taste or smell, sore throat, body aches.
- Fever, cough, shortness of breath, fatigue, muscle or body aches, headache, new loss of taste or smell, sore throat, congestion or runny nose, nausea or vomiting, and diarrhea.
_________________________________________________
0 0.6742740102778486 59


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is the treatment for a common cold?
Rest, hydration, over-the-counter pain relievers (ibuprofen or acetaminophen), decongestants, and cough syrup.
A cold is a viral infection of the upper respiratory tract caused by rhinoviruses, coronavirus, parainfluenza virus, influenza A and B viruses and adenovirus. It is characterized by nasal congestion, sneezing, runny nose, cough and sore throat. The symptoms usually last for 3 to 10 days. There is no specific therapy for the cold. Treatment is symptomatic.
_________________________________________________
1 0.11753134113541602 26


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What causes high blood pressure?
Genetics, obesity, lack of exercise, poor diet (high salt), smoking, alcohol consumption, and stress.
The cause of hypertension is not known, but it is thought to be due to a combination of genetic, environmental, and lifestyle factors.
_________________________________________________
2 0.08332305127513587 34


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is the main cause of type 2 diabetes?
Insulin resistance due to obesity, sedentary lifestyle, and poor diet.
There are many factors to consider when it comes to the causes of T2DM. Some of them are genetic, while others are environmental. Genetic factors include family history, ethnicity, age and sex. Environmental factors can include diet, physical activity, smoking and alcohol consumption. There are also other factors such as stress and depression that may play a role in the development of this condition. It is important to be aware of all the possible causes so
_________________________________________________
3 0.10583071637768358 39


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How is asthma treated?
Inhalers (bronchodilators), corticosteroids, avoiding triggers, and lifestyle modifications.
A postal questionnaire was sent to a random sample of 500 general practices in the United Kingdom. The response rate was 70%. The results show that the majority of asthmatic patients are treated in general practice and that there is a wide range of treatment regimens in use. Inhaled corticosteroids are the most commonly prescribed anti-inflammatory drugs, and theophylline is the preferred bronchodilator. There is considerable variation in prescribing habits between different regions of the country. .
_________________________________________________
4 0.04598712561572094 38


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is a stroke?
A stroke occurs when there is an interruption of blood flow to the brain, leading to brain cell death.
A stroke occurs when the blood supply to a part of the brain is interrupted or reduced, causing the death of brain cells. A stroke is also called a "cerebrovascular accident" (CVA). Strokes can be classified as ischemic or hemorrhagic. Ischemic strokes are caused by a blockage in a blood vessel that carries oxygen and nutrients to the parts of your brain. The most common cause of a blocked artery is the buildup of fat, cholesterol, and other substances (plaques) on the walls of arteries. These plaques can rupture and cause a clot to form, which can block blood flow.
_________________________________________________
5 0.41698017980212143 50


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the symptoms of a heart attack?
Chest pain, shortness of breath, nausea, lightheadedness, pain in the arm, back, or jaw.
The most common symptom is chest pain or discomfort. It may feel like an intense pressure or squeezing sensation. The pain may spread to the shoulders, arms, neck, jaw, back, or stomach. Some people have shortness of breath, nausea, vomiting, sweating, dizziness, and lightheadedness.
_________________________________________________
6 0.44968032450655193 55


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is the treatment for pneumonia?
Antibiotics (for bacterial pneumonia), rest, fluids, and in some cases, oxygen therapy.
A 10-year-old boy presented to the emergency department with a 2-day history of fever, cough, and shortness of breath. He had been seen by his pediatrician the day before, who had prescribed amoxicillin/clavulanate for a presumed viral upper respiratory tract infection. However, the patient’s symptoms had worsened, prompting his parents to seek medical attention.
_________________________________________________
7 0.03570660525289104 29


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the causes of back pain?
Muscle strain, poor posture, herniated discs, osteoarthritis, and spinal stenosis.
BACKGROUND
_________________________________________________
8 0.0 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How is tuberculosis diagnosed?
Chest X-ray, sputum culture, TB skin test, or blood tests.
BACKGROUND
_________________________________________________
9 0.0 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is the treatment for hypothyroidism?
Synthetic thyroid hormone replacement (levothyroxine).
A 50-year-old woman with a history of Hashimoto's thyroiditis presents to her primary care physician (PCP) with symptoms of fatigue, cold intolerance, weight gain, constipation, and dry skin. She has been taking levothyroxine (L-T4) for the past 10 years. Her thyrotropin (thyroid-stimulating hormone; TSH) level is 0.01 mIU/L (normal range, 2.5 to 4.2), and her free T4 (FT4; measured by immunoassay) and total T3 (TT3) levels are within the normal range. The PCP is concerned that the patient may be overtreated and recommends that she stop taking the medication.
_________________________________________________
10 0.06688695920793561 41


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the risks of smoking?
Lung cancer, heart disease, stroke, respiratory infections, chronic obstructive pulmonary disease (COPD).
Smoking is the leading cause of preventable disease and premature death in the United States. Smoking cigarettes, cigars, or pipes increases the risk of heart disease, stroke, lung cancer, chronic obstructive pulmonary disease (COPD), emphysema, and other diseases.
_________________________________________________
11 0.36148231959906235 68


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is the function of the liver?
Detoxification, protein synthesis, production of bile, metabolism of fats, carbohydrates, and proteins.
In the 1950s, the physiologist Hans Selye coined the term ‘‘stress’’ to describe the body’s response to any demand placed upon it, whether it be physical, emotional, or chemical in nature. The term has since been applied to a wide variety of situations, ranging from the mundane to the life-threatening. In this issue of HEPATOLOGY, Smyrniotis et al. (1) report the results of a study that examines the effects of acute stress on hepatic glucose production (HGP) in healthy volunteers. They found that the administration of cortisol, a stress hormone, increased HGP in a dose-dependent manner, and that this effect was mediated by the gluconeogenic enzyme phosphoenol
_________________________________________________
12 0.11073601963781918 32


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the signs of dehydration?
Dry mouth, fatigue, dark-colored urine, dizziness, confusion.
- How do I know if I'm drinking enough water?.
_________________________________________________
13 0.0 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What causes Alzheimer’s disease?
The exact cause is unknown, but factors include genetics, aging, and environmental influences.
Amyloid plaques and neurofibrillary tangles (NFTs) are the two main pathological hallmarks of the brain in patients with AD   Neuropathological alterations in alzheimers disease, Serrano-Pozo  . The accumulation of Aβ peptides in the extracellular space and the presence of hyperphosphorylated tau proteins in neurons and dystrophic neurites are considered to be the main causes of neuronal dysfunction and cell death in AD. However, it is still unclear how these two pathologies are related to each other and how they contribute to the progression of cognitive impairment and dementia. In this review, we summarize the current knowledge on the molecular mechanisms underlying the pathogenesis of AD, with a particular focus on recent advances in our understanding
_________________________________________________
14 0.15537304494951223 30


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How do you treat high cholesterol?
Statins, lifestyle changes (healthy diet, exercise), and sometimes other lipid-lowering medications.

_________________________________________________
15 0.0 0


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What is a common cause of sudden weight loss?
Hyperthyroidism, diabetes, cancer, gastrointestinal issues, and malnutrition.
Answers on p 108. 
_________________________________________________
16 0.0 39


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


How do you treat an allergic reaction?
Antihistamines, corticosteroids, and in severe cases, epinephrine.
A 10-year-old boy presented to the emergency department (ED) with a 2-day history of a pruritic rash on his face, neck, and upper chest. He had no fever, cough, shortness of breath, wheezing, or other respiratory symptoms. His mother reported that he had eaten a peanut butter and jelly sandwich for lunch at school the day before he developed the skin eruption. She denied any recent travel or sick contacts.
_________________________________________________
17 0.04183532815857976 33


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What are the symptoms of hepatitis?
Fatigue, jaundice, abdominal pain, loss of appetite, nausea, and dark urine.
Healthy liver cells make bile, a substance that helps the body absorb fats and fat-soluble vitamins from food. Bile is stored in the gallbladder and released into the small intestine when food is eaten.
_________________________________________________
18 0.04604360647426783 24


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


What causes rheumatoid arthritis?
Autoimmune disorder where the immune system attacks the joints.
The aetiology of RA is unknown, but it is likely to be multifactorial, with genetic, environmental and immunological factors all playing a role.
_________________________________________________
19 0.06951446975004914 29


KeyboardInterrupt: 

In [34]:
question = """
What is the most common treatment for a headache?

A) Antihistamines
B) Over-the-counter pain relievers (such as ibuprofen or acetaminophen)
C) Antibiotics
D) Corticosteroids

Please select the correct option (A, B, C, or D):
"""

print(generate_response(question))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=20) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



You are a medical assistant, and will be asked to answer questions. 
The questions come in multiple choice format. 
Please respond with your answer choice, as a letter, and that only. For example, for a question, respond with "A" or "B" or "C" or "D".
If you don't know, say "E".

Question: 
What is the most common treatment for a headache?

A) Antihistamines  
B) Over-the-counter pain relievers (such as ibuprofen or acetaminophen)  
C) Antibiotics  
D) Corticosteroids  

Please select the correct option (A, B, C, or D):
(1 point)
Correct Answer: B
Explanation:
Anti-infl
