## MediPal -- LLM test and select

### In this section, I set three tasks to test three small LLMs' performance, so that I can understand them and design the application properly.
##### Tasks:
1. Reasoning
2. Generate questions based on content
3. Solve Yes or No question
4. Structured ouput

##### Target LLMs:
1. meta-llama/Llama-3.2-1B-Instruct 
2. meta-llama/Meta-Llama-3-8B-Instruct
3. ContactDoctor/Bio-Medical-Llama-3-8B

##### Why test Bio-Medical-Llama-3-8B？

Bio-Medical-Llama-3-8B model is a specialized large language model designed for biomedical applications. It is finetuned from the meta-llama/Meta-Llama-3-8B-Instruct model using a custom dataset containing over 500,000 diverse entries. These entries include a mix of synthetic and manually curated data, ensuring high quality and broad coverage of biomedical topics.

The model is trained to understand and generate text related to various biomedical fields, making it a valuable tool for researchers, clinicians, and other professionals in the biomedical domain.

In [None]:
import torch
import os
from dotenv import load_dotenv
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

In [4]:
def best_dtype():
    """Return the best dtype for the device"""
    if torch.cuda.is_available():
        if torch.cuda.is_bf16_supported():
            return torch.bfloat16
        else:
            return torch.float16
        
    return torch.float32

def best_device():
    """Return the device type"""
    return "cuda" if torch.cuda.is_available() else "cpu"

def login_huggingface():
    """Login HaggingFace"""
    load_dotenv()
    login(os.getenv("HUGGINGFACE_KEY"))
    print("Login HuggingFace!")  

##### Task1: Reasoning
##### Prompts:

In [None]:
prompt_last_letters_basic = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise but brief problem-solver. "
     "Explain your reasoning in a few short steps labeled 'Step 1:', 'Step 2:', 'Step 3:' etc., "
     "then end with a single line 'Result: <value>'. Avoid long explanations."),
    ("human",
     "Given three English words: {w1}, {w2}, {w3}.\n"
     "Task:\n"
     "- For each word, ignore trailing spaces/punctuation and find the last alphabetic letter (A–Z/a–z).\n"
     "- Concatenate these three letters in the SAME order (word1 → word2 → word3).\n"
     "Output:\n"
     "- Write step-by-step text (Step 1/2/3...).\n"
     "- End with one final line: Result: <concatenated_string>.")
])

prompt_simple_math = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise but brief math tutor. "
     "Explain in short steps labeled 'Step 1:', 'Step 2:', ... "
     "then finish with a single line 'Result: <number>'."),
    ("human",
     "John has 5 apples. He gives 2 apples to his father. "
     "How many apples does John have left?\n"
     "Solve step by step and end with: Result: <number>.")
])

##### Task2: Generate questions based on content
##### Prompts:

In [None]:
prompt_generate_questions = ChatPromptTemplate.from_messages([
    ("system",
     "You are a cautious medical student who generates clinically relevant, self-contained questions. "
     "Ground every question strictly in the provided document; do not invent details or rely on outside knowledge."),
    ("user",
     "Instructions:\n"
     "1) From the document below, write exactly 3 unique QUESTIONS in English only.\n"
     "2) Cover different medical perspectives (aim for breadth), such as: symptoms/signs; diagnosis/differential; "
     "investigations/labs/imaging; treatment/procedures; medications (dose, interactions, adverse effects); "
     "contraindications/precautions; risk factors/prognosis; prevention/patient counseling; special populations "
     "(e.g., pregnancy, breastfeeding, pediatrics, geriatrics).\n"
     "3) Each question must be concise (≤ 25 words), self-contained (avoid pronouns like 'it/this/that'), and "
     "directly supported by the document.\n"
     "4) Do NOT provide any answers or explanations.\n"
     "5) Return ONLY a JSON list that matches exactly this schema: [\"question1\", \"question2\", ..., \"question3\"]\n"
     "6) Enclose the JSON list between <json> and </json> tags.\n"
     "7) If the document does not support 3 distinct perspectives, still produce 3 questions but avoid near-duplicates; "
     "prefer covering as many perspectives as possible."),
    ("user", "Document:\n{doc}\n")
])

contents = [
    "phenylephrine is used to relieve nasal discomfort caused by colds, allergies, and hay fever. it is also used to relieve sinus congestion and pressure. phenylephrine will relieve symptoms but will not treat the cause of the symptoms or speed recovery. phenylephrine is in a class of medications called nasal decongestants. it works by reducing swelling of the blood vessels in the nasal passages.about Phenylephrine",
    "Sinner's physical struggles appeared to begin late in the second set, and he rushed to place ice towels around his neck at the changeover before the start of the third. In the decider, the Italian was limping between points and frequently massaged his right thigh. At the 2-1 changeover, he didn't sit and instead put his legs up on his bench to try and ward off cramp.",
    "Artificial intelligence (AI) investing is still the dominant theme in the stock market. This checks out, as it's where a massive amount of capital is getting invested to build out computing infrastructure and train models. Although it seems like AI has been the prevailing market theme for some time, there are multiple indications that this will persist for several more years, making AI a great place to invest today."
]

##### Task 3 and 4: Solve Yes or No question
##### Prompts:

In [None]:
prompt_self_contained = PromptTemplate(
        template="""You are a grader for a question.
    You must decide whether the question is self-contained—meaning that it is clear, meaningful, and understandable on its own, without any conversation history or external context.
    Here is the user's question: {question} \n
    Return a binary judgment as a JSON object with a single key "score".
    Respond only with {{"score": "yes"}} if the question is self-contained,
    or {{"score": "no"}} if it is not. Do not include any explanation or extra text.""",
        input_variables=["question"],
)

self_contained_questions = [
        "What is the tallest mountain in South America?",    
        "Can you think about it?"    
        "Can you explain how blockchain technology works?",    
        "Do you have a medicine to relieve sinus congestion and pressure?",        
        "How can I take it?"  
]

In [None]:
prompt_related_question = PromptTemplate(
        template="""You are a conversation coherence grader.
        Your task is to decide whether the user's latest message is logically and topically connected to the previous conversation.

        Conversation history:
        {document}

        User's latest message:
        {question}

        Return only a JSON object with a single key "score":
        - {{"score": "yes"}} if the latest message is coherent and contextually related to the conversation history.
        - {{"score": "no"}} if it is not related or breaks the context.

        No explanation or extra text.""",
        input_variables=["document", "question"],
    )  

documents = ["""AI: phenylephrine comes as a tablet, a liquid, or a dissolving strip to take by mouth. it is usually taken every 4 hours as needed.\n 
             HUMAN: What form does Phenylephrine come? """,
             """Sinner's physical struggles appeared to begin late in the second set, and he rushed to place ice towels around his neck at the changeover"""]  

related_questions = ["how can I take it?",
             "what did happen to Sinner?"]

##### First of all, Testing meta-llama/Llama-3.2-1B-Instruct

In [None]:
model_id = "meta-llama/Llama-3.2-1B-Instruct"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

In [None]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Llama_1b = ChatHuggingFace(llm=hug_pipe)

In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Llama_1b
result = reason_letters_chain.invoke({})
print(result)

In [None]:
reason_math_chain = prompt_simple_math | Llama_1b
result = reason_math_chain.invoke({})
print(result)

In [None]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Llama_1b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

In [None]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Llama_1b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q})
    print(result)

In [None]:
related_chain = prompt_related_question | Llama_1b
for q, d in zip(related_questions, documents):
    result  = related_chain.invoke({"question": q, "document": d})

##### Testing meta-llama/Meta-Llama-3-8B-Instruct

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

In [None]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Llama_8b = ChatHuggingFace(llm=hug_pipe)

In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Llama_8b
result = reason_letters_chain.invoke({})
print(result)

In [None]:
reason_math_chain = prompt_simple_math | Llama_8b
result = reason_math_chain.invoke({})
print(result)

In [None]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Llama_8b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

In [None]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Llama_8b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q})
    print(result)

In [None]:
related_chain = prompt_related_question | Llama_8b
for q, d in zip(related_questions, documents):
    result  = related_chain.invoke({"question": q, "document": d})

##### Testing ContactDoctor/Bio-Medical-Llama-3-8B

In [None]:
model_id = "ContactDoctor/Bio-Medical-Llama-3-8B"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

In [None]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Medical_Llama_8b = ChatHuggingFace(llm=hug_pipe)

In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Medical_Llama_8b
result = reason_letters_chain.invoke({})
print(result)

In [None]:
reason_math_chain = prompt_simple_math | Medical_Llama_8b
result = reason_math_chain.invoke({})
print(result)

In [None]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Medical_Llama_8b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

In [None]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Medical_Llama_8b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q})
    print(result)

In [None]:
related_chain = prompt_related_question | Medical_Llama_8b
for q, d in zip(related_questions, documents):
    result  = related_chain.invoke({"question": q, "document": d})