## MediPal -- LLM test and select

### In this section, I set three tasks to test three small LLMs' performance, so that I can understand them and design the application properly.
##### Tasks:
1. Reasoning
2. Generate questions based on content
3. Solve Yes or No question
4. Structured ouput

##### Target LLMs:
1. meta-llama/Llama-3.2-1B-Instruct 
2. meta-llama/Meta-Llama-3-8B-Instruct
3. ContactDoctor/Bio-Medical-Llama-3-8B

##### Why test Bio-Medical-Llama-3-8B？

Bio-Medical-Llama-3-8B model is a specialized large language model designed for biomedical applications. It is finetuned from the meta-llama/Meta-Llama-3-8B-Instruct model using a custom dataset containing over 500,000 diverse entries. These entries include a mix of synthetic and manually curated data, ensuring high quality and broad coverage of biomedical topics.

The model is trained to understand and generate text related to various biomedical fields, making it a valuable tool for researchers, clinicians, and other professionals in the biomedical domain.

In [None]:
import torch
import os
from dotenv import load_dotenv
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate


In [2]:
def best_dtype():
    """Return the best dtype for the device"""
    if torch.cuda.is_available():
        if torch.cuda.is_bf16_supported():
            return torch.bfloat16
        else:
            return torch.float16
        
    return torch.float32

def best_device():
    """Return the device type"""
    return "cuda" if torch.cuda.is_available() else "cpu"

def login_huggingface():
    """Login HaggingFace"""
    load_dotenv()
    login(os.getenv("HUGGINGFACE_KEY"))
    print("Login HuggingFace!")  

In [3]:
login_huggingface()

Login HuggingFace!


##### Task1: Reasoning
##### Prompts:

In [4]:
prompt_last_letters_basic = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise but brief problem-solver. "
     "Explain your reasoning in a few short steps labeled 'Step 1:', 'Step 2:', 'Step 3:' etc., "
     "then end with a single line 'Result: <value>'. Avoid long explanations."),
    ("human",
     "Given three English words: {w1}, {w2}, {w3}.\n"
     "Task:\n"
     "- For each word, ignore trailing spaces/punctuation and find the last alphabetic letter (A–Z/a–z).\n"
     "- Concatenate these three letters in the SAME order (word1 → word2 → word3).\n"
     "- For example, you got 'how', 'are', 'you', each word's last letter are 'w', 'e', 'u'. Output: 'weu'  .\n"
     "Output:\n"
     "- Write step-by-step text (Step 1/2/3...).\n"
     "- End with one final line: Result: <concatenated_string>.")
])

prompt_simple_math = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise but brief math tutor. "
     "Explain in short steps labeled 'Step 1:', 'Step 2:', ... "
     "then finish with a single line 'Result: <number>'."),
    ("human",
     "John has 5 apples. His father has 8 apples. He gives 2 apples to his father. "
     "How many apples does his father have?\n"
     "Solve step by step and end with: Result: <number>.")
])

prompt_moderate_math = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise but brief math tutor. "
     "Explain in short steps labeled 'Step 1:', 'Step 2:', ... "
     "then finish with a single line 'Result: <number>'."),
    ("human",
     "John has 12 apples. His father has 9 apples. John gives 3 apples to his father. "
     "Then John's father gives one-third (1/3) of his apples to John's sister. "
     "John's sister eats 2 apples and returns the rest to John. "
     "How many apples does John have at the end?\n"
     "Solve step by step and end with: Result: <number>.")
])

##### Task2: Generate questions based on content
##### Prompts:

In [5]:
prompt_generate_questions = ChatPromptTemplate.from_messages([
    ("system",
     "You are a cautious medical student who generates clinically relevant, self-contained questions. "
     "Ground every question strictly in the provided document; do not invent details or rely on outside knowledge."),
    ("user",
     "Instructions:\n"
     "1) From the document below, write exactly 3 unique QUESTIONS in English only.\n"
     "2) Cover different medical perspectives (aim for breadth), such as: symptoms/signs; diagnosis/differential; "
     "investigations/labs/imaging; treatment/procedures; medications (dose, interactions, adverse effects); "
     "contraindications/precautions; risk factors/prognosis; prevention/patient counseling; special populations "
     "(e.g., pregnancy, breastfeeding, pediatrics, geriatrics).\n"
     "3) Each question must be concise (≤ 25 words), self-contained (avoid pronouns like 'it/this/that'), and "
     "directly supported by the document.\n"
     "4) Do NOT provide any answers or explanations.\n"
     "5) Return ONLY a JSON list that matches exactly this schema: [\"question1\", \"question2\", ..., \"question3\"]\n"
     "6) Enclose the JSON list between <json> and </json> tags.\n"
     "7) If the document does not support 3 distinct perspectives, still produce 3 questions but avoid near-duplicates; "
     "prefer covering as many perspectives as possible."),
    ("user", "Document:\n{doc}\n")
])

contents = [
    "phenylephrine is used to relieve nasal discomfort caused by colds, allergies, and hay fever. it is also used to relieve sinus congestion and pressure. phenylephrine will relieve symptoms but will not treat the cause of the symptoms or speed recovery. phenylephrine is in a class of medications called nasal decongestants. it works by reducing swelling of the blood vessels in the nasal passages.about Phenylephrine",
    "Sinner's physical struggles appeared to begin late in the second set, and he rushed to place ice towels around his neck at the changeover before the start of the third. In the decider, the Italian was limping between points and frequently massaged his right thigh. At the 2-1 changeover, he didn't sit and instead put his legs up on his bench to try and ward off cramp.",
    "Artificial intelligence (AI) investing is still the dominant theme in the stock market. This checks out, as it's where a massive amount of capital is getting invested to build out computing infrastructure and train models. Although it seems like AI has been the prevailing market theme for some time, there are multiple indications that this will persist for several more years, making AI a great place to invest today.",
    "phenylephrine comes as a tablet, a liquid, or a dissolving strip to take by mouth. it is usually taken every 4 hours as needed. follow the directions on your prescription label or the package label carefully, and ask your doctor or pharmacist to explain any part you do not understand. take phenylephrine exactly as directed. do not take more or less of it or take it more often than prescribed by your doctor or directed on the label.phenylephrine comes alone and in combination with other medications. ask your doctor or pharmacist for advice on which product is best for your symptoms. check nonprescription cough and cold product labels carefully before using two or more products at the same time. these products may contain the same active ingredient(s) and taking them together could cause you to receive an overdose. this is especially important if you will be giving cough and cold medications to a child.nonprescription cough and cold combination products, including products that contain phenylephrine, can cause serious side effects or death in young children. do not give these products to children younger than 4 years of age. if you give these products to children 4 to 11 years of age, use caution and follow the package directions carefully.if you are giving phenylephrine or a combination product that contains phenylephrine to a child, read the package label carefully to be sure that it is the right product for a child of that age. do not give phenylephrine products that are made for adults to children.before you give a phenylephrine product to a child, check the package label to find out how much medication the child should receive. give the dose that matches the child's age on the chart. ask the child's doctor if you don't know how much medication to give the child.if you are taking the liquid, do not use a household spoon to measure your dose. use the measuring spoon or cup that came with the medication or use a spoon made especially for measuring medication.if your symptoms do not get better within 7 days or if you have a fever, stop taking phenylephrine and call your doctor.if you are taking the dissolving strips, place one strip on your tongue and allow it to dissolve.about Phenylephrine",
    "if you are taking scheduled doses of aluminum hydroxide, magnesium hydroxide, take the missed dose as soon as you remember it. however, if it is almost time for the next dose, skip the missed dose and continue your regular dosing schedule. do not take a double dose to make up for a missed one.about Aluminum Hydroxide and Magnesium Hydroxide"
]

##### Task 3 and 4: Solve Yes or No question
##### Prompts:

In [6]:
prompt_self_contained = PromptTemplate(
        template="""You are a grader for a question.
    You must decide whether the question is self-contained—meaning that it is clear, meaningful, and understandable on its own, without any conversation history or external context.
    Here is the user's question: {question} \n
    Return a binary judgment as a JSON object with a single key "score".
    Respond only with {{"score": "yes"}} if the question is self-contained,
    or {{"score": "no"}} if it is not. Do not include any explanation or extra text.""",
        input_variables=["question"],
)

self_contained_questions = [
    {"question":"What is the tallest mountain in South America?", "expect":"""{"score": "yes"}"""},   
    {"question":"Can you think about it?", "expect":"""{"score": "no"}"""},   
    {"question":"Can you explain how blockchain technology works?", "expect":"""{"score": "yes"}"""},  
    {"question":"Do you have a medicine to relieve sinus congestion and pressure?", "expect":"""{"score": "yes"}"""},        
    {"question":"How can I take it?", "expect":"""{"score": "no"}"""},          
    {"question": "Who discovered penicillin?", "expect": """{"score": "yes"}"""},  
    {"question": "Can you tell me more about it?", "expect": """{"score": "no"}"""},  
    {"question": "What are the side effects of ibuprofen?", "expect": """{"score": "yes"}"""},  
    {"question": "Why do you think so?", "expect": """{"score": "no"}"""},  
    {"question": "What is the capital city of Japan?", "expect": """{"score": "yes"}"""},  
]

In [7]:
prompt_related_question = PromptTemplate(
        template="""You are a conversation coherence grader.
        Your task is to decide whether the user's latest message is logically and topically connected to the previous conversation.

        Conversation history:
        {document}

        User's latest message:
        {question}

        Return only a JSON object with a single key "score":
        - {{"score": "yes"}} if the latest message is coherent and contextually related to the conversation history.
        - {{"score": "no"}} if it is not related or breaks the context.

        No explanation or extra text.""",
        input_variables=["document", "question"],
    )  

documents = [
    """AI: phenylephrine comes as a tablet, a liquid, or a dissolving strip to take by mouth. it is usually taken every 4 hours as needed.\n 
             HUMAN: What form does Phenylephrine come? """,

    """Sinner's physical struggles appeared to begin late in the second set, and he rushed to place ice towels around his neck at the changeover""",
             
    """AI: amoxicillin is available as a capsule, tablet, chewable tablet, and liquid suspension to take by mouth.\n 
       HUMAN: What forms does amoxicillin come in?""",

    """AI: metformin is usually taken once or twice daily with meals to reduce stomach upset.\n 
       HUMAN: When should I take metformin?""",

    """AI: ibuprofen works by reducing hormones that cause inflammation and pain in the body.\n 
       HUMAN: How does ibuprofen work?""",

    """AI: loratadine may cause headache, dry mouth, or drowsiness in some people.\n 
       HUMAN: What are common side effects of loratadine?""",

    """AI: acetaminophen is generally used to relieve mild to moderate pain and reduce fever.\n 
       HUMAN: What is acetaminophen used for?""",

    """AI: azithromycin is typically taken once daily for 3 to 5 days, depending on the infection type.\n 
       HUMAN: How long do I need to take azithromycin?""",

    """AI: omeprazole should be taken before meals, usually in the morning, to reduce stomach acid production.\n 
       HUMAN: When is the best time to take omeprazole?""",

    """AI: insulin helps lower blood sugar by allowing glucose to enter cells and be used for energy.\n 
       HUMAN: What does insulin do in the body?"""

]  

related_questions = [
    "how can I take it?",
    "what did happen to John?",
    "how can I take it?",
    "how can I take azithromycin?",
    "how can I take it?",
    "how can I take it?",
    "how can I take azithromycin?",
    "how can I take it?",
    "how can I take it?",
    "how can I take it?",
    ]

expecting = [
    """{"score": "yes"}""", 
    """{"score": "no"}""",
    """{"score": "yes"}""", 
    """{"score": "no"}""", 
    """{"score": "yes"}""", 
    """{"score": "yes"}""", 
    """{"score": "no"}""", 
    """{"score": "yes"}""", 
    """{"score": "yes"}""", 
    """{"score": "yes"}""", 
    ]

In [8]:
# Pure structure output
prompt_structured_output = ChatPromptTemplate.from_messages([
    ("system",
     "You are a precise assistant. "
     "Read the user's request and produce your answer as structured JSON. "
     "Do not include extra explanations or text outside the JSON."),
    ("human",
     "Extract the following information from this sentence:\n"
     "'Alice bought 3 apples for $5 on Monday.'\n\n"
     "Return the result in this JSON format:\n"
     "{{\n"
     '  "person": "",\n'
     '  "item": "",\n'
     '  "quantity": ,\n'
     '  "price": "",\n'
     '  "day": ""\n'
     "}}")
])

##### First of all, Testing meta-llama/Llama-3.2-1B-Instruct

In [9]:
model_id = "meta-llama/Llama-3.2-1B-Instruct"

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

Load meta-llama/Llama-3.2-1B-Instruct done!


In [11]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Llama_1b = ChatHuggingFace(llm=hug_pipe)

Device set to use cuda


In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Llama_1b
result = reason_letters_chain.invoke({"w1":"good", "w2":"morning", "w3": "Medipal"})
print("Expecting: dgl\n")
print(result)
# Expecting: 'dgl'
# Result is 'DGGL'
# It is ok!

Expecting: dgl

content="Step 1: \nRemove trailing spaces from 'good' to get 'good'.\nStep 2: \nFind last alphabetic letter in 'good' which is 'D'.\nStep 3: \nFind last alphabetic letter in'morning' which is 'G'.\nStep 4: \nFind last alphabetic letter in 'Medipal' which is 'L'.\nStep 5: \nConcatenate these three letters in the same order to get 'DGGL'." additional_kwargs={} response_metadata={} id='run--c224fad4-a593-4f4a-8632-e769d2fa7a7b-0'


In [None]:
# simple math
reason_math_chain = prompt_simple_math | Llama_1b
result = reason_math_chain.invoke({})
print(result)

# Expecting: 10
# Result is 3
# It is bad!

content='Step 1: Start with the initial number of apples John has.\nJohn has 5 apples.\n\nStep 2: Calculate the number of apples John gives to his father.\nJohn gives 2 apples to his father.\n\nStep 3: Subtract the number of apples John gave to his father from the initial number of apples John has.\n5 - 2 = 3\n\nResult: 3' additional_kwargs={} response_metadata={} id='run--f64dffbd-9538-4ed5-b943-2a46489568cb-0'


In [None]:
# a little harder math
reason_math_moderate_chain = prompt_moderate_math | Llama_1b
result = reason_math_chain.invoke({})
print(result)
# Expecting: 11
# Result is 3
# It is bad!

content='Step 1: Find the initial number of apples John has. \nJohn has 5 apples.\n\nStep 2: Find the number of apples John gave to his father. \nJohn gave 2 apples to his father.\n\nStep 3: Subtract the number of apples given from the initial number of apples to find the number of apples his father has.\n5 - 2 = 3\n\nResult: 3' additional_kwargs={} response_metadata={} id='run--0bf10b0a-741b-4a43-8652-12fc9f09179b-0'


In [None]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Llama_1b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

# The generated questions are no

content='Here are three questions based on the provided information:\n\n<json>\n"question1", "question2", "question3"\n"Is phenylephrine used to treat colds, allergies, and hay fever?", "Does phenylephrine relieve sinus congestion and pressure?", "Is phenylephrine a class of medications called nasal decongestants?"\n</json>\n\n"question4", "question5", "question6"\n"Does phenylephrine relieve nasal discomfort caused by colds, allergies, and hay fever?", "Does phenylephrine reduce swelling of the blood vessels in the nasal passages?", "Is phenylephrine used to relieve sinus pressure?"\n</json>\n\n"question7", "question8", "question9"\n"Is phenylephrine used to relieve nasal discomfort caused by colds, allergies, and hay fever?", "Is phenylephrine used to relieve sinus congestion?", "Is phenylephrine used to relieve nasal pressure?"\n</json>' additional_kwargs={} response_metadata={} id='run--7c33200c-72df-45cd-b410-e4cc40cbaf49-0'
content='<json>\n    "question1": "What is the likely di

In [18]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Llama_1b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q["question"]})
    print(f"""Expecting:{q["expect"]}, actual result:{result}""")

Expecting:{"score": "yes"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--13fd686f-7e9c-4a88-8603-d662d8ab2ca5-0'
Expecting:{"score": "no"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--a4b889f1-632d-46ae-9b8f-db85d9291776-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--f8bce220-7728-494e-ae32-292f8a859450-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--713c188b-5aa0-4426-b72f-e99432a34ce4-0'
Expecting:{"score": "no"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--0538ac44-d930-45d8-b155-fe15c8cfe4f3-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--7fcddd22-a9dd-46be-bd2d-ca27d2d25ac5-0'
Expecting:{"score": "no"}, actual result:content

In [20]:
related_chain = prompt_related_question | Llama_1b
for q, d, e in zip(related_questions, documents, expecting):
    result  = related_chain.invoke({"question": q, "document": d})
    print(f"""Expecting:{e}, actual result: {result} """)

Expecting:{"score": "yes"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--f82fe778-b8f1-41c8-823c-307e729dfdb1-0' 
Expecting:{"score": "no"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--b1094875-b0f9-4041-8ed1-ea9fdd5e1d58-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--57d554ae-899c-4eef-8dd9-22ea6087360b-0' 
Expecting:{"score": "no"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--2e51e744-63d2-4166-9d1a-729867c4689a-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--a183cefd-6b6a-4314-8268-7f906d8c2ba8-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--63be0bf9-031d-4dc4-8110-dfb454270036-0' 
Expecting:{"score": "no"}, actual re

In [None]:
structured_output_chain = prompt_structured_output | Llama_1b
result  = structured_output_chain.invoke({})
print(f"""Result: {result} """)

# Expecting {\n  "person": "Alice ",\n  "item": "apples",\n  "quantity": 3,\n  "price": "5",\n  "day": "Monday"\n}
# It is OK, but not accuracy

Result: content='{\n  "person": "",\n  "item": "apples",\n  "quantity": 3,\n  "price": "$5",\n  "day": "Monday"\n}' additional_kwargs={} response_metadata={} id='run--45caa7c9-4848-4bfd-814e-a133f5360225-0' 


##### Testing meta-llama/Meta-Llama-3-8B-Instruct

In [26]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

In [27]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Load meta-llama/Meta-Llama-3-8B-Instruct done!


In [28]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Llama_8b = ChatHuggingFace(llm=hug_pipe)

Device set to use cuda


In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Llama_8b
result = reason_letters_chain.invoke({"w1":"good", "w2":"morning", "w3": "Medipal"})
print(result)
# Expecting: 'dgl'
# Result is 'dgl'
# It is pretty good!

content='Here are the steps to solve the problem:\n\nStep 1: Remove trailing spaces and punctuation from each word.\ngood -> good, morning -> morning, Medipal -> Medipal\n\nStep 2: Find the last alphabetic letter (A-Z/a-z) of each word.\ngood -> d, morning -> g, Medipal -> L\n\nStep 3: Concatenate these letters in the same order (word1 → word2 → word3).\nd-g-L\n\nResult: dgl' additional_kwargs={} response_metadata={} id='run--e1852036-12c9-4dda-9575-6da38c4f4b10-0'


In [None]:
reason_math_chain = prompt_simple_math | Llama_8b
result = reason_math_chain.invoke({})
print(result)
# Expecting: 10
# Result is 10
# It is pretty good!

content="Step 1: John's father initially has 8 apples.\nStep 2: John gives 2 apples to his father, so the father receives 2 more apples.\nStep 3: To find the total number of apples his father has now, add the initial 8 apples to the 2 apples received: 8 + 2 = 10\nResult: 10" additional_kwargs={} response_metadata={} id='run--cfa477ae-9516-4d08-823f-21b69ead357d-0'


In [None]:
# a little harder math
reason_math_moderate_chain = prompt_moderate_math | Llama_8b
result = reason_math_chain.invoke({})
print(result)
# Expecting: 11
# Result is 10
# The step is wrong, so result is wrong!

content="Step 1: Calculate the total number of apples John has initially.\nJohn has 5 apples.\n\nStep 2: Calculate the number of apples John's father has after receiving 2 apples from John.\nJohn's father has 8 apples initially. \nAfter receiving 2 apples from John, his father has 8 + 2 = 10 apples.\n\nResult: 10" additional_kwargs={} response_metadata={} id='run--07c4e435-a6c5-4839-bd58-24b2242423fc-0'


In [None]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Llama_8b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

# Pretty good!

content='<json>\n["What are the common uses of phenylephrine?", "Can phenylephrine treat the underlying cause of nasal discomfort?", "What is the mechanism of action of phenylephrine as a nasal decongestant?"]\n</json>' additional_kwargs={} response_metadata={} id='run--ef813a3f-c710-4f41-b9b3-7a6998f2a8aa-0'
content='<json>\n["What are the initial symptoms exhibited by Sinner during the second set?", "What is the potential cause of Sinner\'s limping and thigh massaging in the decider?", "What is Sinner\'s attempt to alleviate his cramp symptoms at the 2-1 changeover?"]\n</json>' additional_kwargs={} response_metadata={} id='run--ac0d01d6-b3ba-4bb3-bcf8-19071b901edd-0'
content='<json>\n["What are the key areas of computing infrastructure being built to support AI investing?", "What are the indications that AI will remain a prevailing market theme for several more years?", "What is the expected impact of AI investing on the stock market in the near future?"]\n</json>' additional_kwargs=

In [None]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Llama_8b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q["question"]})
    print(f"""Expecting:{q["expect"]}, actual result:{result}""")

# 10 out of 10

Expecting:{"score": "yes"}, actual result:content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--7a7660c2-e76b-4dad-96ba-1d710374f8bb-0'
Expecting:{"score": "no"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--e9816c7c-1ce0-4b22-b2f8-e9c6524cab80-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--7754d14f-57f2-473b-abf3-823c69640242-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--8a07e2aa-9402-4df4-81a9-b2bb131524d2-0'
Expecting:{"score": "no"}, actual result:content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--82701dd2-6e18-46b9-a7e5-b10e0fca5370-0'
Expecting:{"score": "yes"}, actual result:content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--f0c498ef-7abc-42c3-86e3-6ce908791831-0'
Expecting:{"score": "no"}, actual result:con

In [None]:
related_chain = prompt_related_question | Llama_8b
for q, d, e in zip(related_questions, documents, expecting):
    result  = related_chain.invoke({"question": q, "document": d})
    print(f"""Expecting:{e}, actual result: {result} """)

# 9 out of 10

Expecting:{"score": "yes"}, actual result: content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--0a6f9a37-b2ca-4804-b967-040f5a5cd4e0-0' 
Expecting:{"score": "no"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--354220e5-078e-44b3-a89d-2283b627ceef-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--cba23984-4bdc-4b44-848f-a0c77a94ef3f-0' 
Expecting:{"score": "no"}, actual result: content='{"score": "no"}' additional_kwargs={} response_metadata={} id='run--e8c968f2-6aed-466f-aca0-a61ca31ff99f-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--23d103f8-b169-4053-8fc2-3856b2fab385-0' 
Expecting:{"score": "yes"}, actual result: content='{"score": "yes"}' additional_kwargs={} response_metadata={} id='run--fd8f7d73-9123-41e9-9199-3e684d34d983-0' 
Expecting:{"score": "no"}, actua

In [None]:
structured_output_chain = prompt_structured_output | Llama_8b
result  = structured_output_chain.invoke({})
print(f"""Result: {result} """)

# Expecting {\n  "person": "Alice ",\n  "item": "apples",\n  "quantity": 3,\n  "price": "5",\n  "day": "Monday"\n}
# It is pretty good!

Result: content='{\n  "person": "Alice",\n  "item": "apples",\n  "quantity": 3,\n  "price": 5,\n  "day": "Monday"\n}' additional_kwargs={} response_metadata={} id='run--5283cdc7-1d15-44d8-8a45-a94768beefa4-0' 


##### Testing ContactDoctor/Bio-Medical-Llama-3-8B

In [10]:
model_id = "ContactDoctor/Bio-Medical-Llama-3-8B"

In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
            model_id,
            dtype = best_dtype(),
            device_map={"":best_device()}, 
            low_cpu_mem_usage=True           
        )

print(f"Load {model_id} done!")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Load ContactDoctor/Bio-Medical-Llama-3-8B done!


In [12]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    return_full_text=False,   
    )

# Wrapper normal piple with huggingfacepipeline
hug_pipe = HuggingFacePipeline(pipeline=pipe)

# Transfer the pipeline to Chat mode.
# Because this is a way we can use ChatPromptTemplate to make better prompt.
Medical_Llama_8b = ChatHuggingFace(llm=hug_pipe)

Device set to use cuda


In [None]:
# Reasoning test
reason_letters_chain = prompt_last_letters_basic | Medical_Llama_8b
result = reason_letters_chain.invoke({"w1":"good", "w2":"morning", "w3": "Medipal"})
print(result)

# Expecting: 'dgl'
# Result is 'mg'
# It is bad!

content=' Step 1: Remove trailing spaces from the input words.\nStep 2: Remove punctuation from the input words.\nStep 3: Find the last alphabetic letter of each word.\nStep 4: Concatenate the last letters of the words.\nResult: mg' additional_kwargs={} response_metadata={} id='run--b7d2e97b-fd76-459e-afb8-2e05057f4892-0'


In [None]:
reason_math_chain = prompt_simple_math | Medical_Llama_8b
result = reason_math_chain.invoke({})
print(result)
# Expecting: 10
# Result is 10
# It is pretty good!

content=" Step 1: Initial number of apples father has: 8\nStep 2: Apples John gives to father: 2\nStep 3: Subtract apples given from father's initial apples: 8 - 2 = 6\nResult: 6" additional_kwargs={} response_metadata={} id='run--53f40231-ee84-4a4e-b7a3-19884634620a-0'


In [16]:
# a little harder math
reason_math_moderate_chain = prompt_moderate_math | Medical_Llama_8b
result = reason_math_chain.invoke({})
print(result)
# Expecting: 11
# Result is 10
# The step is wrong, so result is wrong!

content=' Step 1: John has 5 apples and gives 2 apples to his father, so John has 5 - 2 = 3 apples left.\nStep 2: His father had 8 apples and received 2 apples from John, so his father now has 8 + 2 = 10 apples.\nResult: 10' additional_kwargs={} response_metadata={} id='run--23b5571c-9927-4655-a6a5-754168a5aa5f-0'


In [17]:
# Generate questions based on content
generate_question_chain = prompt_generate_questions | Medical_Llama_8b

for c in contents:
    result = generate_question_chain.invoke({"doc": c})
    print(result)

content=' <json>\n["Is phenylephrine effective in relieving nasal discomfort caused by allergic rhinitis?", "Can phenylephrine be used to treat sinus congestion and pressure?", "Does phenylephrine belong to the class of medications known as nasal decongestants?"\n</json>' additional_kwargs={} response_metadata={} id='run--d172c326-5995-4153-b08e-937131a17ef4-0'
content=' <json>\n["Why would a player use ice towels during a set break?", "What might cause a player to limp between points?", "Why might a player try to prevent cramping during a match?"\n</json>' additional_kwargs={} response_metadata={} id='run--db5681a9-30c5-476e-8064-67208208748c-0'
content=' <json>\n["What are the key drivers of the dominance of artificial intelligence investing in the stock market?", "How will the trend of AI investing in the stock market persist in the coming years?", "What are the implications of AI investing being the dominant theme in the stock market?"\n</json>' additional_kwargs={} response_metada

In [None]:
# Solve Yes or No question， structured_output
self_contained_chain = prompt_self_contained | Medical_Llama_8b
for q in self_contained_questions:
    result = self_contained_chain.invoke({"question": q["question"]})
    print(f"""Expecting:{q["expect"]}, actual result:{result}""")

# 8 out of 10

Expecting:{"score": "yes"}, actual result:content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--e54e8da6-4919-4135-8010-56fbfef7343a-0'


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Expecting:{"score": "no"}, actual result:content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--b9fb0ca1-39eb-4506-b9c3-ff344f83205a-0'
Expecting:{"score": "yes"}, actual result:content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--9d16c68e-1462-4cbf-acf4-236ee2a0500f-0'
Expecting:{"score": "yes"}, actual result:content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--96cbffff-ffc4-46b8-9c5b-754241ad009c-0'
Expecting:{"score": "no"}, actual result:content=' {"score": "no"}' additional_kwargs={} response_metadata={} id='run--57e89f08-cf5a-49a6-8849-4d5f7a60732e-0'
Expecting:{"score": "yes"}, actual result:content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--71ecbcbb-70e3-4864-9ca1-2c22f44fdc01-0'
Expecting:{"score": "no"}, actual result:content=' {"score": "no"}' additional_kwargs={} response_metadata={} id='run--eb32dc49-9576-493a-815d-84ed7c21cd80-0'
Expecting:{"score": "yes"}, actual resu

In [None]:
related_chain = prompt_related_question | Medical_Llama_8b
for q, d, e in zip(related_questions, documents, expecting):
    result  = related_chain.invoke({"question": q, "document": d})
    print(f"""Expecting:{e}, actual result: {result} """)

# 9 out of 10

Expecting:{"score": "yes"}, actual result: content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--9ae7e78c-73df-4f9e-9392-915ecd34b520-0' 
Expecting:{"score": "no"}, actual result: content=' {"score": "no"}' additional_kwargs={} response_metadata={} id='run--f2b2d779-e781-4619-9461-9e47bede35ae-0' 
Expecting:{"score": "yes"}, actual result: content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--2c2c6068-7ec9-413b-8885-e65f0c753d08-0' 
Expecting:{"score": "no"}, actual result: content=' {"score": "no"}' additional_kwargs={} response_metadata={} id='run--2e38731a-f0fd-4bbd-be1c-036b2d95cf51-0' 
Expecting:{"score": "yes"}, actual result: content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--8f9898e8-8283-471a-9ac8-88dfecad9ee2-0' 
Expecting:{"score": "yes"}, actual result: content=' {"score": "yes"}' additional_kwargs={} response_metadata={} id='run--8225a82d-b702-4892-9e08-75cc479b79fe-0' 
Expecting:{"score": "no"},

In [21]:
structured_output_chain = prompt_structured_output | Medical_Llama_8b
result  = structured_output_chain.invoke({})
print(f"""Result: {result} """)

# Expecting {\n  "person": "Alice ",\n  "item": "apples",\n  "quantity": 3,\n  "price": "5",\n  "day": "Monday"\n}
# It is pretty good!

Result: content=' {\n  "person": "Alice",\n  "item": "apples",\n  "quantity": 3,\n  "price": 5,\n  "day": "Monday"\n}' additional_kwargs={} response_metadata={} id='run--e228831b-023b-4f5f-a97a-401b93cde14a-0' 
