<a href="https://colab.research.google.com/github/Yujia1104/fe/blob/main/NLP_Assignment2_223040118.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Task 1: LLMs as a knowledgeable doctor**

The pharmacist licensure exam is a cornerstone in the pharmacy profession, ensuring that candidates possess the requisite knowledge and skills for safe and effective practice. Its significance lies not only in validating credentials but also in safeguarding public health, enabling professional recognition, and ensuring adherence to legal and regulatory standards.

Advanced models like ChatGPT have significant potential in exam preparation, boasting an extensive knowledge base and the capability to provide in-depth explanations and clarify complex concepts. However, despite the prowess of such large models, if prompts are not designed appropriately, the information retrieved might be inaccurate or incomplete, potentially hindering success in the pharmacist exam.

In [None]:
!pip install langchain
!pip install langchain-openai
!pip install langchain-deepseek
!pip install retrying
!pip install langchain_core
!pip install tqdm
!pip install jsonlines

In [None]:
import os
import time
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_deepseek import ChatDeepSeek
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import random
import json
from retrying import retry
import requests

os.environ["DEEPSEEK_API_KEY"] = "sk-d5afb39115974c6f840eaea469828fda"          #deepseek
os.environ["DEEPSEEK_BASE_URL"] = "https://api.deepseek.com/v1"

deepseek_chat = ChatDeepSeek(model="deepseek-chat", temperature=1)

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an AI assistant, please answer user's question."),
        ("user", "{input}")
    ]
)

model = ChatDeepSeek(model="deepseek-chat")

In [None]:
chain = prompt | model | StrOutputParser()

response = chain.invoke({"input": "Hello"})



---



In [None]:
!wget https://NLP-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json

--2025-03-25 13:06:08--  https://nlp-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json
Resolving nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)... 185.199.108.153, 185.199.111.153, 185.199.110.153, ...
Connecting to nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 86227 (84K) [application/json]
Saving to: ‘1.exam.json.1’


2025-03-25 13:06:09 (834 KB/s) - ‘1.exam.json.1’ saved [86227/86227]



In [None]:
import json
with open('1.exam.json') as f:
  data = json.load(f)

In [None]:
data[0]

{'question': '27. 根据国家药品监督管理局，公安部，国家卫⽣健康委员会的有关规定，⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品，精神药品或者药品类易制毒化学品的复⽅制剂列⼊（）。',
 'option': {'A': '含⿇醉药品复⽅制剂的管理',
  'B': '第⼆类精神药品管理',
  'C': '第⼀类精神药品管理',
  'D': '医疗⽤毒性药品管理',
  'E': ''},
 'analysis': '⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品、精神药品或药品类易制毒化学品的复⽅制剂列⼊第⼆类精神药品管理。',
 'answer': 'B',
 'question_type': '最佳选择题',
 'source': '2021年执业药师职业资格考试《药事管理与法规》'}

In [None]:
your_prompt = """请回答下面的多选题，请直接正确答案选项，不要输出其他内容。
{question}
{options}"""

def get_query(da):
  da['options'] = '\n'.join([f"{k}:{v}" for k,v in da['option'].items()])
  return your_prompt.format_map(da)

get_query(data[0])

'请回答下面的多选题，请直接正确答案选项，不要输出其他内容。\n27. 根据国家药品监督管理局，公安部，国家卫⽣健康委员会的有关规定，⼜服固体制剂每剂量单位含羟考酮碱不超过5毫克，且不含其他⿇醉药品，精神药品或者药品类易制毒化学品的复⽅制剂列⼊（）。\nA:含⿇醉药品复⽅制剂的管理\nB:第⼆类精神药品管理\nC:第⼀类精神药品管理\nD:医疗⽤毒性药品管理\nE:'

In [None]:
chain.invoke(get_query(data[0]))

'B:第⼆类精神药品管理'

#Default

In [None]:
# 计算做题准确率
your_prompt = """Please answer the multiple choice questions below, please direct the correct answer option and do not output anything else.
{question}
{options}"""

import re
from tqdm import tqdm

def get_ans(ans):
    match = re.findall(r'.*?([A-E]+(?:[、, ]+[A-E]+)*)', ans)
    if match:
        last_match = match[-1]
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ''

correct_num = 0
total_num = 0
for da in tqdm(data[:100]):
  da['deepseek_ans'] =  chain.invoke(get_query(da))
  if get_ans(da['deepseek_ans']) == da['answer']:
    correct_num += 1
  total_num += 1
print(f"模型准确率: {correct_num/total_num:.2%}")

100%|██████████| 100/100 [08:35<00:00,  5.15s/it]

模型准确率: 78.00%





In [None]:
# few_shot_prompt
your_prompt = """Here are some examples of how to answer professional multiple-choice questions:

Example 1:
Question: 根据以下规定，哪项属于第一类精神药品管理？
Options: A. 氯胺酮 B. 哌替啶 C. 吗啡 D. 芬太尼
Answer: C

Example 2:
Question: 以下关于执业药师的职责描述，哪一项是正确的？
Options: A. 只能在药店工作 B. 仅限处方药指导 C. 提供用药咨询 D. 不可进行药品调配
Answer: C

Now, please answer the following multiple-choice question:
Question: {question}
Options:
{options}

Please answer using the format: Answer: [OPTION LETTER]
"""


import re
from tqdm import tqdm

def get_ans(ans):
    match = re.findall(r'.*?([A-E]+(?:[、, ]+[A-E]+)*)', ans)
    if match:
        last_match = match[-1]
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ''

correct_num = 0
total_num = 0
for da in tqdm(data[:100]):
  da['deepseek_ans'] =  chain.invoke(get_query(da))
  if get_ans(da['deepseek_ans']) == da['answer']:
    correct_num += 1
  total_num += 1
print(f"模型准确率: {correct_num/total_num:.2%}")

100%|██████████| 100/100 [08:40<00:00,  5.20s/it]

模型准确率: 80.00%





In [None]:
# chain_of_thought_prompt
your_prompt = """Let’s break down the following medical/pharmaceutical regulatory question step by step.

Question: {question}
Options:
{options}

Step 1: Identify the key regulatory terms or numerical thresholds in the question.
Step 2: Recall the relevant classification rules or policies (e.g., regarding drug control categories).
Step 3: Eliminate incorrect options based on mismatch with the criteria.
Step 4: Choose the most accurate answer based on the remaining valid choice.

Provide your reasoning first, then give the final answer as: Answer: [OPTION LETTER]
"""


import re
from tqdm import tqdm

def get_ans(ans):
    # First, look for multiple answers or formatted choices
    match = re.findall(r'[A-E]+(?:[、, ]+[A-E]+)*', ans)
    if match:
        last_match = match[-1]
        # Clean up the extracted answer, remove any unwanted characters
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ans.strip()

correct_num = 0
total_num = 0
for da in tqdm(data[:100]):
  da['deepseek_ans'] =  chain.invoke(get_query(da))
  if get_ans(da['deepseek_ans']) == da['answer']:
    correct_num += 1
  total_num += 1
print(f"模型准确率: {correct_num/total_num:.2%}")

100%|██████████| 100/100 [39:31<00:00, 23.72s/it]

模型准确率: 87.00%





In [None]:
# self_consistency_prompt
your_prompt = """You are answering a multiple-choice regulatory question. You will approach it from three different reasoning paths, then decide which answer appears most consistently.

Question: {question}
Options:
{options}

Answer the question three times using varied reasoning or assumptions. Then, identify the most frequently occurring answer.

Output only the final consistent answer in the format: Answer: [OPTION LETTER]
"""



import re
from tqdm import tqdm

def get_ans(ans):
    match = re.findall(r'.*?([A-E]+(?:[、, ]+[A-E]+)*)', ans)
    if match:
        last_match = match[-1]
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ''

correct_num = 0
total_num = 0
for da in tqdm(data[:100]):
  da['deepseek_ans'] =  chain.invoke(get_query(da))
  if get_ans(da['deepseek_ans']) == da['answer']:
    correct_num += 1
  total_num += 1
print(f"模型准确率: {correct_num/total_num:.2%}")

100%|██████████| 100/100 [31:38<00:00, 18.98s/it]

模型准确率: 81.00%





<font color="blue">You need to optimize the prompt to improve the performance (accuracy) of large language models (LLMs).</font>

### Agent-based

In [22]:
import os
import json
import re
from tqdm import tqdm
from langchain_deepseek import ChatDeepSeek
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 环境配置
os.environ["DEEPSEEK_API_KEY"] = "sk-d5afb39115974c6f840eaea469828fda"  # deepseek
os.environ["DEEPSEEK_BASE_URL"] = "https://api.deepseek.com/v1"

# 初始化模型和提示模板
# 确保传入模型参数
deepseek_chat = ChatDeepSeek(model="deepseek-chat", model_kwargs={"temperature": 1.0})  # 确保提供模型名称

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an AI assistant, please answer user's question."),
        ("user", "{input}")
    ]
)

# 设置模型处理链
model = deepseek_chat
chain = prompt | model | StrOutputParser()

# 下载题目数据
!wget https://NLP-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json

# 加载数据
with open('1.exam.json') as f:
    data = json.load(f)

# 定义模板格式
your_prompt = """请回答下面的多选题，请直接正确答案选项，不要输出其他内容。
{question}
{options}"""

def get_query(da):
    da['options'] = '\n'.join([f"{k}:{v}" for k,v in da['option'].items()])
    return your_prompt.format_map(da)

# 提取答案函数
def get_ans(ans):
    match = re.findall(r'.*?([A-E]+(?:[、, ]+[A-E]+)*)', ans)
    if match:
        last_match = match[-1]
        return ''.join(re.split(r'[、, ，]+', last_match))
    return ''

class QuestionAnswerAgent:
    def __init__(self, model_chain):
        self.model_chain = model_chain
        self.correct_num = 0
        self.total_num = 0

    def ask_question(self, question_data):
        query = get_query(question_data)
        return self.model_chain.invoke(query)

    def evaluate_answer(self, predicted_answer, correct_answer):
        if get_ans(predicted_answer) == correct_answer:
            self.correct_num += 1
        self.total_num += 1

    def get_accuracy(self):
        return self.correct_num / self.total_num if self.total_num > 0 else 0

# 创建并使用 Agent
agent = QuestionAnswerAgent(model_chain=chain)

# 评估模型准确率
for da in tqdm(data[:100]):
    deepseek_answer = agent.ask_question(da)
    agent.evaluate_answer(deepseek_answer, da['answer'])

# 输出模型准确率
print(f"模型准确率: {agent.get_accuracy():.2%}")

  if (await self.run_code(code, result,  async_=asy)):


--2025-03-25 15:01:40--  https://nlp-course-cuhksz.github.io/Assignments/Assignment1/task1/data/1.exam.json
Resolving nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to nlp-course-cuhksz.github.io (nlp-course-cuhksz.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 86227 (84K) [application/json]
Saving to: ‘1.exam.json.5’


2025-03-25 15:01:40 (847 KB/s) - ‘1.exam.json.5’ saved [86227/86227]



100%|██████████| 100/100 [08:10<00:00,  4.91s/it]

模型准确率: 71.00%



