# Sistemas inteligentes para respostas a perguntas médicas

Gyovana M. Moriyama (216190)

Rafael A. Matumoto (273085)

## Chain-of-thought: implementation and experiments

In [None]:
!pip install -qU langchain_openai langchain-community faiss-cpu sentence-transformers openai datasets pydantic

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m387.1/387.1 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m472.7/472.7 kB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import datasets
import datetime

from google.colab import userdata, drive

from tqdm import tqdm
from pydantic import BaseModel, Field
from typing import List, Optional, Literal

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

In [None]:
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
os.environ['OPENAI_API_KEY'] = userdata.get('api_key')

### MedQA-USMLE-4-options dataset

In [None]:
data = datasets.load_dataset('GBaker/MedQA-USMLE-4-options', split='test')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/654 [00:00<?, ?B/s]

phrases_no_exclude_train.jsonl:   0%|          | 0.00/16.2M [00:00<?, ?B/s]

phrases_no_exclude_test.jsonl:   0%|          | 0.00/2.08M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/10178 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1273 [00:00<?, ? examples/s]

In [None]:
sampled_question = data[0]

### Model and prompt chain

In [None]:
class Answer(BaseModel):
    ''' Your answer to the question.
    '''
    final_answer: Literal['A', 'B', 'C', 'D'] = Field(description='Final answer to the question')
    step_by_step: Optional[str] = Field(description='Step-by-step reasoning to answer the question', default=None)

In [None]:
model = 'gpt-4o-mini'
model_temp = 0.5

In [None]:
llm = ChatOpenAI(
    model=model,
    temperature=model_temp,
)

In [None]:
cot_prompt_template = '''
Answer the question below.

Question: {question}
Options: {options}

Answer: Let's think step-by-step...
'''.strip()

In [None]:
cot_prompt = ChatPromptTemplate.from_template(cot_prompt_template)

In [None]:
chain = cot_prompt | llm.with_structured_output(Answer, include_raw=True)

In [None]:
res = chain.invoke({'question': sampled_question['question'], 'options': sampled_question['options']})

In [None]:
res

{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_s9VS2oejpIgNINEtI7UhD6Eg', 'function': {'arguments': '{"final_answer":"A","step_by_step":"1. The resident made an error during the surgery by cutting a flexor tendon, which is a significant event that should be documented. \\n2. The attending physician suggests not reporting this complication, arguing that it is minor and won\'t harm the patient. However, ethical medical practice dictates that all complications, regardless of perceived severity, should be disclosed to the patient and documented in the operative report. \\n3. Option A (Disclose the error to the patient and put it in the operative report) is the most ethical choice, as it prioritizes transparency and patient safety. \\n4. Option B (Tell the attending that he cannot fail to disclose this mistake) may create conflict but does not address the need for disclosure directly to the patient. \\n5. Option C (Report the physician to the ethics committee) 

In [None]:
res['parsed']

Answer(final_answer='A', step_by_step="1. The resident made an error during the surgery by cutting a flexor tendon, which is a significant event that should be documented. \n2. The attending physician suggests not reporting this complication, arguing that it is minor and won't harm the patient. However, ethical medical practice dictates that all complications, regardless of perceived severity, should be disclosed to the patient and documented in the operative report. \n3. Option A (Disclose the error to the patient and put it in the operative report) is the most ethical choice, as it prioritizes transparency and patient safety. \n4. Option B (Tell the attending that he cannot fail to disclose this mistake) may create conflict but does not address the need for disclosure directly to the patient. \n5. Option C (Report the physician to the ethics committee) may be excessive at this stage and does not address the immediate need to inform the patient. \n6. Option D (Refuse to dictate the op

In [None]:
res['parsed'].final_answer

'A'

In [None]:
print(res['parsed'].step_by_step)

1. The resident made an error during the surgery by cutting a flexor tendon, which is a significant event that should be documented. 
2. The attending physician suggests not reporting this complication, arguing that it is minor and won't harm the patient. However, ethical medical practice dictates that all complications, regardless of perceived severity, should be disclosed to the patient and documented in the operative report. 
3. Option A (Disclose the error to the patient and put it in the operative report) is the most ethical choice, as it prioritizes transparency and patient safety. 
4. Option B (Tell the attending that he cannot fail to disclose this mistake) may create conflict but does not address the need for disclosure directly to the patient. 
5. Option C (Report the physician to the ethics committee) may be excessive at this stage and does not address the immediate need to inform the patient. 
6. Option D (Refuse to dictate the operative report) is not a constructive action

In [36]:
answer_list = list()
reasoning_list = list()
parsing_error = list()

with open(f'/content/drive/MyDrive/ia024a/projeto/entrega2/answers_cot_{datetime.datetime.now().strftime("%Y%m%d_%H%M%S")}.txt', 'w') as f:

    f.write(''.center(10, '-'))
    f.write('\n')
    f.write(f'Model: {model}\n')
    f.write(f'Temperature: {model_temp}\n')
    f.write(f'Prompt: {cot_prompt_template}\n')
    f.write(''.center(10, '-'))
    f.write('\n')

    for n, question in enumerate(tqdm(data)):

        res = chain.invoke({'question': question['question'], 'options': question['options']})
        f.write(f'{n}: ')
        f.write(str(res))
        f.write('\n')

        if res['parsing_error'] is None:
            answer_list.append(res['parsed'].final_answer)
            reasoning_list.append(res['parsed'].step_by_step)
        else:
            answer_list.append(None)
            reasoning_list.append(None)
            parsing_error.append(res)

100%|██████████| 1273/1273 [1:12:35<00:00,  3.42s/it]


### Results

In [37]:
# unanswered questions or answers not properly parsed
parsing_error

[]

In [39]:
data = data.add_column('answer_cot', answer_list)
data = data.add_column('reasoning_cot', reasoning_list)

In [40]:
df = data.to_pandas()

In [41]:
data.save_to_disk(f'/content/drive/MyDrive/ia024a/projeto/entrega2/cot_results_entrega')

Saving the dataset (0/1 shards):   0%|          | 0/1273 [00:00<?, ? examples/s]

In [42]:
df.to_csv(f'/content/drive/MyDrive/ia024a/projeto/entrega2/cot_results_entrega.csv')

In [43]:
# chain-of-thought accuracy
accuracy_cot = df['answer_cot'].eq(df['answer_idx']).mean()
accuracy_cot

0.7415553809897879

In [46]:
# chain-of-thought accuracy for step 1 questions
step1 = df[df['meta_info'].eq('step1')]

step1_acc = step1['answer_cot'].eq(step1['answer_idx']).mean()
step1_acc

0.7393225331369662

In [47]:
# chain-of-thought accuracy for step 2&3 questions
step23 = df[df['meta_info'].eq('step2&3')]

step23_acc = step23['answer_cot'].eq(step23['answer_idx']).mean()
step23_acc

0.7441077441077442

In [54]:
# distribution of wrong answers
df[~df['answer_cot'].eq(df['answer_idx'])].groupby('answer_cot').size()

Unnamed: 0_level_0,0
answer_cot,Unnamed: 1_level_1
A,94
B,90
C,78
D,67


In [58]:
# sample of correctly answered question
sample_answer = df[df['answer_cot'].eq(df['answer_idx'])].sample(1)

In [69]:
print(sample_answer['question'].values[0])

A 40-year-old man is referred to an optometrist. He complains of mild vision impairment over the last 6 months. His vision has continued to slowly deteriorate and his condition is now affecting his night driving. Past medical history is significant for well-controlled schizophrenia. He takes a low-potency typical antipsychotics and a multivitamin every day. He has been compliant with his medication and has regular follow-up visits. What is the best first step in the management of this patient’s symptoms?


In [66]:
print(sample_answer['options'].values[0])

{'A': 'Decrease medication dosage', 'B': 'Reassurance', 'C': 'Ocular examination under anesthesia', 'D': 'Slit-lamp examination'}


In [67]:
print(sample_answer['answer_cot'].values[0])

D


In [68]:
print(sample_answer['reasoning_cot'].values[0])

1. The patient is experiencing mild vision impairment that has been progressively worsening, particularly affecting his night driving. This suggests a potential ocular issue that needs to be evaluated.
2. Given the patient's age and the nature of his symptoms, it is important to conduct a thorough ocular examination to identify any underlying conditions, such as cataracts or other retinal issues.
3. A slit-lamp examination is a non-invasive procedure that allows for detailed visualization of the anterior segment of the eye, which can help diagnose various ocular conditions.
4. Decreasing the medication dosage (Option A) might not address the underlying cause of the vision impairment and could worsen his psychiatric condition. 
5. Reassurance (Option B) is not sufficient without a proper evaluation of the cause of vision changes.
6. Ocular examination under anesthesia (Option C) is typically reserved for more complex cases or when a detailed examination cannot be performed awake, which 