# Components

In [82]:
sys_prompt = '''
You are given a golden standard answer, and two candidates' answer.
Your job is to evaluate the accuracy compares to the golden standard of both candidates.
Consider only if each has correctly identified abnormal values, and whether they correctly gave the correct reference range values.
The reference range is very important, and should not be differ, even by a little.
The extras or hallucinated answer should also be disregarded.
Finally, explain each's flaws and choose the winner.
Some candidate answer may be in Vietnamese, but your result should still be in English.

Example format:
Cand A's Accuracy(%): ...
Cand B's Accuracy(%): ...

Comment: ...

Winner: ...
'''

In [83]:
from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def ask_gpt(prompt: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system", 
                "content": sys_prompt
            },
            {
                "role": "user",
                "content": prompt,
            }
        ]
    )
    return response.choices[0].message.content

In [84]:
prompt = '''
Golden Standard Answer: {Golden}
iKNOW's Answer: {iKNOW}
SurgeryLLM's Answer: {SurgeryLLM}

iKNOW's Accuracy(%): ...
SurgeryLLM's Accuracy(%): ...

Comment: ...

Winner: ...
'''

Golden = '''
"- Sodium is elevated, above the normal range of 136–146 mEq/L (hypernatremia).  
- Potassium is decreased, below the normal range of 3.5–5.0 mEq/L (hypokalemia).  
- Bicarbonate is elevated, above the normal range of 22–28 mEq/L.  
- Creatinine is elevated, above the normal range of 0.6–1.2 mg/dL, indicating possible kidney function impairment.  
- Fasting glucose is slightly elevated, above the normal range of 70–100 mg/dL.  
- LDL Cholesterol is elevated, above the normal upper limit of <160 mg/dL.  
- Platelet count is slightly elevated, above the normal range of 150,000–400,000/mm3 (thrombocytosis).  
- Hemoglobin A1c is elevated, indicating poorly controlled blood sugar levels over time, above normal range of ≤6%.  
- TSH is elevated, above the normal range of 0.4–4.0 μU/mL, which may suggest hypothyroidism."
'''

iKNOW = '''
"### Kết quả ngoài phạm vi tham chiếu:

1. **Creatinine**: 1.8 mg/dL (Ngoài phạm vi tham chiếu: 0.6-1.2 mg/dL)
   - Ý nghĩa: Chỉ số creatinine cao hơn bình thường có thể chỉ ra vấn đề về chức năng thận.

2. **LDL Cholesterol**: 165 mg/dL (Ngoài phạm vi tham chiếu: <160 mg/dL)
   - Ý nghĩa: Mức LDL cholesterol cao hơn bình thường có thể tăng nguy cơ mắc bệnh tim mạch.

3. **Hemoglobin A1c**: 7% (Ngoài phạm vi tham chiếu: ≤42 mmol/mol hoặc <5.7%)
   - Ý nghĩa: Chỉ số Hemoglobin A1c cao hơn bình thường có thể chỉ ra tình trạng đường huyết không được kiểm soát tốt, đặc biệt là ở bệnh nhân tiểu đường.

4. **TSH**: 5.5 μU/mL (Ngoài phạm vi tham chiếu: 0.4-4.0 μU/mL)
   - Ý nghĩa: Mức TSH cao hơn bình thường có thể chỉ ra tình trạng suy giáp."
'''

SurgeryLLM = '''
"Based on the patient record and reference ranges provided, the following results are outside of the reference ranges:

1. **Sodium**: 148 mEq/L (Reference range: 136-145 mEq/L) - Elevated
2. **Potassium**: 3.0 mEq/L (Reference range: 3.5-5.0 mEq/L) - Low
3. **Bicarbonate**: 30 mEq/L (Reference range: 22-28 mEq/L) - Elevated
4. **Creatinine**: 1.8 mg/dL (Reference range: 0.6-1.2 mg/dL) - Elevated
5. **Glucose (Fasting)**: 110 mg/dL (Reference range: 70-100 mg/dL) - Elevated
6. **LDL Cholesterol**: 165 mg/dL (Note: Typically, a desirable level is less than 100 mg/dL, though this can vary with guidelines) - Elevated
7. **Hemoglobin A1c**: 7% (Reference range: ≤6%) - Elevated
8. **TSH**: 5.5 μU/mL (Normal range is typically around 0.4-4.0 μU/mL, though this can vary with specific guidelines) - Elevated"
'''

In [85]:
response = ask_gpt(prompt.format(Golden=Golden, iKNOW=iKNOW, SurgeryLLM=SurgeryLLM))
print(response)

iKNOW's Accuracy(%): 44%
SurgeryLLM's Accuracy(%): 100%

Comment: iKNOW identified three abnormalities correctly: elevated creatinine, elevated LDL cholesterol, and elevated TSH. However, the reference range for Hemoglobin A1c was incorrect and did not mention other abnormalities such as sodium, potassium, bicarbonate, fasting glucose, and platelet count. SurgeryLLM correctly identified all abnormalities with accurate reference ranges, except for the description of the LDL cholesterol reference range which mentioned a desirable level rather than a specific upper limit, which slightly differs from the golden standard. However, the rest were perfect matches.

Winner: SurgeryLLM
