<a href="https://colab.research.google.com/github/christiejibaraki/CUREBench/blob/main/notebooks/1-inspect_datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inspect provided datasets
- curebench_testset_phase1.jsonl
- curebench_valset_pharse1.jsonl

### basic setup
- clone repo
- DO NOT BUILD THE ENVIRONMENT FROM `requirements.txt` it will take too long ❌

In [1]:
!git clone https://github.com/christiejibaraki/CUREBench.git

Cloning into 'CUREBench'...
remote: Enumerating objects: 137, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 137 (delta 32), reused 12 (delta 12), pack-reused 89 (from 2)[K
Receiving objects: 100% (137/137), 2.85 MiB | 12.51 MiB/s, done.
Resolving deltas: 100% (61/61), done.


In [2]:
%cd CUREBench

/content/CUREBench


In [27]:
import os
import json
import pandas as pd
from typing import Dict, List
from torch.utils.data import DataLoader
from dataset_utils import build_dataset
from core.eval_framework import load_and_merge_config, create_metadata_parser

In [21]:
# pandas settings
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [15]:
def load_dataset_by_config(config_path):
  # load config file to get dataset info
  config = json.load(open(config_path, 'r')) if config_path else {}
  if 'dataset' in config:
      dataset_config = config['dataset']
      dataset_name = dataset_config.get('dataset_name', 'treatment')
  print(f"\nconfig file: {config_path}\ncontents:\n{dataset_config}")
  dataset_path = dataset_config.get("dataset_path")

  # build dataset
  dataset = build_dataset(
        dataset_config.get("dataset_path"),
    )
  dataloader = DataLoader(dataset, batch_size=1, shuffle=False)
  dataset_list = []

  for batch in dataloader:
      question_type = batch[0][0]

      if question_type == "multi_choice":
          dataset_list.append({
              "question_type": batch[0][0],
              "id": batch[1][0],
              "question": batch[2][0],
              "answer": batch[3][0],
          })
      elif question_type == "open_ended_multi_choice":
          dataset_list.append({
              "question_type": batch[0][0],
              "id": batch[1][0],
              "question": batch[2][0],
              "answer": batch[3][0],
              "meta_question": batch[4][0],
          })
      elif question_type == "open_ended":
          dataset_list.append({
              "question_type": batch[0][0],
              "id": batch[1][0],
              "question": batch[2][0],
              "answer": batch[3][0],
          })
  return dataset_list



In [17]:
test_data_config_path= "metadata_config_test.json"
val_data_config_path= "metadata_config_val.json"
test_data_list = load_dataset_by_config(test_data_config_path)
val_data_list = load_dataset_by_config(val_data_config_path)


config file: metadata_config_test.json
contents:
{'dataset_name': 'cure_bench_phase1_test', 'dataset_path': 'resources/curebench_testset_phase1.jsonl', 'description': 'CureBench 2025 test questions'}
dataset_path: resources/curebench_testset_phase1.jsonl
CureBenchDataset initialized with 2079 examples

config file: metadata_config_val.json
contents:
{'dataset_name': 'cure_bench_phase1_val', 'dataset_path': 'resources/curebench_valset_pharse1.jsonl', 'description': 'CureBench 2025 val questions'}
dataset_path: resources/curebench_valset_pharse1.jsonl
CureBenchDataset initialized with 459 examples


### val data

In [22]:
val_df = pd.DataFrame(val_data_list)

In [43]:
print(f"number of rows in val set: {len(val_df):,}")

number of rows in val set: 459


In [23]:
val_df.head()

Unnamed: 0,question_type,id,question,answer,meta_question
0,multi_choice,U9PHZ83RKYV8,Which drug brand name is associated with the treatment of acne?\nA: Salicylic Acid\nB: Minoxidil\nC: Ketoconazole\nD: Fluocinonide,A,
1,open_ended_multi_choice,vIGwm8qguXYi,What should patients do if they experience severe allergic reactions during or after receiving fosaprepitant for injection?,B,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: What should patients do if they experience severe allergic reactions during or after receiving fosaprepitant for injection?\nA: Wait for the symptoms to resolve on their own.\nB: Inform their healthcare provider immediately and seek emergency medical care.\nC: Stop chemotherapy treatment permanently.\nD: Take over-the-counter antihistamines.\n\n"
2,open_ended_multi_choice,GlpDnJvMaWbs,What should you do if the dose indicator on Stiolto Respimat reaches 0?,B,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: What should you do if the dose indicator on Stiolto Respimat reaches 0?\nA: Continue using the inhaler until the cartridge is empty.\nB: Prepare and use a new Stiolto Respimat inhaler.\nC: Turn the clear base to reset the dose indicator.\nD: Clean the mouthpiece and continue using the inhaler.\n\n"
3,open_ended_multi_choice,WfWiWK0yULaX,Which of the following conditions is a contraindication for the use of Gadavist?,B,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: Which of the following conditions is a contraindication for the use of Gadavist?\nA: Mild hypersensitivity reactions to Gadavist\nB: History of severe hypersensitivity reactions to Gadavist\nC: Renal impairment\nD: Liver dysfunction\n\n"
4,multi_choice,wzkMQ7uHtlLs,"What is the primary consideration for lactating mothers using Albuterol Sulfate HFA?\nA: It is contraindicated during lactation.\nB: Plasma levels of albuterol are low, and effects on breastfed children are likely minimal.\nC: It significantly reduces milk production.\nD: It should only be used in emergencies.",B,


In [30]:
val_answer_counts = val_df.groupby(['answer'], as_index=False).size()
val_answer_counts.rename(columns={"size": "count"}, inplace=True)
val_answer_counts

Unnamed: 0,answer,count
0,A,118
1,B,262
2,C,66
3,D,13


In [40]:
val_df['meta_question'].isna().sum()

np.int64(229)

### test data

In [25]:
test_df = pd.DataFrame(test_data_list)

In [42]:
print(f"number of rows in test set: {len(test_df):,}")

number of rows in test set: 2,079


In [31]:
test_answer_counts = test_df.groupby(['answer'], as_index=False).size()
test_answer_counts.rename(columns={"size": "count"}, inplace=True)
test_answer_counts

Unnamed: 0,answer,count
0,,2079


In [34]:
test_df=test_df.drop(['answer'], axis=1)

In [35]:
test_df.head()

Unnamed: 0,question_type,id,question,meta_question
0,open_ended_multi_choice,qHBAJ2T5cs5U,A 10-year-old child diagnosed with juvenile rheumatoid arthritis (JRA) requires treatment. Genetic testing reveals the child is a poor CYP2C9 metabolizer. Which drug is the most appropriate for this patient?,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: A 10-year-old child diagnosed with juvenile rheumatoid arthritis (JRA) requires treatment. Genetic testing reveals the child is a poor CYP2C9 metabolizer. Which drug is the most appropriate for this patient?\nA: Celecoxib 200 mg\nB: First Aid Direct Chewable Aspirin\nC: Florexa\nD: None of the above\n\n"
1,open_ended_multi_choice,dYtEF1FnUwSQ,"A 70-year-old male with familial chylomicronemia syndrome (FCS) presents with memory issues, recurrent pancreatitis, and mild renal impairment (eGFR 60 mL/min). He has no known history of hypersensitivity reactions. What is the most suitable treatment option for this patient?","The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: A 70-year-old male with familial chylomicronemia syndrome (FCS) presents with memory issues, recurrent pancreatitis, and mild renal impairment (eGFR 60 mL/min). He has no known history of hypersensitivity reactions. What is the most suitable treatment option for this patient?\nA: TRYNGOLZA (Olezarsen Sodium) without dose adjustment\nB: TRYNGOLZA with dose adjustment due to renal impairment\nC: A triglyceride-lowering drug contraindicated for geriatric patients\nD: A drug targeting APOC-III but requiring pediatric dosing\n\n"
2,open_ended_multi_choice,IKWfGHlG9aaL,Which thyroid hormone replacement therapy is considered both safe and effective for a 32-year-old pregnant woman in her second trimester who has familial papillary thyroid carcinoma and hypothyroidism?,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: Which thyroid hormone replacement therapy is considered both safe and effective for a 32-year-old pregnant woman in her second trimester who has familial papillary thyroid carcinoma and hypothyroidism?\nA: Doxorubicin Hydrochloride\nB: NP Thyroid 120\nC: Cholestyramine\nD: Estrogen-containing oral contraceptives\n\n"
3,open_ended_multi_choice,XCM1462WATI2,Which over-the-counter supplement should be avoided while taking venlafaxine due to potential interactions?,"The following is a multiple choice question about medicine and the agent's open-ended answer to the question. Convert the agent's answer to the final answer format using the corresponding option label, e.g., 'A', 'B', 'C', 'D', 'E' or 'None'. \n\nQuestion: Which over-the-counter supplement should be avoided while taking venlafaxine due to potential interactions?\nA: Vitamin D\nB: St. John's Wort\nC: Omega-3 fatty acids\nD: Calcium\n\n"
4,multi_choice,tvnSoQOIv0R0,"What was the purpose of the 6-month study conducted in CD-1 mice using cV1q anti-mouse TNFα, an analogous antibody to REMICADE?\nA: To study the effects of REMICADE on cardiovascular health.\nB: To evaluate the pharmacokinetics of REMICADE in mice.\nC: To assess the tumorigenic potential of cV1q anti-mouse TNFα.\nD: To determine the optimal dosage of REMICADE for mice.",


In [39]:
test_df['meta_question'].isna().sum()

np.int64(805)