# Turn BioDEX into a QA task

### How do we measure success?
1. Via transfer to a different QA dataset?
2. Via results on unvalidated QA instances on PVQA?
3. Via validated instances on PVQA? 

### How do we create the dataset?
1. load BioDEX-raw
2. Create some QA templates


In [1]:
from src import Match, Icsr
from src.utils import get_matches

from datetime import datetime
import random
import datasets
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# load matches
dataset = datasets.load_dataset("FAERS-PubMed/raw_dataset")
matches = get_matches(dataset['train'])
print(len(matches))

Using custom data configuration FAERS-PubMed--raw_dataset-0b83cc0b498dbbb2
Found cached dataset json (/Users/kldooste/.cache/huggingface/datasets/FAERS-PubMed___json/FAERS-PubMed--raw_dataset-0b83cc0b498dbbb2/0.0.0/e6070c77f18f01a5ad4551a8b7edfba20b8438b7cad4d94e6ad9378022ce4aab)
100%|██████████| 1/1 [00:00<00:00, 26.52it/s]


65648


### Filter and split

In [3]:
# arguments
report_cutoff = 10
fulltext_only = True
commercial_only = False
test_cutoff = datetime(year=2021, month=1, day=1)

In [4]:
# filter too many reports
matches = [m for m in matches if len(m.reports) <= report_cutoff]
print(f'Matches with <= {report_cutoff} reports: {len(matches):,}')

Matches with <= 10 reports: 62,168


In [5]:
# get articles with full text
if fulltext_only:
    matches = [m for m in matches if m.article.fulltext]
    print(f'Matches with full text: {len(matches):,}')

Matches with full text: 18,678


In [None]:
# TODO split

### Create QA instances
1. create qa templates based on reports
2. sample a report
3. sample applicable qa templates and fill in

Some sample questions:
 - What was the indication for the drug X taken by patient Y
 - What was the dosage for the drug X taken by patient Y
 - Which drug(s) led to reaction(s) X (for patient Y)
 - Which reaction(s) was associated with drug(s) (for patient Y)
 - What as the outcome of reaction X (for patient Y)


Note, there are text-form explanations of all these fields. Can we leverage this to generate interesting questions?
 - Would definitelty help with data-augmentation, would help rephrase questions in terms of other concepts.
 - Needed to translate integer values of fields into concepts? Important for few-shot learning etc? People could also just inject this via prompting.

Note, can we just generate questions straight from the source material without requiring the report? Might work, might not be relevant to PV. Without some high-level field to validate against, most questions might boil-down to just retrieval. Not sure if our reports actually contain these high-level values or are mainly retrieval as well.
 


In [67]:
# class Question(object):
    
#     def __init__(self):
#         pass

#     def check_report(self, report):
#         pass

#     def from_report(self, report):
#         pass

class WeightQuestion(object):
    q = 'What is the weight of the patient?'
    a = '{patientweight} kg.'

    def check_report(self, report):
        return bool(report.patient.patientweight)
    
    def from_report(self, report):
        return [(self.q, self.a.format(patientweight=report.patient.patientweight))]
    
class DrugsGivenReactionQuestion(object):
    q = "Give an alphabetized list of all active substances of drugs taken by the patient who experienced '{reaction}'."

    def check_report(self, report):
        return True
    
    def from_report(self, report):
        # get all activesubstances
        activesubstances = []
        for drug in report.patient.drug:
            if drug.activesubstance and drug.activesubstance.activesubstancename:
                activesubstances.append(drug.activesubstance.activesubstancename)
        # deduplicate and sort
        activesubstances = ", ".join(sorted(list(set(activesubstances))))

        q_list = []
        a_list = [activesubstances] * len(report.patient.reaction)

        for reaction in report.patient.reaction:
            q_list.append(self.q.format(reaction=reaction.reactionmeddrapt))

        return list(zip(q_list, a_list))
    
class DrugIndicationQuestion(object):
    q = "What was the indication of drug {drug}?"

    def check_report(self, report):
        return True
    
    def from_report(self, report):
        q_list = []
        a_list = []

        for drug in report.patient.drug:
            if drug.activesubstance and drug.activesubstance.activesubstancename:
                if drug.drugindication:
                    q_list.append(self.q.format(drug=drug.activesubstance.activesubstancename))
                    a_list.append(drug.drugindication)
        
        return list(zip(q_list, a_list))


class DrugAdministrationRouteQuestion(object):
    q = "What was the administration route of drug '{drug}'?"

    administrationroute_map = {            
                  '001': "Auricular (otic)"
                  ,'002': "Buccal"
                  ,'003': "Cutaneous"
                  ,'004': "Dental"
                  ,'005': "Endocervical"
                  ,'006': "Endosinusial"
                  ,'007': "Endotracheal"
                  ,'008': "Epidural"
                  ,'009': "Extra-amniotic"
                  ,'010': "Hemodialysis"
                  ,'011': "Intra corpus cavernosum"
                  ,'012': "Intra-amniotic"
                  ,'013': "Intra-arterial"
                  ,'014': "Intra-articular"
                  ,'015': "Intra-uterine"
                  ,'016': "Intracardiac"
                  ,'017': "Intracavernous"
                  ,'018': "Intracerebral"
                  ,'019': "Intracervical"
                  ,'020': "Intracisternal"
                  ,'021': "Intracorneal"
                  ,'022': "Intracoronary"
                  ,'023': "Intradermal"
                  ,'024': "Intradiscal (intraspinal)"
                  ,'025': "Intrahepatic"
                  ,'026': "Intralesional"
                  ,'027': "Intralymphatic"
                  ,'028': "Intramedullar (bone marrow)"
                  ,'029': "Intrameningeal"
                  ,'030': "Intramuscular"
                  ,'031': "Intraocular"
                  ,'032': "Intrapericardial"
                  ,'033': "Intraperitoneal"
                  ,'034': "Intrapleural"
                  ,'035': "Intrasynovial"
                  ,'036': "Intratumor"
                  ,'037': "Intrathecal"
                  ,'038': "Intrathoracic"
                  ,'039': "Intratracheal"
                  ,'040': "Intravenous bolus"
                  ,'041': "Intravenous drip"
                  ,'042': "Intravenous (not otherwise specified)"
                  ,'043': "Intravesical"
                  ,'044': "Iontophoresis"
                  ,'045': "Nasal"
                  ,'046': "Occlusive dressing technique"
                  ,'047': "Ophthalmic"
                  ,'048': "Oral"
                  ,'049': "Oropharingeal"
                  ,'050': "Other"
                  ,'051': "Parenteral"
                  ,'052': "Periarticular"
                  ,'053': "Perineural"
                  ,'054': "Rectal"
                  ,'055': "Respiratory (inhalation)"
                  ,'056': "Retrobulbar"
                  ,'057': "Sunconjunctival"
                  ,'058': "Subcutaneous"
                  ,'059': "Subdermal"
                  ,'060': "Sublingual"
                  ,'061': "Topical"
                  ,'062': "Transdermal"
                  ,'063': "Transmammary"
                  ,'064': "Transplacental"
                  ,'065': "Unknown"
                  ,'066': "Urethral"
                  ,'067': "Vaginal"
                  }

    def check_report(self, report):
        return True
    
    def from_report(self, report):
        q_list = []
        a_list = []

        for drug in report.patient.drug:
            if drug.activesubstance and drug.activesubstance.activesubstancename:
                if drug.drugadministrationroute:
                    q_list.append(self.q.format(drug=drug.activesubstance.activesubstancename))
                    a_list.append(self.administrationroute_map[drug.drugadministrationroute])
        
        return list(zip(q_list, a_list))

class DrugDosageQuestion(object):
    q = "What was the dosage of drug '{drug}'?"
    a = "{dosagenumb} {dosageunit}."

    dosage_map = {            
                '001': "kg (kilograms)",
                '002': "g (grams)",
                '003': "mg (milligrams)",
                '004': "µg (micrograms)"
        }

    def check_report(self, report):
        return True
    
    def from_report(self, report):
        q_list = []
        a_list = []

        for drug in report.patient.drug:
            if drug.activesubstance and drug.activesubstance.activesubstancename:
                if drug.drugstructuredosagenumb and drug.drugstructuredosageunit:
                    q_list.append(self.q.format(drug=drug.activesubstance.activesubstancename))
                    a_list.append(self.a.format(
                        dosagenumb = drug.drugstructuredosagenumb,
                        dosageunit = self.dosage_map[drug.drugstructuredosageunit]
                    ))
        
        return list(zip(q_list, a_list))
    
class ReactionOutcomeQuestion(object):
    q = "What was the outcome of reaction '{reaction}'?"

    outcome_map = {
        '1': "Recovered",
        '2': "Recovering",
        '3': "Not recovered",
        '4': "Recovered with sequelae (consequent health issues)",
        '5': "Fatal",
        '6': "Unknown"
    }
    
    def check_report(self, report):
        return True
    
    def from_report(self, report):
        q_list = []
        a_list = []

        for reaction in report.patient.reaction:
            if reaction.reactionoutcome:
                q_list.append(self.q.format(reaction = reaction.reactionmeddrapt))
                a_list.append(self.outcome_map[reaction.reactionoutcome])

        return list(zip(q_list, a_list))


In [68]:
matches[3].reports[0].dict()['patient']['patientweight']

report = matches[0].reports[0]

questions = [
    WeightQuestion(),
    DrugsGivenReactionQuestion(),
    DrugAdministrationRouteQuestion(),
    DrugDosageQuestion(),
    ReactionOutcomeQuestion()
]
qa_tuples = []

for question in questions:
    if question.check_report(report):
        qa_tuples.extend(question.from_report(report))

qa_tuples

[("Give an alphabetized list of all active substances of drugs taken by the patient who experienced 'Kounis syndrome'.",
  'ATROPINE SULFATE, MIDAZOLAM, SEVOFLURANE'),
 ("Give an alphabetized list of all active substances of drugs taken by the patient who experienced 'Hypersensitivity'.",
  'ATROPINE SULFATE, MIDAZOLAM, SEVOFLURANE'),
 ("What was the administration route of drug 'ATROPINE SULFATE'?",
  'Intravenous bolus'),
 ("What was the administration route of drug 'MIDAZOLAM'?",
  'Intravenous bolus'),
 ("What was the administration route of drug 'SEVOFLURANE'?",
  'Respiratory (inhalation)'),
 ("What was the outcome of reaction 'Kounis syndrome'?", 'Recovered'),
 ("What was the outcome of reaction 'Hypersensitivity'?", 'Recovered')]

In [56]:
print(matches[0].reports[0].dict()['patient']['patientonsetage'])
print(matches[0].reports[0].dict()['patient']['patientonsetageunit'])
print(matches[0].reports[0].dict()['patient']['summary'])

12
801
None


In [7]:
print(len(matches[0].reports[0].dict()['patient']['drug']))
print(matches[0].reports[0].dict()['patient']['drug'][-1])
print(matches[0].reports[0].dict()['patient']['drug'][-1]['drugcharacterization'])

3
{'actiondrug': '1', 'activesubstance': {'activesubstancename': 'SEVOFLURANE'}, 'drugadditional': '1', 'drugadministrationroute': '055', 'drugauthorizationnumb': None, 'drugbatchnumb': None, 'drugcharacterization': '1', 'drugcumulativedosagenumb': None, 'drugcumulativedosageunit': None, 'drugdosageform': None, 'drugdosagetext': 'CONTINUOUS INHALED', 'drugenddate': None, 'drugenddateformat': None, 'drugindication': 'Anaesthesia', 'drugintervaldosagedefinition': None, 'drugintervaldosageunitnumb': None, 'drugrecurreadministration': None, 'drugseparatedosagenumb': None, 'drugstartdate': None, 'drugstartdateformat': None, 'drugstructuredosagenumb': None, 'drugstructuredosageunit': None, 'drugtreatmentduration': None, 'drugtreatmentdurationunit': None, 'medicinalproduct': 'SEVOFLURANE', 'drugrecurrence': None}
1


In [8]:
print(matches[0].article.fulltext)


==== Front
Front Cardiovasc Med
Front Cardiovasc Med
Front. Cardiovasc. Med.
Frontiers in Cardiovascular Medicine
2297-055X
Frontiers Media S.A.

10.3389/fcvm.2021.676188
Cardiovascular Medicine
Case Report
Case Report: Perioperative Kounis Syndrome in an Adolescent With Congenital Glaucoma
Capponi Guglielmo 1

Giovannini Mattia 2

Koniari Ioanna 3
Mori Francesca 2

Rubino Chiara 4

Spaziani Gaia 1
Calabri Giovanni Battista 1
Favilli Silvia 1
Novembre Elio 2

Indolfi Giuseppe 4 5

De Simone Luciano 1 †
Trapani Sandra 4 6 * †

1Cardiology Unit, Department of Pediatrics, Meyer Children's University Hospital, Florence, Italy
2Allergy Unit, Department of Pediatrics, Meyer Children's University Hospital, Florence, Italy
3Electrophysiology and Device Department, University Hospital of South Manchester NHS Foundation Trust, Manchester, United Kingdom
4Department of Pediatrics, Meyer Children's Hospital, Florence, Italy
5Department of NEUROFARBA, Meyer Children's Hospital, University of Flore

In [9]:
print(matches[0].article.mesh_terms)

None


In [10]:
print(matches[0].article.keywords)

Kounis syndrome; coronary artery; midazolam; pediatrics; perioperative; sevoflurane


In [11]:
print(matches[0].article.chemical_list)

None


In [12]:
print(matches[0].article.abstract)

A 12-year-old male patient suffering from congenital glaucoma developed bradycardia, left ventricular failure, and hypotension after induction of anesthesia. Electrocardiography and echocardiography revealed a complete normalization of ECG and a complete spontaneous recovery in the cardiac function 72 hours from the beginning of the clinical manifestations, while cardiac Magnetic Resonance Imaging was performed, and coronary Computed Tomography scan revealed a myocardial bridge of a tract of the left anterior descendent coronary artery. Diagnosis of Kounis syndrome (KS) was made, a relatively novel, under-recognized clinical condition, defined as the manifestation of an acute coronary syndrome accompanied by mast cell activation and platelet aggregation involving interrelated and interacting inflammatory cells in the setting of allergic, hypersensitivity, anaphylactic or anaphylactoid insults. We described one of the first pediatric cases of KS related to anesthetic medications. In chi