# FHIR for Research Workshop - Exercise 3

## Drug on Drug Interactions

 For this exercise we will explore potential drug on drug (DOD) interactions in a patient cohort by drawing from the NIH's Drug RxNAV database that includes open APIs we can reference. 

For this exercise we will explore the following scenario.


## Scenario: Potential DOD Interaction Risks in a Patient Population
For this scenario we will pull the list of prescriptions from our patient cohort, and then leveraging the NIH DOD API to determine if there is any potential risk of adverse events to patients. 

For this initial analysis we will want to do the following:
### Key Activities:
<ol>
    <li> Query all active prescriptions in our patient cohort</li>
    <li> Determine how to query the API and then construct a mechanism to feed our patient info into that API to determine if a known DoD Interaction could occur.</li>
    <li> Construct a composite list of all drugs per-patient (so we can determine a potential DoD interaction</li>
    <li> Construct a program to loop through our entire cohort and determine the aggregate risk.</li>

### Motivation/Purpose
From a research persective we can envision leveraging these sorts of analyses to do post-market surveilance of drugs to determine both the rate of known adverse events among patients, as well as to potentially flag additional risks not yet identified. 

Clinically this exercise demonstrates the power of a SMART of FHIR application, where third-party data (in this case DOD interaction data), can be pulled in, paired with FHIR formatted clinical data, and then leveraged to better inform patient care in the form of Clinical Decision Support tools. 

### Key Skills Practiced
For this scenario, the key challenge will be in mapping the API query to our data and constructing a data structure capable of leveraging it automatically. 

## Step 1 Query all active prescriptions in our patient cohort

For this exercise we will call on the 'MedicationRequest' resource which is the closest equivalent to a prescriptions resource in FHIR. Each item is effectively a single prescription. 

We will also make sure to include the relevant patient information to ensure we can map multiple prescritpions to individual patients.

In [2]:
import requests
import json
r = requests.get(f"https://api.logicahealth.org/researchonfhir/open/MedicationRequest?_include=MedicationRequest:patient", headers={'Accept':'application/fhir+json'}, verify=False)
bundle = r.json()



In [3]:
bundle

{'resourceType': 'Bundle',
 'id': '922af671-7e37-4de3-a0a9-9cdf06a60c66',
 'meta': {'lastUpdated': '2022-01-09T01:49:11.467+00:00'},
 'type': 'searchset',
 'link': [{'relation': 'self',
   'url': 'https://api.logicahealth.org/researchonfhir/open/MedicationRequest?_include=MedicationRequest%3Apatient'},
  {'relation': 'next',
   'url': 'https://api.logicahealth.org/researchonfhir/open?_getpages=922af671-7e37-4de3-a0a9-9cdf06a60c66&_getpagesoffset=50&_count=50&_pretty=true&_bundletype=searchset'}],
 'entry': [{'fullUrl': 'https://api.logicahealth.org/researchonfhir/open/MedicationRequest/smart-MedicationRequest-101',
   'resource': {'resourceType': 'MedicationRequest',
    'id': 'smart-MedicationRequest-101',
    'meta': {'versionId': '1',
     'lastUpdated': '2020-07-15T02:51:25.000+00:00',
     'source': '#KQSArAdbxORTtqVw'},
    'text': {'status': 'generated',
     'div': '<div xmlns="http://www.w3.org/1999/xhtml">Nizatidine 15 MG/ML Oral Solution [Axid] (rxnorm: 582620)</div>'},
    

We can use the open function to generate a new file and then write the content of our dataframe to it

In [4]:
open('fhir-data/data.json', 'wb').write(r.content)

101114

## Step 3 Mount Data onto Pandas Dataframe

Now that we've extracted information we need, we will then take the FHIR formatted data and convert it into a pandas dataframe for subsequent analysis.

The following set of functions parse the JSON into a pandas dataframe.

In [5]:
from pandas.io.json import json_normalize
import pandas as pd
import os


class Fhiry(object):
    def __init__(self):
        self._df = None
        self._filename = ""
        self._folder = ""

    @property
    def df(self):
        return self._df

    @property
    def filename(self):
        return self._filename

    @property
    def folder(self):
        return self._folder

    @filename.setter
    def filename(self, filename):
        self._filename = filename
        self._df = self.read_bundle_from_file(filename)

    @folder.setter
    def folder(self, folder):
        self._folder = folder

    def read_bundle_from_file(self, filename):
        with open(filename, 'r') as f:
            json_in = f.read()
            json_in = json.loads(json_in)
            return json_normalize(json_in['entry'])

    def delete_unwanted_cols(self):
        del self._df['resource.text.div']

    def process_df(self):
        """Read a single JSON resource or a directory full of JSON resources
        ONLY COMMON FIELDS IN ALL resources will be mapped
        """
        if self._folder:
            df = pd.DataFrame(columns=[])
            for file in os.listdir(self._folder):
                if file.endswith(".json"):
                    self._df = self.read_bundle_from_file(
                        os.path.join(self._folder, file))
                    self.delete_unwanted_cols()
                    self.convert_object_to_list()
                    self.add_patient_id()
                    if df.empty:
                        df = self._df
                    else:
                        df = pd.concat([df, self._df])
            self._df = df
        elif self._filename:
            self._df = self.read_bundle_from_file(self._filename)
            self.delete_unwanted_cols()
            self.convert_object_to_list()
            self.add_patient_id()

    def process_file(self, filename):
        self._df = self.read_bundle_from_file(filename)
        self.delete_unwanted_cols()
        self.convert_object_to_list()
        self.add_patient_id()
        return self._df

    def convert_object_to_list(self):
        """Convert object to a list of codes
        """
        for col in self._df.columns:
            if 'coding' in col:
                codes = self._df.apply(
                    lambda x: self.process_list(x[col]), axis=1)
                self._df = pd.concat(
                    [self._df, codes.to_frame(name=col+'codes')], 1)
                del self._df[col]
            if 'display' in col:
                codes = self._df.apply(
                    lambda x: self.process_list(x[col]), axis=1)
                self._df = pd.concat(
                    [self._df, codes.to_frame(name=col+'display')], 1)
                del self._df[col]

    def add_patient_id(self):
        """Create a patientId column with the resource.id of the first Patient resource
        """
        self._df['patientId'] = self._df[(
            self._df['resource.resourceType'] == "Patient")].iloc[0]['resource.id']

    def get_info(self):
        if self._df is None:
            return "Dataframe is empty"
        return self._df.info()

    def process_list(self, myList):
        """Extracts the codes from a list of objects
        Args:
            myList (list): A list of objects
        Returns:
            list: A list of codes
        """
        myCodes = []
        if isinstance(myList, list):
            for entry in myList:
                if 'code' in entry:
                    myCodes.append(entry['code'])
                else:
                    myCodes.append(entry['display'])
        return myCodes

In [6]:
# parallel file
import multiprocessing as mp



def process_files(file):
    f = Fhiry()
    return f.process_file(file)


def process_ndjson(file):
    f = Fhirndjson()
    return f.process_file(file)

def process(folder):
    try:
        pool = mp.Pool(mp.cpu_count())
        list_of_dataframes = pool.map(process_files, [folder + '/' + row for row in os.listdir(folder)])
        pool.close()
        return pd.concat(list_of_dataframes)
    except:
        f = Fhiry()
        f.folder = folder
        f.process_df()
        return f.df


def ndjson(folder):
    try:
        pool = mp.Pool(mp.cpu_count())
        list_of_dataframes = pool.map(
            process_ndjson, [folder + '/' + row for row in os.listdir(folder)])
        pool.close()
        return pd.concat(list_of_dataframes)
    except:
        f = Fhirndjson()
        f.folder = folder
        f.process_df()
        return f.df

We can now create our dataframe by calling the process function to parse all the json files within a given directory 

In [20]:
df = process('fhir-data')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 32 columns):
fullUrl                                                   60 non-null object
resource.resourceType                                     60 non-null object
resource.id                                               60 non-null object
resource.meta.versionId                                   60 non-null object
resource.meta.lastUpdated                                 60 non-null object
resource.meta.source                                      60 non-null object
resource.text.status                                      60 non-null object
resource.status                                           50 non-null object
resource.intent                                           50 non-null object
resource.medicationCodeableConcept.text                   50 non-null object
resource.subject.reference                                50 non-null object
resource.dosageInstruction                        

In [8]:
df.columns

Index(['fullUrl', 'resource.resourceType', 'resource.id',
       'resource.meta.versionId', 'resource.meta.lastUpdated',
       'resource.meta.source', 'resource.text.status', 'resource.status',
       'resource.intent', 'resource.medicationCodeableConcept.text',
       'resource.subject.reference', 'resource.dosageInstruction',
       'resource.dispenseRequest.numberOfRepeatsAllowed',
       'resource.dispenseRequest.quantity.value',
       'resource.dispenseRequest.quantity.unit',
       'resource.dispenseRequest.quantity.system',
       'resource.dispenseRequest.quantity.code',
       'resource.dispenseRequest.expectedSupplyDuration.value',
       'resource.dispenseRequest.expectedSupplyDuration.unit',
       'resource.dispenseRequest.expectedSupplyDuration.system',
       'resource.dispenseRequest.expectedSupplyDuration.code', 'search.mode',
       'resource.identifier', 'resource.active', 'resource.name',
       'resource.telecom', 'resource.gender', 'resource.birthDate',
       '

In [9]:
df.head(5)

Unnamed: 0,fullUrl,resource.resourceType,resource.id,resource.meta.versionId,resource.meta.lastUpdated,resource.meta.source,resource.text.status,resource.status,resource.intent,resource.medicationCodeableConcept.text,...,resource.identifier,resource.active,resource.name,resource.telecom,resource.gender,resource.birthDate,resource.address,resource.generalPractitioner,resource.medicationCodeableConcept.codingcodes,patientId
0,https://api.logicahealth.org/researchonfhir/op...,MedicationRequest,smart-MedicationRequest-101,1,2020-07-15T02:51:25.000+00:00,#KQSArAdbxORTtqVw,generated,active,order,Nizatidine 15 MG/ML Oral Solution [Axid],...,,,,,,,,,[582620],smart-1081332
1,https://api.logicahealth.org/researchonfhir/op...,MedicationRequest,smart-MedicationRequest-102,1,2020-07-15T02:51:26.000+00:00,#WnCTEkK79sEBIQNe,generated,active,order,Amoxicillin 80 MG/ML Oral Suspension,...,,,,,,,,,[308189],smart-1081332
2,https://api.logicahealth.org/researchonfhir/op...,MedicationRequest,smart-MedicationRequest-103,1,2020-07-15T02:51:26.000+00:00,#WnCTEkK79sEBIQNe,generated,active,order,Amoxicillin 120 MG/ML / clavulanate potassium ...,...,,,,,,,,,[617993],smart-1081332
3,https://api.logicahealth.org/researchonfhir/op...,MedicationRequest,smart-MedicationRequest-104,1,2020-07-15T02:51:26.000+00:00,#WnCTEkK79sEBIQNe,generated,active,order,Azithromycin 20 MG/ML Oral Suspension [Zithromax],...,,,,,,,,,[211307],smart-1081332
4,https://api.logicahealth.org/researchonfhir/op...,MedicationRequest,smart-MedicationRequest-105,1,2020-07-15T02:51:26.000+00:00,#WnCTEkK79sEBIQNe,generated,active,order,cefdinir 25 MG/ML Oral Suspension [Omnicef],...,,,,,,,,,[261091],smart-1081332


In [10]:
df['patientId'].unique()

array(['smart-1081332'], dtype=object)

So we now have a basic datafame with drug and patient information. Before we can begin trying to construct a parser, we need to examine our API to see how data is submitted and returned.

## Step 2: Determine how to query the API and then construct a mechanism to feed our patient info into that API to determine if a known DoD Interaction could occur.

Reviewing the NIH's RX Norm database documentation. Link here: https://lhncbc.nlm.nih.gov/RxNav/APIs/index.html

We see one clear option we have to use is the RX CUI code using the six-digit NDC code
https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.getNDCs.html

This correllates with our Patient data column: resource.medicationCodeableConcept.codingcodes (quite a mouthful!, but we'll deal with that shortly).

Let's pull a sample interaction using the following general notation:

URL/list.json?rxcuis=[code 1]+[code 2]

In [11]:
url = 'https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis=207106+656659'
response = (requests.get(url).text)
response_json = json.loads(response)
response_json

{'nlmDisclaimer': 'It is not the intention of NLM to provide specific medical advice, but rather to provide users with information to better understand their health and their medications. NLM urges you to consult with a qualified physician for advice about medications.',
 'userInput': {'sources': [''], 'rxcuis': ['207106', '656659']},
 'fullInteractionTypeGroup': [{'sourceDisclaimer': 'DrugBank is intended for educational and scientific research purposes only and you expressly acknowledge and agree that use of DrugBank is at your sole risk. The accuracy of DrugBank information is not guaranteed and reliance on DrugBank shall be at your sole risk. DrugBank is not intended as a substitute for professional medical advice, diagnosis or treatment..[www.drugbank.ca]',
   'sourceName': 'DrugBank',
   'fullInteractionType': [{'comment': 'Drug1 (rxcui = 207106, name = fluconazole 50 MG Oral Tablet [Diflucan], tty = SBD). Drug2 (rxcui = 656659, name = bosentan 125 MG Oral Tablet, tty = SCD). Dru

We now have a target to work toward! For each patient, we need to compile a list of RXCUI codes and then append them to our NIH API query with a '+' between each code! Let's co about constructing that!

## Step 3: Construct a composite list of all drugs per-patient (so we can determine a potential DoD interaction

As a first step let's extract the two features we need: Patient ID and the Drug code into a new dataframe, and then rename our drug code feature something easier to work with!

In [22]:
dfnew = df[['resource.medicationCodeableConcept.codingcodes','patientId']]
dfnew['code']=dfnew['resource.medicationCodeableConcept.codingcodes']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [23]:
dfnew = dfnew.drop(['resource.medicationCodeableConcept.codingcodes'], axis = 1)

In [24]:
dfnew.head()

Unnamed: 0,patientId,code
0,smart-1081332,[582620]
1,smart-1081332,[308189]
2,smart-1081332,[617993]
3,smart-1081332,[211307]
4,smart-1081332,[261091]


Now that's a dataframe we can work with! One final issue though, it appears our drug codes are already individually in list form. In order to more easily construct a list of codes we're going to convert them into strings. The following code fragment will accomplish this:

In [27]:
dfnew['cleancode'] = pd.DataFrame(dfnew['code'].tolist())
dfnew = dfnew.drop(['code'], axis = 1)
dfnew.head()

Unnamed: 0,patientId,cleancode
0,smart-1081332,582620
1,smart-1081332,308189
2,smart-1081332,617993
3,smart-1081332,211307
4,smart-1081332,261091


Now we are ready to rumble! Let's now use a modified GroupBy function to merge our drugs by patient.

In [28]:
groups_by_patient = dfnew.groupby('patientId', sort=False)['cleancode'].apply(lambda x: x.values.tolist())
groups_by_patient.head()

patientId
smart-1081332    [582620, 308189, 617993, 211307, 261091, 40463...
Name: cleancode, dtype: object

ok so this code is working (yay!) however clearly my query is messed up as I only have one patient. Need to fix! For the rough though I'm 'cheating' and using a CSV file of Sythea data to replicate this outcome...

In [12]:
# Read CSV file
df = pd.read_csv("medications.csv")

# Create new DataFrame with relevant columns
df = df[['START','STOP','PATIENT','CODE']]

# Exclude medications rows where the medication has been stopped (i.e. Asssumption that no "current" drug interactions will take place for medications that are stopped)
df = df[df['STOP'].isnull()]

# Create groups of drugs for each patient
groups_by_patient = df.groupby('PATIENT', sort=False)['CODE'].apply(lambda x: x.values.tolist())

# Declare variables for index counting and sum total of drug interactions
index = 1
count_drug_int = 0

In [13]:
df.head()

Unnamed: 0,START,STOP,PATIENT,CODE
11,2020-03-22T00:26:23Z,,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,1000126
34,2020-02-11T15:35:36Z,,b58731cc-2d8b-4c2d-b327-4cab771af3ef,748856
41,2019-05-25T06:54:57Z,,bfb6537b-535a-4f31-9a56-073220f96a17,748879
62,2019-10-09T23:09:32Z,,83719bd7-7a41-4c87-93f9-c5de4db6a14a,746030
63,1982-10-25T18:19:08Z,,76982e06-f8b8-4509-9ca3-65a99c8650fe,1049630


In [14]:
groups_by_patient = df.groupby('PATIENT', sort=False)['CODE'].apply(lambda x: x.values.tolist())
groups_by_patient.head()

PATIENT
8d4c4326-e9de-4f45-9a4c-f8c36bff89ae            [1000126]
b58731cc-2d8b-4c2d-b327-4cab771af3ef             [748856]
bfb6537b-535a-4f31-9a56-073220f96a17             [748879]
83719bd7-7a41-4c87-93f9-c5de4db6a14a             [746030]
76982e06-f8b8-4509-9ca3-65a99c8650fe    [1049630, 310325]
Name: CODE, dtype: object

## Step 4: Construct a program to loop through our entire cohort and determine the aggregate risk.

Ok so to recap: we now have a list of patients with associated rx codes in list form, and we have a mechanism to query the RXNav API to determin if a DoD interaction exists. As a last step let's create a series of functions to iterate through our patient list and for each patient return whether or not a DoD interaction could occur

In [15]:
# Function for calling NIH API
def get_api_data(drug_list):
    try:
        url = 'https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis=' + drug_list
        response = (requests.get(url).text)
        response_json = json.loads(response)
        return response_json

    except Exception as e:
        raise e

In [16]:
# Iterate through each patient list of medications
for drug_list in groups_by_patient:
    joined_drug_list = "+".join(str(i) for i in drug_list)
    # print(i, " " , drug_list)
    index += 1
    data = get_api_data(joined_drug_list) # returns JSON response
    if 'fullInteractionTypeGroup' not in data:
        print('No drug interaction: ', joined_drug_list)
        continue
    count_drug_int += 1
    print('Drug interaction: ', joined_drug_list)

print('Total number of drug interactions: ', count_drug_int)

No drug interaction:  1000126
No drug interaction:  748856
No drug interaction:  748879
No drug interaction:  746030
No drug interaction:  1049630+310325
No drug interaction:  310798
Drug interaction:  310798+897718+197604+855332
No drug interaction:  999967
No drug interaction:  746030
No drug interaction:  310325+748856+310798
Drug interaction:  997488+1870230+895994+2123111
Drug interaction:  997488+1870230+895994+2123111
Drug interaction:  310798+895994+2123111
No drug interaction:  429503
No drug interaction:  748856
Drug interaction:  665078+1870230
No drug interaction:  314231
No drug interaction:  999967
Drug interaction:  849574+309362+705129+312961+197361
No drug interaction:  314231+429503
Drug interaction:  141918+1870230+746030
Drug interaction:  665078+1870230
No drug interaction:  999967
No drug interaction:  1000126
Drug interaction:  1014676+1870230
No drug interaction:  314231
No drug interaction:  860975
No drug interaction:  746030
No drug interaction:  746030
Drug 

No drug interaction:  705129
Drug interaction:  866414+313988+309362+705129+312961+197361+1719286
No drug interaction:  1049630
No drug interaction:  429503
No drug interaction:  389221
Drug interaction:  314231+897718+197604+855332
No drug interaction:  314231+106892
No drug interaction:  429503
No drug interaction:  1367439
Drug interaction:  997488+1870230
No drug interaction:  999967
No drug interaction:  748856
No drug interaction:  310798
Drug interaction:  997488+1870230
Drug interaction:  198031+1860480+752899+310436+309362+106892+705129+897718+312961+860975+197604+197361+855332
Drug interaction:  1049630+314231
No drug interaction:  314231
Drug interaction:  1000126+310798
Drug interaction:  2001499+896209
Drug interaction:  309362+999967+705129+312961+197361+866414+313988+1719286
Drug interaction:  849574+866414+313988+1719286
No drug interaction:  314231
No drug interaction:  197591
No drug interaction:  999967
Drug interaction:  1860480+752899+314231+106892+860975
No drug i

Drug interaction:  665078+1870230
Drug interaction:  895994+2123111+1014676+1870230
Drug interaction:  665078+1870230+895994+2123111
Drug interaction:  997488+1870230
Drug interaction:  197378+1870230+310798
Drug interaction:  1860480+752899+314231+746030+705129+896209
Drug interaction:  897718+895994+197604+2123111+855332+314231
No drug interaction:  429503
No drug interaction:  999967
Drug interaction:  1049630+866414+313988
Drug interaction:  866414+313988+904419+1719286+746030
No drug interaction:  999967
Drug interaction:  849574+1860480+752899+996740+897718+197604+855332
No drug interaction:  746030
Drug interaction:  997488+1870230+895994+2123111
No drug interaction:  314231
No drug interaction:  2001499
Drug interaction:  1049630+314231+895994+2123111
Drug interaction:  997488+1870230
Drug interaction:  665078+1870230+895994+2123111
No drug interaction:  749762
No drug interaction:  310798
No drug interaction:  746030
Drug interaction:  849574+389221
Drug interaction:  1860480+

Drug interaction:  665078+1870230+748879+895994+2123111
Drug interaction:  861467+314231+106892+746030
No drug interaction:  748856
Drug interaction:  1049630+429503
Drug interaction:  665078+1870230+757594
Drug interaction:  1000126+310798
Drug interaction:  997488+1870230
Drug interaction:  1049221+849574+746030+897718+197604+855332+314231
Drug interaction:  895994+2123111
Drug interaction:  895994+2123111
Drug interaction:  849574+314231
Drug interaction:  314231+746030+705129+897718+197604+855332
No drug interaction:  999967
No drug interaction:  1049630
Drug interaction:  310325+1803932+1736776+866414+313988+1719286
Drug interaction:  860975+896209
Drug interaction:  665078+1870230+895994+2123111
No drug interaction:  310798
Drug interaction:  997488+1870230+749762+310798
No drug interaction:  477045
Drug interaction:  141918+1870230+310798+895994+2123111
Drug interaction:  904419+866414+313988+1719286
Drug interaction:  198031+1860480+752899+312961+197361
No drug interaction:  83

## Summary

So this is a really cool example of how FHIR data can interact with the broader ecosystem of Healthcare data and resources to determine additional health care insights. 