# FHIR for Research Workshop

## Exercise 1 

Intro: see https://github.com/NIH-NCPI/fhir-101/blob/master/FHIR%20101%20-%20Practical%20Guide.ipynb as a great example

## What is this notebook?

(common overview of the FHIR Training)

(overview of this specific notebook)




### Icons in this Guide
 📘 A link to a useful external reference related to the section the icon appears in  

 ⚡️ A key takeaway for the section that this icon appears in  

 🖐 A hands-on section where you will code something or interact with the server  


(any required MITRE legalese should either go here or at the very bottom of the notebook)

## Motivation / Purpose

## Scenario

(this section describes the specifics of the use case: what is the problem statement, what is the basic approach we are going to take, etc)


## Initial Setup

In [None]:
# import any required libraries here.
#  - requests
#  - fhirclient: https://github.com/smart-on-fhir/client-py
#  - Pandas - DataFrames
#  - NumPy - basic data analysis
#  - matplotlib
#  - maybe seaborn for viz on top of matplotlib ?

## Step 1 Connect to Client

First let's sync to our source server for data extraction, and pull a sample patient file.

We use the requests library to submit a request structured as a url and then convert the file (already in JSON format), into a JSON object in our dataframe.

In [1]:
import requests
import json

r = requests.get(f"https://api.logicahealth.org/researchonfhir/open/Patient/BILIBABY", headers={'Accept':'application/fhir+json'}, verify=False)
bundle = r.json()



In [None]:
#import requests
#import json

#r = requests.get(f"https://api.logicahealth.org/researchonfhir/open/Condition?code:text=headache", headers={'Accept':'application/fhir+json'}, verify=False)
#bundle = r.json()

Let's output the bundle to confirm we successfully accessed the server and queried data.

In [2]:
bundle

{'resourceType': 'Patient',
 'id': 'BILIBABY',
 'meta': {'versionId': '1',
  'lastUpdated': '2020-07-15T02:51:23.000+00:00',
  'source': '#mNKBng6Y74bFyYWP'},
 'text': {'status': 'generated',
  'div': '<div xmlns="http://www.w3.org/1999/xhtml">Bili Baby</div>'},
 'extension': [{'url': 'http://hl7.org/fhir/StructureDefinition/patient-birthTime',
   'valueDateTime': '2016-01-04T00:00:00-06:00'}],
 'active': True,
 'name': [{'family': 'Bili', 'given': ['Baby']}],
 'gender': 'male',
 'birthDate': '2016-01-04'}

## Step 2 Query Data

We are now positioned to submit specific queries to our source sevcer and retrieve data. We then save the results locally in JSON files for subsequent parseing

In [None]:
r = requests.get(f"https://api.logicahealth.org/researchonfhir/open/MedicationRequest?_include=MedicationRequest:patient", headers={'Accept':'application/fhir+json'}, verify=False)
bundle = r.json()

In [None]:
bundle

We can use the open function to generate a new file and then write the content of our dataframe to it

In [None]:
open('fhir-data/data.json', 'wb').write(r.content)

## Step 3 Mount Data onto Pandas Dataframe

Now that we've extracted information we need, we will then take the FHIR formatted data and convert it into a pandas dataframe for subsequent analysis.

The following set of functions parse the JSON into a pandas dataframe.

In [3]:
from pandas.io.json import json_normalize
import pandas as pd
import os


class Fhiry(object):
    def __init__(self):
        self._df = None
        self._filename = ""
        self._folder = ""

    @property
    def df(self):
        return self._df

    @property
    def filename(self):
        return self._filename

    @property
    def folder(self):
        return self._folder

    @filename.setter
    def filename(self, filename):
        self._filename = filename
        self._df = self.read_bundle_from_file(filename)

    @folder.setter
    def folder(self, folder):
        self._folder = folder

    def read_bundle_from_file(self, filename):
        with open(filename, 'r') as f:
            json_in = f.read()
            json_in = json.loads(json_in)
            return json_normalize(json_in['entry'])

    def delete_unwanted_cols(self):
        del self._df['resource.text.div']

    def process_df(self):
        """Read a single JSON resource or a directory full of JSON resources
        ONLY COMMON FIELDS IN ALL resources will be mapped
        """
        if self._folder:
            df = pd.DataFrame(columns=[])
            for file in os.listdir(self._folder):
                if file.endswith(".json"):
                    self._df = self.read_bundle_from_file(
                        os.path.join(self._folder, file))
                    self.delete_unwanted_cols()
                    self.convert_object_to_list()
                    self.add_patient_id()
                    if df.empty:
                        df = self._df
                    else:
                        df = pd.concat([df, self._df])
            self._df = df
        elif self._filename:
            self._df = self.read_bundle_from_file(self._filename)
            self.delete_unwanted_cols()
            self.convert_object_to_list()
            self.add_patient_id()

    def process_file(self, filename):
        self._df = self.read_bundle_from_file(filename)
        self.delete_unwanted_cols()
        self.convert_object_to_list()
        self.add_patient_id()
        return self._df

    def convert_object_to_list(self):
        """Convert object to a list of codes
        """
        for col in self._df.columns:
            if 'coding' in col:
                codes = self._df.apply(
                    lambda x: self.process_list(x[col]), axis=1)
                self._df = pd.concat(
                    [self._df, codes.to_frame(name=col+'codes')], 1)
                del self._df[col]
            if 'display' in col:
                codes = self._df.apply(
                    lambda x: self.process_list(x[col]), axis=1)
                self._df = pd.concat(
                    [self._df, codes.to_frame(name=col+'display')], 1)
                del self._df[col]

    def add_patient_id(self):
        """Create a patientId column with the resource.id of the first Patient resource
        """
        self._df['patientId'] = self._df[(
            self._df['resource.resourceType'] == "Patient")].iloc[0]['resource.id']

    def get_info(self):
        if self._df is None:
            return "Dataframe is empty"
        return self._df.info()

    def process_list(self, myList):
        """Extracts the codes from a list of objects
        Args:
            myList (list): A list of objects
        Returns:
            list: A list of codes
        """
        myCodes = []
        if isinstance(myList, list):
            for entry in myList:
                if 'code' in entry:
                    myCodes.append(entry['code'])
                else:
                    myCodes.append(entry['display'])
        return myCodes

In [None]:
# parallel file
import multiprocessing as mp



def process_files(file):
    f = Fhiry()
    return f.process_file(file)


def process_ndjson(file):
    f = Fhirndjson()
    return f.process_file(file)

def process(folder):
    try:
        pool = mp.Pool(mp.cpu_count())
        list_of_dataframes = pool.map(process_files, [folder + '/' + row for row in os.listdir(folder)])
        pool.close()
        return pd.concat(list_of_dataframes)
    except:
        f = Fhiry()
        f.folder = folder
        f.process_df()
        return f.df


def ndjson(folder):
    try:
        pool = mp.Pool(mp.cpu_count())
        list_of_dataframes = pool.map(
            process_ndjson, [folder + '/' + row for row in os.listdir(folder)])
        pool.close()
        return pd.concat(list_of_dataframes)
    except:
        f = Fhirndjson()
        f.folder = folder
        f.process_df()
        return f.df

We can now create our dataframe by calling the process function to parse all the json files within a given directory 

In [None]:
df = process('fhir-data')
df.info()

In [None]:
df.columns

In [None]:
df.head(5)

In [None]:
df['patientId'].unique()

## Step 4 Exploratory Data Analysis 

Conduct some limited, EDA for demonstration purposes.

In [4]:
url = 'https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis=207106+656659'
response = (requests.get(url).text)
response_json = json.loads(response)
response_json

{'nlmDisclaimer': 'It is not the intention of NLM to provide specific medical advice, but rather to provide users with information to better understand their health and their medications. NLM urges you to consult with a qualified physician for advice about medications.',
 'userInput': {'sources': [''], 'rxcuis': ['207106', '656659']},
 'fullInteractionTypeGroup': [{'sourceDisclaimer': 'DrugBank is intended for educational and scientific research purposes only and you expressly acknowledge and agree that use of DrugBank is at your sole risk. The accuracy of DrugBank information is not guaranteed and reliance on DrugBank shall be at your sole risk. DrugBank is not intended as a substitute for professional medical advice, diagnosis or treatment..[www.drugbank.ca]',
   'sourceName': 'DrugBank',
   'fullInteractionType': [{'comment': 'Drug1 (rxcui = 207106, name = fluconazole 50 MG Oral Tablet [Diflucan], tty = SBD). Drug2 (rxcui = 656659, name = bosentan 125 MG Oral Tablet, tty = SCD). Dru

In [5]:
# Read CSV file
df = pd.read_csv("medications.csv")

# Create new DataFrame with relevant columns
df = df[['START','STOP','PATIENT','CODE']]

# Exclude medications rows where the medication has been stopped (i.e. Asssumption that no "current" drug interactions will take place for medications that are stopped)
df = df[df['STOP'].isnull()]

# Create groups of drugs for each patient
groups_by_patient = df.groupby('PATIENT', sort=False)['CODE'].apply(lambda x: x.values.tolist())

# Declare variables for index counting and sum total of drug interactions
index = 1
count_drug_int = 0

In [6]:
df.head()

Unnamed: 0,START,STOP,PATIENT,CODE
0,1988-09-05,,71949668-1c2e-43ae-ab0a-64654608defb,834060
11,1943-04-13,,c2caaace-9119-4b2d-a2c3-4040f5a9cf32,834060
12,1948-11-05,,c2caaace-9119-4b2d-a2c3-4040f5a9cf32,834060
13,1951-05-26,,c2caaace-9119-4b2d-a2c3-4040f5a9cf32,1049221
14,1943-12-11,,96b24072-e1fe-49cd-a22a-6dfb92c3994c,834060


In [7]:
groups_by_patient = df.groupby('PATIENT', sort=False)['CODE'].apply(lambda x: x.values.tolist())
groups_by_patient.head()

PATIENT
71949668-1c2e-43ae-ab0a-64654608defb                            [834060]
c2caaace-9119-4b2d-a2c3-4040f5a9cf32           [834060, 834060, 1049221]
96b24072-e1fe-49cd-a22a-6dfb92c3994c    [834060, 834101, 860975, 106892]
de43eb48-496c-46d4-8c5b-be6125a38c15            [834060, 896188, 860975]
31410948-38be-4990-be5e-a47ab44f33a1                    [849727, 904420]
Name: CODE, dtype: object

In [8]:
# Function for calling NLM API
def get_api_data(drug_list):
    try:
        url = 'https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis=' + drug_list
        response = (requests.get(url).text)
        response_json = json.loads(response)
        return response_json

    except Exception as e:
        raise e

In [None]:
# Iterate through each patient list of medications
for drug_list in groups_by_patient:
    joined_drug_list = "+".join(str(i) for i in drug_list)
    # print(i, " " , drug_list)
    index += 1
    data = get_api_data(joined_drug_list) # returns JSON response
    if 'fullInteractionTypeGroup' not in data:
        print('No drug interaction: ', joined_drug_list)
        continue
    count_drug_int += 1
    print('Drug interaction: ', joined_drug_list)

print('Total number of drug interactions: ', count_drug_int)

No drug interaction:  834060
No drug interaction:  834060+834060+1049221
Drug interaction:  834060+834101+860975+106892
Drug interaction:  834060+896188+860975
Drug interaction:  849727+904420
Drug interaction:  895994+745679+834101+896188+834101
No drug interaction:  834060+1803932+575971+1803932+575971+1803932+575971+1803932+575971+1803932+575971+1803932+575971+904420
Drug interaction:  834101+1359133
No drug interaction:  834060
No drug interaction:  834060
Drug interaction:  849727+824184
Drug interaction:  834060+1803932+575971+1803932+575971+1803932+575971+1803932+575971+1803932+575971+1803932+575971+896188
No drug interaction:  834060+834101
Drug interaction:  861467+834101+896188
No drug interaction:  831533
No drug interaction:  834101
Drug interaction:  849727+106258+1049630+727316+896188
Drug interaction:  860975+897122+309362+312961+197361+564666
Drug interaction:  834060+834101+309362+312961+197361+564666
No drug interaction:  834060+834101+834101+904420
No drug interactio

No drug interaction:  834060
Drug interaction:  1049630+727316
Drug interaction:  239981+895994+745679+849727
Drug interaction:  239981+727316+834060+834060+1536586
Drug interaction:  239981+727316+896188
No drug interaction:  834060+834101
No drug interaction:  834101+834101
No drug interaction:  834060
No drug interaction:  834101
Drug interaction:  665078+727316+834060+895994+745679
No drug interaction:  834060
No drug interaction:  834060
No drug interaction:  834101
No drug interaction:  834060
Drug interaction:  834101+309362+312961+197361+564666
Drug interaction:  834060+834060+834101+309362+312961+197361+564666
No drug interaction:  239981+757594
Drug interaction:  834101+860975+312961+197361+106892+564666+309362
No drug interaction:  834060+834060
No drug interaction:  834060
Drug interaction:  834101+106258+311372+727316
No drug interaction:  834060
No drug interaction:  834060+834060+834101
Drug interaction:  834060+106258+849727+834101
No drug interaction:  834060+834101+86

No drug interaction:  834060
Drug interaction:  896188+861467
No drug interaction:  860975
Drug interaction:  834060+309362+312961+197361+564666
Drug interaction:  834060+849727+860975+831533+309362+312961+197361+564666
Drug interaction:  997488+727316
No drug interaction:  834101+834101
No drug interaction:  757594
Drug interaction:  834060+834060+904420+309362+312961+197361+564666
No drug interaction:  849437
No drug interaction:  834060+860975
No drug interaction:  834060+564666
No drug interaction:  834060+1803932+575971+849727+1803932+575971+1803932+575971+1803932+575971+1803932+575971+1803932+575971
No drug interaction:  834060
No drug interaction:  834060+1049221+834101
Drug interaction:  895994+745679
No drug interaction:  834060+834101
Drug interaction:  997488+727316
No drug interaction:  834060+860975
No drug interaction:  834060
No drug interaction:  834060
Drug interaction:  312961+197361+564666+309362
No drug interaction:  834101
No drug interaction:  834060
Drug interact

## Summary

(A review of what was done in this notebook, possibly reinforcing how this kind of use case could be useful in the real world)