In [13]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [14]:
from fhir_client import FHIRClient
import logging
import pandas as pd

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

client = FHIRClient(service_base_url='https://r3.smarthealthit.org', logger=logger)

INFO:__main__:Capability statement of https://r3.smarthealthit.org was successfully received.


## Querying Patients
There are two general ways of searching for patients with specific properties.

The first one is to search by coding system:

In [3]:
procedures = client.get_all_procedures()
pd.DataFrame([prod.code['coding'][0] for prod in procedures]).drop_duplicates().sort_values(by=['display']).head()

INFO:root:Received 18117 procedures in 61.48 seconds.


Unnamed: 0,code,display,system
893,183450002,Admission to burn unit,http://snomed.info/sct
1911,305340004,Admission to long stay hospital,http://snomed.info/sct
83,305428000,Admission to orthopedic department,http://snomed.info/sct
6217,305433001,Admission to trauma surgery department,http://snomed.info/sct
13687,35637008,Alcohol rehabilitation,http://snomed.info/sct


In [12]:
patients_by_procedure_code = client.get_patients_by_procedure_code("http://snomed.info/sct","73761001")
"Retrieved {} patients with a total of {} observations".format( len(patients_by_procedure_code), 
                                                               sum([len(pat.observations) for pat in patients_by_procedure_code]))

INFO:root:Received 47 observations in 0.32 seconds.
INFO:root:Received 165 observations in 0.90 seconds.
INFO:root:Received 162 observations in 0.81 seconds.
INFO:root:Received 54 observations in 0.48 seconds.
INFO:root:Received 178 observations in 0.87 seconds.
INFO:root:Received 63 observations in 0.40 seconds.
INFO:root:Received 45 observations in 0.26 seconds.
INFO:root:Received 48 observations in 0.28 seconds.
INFO:root:Received 61 observations in 0.43 seconds.
INFO:root:Received 98 observations in 0.45 seconds.
INFO:root:Received 155 observations in 0.79 seconds.
INFO:root:Received 49 observations in 0.25 seconds.
INFO:root:Received 89 observations in 0.59 seconds.
INFO:root:Received 41 observations in 0.25 seconds.
INFO:root:Received 139 observations in 0.71 seconds.
INFO:root:Received 33 observations in 0.22 seconds.
INFO:root:Received 163 observations in 0.83 seconds.
INFO:root:Received 163 observations in 0.81 seconds.
INFO:root:Received 37 observations in 0.22 seconds.
INFO:

INFO:root:Received 201 observations in 0.85 seconds.
INFO:root:Received 55 observations in 0.42 seconds.
INFO:root:Received 113 observations in 0.51 seconds.
INFO:root:Received 162 observations in 0.67 seconds.
INFO:root:Received 24 observations in 0.18 seconds.
INFO:root:Received 149 observations in 0.51 seconds.
INFO:root:Received 52 observations in 0.31 seconds.
INFO:root:Received 52 observations in 0.31 seconds.
INFO:root:Received 195 observations in 0.69 seconds.
INFO:root:Received 41 observations in 0.19 seconds.
INFO:root:Received 147 observations in 0.52 seconds.
INFO:root:Received 52 observations in 0.32 seconds.
INFO:root:Received 33 observations in 0.18 seconds.
INFO:root:Received 62 observations in 0.34 seconds.
INFO:root:Received 155 observations in 0.71 seconds.
INFO:root:Received 64 observations in 0.35 seconds.
INFO:root:Received 86 observations in 0.35 seconds.
INFO:root:Received 36 observations in 0.18 seconds.
INFO:root:Received 34 observations in 0.19 seconds.
INFO:

'Retrieved 252 patients with a total of 23145 observations'

The second one is by text. The searched text will be `CodeableConcept.text`, `Coding.display`, or `Identifier.type.text`:

In [11]:
conditions = client.get_all_conditions()
pd.DataFrame([cond.code['coding'][0] for cond in conditions]).drop_duplicates(subset=['display']).sort_values(by='display', ascending=True).head()

Unnamed: 0,code,display,system
488,30473006,Abdominal pain,http://snomed.info/sct
140,102594003,Abnormal ECG,http://snomed.info/sct
6801,26079004,Abnormal involuntary movement,http://snomed.info/sct
6276,168750009,"Abnormal mammogram, unspecified",http://snomed.info/sct
6569,312399001,Abnormal results of thyroid function studies,http://snomed.info/sct


In [5]:
pd.DataFrame([cond.code['coding'][0] for cond in conditions])

Unnamed: 0,code,display,system
0,10509002,Acute bronchitis (disorder),http://snomed.info/sct
1,443165006,Pathological fracture due to osteoporosis (dis...,http://snomed.info/sct
2,36971009,Sinusitis (disorder),http://snomed.info/sct
3,195662009,Acute viral pharyngitis (disorder),http://snomed.info/sct
4,193590000,Mature cataract,http://snomed.info/sct
5,197927001,Recurrent urinary tract infection,http://snomed.info/sct
6,201834006,"Localized, primary osteoarthritis of the hand",http://snomed.info/sct
7,75498004,Acute bacterial sinusitis (disorder),http://snomed.info/sct
8,15777000,Prediabetes,http://snomed.info/sct
9,39848009,Whiplash injury to neck,http://snomed.info/sct


In [15]:
patients_by_condition_text = client.get_patients_by_condition_text("Abdominal pain")
"Retrieved {} patients with a total of {} observations".format( len(patients_by_condition_text), 
                                                               sum([len(pat.observations) for pat in patients_by_condition_text]))


INFO:root:Received 154 observations in 0.64 seconds.
INFO:root:Received 59 observations in 0.32 seconds.
INFO:root:Received 162 observations in 0.67 seconds.
INFO:root:Received 206 observations in 0.82 seconds.
INFO:root:Received 4 patients in 2.82 seconds.


'Retrieved 4 patients with a total of 581 observations'

## Machine Learning

In [16]:
from ml_on_fhir import MLOnFHIRClassifier
from fhir_objects.patient import Patient
from sklearn.tree import DecisionTreeClassifier

ml_fhir = MLOnFHIRClassifier(Patient, feature_attrs=['birthDate'], label_attrs=['gender'])
X, y, trained_clf = ml_fhir.fit(patients_by_condition_text, DecisionTreeClassifier())

from sklearn.metrics import accuracy_score, roc_curve, auc
fpr, tpr, _ = roc_curve(y, trained_clf.predict(X))
print("Prediction accuracy {}".format( auc(fpr, tpr) ) )

INFO:root:Extracting attributes from data set
INFO:root:Preprocessing data
INFO:root:Started training of clf
INFO:root:Training completed
INFO:root:Accuracy : 1.0, F1-score : 1.0


Prediction accuracy 1.0


## Custom Preprocessing Classes

#### The first five values of the ` birthDate` feature that has been preprocessed into an age in years:

In [6]:
X[:5]

array([[57],
       [56],
       [72],
       [61],
       [77]])

If you want to preprocess fhir resources differently, feel free to implement your own preprocessing class. 
We can use the `register_preprocessor` function to do so. It is crucial to stick to the following naming scheme: 
    
    "<FHIR_Object_Name><FHIR_Resource_Name>Processor"
    
Furthemore, at least the methods `fit` and `transform` have to be implemented.

In [3]:
from preprocessing import register_preprocessor

from sklearn.base import BaseEstimator
import datetime as dt
import numpy as np
from fhir_objects.fhir_resources import date_format

class PatientBirthdateProcessor(BaseEstimator):
    """
    Calculates the age in days to use birthdate as a feature 
    """
    def transform(self, X, **transform_params):
        ages = []
        for birthdate in X:
            b_date = dt.datetime.strptime(birthdate[0], date_format)
            ages.append([int(
                            (dt.datetime.now().date() - b_date.date()).days)])
        return np.array(ages)

    def fit(self, X, y=None, **fit_params):
        return self
    
register_preprocessor(PatientBirthdateProcessor)

In [4]:
ml_fhir = MLOnFHIR(Patient, feature_attrs=['birthDate'], label_attrs=['gender'])
X, y, trained_clf = ml_fhir.fit(patients_by_procedure_code, DecisionTreeClassifier())

NameError: name 'MLOnFHIR' is not defined

#### The `birthDate` feature is now the age in days:

In [9]:
X[:5]

array([[21140],
       [20769],
       [26591],
       [22512],
       [28271]])