# Notebook for exploratory data analysis of FHIR data

The .json data is formulated in the [FHIR](https://www.hl7.org/fhir/overview.html) standard.
- The basic building block in FHIR is a Resource
- Each resource consists of data elements that describe the healthcare concept.
- See the [FHIR Resource List](https://www.hl7.org/fhir/resourcelist.html) for a list of resources.
- For our use case, we will primarily be using the [Patient](https://www.hl7.org/fhir/patient.html) resource.
- The outcome of this EDA is a number of actionable testing/cleaning steps which will be implemented in tools/data_tests

In [40]:
import sys
sys.path.insert(1, '..')
import pandas as pd
import json
import importlib
for k,v in list(sys.modules.items()):
    if k.startswith('tools'):
        importlib.reload(v)

## Bundle EDA

In [37]:
# read in an example FHIR patient data file.
with open('../data/Aaron697_Jerde200_6fa23508-960e-ff22-c3d0-0519a036543b.json') as f:
    bundle_json = json.load(f)

In [39]:
# Data records are stored as FHIR bundles. Each bundle contains a list of entries.
from fhir.resources.bundle import Bundle

bundle = Bundle.parse_obj(bundle_json)
# # view all different resource types in the bundle - patient should be the only relevant one for this project.
print(set([e.resource.resource_type for e in bundle.entry]))

# TODO: we need to check there is only one patient per bundle, and that the patient is the first entry for all .json files in the data folder since the entries field is a list.
#  This can be implemented as a test.

{'Condition', 'DiagnosticReport', 'AllergyIntolerance', 'Immunization', 'Procedure', 'Patient', 'Provenance', 'Claim', 'Observation', 'CarePlan', 'ExplanationOfBenefit', 'CareTeam', 'MedicationRequest', 'Encounter', 'DocumentReference'}


## Patient EDA

In [35]:
# extract the patient data
from fhir.resources.patient import Patient

patient = Patient.parse_obj(bundle.entry[0].resource)

# print all fields in the patient data object
patient_field_list = [field for field,value in patient]
print(patient_field_list)

# TODO: we need to check that all fields in the patient data object are valid/expected based on the FHIR model. This can be implemented as a test.
# TODO: we also need to check that all field values in the patient data object are valid/expected based on the FHIR model. This can also be implemented as a test.

['resource_type', 'fhir_comments', 'id', 'implicitRules', 'implicitRules__ext', 'language', 'language__ext', 'meta', 'contained', 'extension', 'modifierExtension', 'text', 'active', 'active__ext', 'address', 'birthDate', 'birthDate__ext', 'communication', 'contact', 'deceasedBoolean', 'deceasedBoolean__ext', 'deceasedDateTime', 'deceasedDateTime__ext', 'gender', 'gender__ext', 'generalPractitioner', 'identifier', 'link', 'managingOrganization', 'maritalStatus', 'multipleBirthBoolean', 'multipleBirthBoolean__ext', 'multipleBirthInteger', 'multipleBirthInteger__ext', 'name', 'photo', 'telecom']
