# Importing MIMIC IV data into a graph database

### About the data  
The data were obtaind from Physionet, where the following abstract describes the data: "The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare."

## Import MIMIC-IV Data to into a graph database
---

Data downloaded from https://physionet.org/content/mimiciv/0.4/ on 11 February, 2021. This is version 0.4 of the MIMIC-IV data, which was published 13 Aug 2020.

The downloaded data consisted of the following 27 CSV files, which collectively used 66.5 GB of memory:
- admissions.csv
- chartevents.csv
- datetimeevents.csv
- d_hcpcs.csv
- diagnoses_icd.csv
- d_icd_diagnoses.csv
- d_icd_procedures.csv
- d_items.csv
- d_labitems.csv
- drgcodes.csv
- emar.csv
- emar_detail.csv
- hcpcsevents.csv
- icustays.csv
- inputevents.csv
- labevents.csv
- microbiologyevents.csv
- outputevents.csv
- patients.csv
- pharmacy.csv
- poe.csv
- poe_detail.csv
- prescriptions.csv
- procedureevents.csv
- procedures_icd.csv
- services.csv
- transfers.csv

These files were placed in the Import folder of the MIMIC-IV Neo4j database to make them readily available for import into the graph.

In [3]:
import pandas as pd

In [4]:
path = '/home/tim/.local/share/neo4j-relate/dbmss/dbms-3077a569-80a7-4968-9e81-743773698121/import/'

# Create a list of all CSV files to import
csv_files = ['admissions.csv', 'chartevents.csv', 'datetimeevents.csv', 'd_hcpcs.csv', 'diagnoses_icd.csv', 'd_icd_diagnoses.csv', 'd_icd_procedures.csv', 'd_items.csv', 'd_labitems.csv', 'drgcodes.csv', 'emar.csv', 'emar_detail.csv', 'hcpcsevents.csv', 'icustays.csv', 'inputevents.csv', 'labevents.csv', 'microbiologyevents.csv', 'outputevents.csv', 'patients.csv', 'pharmacy.csv', 'poe.csv', 'poe_detail.csv', 'prescriptions.csv', 'procedureevents.csv', 'procedures_icd.csv', 'services.csv', 'transfers.csv']

# Create a dictionary with file names as keys and the list of headers for each file as values 
headers_dict = {}
for file in csv_files:
    headers = pd.read_csv(path+file, nrows=1)
    headers = headers.columns.tolist()
    if not file in headers_dict:
        headers_dict[file] = headers
        
# Inspect an example item in the dictionary
print(headers_dict['admissions.csv'])

['subject_id', 'hadm_id', 'admittime', 'dischtime', 'deathtime', 'admission_type', 'admission_location', 'discharge_location', 'insurance', 'language', 'marital_status', 'ethnicity', 'edregtime', 'edouttime', 'hospital_expire_flag']


In [5]:
# Create a function that writes the string for a cypher command
# to create nodes from each CSV file

def csv_to_node(csv_file):
    
    # Create the node label based on the CSV file name. Place it in title case and remove the '.csv' suffix
    label= csv_file[:-4].title() 
    
    # Convert the CSV's headers into node properties
    properties = '{'
    col_index = 0
    for header in headers_dict[csv_file]:
        properties = properties+header+':COLUMN['+str(col_index)+'], '
        col_index += 1
    properties = properties[:-2]+'}' # Delete last comma of the list and add the ending curly bracket
    
    # Compile the complete cypher command
    cypher = '''USING PERIODIC COMMIT 100000 LOAD CSV FROM "file:///{csv_file}" AS COLUMN CREATE (n:Mimic4:{label} {properties})'''.format(csv_file=csv_file, label=label, properties=properties)
    return cypher

In [6]:
# Generate the cypher code for a single csv file to test in the Neo4j browser
csv_to_node('d_labitems.csv')

'USING PERIODIC COMMIT 100000 LOAD CSV FROM "file:///d_labitems.csv" AS COLUMN CREATE (n:Mimic4:D_Labitems {itemid:COLUMN[0], label:COLUMN[1], fluid:COLUMN[2], category:COLUMN[3], loinc_code:COLUMN[4]})'

### Initialize a connection to the neo4j database.

In [None]:
import getpass
password = getpass.getpass("\nPlease enter the Neo4j database password to continue \n")

In [12]:
from neo4j import GraphDatabase
driver=GraphDatabase.driver(uri="bolt://localhost:7687", auth=('neo4j',password))
session=driver.session()

In [8]:
# Create all nodes
for csv_name in csv_files:
    query = csv_to_node(csv_name)
    session.run(query)

### Create relationships

In [13]:
# Create all relationships
query_list = ['MATCH (n:Admissions), (m:Chartevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Datetimeevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Diagnoses_Icd) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Drgcodes) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Icustays) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Inputevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Labevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Microbiologyevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Outputevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Prescriptions) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Procedureevents) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Procedures_Icd) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Services) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Admissions), (m:Transfers) WHERE m.hadm_id = n.hadm_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Chartevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Datetimeevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Inputevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Microbiologyevents) WHERE m.spec_itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Microbiologyevents) WHERE m.org_itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Microbiologyevents) WHERE m.ab_itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Outputevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Items), (m:Procedureevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:D_Labitems), (m:Labevents) WHERE m.itemid = n.itemid   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Diagnoses_Icd), (m:D_Icd_Diagnoses) WHERE m.icd_code = n.icd_code   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Emar_Detail), (m:Emar) WHERE m.emar_id = n.emar_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Hcpcsevents), (m:D_Hcpcs) WHERE m.code = n.hcpcs_cd   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Chartevents) WHERE m.stay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Datetimeevents) WHERE m.stay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Inputevents) WHERE m.stay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Outputevents) WHERE m.stay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Procedureevents) WHERE m.stay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Icustays), (m:Transfers) WHERE m.icustay_id = n.stay_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Admissions) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Chartevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Datetimeevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Diagnoses_Icd) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Drgcodes) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Icustays) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Inputevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Labevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Microbiologyevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Outputevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Prescriptions) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Procedureevents) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Procedures_Icd) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Services) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Patients), (m:Transfers) WHERE m.subject_id = n.subject_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Pharmacy), (m:Emar) WHERE m.pharmacy_id = n.pharmacy_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Pharmacy), (m:Emar_detail) WHERE m.pharmacy_id = n.pharmacy_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Poe), (m:Emar) WHERE m.poe_id = n.poe_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Poe), (m:Pharmacy) WHERE m.poe_id = n.poe_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Poe_Detail), (m:Poe) WHERE m.poe_id = n.poe_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Prescriptions), (m:Pharmacy) WHERE m.pharmacy_id = n.pharmacy_id   MERGE (n)<-[:SQL_CHILD_OF ]-(m)', 'MATCH (n:Procedures_Icd), (m:D_Icd_Procedures) WHERE m.icd_code = n.icd_code   MERGE (n)<-[:SQL_CHILD_OF ]-(m)']
count = 0
for command in query_list:
    session.run(command)
    count += 1
    print(count+'of 54')

KeyboardInterrupt: 

### Close the connection to the neo4j database

In [9]:
session.close()

### Data references:
MIMIC IV:
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2020). MIMIC-IV (version 0.4). PhysioNet. https://doi.org/10.13026/a3wn-hq05

Physionet:
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation. 101(23), pp. e215-e220.