# Virtualized Patient Population

To-Do:
- change output icd9_code to procedure_icd9_code
- create ICD diagnosis x other entities probabilities
- determine significance thresholds for each pair of entities
- Incorporate ICD diagnoses into problems
- Create OCCURS_WITH relationships with z-scores between continuous variables of interest
- Figure out how to add timedeltas between problems and the change in z-scores for labs or the occurrence of prescriptions

In [1]:
from datetime import datetime
from progressbar import ProgressBar
import pandas as pd
import time

In [2]:
from neo4j import GraphDatabase
driver=GraphDatabase.driver(uri="bolt://localhost:7687", auth=('neo4j','NikeshIsCool'))
session=driver.session()

Entities of interest:
- Problem
- Prescriptions -> Concept
- Labevents -> D_Labitems -> Concept
- Diagnoses_Icd -> D_Icd_Diagnoses -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)
- Procedures_Icd -> D_Icd_Procedures -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)

Relationships to create between entities of interest:
![virtualized relationships](images/Virtual_relationship_schema.png)

We'll start with these relationships to improve performance on our current use cases:  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {co_occurrence_probability: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Prescriptions)  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {co_occurrence_probability: __, z_score: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Labevents)  

The essential equation used to determine co-occurance probability for our purposes is:  
(Probability of A and B co-occuring during an admission) / (Probability of A + Probability of B)  

For labs, we calculate the probability that the lab will be flagged as abnormal, not the probability that it will be ordered. For prescriptions and procedures, we calculate the probability that they will be ordered.

## Find the probability of each individual entity of interest occurring during an admission

In [5]:
# Get the probability of each problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]->(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS Probability_Entity, Pt
WHERE Pt > 20
RETURN Problem_CUI, Probability_Entity
'''
data = session.run(query)
problem_probabilities = pd.DataFrame([dict(record) for record in data])

In [5]:
# Get the probability of each prescription in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(b:Concept)
WITH b.cui AS Rx_CUI, count(distinct(ad)) AS RxTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Rx_CUI, toFloat(RxTotal)/ptTotal AS Probability_Entity
'''
data = session.run(query)
rx_probabilities = pd.DataFrame([dict(record) for record in data])

In [3]:
# Get the probability of each abnormal lab in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(lab:Labevents)-[:INSTANCE_OF]->(b:Concept)
WHERE lab.flag IS NOT NULL
WITH b.cui AS Lab_CUI, count(distinct(ad)) AS LabTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Lab_CUI, toFloat(LabTotal)/ptTotal AS Probability_Entity
'''
data = session.run(query)
abnormal_lab_probabilities = pd.DataFrame([dict(record) for record in data])

In [10]:
# Get the probability of each procedure in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]->(proc:Procedures_Icd)
WITH proc.icd9_code AS procedure_icd9_code, count(distinct(ad)) AS ProcTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN procedure_icd9_code, toFloat(ProcTotal)/ptTotal AS Probability_Entity
'''
data = session.run(query)
procedure_probabilities = pd.DataFrame([dict(record) for record in data])

In [8]:
# Get the probability of each ICD diagnosis in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]->(dx:Diagnoses_Icd)
WITH dx.icd9_code AS dx_icd9_code, count(distinct(ad)) AS ProcTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN dx_icd9_code, toFloat(ProcTotal)/ptTotal AS Probability_Entity
'''
data = session.run(query)
diagnosis_probabilities = pd.DataFrame([dict(record) for record in data])

## Find the probability of each pair of different entities of interest occuring together

In [5]:
def two_different_entities_co_occurrence_normalized(entity1_df, entity2_df, entity1_id, entity2_id):
    
    # Calculate the co-occurance probability
    co_occurrence_df = pd.merge(combined_entities_df, entity1_df, on=entity1_id)
    co_occurrence_df = pd.merge(co_occurrence_df, entity2_df, on=entity2_id, suffixes=['_1', '_2'])
    co_occurrence_df['co_occurrence_probability'] = co_occurrence_df.combined_entities_probability / (co_occurrence_df.Probability_Entity_1 + co_occurrence_df.Probability_Entity_2)

    # Normalize the co-occurance probability using the min-max method
    co_occurrence_df['normalized_co_occurrence_probability'] = (co_occurrence_df.co_occurrence_probability-co_occurrence_df.co_occurrence_probability.min())/(co_occurrence_df.co_occurrence_probability.max()-co_occurrence_df.co_occurrence_probability.min())

    # Sort the dataframe
    co_occurrence_df.sort_values(by='normalized_co_occurrence_probability', ascending=False, inplace=True)
    
    # Write out to CSV
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    filename = 'normalized_co_occurrence_probability_'+entity1_id+'_'+entity2_id+'_'+timestamp+'.csv'
    co_occurrence_df.loc[:,[entity1_id, entity2_id, 'normalized_co_occurrence_probability']].to_csv(filename, index=False)
    
    return 'Saved results as '+filename

In [63]:
# Get the probability of each pair of abnormal lab and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]->(lab:Labevents)-[:INSTANCE_OF]->(c:Concept)
WHERE lab.flag IS NOT NULL
WITH p.cui AS Problem_CUI, c.cui AS Lab_CUI, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Lab_CUI, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=problem_probabilities, 
    entity2_df=abnormal_lab_probabilities, 
    entity1_id='Problem_CUI', 
    entity2_id='Lab_CUI')

'Saved results as normalized_co_occurrence_probability_Problem_CUI_Lab_CUI_2021-12-16_16-23-07.csv'

In [67]:
# Get the probability of each pair of prescription and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(c:Concept)
WITH p.cui AS Problem_CUI, c.cui AS Rx_CUI, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Rx_CUI, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=problem_probabilities, 
    entity2_df=rx_probabilities, 
    entity1_id='Problem_CUI', 
    entity2_id='Rx_CUI')

'Saved results as normalized_co_occurrence_probability_Problem_CUI_Rx_CUI_2021-12-16_16-36-33.csv'

In [73]:
# Get the probability of each pair of procedure and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]->(proc:Procedures_Icd)
WITH p.cui AS Problem_CUI, proc.icd9_code AS icd9_code, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, icd9_code, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=problem_probabilities, 
    entity2_df=procedure_probabilities, 
    entity1_id='Problem_CUI', 
    entity2_id='icd9_code')

'Saved results as normalized_co_occurrence_probability_Problem_CUI_icd9_code_2021-12-16_19-13-17.csv'

In [11]:
# Get the probability of each pair of procedure and abnormal lab
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (proc:Procedures_Icd)<-[:HAD]-(Pt)-[:HAD]->(lab:Labevents)-[:INSTANCE_OF]->(c:Concept)
WHERE lab.flag IS NOT NULL
WITH proc.icd9_code AS icd9_code, c.cui AS Lab_CUI, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN icd9_code, Lab_CUI, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=procedure_probabilities, 
    entity2_df=abnormal_lab_probabilities, 
    entity1_id='icd9_code', 
    entity2_id='Lab_CUI')

'Saved results as normalized_co_occurrence_probability_icd9_code_Lab_CUI_2021-12-17_08-51-05.csv'

In [13]:
# Get the probability of each pair of procedure and prescription
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (proc:Procedures_Icd)<-[:HAD]-(Pt)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(c:Concept)
WITH proc.icd9_code AS icd9_code, c.cui AS Rx_CUI, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN icd9_code, Rx_CUI, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=procedure_probabilities, 
    entity2_df=rx_probabilities, 
    entity1_id='icd9_code', 
    entity2_id='Rx_CUI')

'Saved results as normalized_co_occurrence_probability_icd9_code_Rx_CUI_2021-12-17_08-54-58.csv'

In [7]:
# Get the probability of each pair of prescription and abnormal lab
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (c1:Concept)<-[:INSTANCE_OF]-(rx:Prescriptions)<-[:HAD]-(Pt)-[:HAD]->(lab:Labevents)-[:INSTANCE_OF]->(c2:Concept)
WHERE lab.flag IS NOT NULL
WITH c1.cui AS Rx_CUI, c2.cui AS Lab_CUI, count(distinct(ad)) AS combinedTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Rx_CUI, Lab_CUI, toFloat(combinedTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

two_different_entities_co_occurrence_normalized(
    entity1_df=rx_probabilities, 
    entity2_df=abnormal_lab_probabilities, 
    entity1_id='Rx_CUI', 
    entity2_id='Lab_CUI')

'Saved results as normalized_co_occurrence_probability_Rx_CUI_Lab_CUI_2021-12-17_09-20-27.csv'

## Find the probability of each entity of interest occuring with another of its kind (e.g. problems with problems)

In [6]:
def pair_same_entity_co_occurrence_normalized(entity_df, entity_id):
    
    entity1_id = entity_id+'_1'
    entity2_id = entity_id+'_2'
    
    # Calculate the co-occurance probability
    co_occurrence_df = pd.merge(combined_entities_df, entity_df, left_on=entity1_id, right_on=entity_id)
    
    co_occurrence_df = pd.merge(co_occurrence_df, entity_df, left_on=entity2_id, right_on=entity_id)
    co_occurrence_df['co_occurrence_probability'] = co_occurrence_df.combined_entities_probability / (co_occurrence_df.Probability_Entity_x + co_occurrence_df.Probability_Entity_y)
    
    # Drop duplicates
    co_occurrence_df['ID_pair'] = co_occurrence_df.loc[:,[entity1_id, entity2_id]].values.tolist()
    co_occurrence_df.ID_pair.apply(lambda x: x.sort())
    co_occurrence_df[[entity1_id,entity2_id]] = pd.DataFrame(co_occurrence_df.ID_pair.tolist(), index= co_occurrence_df.index)
    co_occurrence_df.drop_duplicates(subset=[entity1_id,entity2_id], inplace=True)
    
    # Normalize the co-occurance probability using the min-max method
    co_occurrence_df['normalized_co_occurrence_probability'] = (co_occurrence_df.co_occurrence_probability-co_occurrence_df.co_occurrence_probability.min())/(co_occurrence_df.co_occurrence_probability.max()-co_occurrence_df.co_occurrence_probability.min())
    
    # Sort the dataframe
    co_occurrence_df.sort_values(by='normalized_co_occurrence_probability', ascending=False, inplace=True)
    
    # Write out to CSV
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    filename = 'normalized_co_occurrence_probability_paired_'+entity_id+'_'+timestamp+'.csv'
    co_occurrence_df.loc[:,[entity1_id, entity2_id, 'normalized_co_occurrence_probability']].to_csv(filename, index=False)
    
    return 'Saved results as '+filename

In [106]:
# Get the probability of each pair of problems co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p1:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD_PROBLEM]->(p2:Problem)
WHERE p1.cui <> p2.cui
WITH p1.cui AS Problem_CUI_1, p2.cui AS Problem_CUI_2, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN Problem_CUI_1, Problem_CUI_2, toFloat(ProblemPairTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

pair_same_entity_co_occurrence_normalized(
    entity_df = problem_probabilities,
    entity_id = 'Problem_CUI')

'Saved results as normalized_co_occurrence_probability_paired_Problem_CUI_2021-12-17_07-32-44.csv'

In [7]:
# Get the probability of each pair of abnormal lab events co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (p1:Concept)<-[:INSTANCE_OF]-(lab1:Labevents)-[:OCCURRED_DURING]->(ad:Admissions)<-[:OCCURRED_DURING]-(lab2:Labevents)-[:INSTANCE_OF]->(p2:Concept)
WHERE lab1.flag IS NOT NULL AND lab2.flag IS NOT NULL AND p1.cui <> p2.cui
WITH p1.cui AS Lab_CUI_1, p2.cui AS Lab_CUI_2, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN Lab_CUI_1, Lab_CUI_2, toFloat(ProblemPairTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

pair_same_entity_co_occurrence_normalized(
    entity_df = abnormal_lab_probabilities,
    entity_id = 'Lab_CUI')

'Saved results as normalized_co_occurrence_probability_paired_Lab_CUI_2021-12-17_15-18-35.csv'

In [11]:
# Get the probability of each pair of procedures co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p1:Procedures_Icd)<-[:HAD]-(Pt)-[:HAD]->(p2:Procedures_Icd)
WHERE p1.icd9_code <> p2.icd9_code
WITH p1.icd9_code AS procedure_icd9_code_1, p2.icd9_code AS procedure_icd9_code_2, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN procedure_icd9_code_1, procedure_icd9_code_2, toFloat(ProblemPairTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

pair_same_entity_co_occurrence_normalized(
    entity_df = procedure_probabilities,
    entity_id = 'procedure_icd9_code')

'Saved results as normalized_co_occurrence_probability_paired_procedure_icd9_code_2021-12-17_16-09-08.csv'

In [9]:
# Get the probability of each pair of diagnoses co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p1:Diagnoses_Icd)<-[:HAD]-(Pt)-[:HAD]->(p2:Diagnoses_Icd)
WHERE p1.icd9_code <> p2.icd9_code
WITH p1.icd9_code AS dx_icd9_code_1, p2.icd9_code AS dx_icd9_code_2, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN dx_icd9_code_1, dx_icd9_code_2, toFloat(ProblemPairTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

pair_same_entity_co_occurrence_normalized(
    entity_df = diagnosis_probabilities,
    entity_id = 'dx_icd9_code')

'Saved results as normalized_co_occurrence_probability_paired_dx_icd9_code_2021-12-17_16-06-52.csv'

In [20]:
# Get the probability of each pair of prescriptions co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p1:Concept)<-[:INSTANCE_OF]-(:Prescriptions)<-[:HAD]-(Pt)-[:HAD]-(:Prescriptions)-[:INSTANCE_OF]->(p2:Concept)
WHERE p1.cui <> p2.cui
WITH p1.cui AS Rx_CUI_1, p2.cui AS Rx_CUI_2, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN Rx_CUI_1, Rx_CUI_2, toFloat(ProblemPairTotal)/ptTotal AS combined_entities_probability
'''
data = session.run(query)
combined_entities_df = pd.DataFrame([dict(record) for record in data])

pair_same_entity_co_occurrence_normalized(
    entity_df = rx_probabilities,
    entity_id = 'Rx_CUI')

'Saved results as normalized_co_occurrence_probability_paired_Rx_CUI_2021-12-17_11-24-12.csv'

## Determine significance thresholds for the normalized probability of co-occurance for each pair of entities  

![problems x problems](images/normalized_co_occurence_problem_pair.png)  
(Picture above) Normalized co-occurence probabilities for problem pairs. 

![rx x rx](images/normalized_co_occurence_prescription_pair.png)  
(Picture above) Normalized co-occurence probabilities for prescription pairs. 

![procedures x procedures](images/normalized_co_occurence_procedures_pair.png)  
(Picture above) Normalized co-occurence probabilities for procedure pairs. 

![labs x labs](images/normalized_co_occurence_abnormal_lab_pair.png)  
(Picture above) Normalized co-occurence probabilities for abnormal lab pairs. 

![labs x rx](images/normalized_co_occurence_labs_x_prescriptions.png)  
(Picture above) Normalized co-occurence probabilities for abnormal labs and prescriptions. 

![labs x procedures](images/normalized_co_occurence_procedures_x_labs.png)  
(Picture above) Normalized co-occurence probabilities for abnormal labs and procedures. 

![labs x problems](images/normalized_co_occurence_problem_x_abnormal_labs.png)  
(Picture above) Normalized co-occurence probabilities for problems and abnormal labs. 

![rx x problems](images/normalized_co_occurence_problem_x_prescriptions.png)  
(Picture above) Normalized co-occurence probabilities for problems and prescriptions. 

![procedures x problems](images/normalized_co_occurence_problem_x_procedures.png)  
(Picture above) Normalized co-occurence probabilities for problems and procedures. 

![rx x procedures](images/normalized_co_occurence_procedures_x_prescriptions.png)  
(Picture above) Normalized co-occurence probabilities for prescriptions and procedures. 

## Create 2-way co-occurrence nodes/relationships
Move the CSV into the database's Import folder

In [12]:
# Import the CUI-based co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (lab:Concept)
WHERE lab.cui = COLUMN.Lab_CUI AND lab.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrence_probability:toFloat(COLUMN.normalized_co_occurrence_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(lab)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f27766f6640>

In [109]:
# Import the procedure-other entity co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (rx:Concept)
WHERE rx.cui = COLUMN.Rx_CUI AND rx.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrence_probability:toFloat(COLUMN.co_occurrence_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(rx)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f6e96d6cd30>

In [None]:
# Import the other entity-procedure co-occurance probabilities into the database

In [None]:
# Import the procedure-procedure co-occurance probabilities into the database

In [74]:
# Import the problem-pair co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob1:Concept)
WHERE prob1.cui = COLUMN.Problem1_CUI AND prob1.cui_pref_term IS NOT NULL
MATCH (prob2:Concept)
WHERE prob2.cui = COLUMN.Problem2_CUI AND prob2.cui_pref_term IS NOT NULL
MERGE (prob1)<-[r:OCCURS_WITH {{co_occurrence_probability:toFloat(COLUMN.co_occurrence_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(prob2)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7fec48db34f0>

In [27]:
# Import the procedure-problem co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (proc:D_Icd_Procedures)
WHERE proc.icd9_code = COLUMN.icd9_code
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrence_probability:toFloat(COLUMN.co_occurrence_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(proc)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f2f847e40a0>

## Queries

In [36]:
def LikelyOrders(cui_prob_list):
    
    # Find prescriptions associated with the input problem    
    query = '''
    MATCH p=(ord:Concept)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list} AND ord.semantic_type IN ["['Clinical Drug']"]
    WITH round(r.co_occurrence_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.term AS `Order`, Score
    ORDER BY r.co_occurrence_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_rx = pd.DataFrame([dict(record) for record in data])
    
    # Find abnormal labs associated with the input problem
    query = '''
    MATCH p=(ord:Concept)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list} AND ord.semantic_type IN ["['Clinical Attribute']"]
    WITH round(r.co_occurrence_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.description AS `Order`, Score
    ORDER BY r.co_occurrence_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_lab = pd.DataFrame([dict(record) for record in data])
    
    # Find procedures associated with the input problem
    query = '''
    MATCH p=(ord:D_Icd_Procedures)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list}
    WITH round(r.co_occurrence_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.long_title AS `Order`, Score
    ORDER BY r.co_occurrence_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_procedure = pd.DataFrame([dict(record) for record in data])
    
    return orders_likely_rx, orders_likely_lab, orders_likely_procedure

In [37]:
start_time = time.time()

cui_prob_list = ['C0022661']
orders_likely_rx, orders_likely_lab, orders_likely_procedure = LikelyOrders(cui_prob_list)

print("Total runtime:", time.time() - start_time, "seconds")

Total runtime: 0.015000104904174805 seconds


In [30]:
orders_likely_rx

Unnamed: 0,Order,Score
0,calcitriol 0.00025 MG Oral Capsule,109.59
1,sevelamer carbonate 800 MG Oral Tablet [Renvela],106.88
2,sodium polystyrene sulfonate 250 MG/ML Oral Su...,103.32
3,1 ML epoetin alfa 4000 UNT/ML Injection [Procrit],79.31
4,150 ML Glucose 50 MG/ML Injection,76.26
...,...,...
92,100 ML Glucose 50 MG/ML Injection,20.79
93,Docusate Sodium 10 MG/ML Oral Suspension,20.71
94,Aspirin 81 MG Chewable Tablet,20.34
95,Amiodarone hydrochloride 200 MG Oral Tablet,20.32


In [31]:
orders_likely_lab

Unnamed: 0,Order,Score
0,Macrophage in Ascites,57.87
1,Lymphocytes in Other Body Fluid,57.69
2,Polys in Other Body Fluid,53.77
3,Lymphocytes in Ascites,52.82
4,"RBC, Ascites in Ascites",51.48
5,"WBC, Ascites in Ascites",50.71
6,Monocytes in Ascites,49.68
7,Polys in Ascites,48.77
8,Protein/Creatinine Ratio in Urine,48.01
9,pH in Urine,44.46


In [35]:
orders_likely_procedure

Unnamed: 0,Order,Score
0,Hemodialysis,69.13
1,Venous catheterization for renal dialysis,64.32
2,Other endoscopy of small intestine,24.7
3,Closed [endoscopic] biopsy of bronchus,23.03
4,"Venous catheterization, not elsewhere classified",21.32
5,Transfusion of packed cells,21.29
6,Insertion of endotracheal tube,20.16
