# Virtualized Patient Population

To-Do:
- Incorporate ICD diagnoses into problems
- Incorporate ICD procedures into treatments
- Create OCCURS_WITH relationships between entities of interest that have z-scores or odds ratios
- Figure out how to add timedeltas between problems and the change in z-scores for labs or the occurrance of prescriptions

In [1]:
from datetime import datetime
from progressbar import ProgressBar
import pandas as pd
import time

In [2]:
from neo4j import GraphDatabase
driver=GraphDatabase.driver(uri="bolt://localhost:7687", auth=('neo4j','NikeshIsCool'))
session=driver.session()

Entities of interest:
- Caregivers
- Patients
- Problem
- Prescriptions -> Concept
- Labevents -> D_Labitems -> Concept
- Diagnoses_Icd -> D_Icd_Diagnoses -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)
- Procedures_Icd -> D_Icd_Procedures -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)

Relationships to create between entities of interest:
![virtualized relationships](images/Virtual_relationship_schema.png)

We'll start with these relationships to improve performance on our current use cases:  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {odds_ratio: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Prescriptions)  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {odds_ratio: __, z_score: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Labevents)  

## Probability of a prescription-problem pair occuring together vs separately

In [89]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
ORDER BY problem_gen_pop_probability DESC
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [90]:
# Get the probability of each prescription in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(b:Concept)
WITH b.cui AS Rx_CUI, count(distinct(ad)) AS RxTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Rx_CUI, toFloat(RxTotal)/ptTotal AS RxProbability
ORDER BY RxProbability DESC
'''
data = session.run(query)
Rx_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [91]:
# Get the probability of a specific pair of prescription and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(c:Concept)
WITH p.cui AS Problem_CUI, c.cui AS Rx_CUI, count(distinct(ad)) AS RxProbTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Rx_CUI, toFloat(RxProbTotal)/ptTotal AS Rx_Problem_Probability
ORDER BY Rx_Problem_Probability DESC
'''
data = session.run(query)
Rx_Problem_Probability = pd.DataFrame([dict(record) for record in data])

In [112]:
print(len(problem_gen_pop_probability))
print(len(Rx_gen_pop_probability))
print(len(Rx_Problem_Probability))

465
946
11073


In [95]:
Rx_Problem_Probability_merged = pd.merge(Rx_Problem_Probability, problem_gen_pop_probability, on=['Problem_CUI'])
Rx_Problem_Probability_merged = pd.merge(Rx_Problem_Probability_merged, Rx_gen_pop_probability, on=['Rx_CUI'])
Rx_Problem_Probability_merged['co_occurrance_probability'] = Rx_Problem_Probability_merged.Rx_Problem_Probability / (Rx_Problem_Probability_merged.problem_gen_pop_probability + Rx_Problem_Probability_merged.RxProbability)
Rx_Problem_Probability_merged

Unnamed: 0,Problem_CUI,Rx_CUI,Rx_Problem_Probability,problem_gen_pop_probability,RxProbability,co_occurrance_probability
0,C0022661,C0977439,0.008665,0.009495,0.369981,0.022833
1,C0020517,C0977439,0.007698,0.008427,0.369981,0.020343
2,C0011860,C0977439,0.007291,0.007902,0.369981,0.019295
3,C2830004,C0977439,0.006664,0.007138,0.369981,0.017670
4,C0039239,C0977439,0.006613,0.007003,0.369981,0.017541
...,...,...,...,...,...,...
11068,C0085762,C0980635,0.001882,0.003595,0.031708,0.053314
11069,C0085762,C0688559,0.001713,0.003595,0.014107,0.096743
11070,C0236663,C0688559,0.001543,0.001797,0.014107,0.097015
11071,C1321878,C0354080,0.000899,0.002120,0.028639,0.029217


In [96]:
Rx_Problem_Probability_merged.sort_values(by='co_occurrance_probability', ascending=False, inplace=True)
Rx_Problem_Probability_merged.head(20)

Unnamed: 0,Problem_CUI,Rx_CUI,Rx_Problem_Probability,problem_gen_pop_probability,RxProbability,co_occurrance_probability
11020,C0021400,C0875805,0.003306,0.006409,0.012361,0.176152
11060,C0085605,C1584819,0.002052,0.003442,0.014582,0.113829
9713,C0022661,C0975120,0.002577,0.009495,0.014023,0.109589
9385,C0022661,C1967412,0.002899,0.009495,0.017634,0.106875
9538,C0022661,C1951501,0.002798,0.009495,0.017583,0.103319
11070,C0236663,C0688559,0.001543,0.001797,0.014107,0.097015
11069,C0085762,C0688559,0.001713,0.003595,0.014107,0.096743
11049,C0036572,C1739168,0.002425,0.005087,0.022179,0.08893
11045,C0036572,C0875827,0.002577,0.005087,0.026197,0.082385
11064,C0085605,C0690704,0.000882,0.003442,0.007545,0.080247


In [98]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'Rx_Problem_co_occurrance_probability_'+timestamp+'.csv'
Rx_Problem_Probability_merged.loc[:,['Problem_CUI', 'Rx_CUI', 'co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [109]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (rx:Concept)
WHERE rx.cui = COLUMN.Rx_CUI AND rx.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(rx)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f6e96d6cd30>

## Probability of a prescription-problem pair occuring together vs separately

In [42]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
ORDER BY problem_gen_pop_probability DESC
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [31]:
# Get the probability of each abnormal lab in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(lab:Labevents)-[:INSTANCE_OF]->(b:Concept)
WHERE lab.flag IS NOT NULL
WITH b.cui AS Lab_CUI, count(distinct(ad)) AS LabTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Lab_CUI, toFloat(LabTotal)/ptTotal AS LabAbnormalProbability
ORDER BY LabAbnormalProbability DESC
'''
data = session.run(query)
Lab_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [34]:
# Get the probability of a specific pair of abnormal lab and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]-(lab:Labevents)-[:INSTANCE_OF]->(c:Concept)
WHERE lab.flag IS NOT NULL
WITH p.cui AS Problem_CUI, c.cui AS Lab_CUI, count(distinct(ad)) AS LabProbTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Lab_CUI, toFloat(LabProbTotal)/ptTotal AS Lab_Problem_Probability
ORDER BY Lab_Problem_Probability DESC
'''
data = session.run(query)
Lab_Problem_Probability = pd.DataFrame([dict(record) for record in data])

In [35]:
problem_gen_pop_probability

Unnamed: 0,Problem_CUI,problem_gen_pop_probability
0,C0022661,0.009495
1,C0020517,0.008427
2,C0011860,0.007902
3,C2830004,0.007138
4,C0039239,0.007003
...,...,...
314,C2346457,0.000899
315,C2936910,0.000865
316,C3160739,0.000865
317,C0085616,0.000848


In [36]:
Lab_gen_pop_probability

Unnamed: 0,Lab_CUI,LabAbnormalProbability
0,C0362923,0.921595
1,C0366777,0.901265
2,C0362910,0.877916
3,C0362934,0.812636
4,C0362978,0.755358
...,...,...
174,C1114256,0.001017
175,C0801528,0.000763
176,C0943517,0.000712
177,C0365157,0.000560


In [37]:
Lab_Problem_Probability

Unnamed: 0,Problem_CUI,Lab_CUI,Lab_Problem_Probability
0,C0022661,C0362910,0.009495
1,C0022661,C0366777,0.009495
2,C0022661,C0362923,0.009495
3,C0022661,C0364096,0.009241
4,C0022661,C0364133,0.009224
...,...,...,...
13768,C0575090,C0368013,0.000475
13769,C0575090,C0363885,0.000458
13770,C3263723,C0364290,0.000458
13771,C0267841,C0368036,0.000458


In [38]:
Lab_Problem_Probability_merged = pd.merge(Lab_Problem_Probability, problem_gen_pop_probability, on=['Problem_CUI'])
Lab_Problem_Probability_merged = pd.merge(Lab_Problem_Probability_merged, Lab_gen_pop_probability, on=['Lab_CUI'])
Lab_Problem_Probability_merged['co_occurrance_probability'] = Lab_Problem_Probability_merged.Lab_Problem_Probability / (Lab_Problem_Probability_merged.problem_gen_pop_probability + Lab_Problem_Probability_merged.LabAbnormalProbability)
Lab_Problem_Probability_merged

Unnamed: 0,Problem_CUI,Lab_CUI,Lab_Problem_Probability,problem_gen_pop_probability,LabAbnormalProbability,co_occurrance_probability
0,C0022661,C0362910,0.009495,0.009495,0.877916,0.010700
1,C0020517,C0362910,0.008308,0.008427,0.877916,0.009374
2,C0011860,C0362910,0.007868,0.007902,0.877916,0.008882
3,C2830004,C0362910,0.007105,0.007138,0.877916,0.008027
4,C0039239,C0362910,0.006884,0.007003,0.877916,0.007779
...,...,...,...,...,...,...
11910,C0085605,C1114285,0.001814,0.003442,0.026095,0.061424
11911,C0751781,C1114285,0.001119,0.003391,0.026095,0.037953
11912,C0003962,C1114285,0.001390,0.002560,0.026095,0.048521
11913,C0014867,C1114285,0.000966,0.001611,0.026095,0.034884


In [39]:
Lab_Problem_Probability_merged.sort_values(by='co_occurrance_probability', ascending=False, inplace=True)
Lab_Problem_Probability_merged.head(20)

Unnamed: 0,Problem_CUI,Lab_CUI,Lab_Problem_Probability,problem_gen_pop_probability,LabAbnormalProbability,co_occurrance_probability
11905,C0031039,C0942431,0.001814,0.002628,0.021619,0.074825
11900,C0032227,C1544494,0.002866,0.005901,0.03303,0.073606
11901,C0032227,C1114284,0.002645,0.005901,0.031623,0.070493
11829,C0032227,C0942424,0.003442,0.005901,0.044849,0.067825
11809,C0032227,C0942443,0.003459,0.005901,0.046409,0.066126
11822,C0032227,C0942432,0.003459,0.005901,0.048189,0.06395
11815,C0032227,C0942477,0.00334,0.005901,0.046493,0.063754
11902,C0032227,C1315831,0.002628,0.005901,0.035743,0.063111
11910,C0085605,C1114285,0.001814,0.003442,0.026095,0.061424
11656,C0085605,C1544491,0.001882,0.003442,0.0283,0.059295


In [40]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'LabAbnormal_Problem_co_occurrance_probability_'+timestamp+'.csv'
Lab_Problem_Probability_merged.loc[:,['Problem_CUI', 'Lab_CUI', 'co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [41]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (lab:Concept)
WHERE lab.cui = COLUMN.Lab_CUI AND lab.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(lab)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f570f520f40>

## Queries

In [60]:
def LikelyOrders(cui_prob_list):
        
    query = '''
    MATCH p=(ord:Concept)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list}
    WITH round(r.co_occurrance_probability, 3)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.term AS `Order`, ord.description AS AlternateDescription, Score
    ORDER BY r.co_occurrance_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    LikelyOrders = pd.DataFrame([dict(record) for record in data])
    
    # Assign prescriptions to a dataframe
    orders_likely_rx = LikelyOrders[LikelyOrders.AlternateDescription.isnull()]
    orders_likely_rx = orders_likely_rx.loc[:,['Order', 'Score']]
    
    # Assign labs likely to be abnormal to a dataframe
    orders_likely_lab = LikelyOrders[LikelyOrders.AlternateDescription.notnull()]
    orders_likely_lab = orders_likely_lab.loc[:,['AlternateDescription', 'Score']]
    orders_likely_lab.columns = ['Order', 'Score']
    
    return orders_likely_rx, orders_likely_lab

In [68]:
start_time = time.time()

cui_prob_list = ['C0022661']
orders_likely_rx, orders_likely_lab = LikelyOrders(cui_prob_list)

print("Total runtime:", time.time() - start_time, "seconds")

Total runtime: 0.009356498718261719 seconds


In [69]:
orders_likely_rx

Unnamed: 0,Order,Score
0,calcitriol 0.00025 MG Oral Capsule,110.0
1,sevelamer carbonate 800 MG Oral Tablet [Renvela],107.0
2,sodium polystyrene sulfonate 250 MG/ML Oral Su...,103.0
3,1 ML epoetin alfa 4000 UNT/ML Injection [Procrit],79.0
4,150 ML Glucose 50 MG/ML Injection,76.0
...,...,...
119,200 ML vancomycin 5 MG/ML Injection,21.0
120,acetaminophen 32 MG/ML Oral Solution,21.0
121,1000 ML glucose 50 MG/ML / sodium chloride 4.5...,21.0
125,100 ML Glucose 50 MG/ML Injection,21.0


In [66]:
orders_likely_lab

Unnamed: 0,Order,Score
11,Macrophage in Ascites,58.0
12,Lymphocytes in Other Body Fluid,58.0
13,Polys in Other Body Fluid,54.0
14,Lymphocytes in Ascites,53.0
16,"RBC, Ascites in Ascites",51.0
17,"WBC, Ascites in Ascites",51.0
18,Monocytes in Ascites,50.0
19,Polys in Ascites,49.0
20,Protein/Creatinine Ratio in Urine,48.0
22,pH in Urine,44.0
