# Virtualized Patient Population

To-Do:
- Incorporate ICD diagnoses into problems
- Incorporate ICD procedures into treatments
- Create OCCURS_WITH relationships between entities of interest that have z-scores or odds ratios
- Figure out how to add timedeltas between problems and the change in z-scores for labs or the occurrance of prescriptions

In [1]:
from datetime import datetime
from progressbar import ProgressBar
import pandas as pd
import time

In [2]:
from neo4j import GraphDatabase
driver=GraphDatabase.driver(uri="bolt://localhost:7687", auth=('neo4j','NikeshIsCool'))
session=driver.session()

Entities of interest:
- Problem
- Prescriptions -> Concept
- Labevents -> D_Labitems -> Concept
- Diagnoses_Icd -> D_Icd_Diagnoses -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)
- Procedures_Icd -> D_Icd_Procedures -> Concept (timedelta of limited utility, since ICD codes pertain to entire admission)

Relationships to create between entities of interest:
![virtualized relationships](images/Virtual_relationship_schema.png)

We'll start with these relationships to improve performance on our current use cases:  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {co_occurrance_probability: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Prescriptions)  
(:Problem) - [:INSTANCE_OF] -> (:Concept) - [:OCCURS_WITH {co_occurrance_probability: __, z_score: __, source: 'MIMIC-III v1.4', updated: timestamp}] - (:Concept) <- [:INSTANCE_OF] - (:Labevents)  

The essential equation used to determine co-occurance probability for our purposes is:  
(Probability of A and B co-occuring during an admission) / (Probability of A + Probability of B)  

For labs, we calculate the probability that the lab will be flagged as abnormal, not the probability that it will be ordered. For prescriptions and procedures, we calculate the probability that they will be ordered.

## Probability of a prescription-problem pair occuring together vs separately

In [89]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
ORDER BY problem_gen_pop_probability DESC
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [90]:
# Get the probability of each prescription in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(b:Concept)
WITH b.cui AS Rx_CUI, count(distinct(ad)) AS RxTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Rx_CUI, toFloat(RxTotal)/ptTotal AS RxProbability
ORDER BY RxProbability DESC
'''
data = session.run(query)
Rx_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [91]:
# Get the probability of a specific pair of prescription and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]-(rx:Prescriptions)-[:INSTANCE_OF]->(c:Concept)
WITH p.cui AS Problem_CUI, c.cui AS Rx_CUI, count(distinct(ad)) AS RxProbTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Rx_CUI, toFloat(RxProbTotal)/ptTotal AS Rx_Problem_Probability
ORDER BY Rx_Problem_Probability DESC
'''
data = session.run(query)
Rx_Problem_Probability = pd.DataFrame([dict(record) for record in data])

In [112]:
print(len(problem_gen_pop_probability))
print(len(Rx_gen_pop_probability))
print(len(Rx_Problem_Probability))

465
946
11073


In [95]:
Rx_Problem_Probability_merged = pd.merge(Rx_Problem_Probability, problem_gen_pop_probability, on=['Problem_CUI'])
Rx_Problem_Probability_merged = pd.merge(Rx_Problem_Probability_merged, Rx_gen_pop_probability, on=['Rx_CUI'])
Rx_Problem_Probability_merged['co_occurrance_probability'] = Rx_Problem_Probability_merged.Rx_Problem_Probability / (Rx_Problem_Probability_merged.problem_gen_pop_probability + Rx_Problem_Probability_merged.RxProbability)
Rx_Problem_Probability_merged

Unnamed: 0,Problem_CUI,Rx_CUI,Rx_Problem_Probability,problem_gen_pop_probability,RxProbability,co_occurrance_probability
0,C0022661,C0977439,0.008665,0.009495,0.369981,0.022833
1,C0020517,C0977439,0.007698,0.008427,0.369981,0.020343
2,C0011860,C0977439,0.007291,0.007902,0.369981,0.019295
3,C2830004,C0977439,0.006664,0.007138,0.369981,0.017670
4,C0039239,C0977439,0.006613,0.007003,0.369981,0.017541
...,...,...,...,...,...,...
11068,C0085762,C0980635,0.001882,0.003595,0.031708,0.053314
11069,C0085762,C0688559,0.001713,0.003595,0.014107,0.096743
11070,C0236663,C0688559,0.001543,0.001797,0.014107,0.097015
11071,C1321878,C0354080,0.000899,0.002120,0.028639,0.029217


In [96]:
Rx_Problem_Probability_merged.sort_values(by='co_occurrance_probability', ascending=False, inplace=True)
Rx_Problem_Probability_merged.head(20)

Unnamed: 0,Problem_CUI,Rx_CUI,Rx_Problem_Probability,problem_gen_pop_probability,RxProbability,co_occurrance_probability
11020,C0021400,C0875805,0.003306,0.006409,0.012361,0.176152
11060,C0085605,C1584819,0.002052,0.003442,0.014582,0.113829
9713,C0022661,C0975120,0.002577,0.009495,0.014023,0.109589
9385,C0022661,C1967412,0.002899,0.009495,0.017634,0.106875
9538,C0022661,C1951501,0.002798,0.009495,0.017583,0.103319
11070,C0236663,C0688559,0.001543,0.001797,0.014107,0.097015
11069,C0085762,C0688559,0.001713,0.003595,0.014107,0.096743
11049,C0036572,C1739168,0.002425,0.005087,0.022179,0.08893
11045,C0036572,C0875827,0.002577,0.005087,0.026197,0.082385
11064,C0085605,C0690704,0.000882,0.003442,0.007545,0.080247


In [98]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'Rx_Problem_co_occurrance_probability_'+timestamp+'.csv'
Rx_Problem_Probability_merged.loc[:,['Problem_CUI', 'Rx_CUI', 'co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [109]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (rx:Concept)
WHERE rx.cui = COLUMN.Rx_CUI AND rx.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(rx)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f6e96d6cd30>

## Probability of a abnormal lab-problem pair occuring together vs separately

In [3]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
ORDER BY problem_gen_pop_probability DESC
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [4]:
# Get the probability of each abnormal lab in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]-(lab:Labevents)-[:INSTANCE_OF]->(b:Concept)
WHERE lab.flag IS NOT NULL
WITH b.cui AS Lab_CUI, count(distinct(ad)) AS LabTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN Lab_CUI, toFloat(LabTotal)/ptTotal AS LabAbnormalProbability
ORDER BY LabAbnormalProbability DESC
'''
data = session.run(query)
Lab_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [5]:
# Get the probability of a specific pair of abnormal lab and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]-(lab:Labevents)-[:INSTANCE_OF]->(c:Concept)
WHERE lab.flag IS NOT NULL
WITH p.cui AS Problem_CUI, c.cui AS Lab_CUI, count(distinct(ad)) AS LabProbTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, Lab_CUI, toFloat(LabProbTotal)/ptTotal AS Lab_Problem_Probability
ORDER BY Lab_Problem_Probability DESC
'''
data = session.run(query)
Lab_Problem_Probability = pd.DataFrame([dict(record) for record in data])

In [21]:
problem_gen_pop_probability

Unnamed: 0,Problem_CUI,problem_gen_pop_probability
0,C0022661,0.009495
1,C0020517,0.008427
2,C0011860,0.007902
3,C2830004,0.007138
4,C0039239,0.007003
...,...,...
460,C0085669,0.000577
461,C0017086,0.000577
462,C0155626,0.000577
463,C0575090,0.000577


In [22]:
Lab_gen_pop_probability

Unnamed: 0,Lab_CUI,LabAbnormalProbability
0,C0362923,0.921595
1,C0366777,0.901265
2,C0362910,0.877916
3,C0362934,0.812636
4,C0362978,0.755358
...,...,...
174,C1114256,0.001017
175,C0801528,0.000763
176,C0943517,0.000712
177,C0365157,0.000560


In [23]:
Lab_Problem_Probability

Unnamed: 0,Problem_CUI,Lab_CUI,Lab_Problem_Probability
0,C0022661,C0362910,0.009495
1,C0022661,C0366777,0.009495
2,C0022661,C0362923,0.009495
3,C0022661,C0364096,0.009241
4,C0022661,C0364133,0.009224
...,...,...,...
13768,C0575090,C0368013,0.000475
13769,C0575090,C0363885,0.000458
13770,C3263723,C0364290,0.000458
13771,C0267841,C0368036,0.000458


In [25]:
# Calculate the co-occurance probability
Lab_Problem_Probability_merged = pd.merge(Lab_Problem_Probability, problem_gen_pop_probability, on=['Problem_CUI'])
Lab_Problem_Probability_merged = pd.merge(Lab_Problem_Probability_merged, Lab_gen_pop_probability, on=['Lab_CUI'])
Lab_Problem_Probability_merged['co_occurrance_probability'] = Lab_Problem_Probability_merged.Lab_Problem_Probability / (Lab_Problem_Probability_merged.problem_gen_pop_probability + Lab_Problem_Probability_merged.LabAbnormalProbability)

# Normalize the co-occurance probability using the min-max method
Lab_Problem_Probability_merged['normalized_co_occurrance_probability'] = (Lab_Problem_Probability_merged.co_occurrance_probability-Lab_Problem_Probability_merged.co_occurrance_probability.min())/(Lab_Problem_Probability_merged.co_occurrance_probability.max()-Lab_Problem_Probability_merged.co_occurrance_probability.min())

# Sort the dataframe and present it for inspection
Lab_Problem_Probability_merged.sort_values(by='normalized_co_occurrance_probability', ascending=False, inplace=True)
Lab_Problem_Probability_merged

Unnamed: 0,Problem_CUI,Lab_CUI,Lab_Problem_Probability,problem_gen_pop_probability,LabAbnormalProbability,co_occurrance_probability,normalized_co_occurrance_probability
13763,C0031039,C0942431,0.001814,0.002628,0.021619,0.074825,1.000000
13758,C0032227,C1544494,0.002866,0.005901,0.033030,0.073606,0.983581
13759,C0032227,C1114284,0.002645,0.005901,0.031623,0.070493,0.941638
13684,C0032227,C0942424,0.003442,0.005901,0.044849,0.067825,0.905704
13664,C0032227,C0942443,0.003459,0.005901,0.046409,0.066126,0.882824
...,...,...,...,...,...,...,...
921,C0683278,C0366777,0.000560,0.000560,0.901265,0.000620,0.000432
460,C1261287,C0362910,0.000543,0.000577,0.877916,0.000618,0.000394
1383,C0683278,C0362923,0.000560,0.000560,0.921595,0.000607,0.000248
922,C1261287,C0366777,0.000543,0.000577,0.901265,0.000602,0.000179


In [11]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'LabAbnormal_Problem_co_occurrance_probability_'+timestamp+'.csv'
Lab_Problem_Probability_merged.loc[:,['Problem_CUI', 'Lab_CUI', 'normalized_co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [12]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (lab:Concept)
WHERE lab.cui = COLUMN.Lab_CUI AND lab.cui_pref_term IS NOT NULL
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.normalized_co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(lab)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f27766f6640>

## Probability of a problem-problem pair occuring together vs separately

In [3]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [9]:
# Get the probability of a specific pair of problems co-occuring
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p1:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD_PROBLEM]->(p2:Problem)
WHERE p1.cui <> p2.cui
WITH p1.cui AS Problem1_CUI, p2.cui AS Problem2_CUI, count(distinct(ad)) AS ProblemPairTotal, ptTotal
RETURN Problem1_CUI, Problem2_CUI, toFloat(ProblemPairTotal)/ptTotal AS ProblemPairProbability
ORDER BY ProblemPairProbability DESC
'''
data = session.run(query)
ProblemPairProbability = pd.DataFrame([dict(record) for record in data])

In [5]:
problem_gen_pop_probability

Unnamed: 0,Problem_CUI,problem_gen_pop_probability
0,C2711480,0.001780
1,C0038358,0.002950
2,C1861101,0.000797
3,C0002886,0.001458
4,C0029443,0.002255
...,...,...
460,C0085610,0.000848
461,C0001339,0.000746
462,C0521614,0.000695
463,C1136033,0.000746


In [10]:
ProblemPairProbability

Unnamed: 0,Problem1_CUI,Problem2_CUI,ProblemPairProbability
0,C0022661,C0022658,0.001831
1,C0022658,C0022661,0.001831
2,C0020517,C0268529,0.001611
3,C0268529,C0020517,0.001611
4,C0745283,C2830004,0.001594
...,...,...,...
309953,C0161679,C0394996,0.000017
309954,C2830004,C0394996,0.000017
309955,C0394996,C0037199,0.000017
309956,C0161679,C0037199,0.000017


In [70]:
ProblemPairProbability_merged = pd.merge(ProblemPairProbability, problem_gen_pop_probability, left_on='Problem1_CUI', right_on='Problem_CUI')
ProblemPairProbability_merged = pd.merge(ProblemPairProbability_merged, problem_gen_pop_probability, left_on='Problem2_CUI', right_on='Problem_CUI')
ProblemPairProbability_merged['co_occurrance_probability'] = ProblemPairProbability_merged.ProblemPairProbability / (ProblemPairProbability_merged.problem_gen_pop_probability_x + ProblemPairProbability_merged.problem_gen_pop_probability_y)
ProblemPairProbability_merged

Unnamed: 0,Problem1_CUI,Problem2_CUI,ProblemPairProbability,Problem_CUI_x,problem_gen_pop_probability_x,Problem_CUI_y,problem_gen_pop_probability_y,co_occurrance_probability
0,C0022661,C0022658,0.001831,C0022661,0.009495,C0022658,0.005629,0.121076
1,C0020517,C0022658,0.000322,C0020517,0.008427,C0022658,0.005629,0.022919
2,C0268529,C0022658,0.000153,C0268529,0.002170,C0022658,0.005629,0.019565
3,C0745283,C0022658,0.000627,C0745283,0.004256,C0022658,0.005629,0.063465
4,C2830004,C0022658,0.000712,C2830004,0.007138,C0022658,0.005629,0.055777
...,...,...,...,...,...,...,...,...
97721,C0026771,C0020255,0.000034,C0026771,0.001000,C0020255,0.000933,0.017544
97722,C0002962,C0020255,0.000102,C0002962,0.000848,C0020255,0.000933,0.057143
97723,C0001418,C0020255,0.000017,C0001418,0.000661,C0020255,0.000933,0.010638
97724,C3263723,C0020255,0.000034,C3263723,0.001000,C0020255,0.000933,0.017544


In [71]:
ProblemPairProbability_merged.sort_values(by='co_occurrance_probability', ascending=False, inplace=True)
ProblemPairProbability_merged.head(20)

Unnamed: 0,Problem1_CUI,Problem2_CUI,ProblemPairProbability,Problem_CUI_x,problem_gen_pop_probability_x,Problem_CUI_y,problem_gen_pop_probability_y,co_occurrance_probability
17629,C0042487,C0014122,0.00078,C0042487,0.00173,C0014122,0.001373,0.251366
13604,C0014122,C0042487,0.00078,C0014122,0.001373,C0042487,0.00173,0.251366
72055,C0085281,C0149521,0.000831,C0085281,0.001662,C0149521,0.00173,0.245
95630,C0149521,C0085281,0.000831,C0149521,0.00173,C0085281,0.001662,0.245
95644,C0438696,C0085281,0.000729,C0438696,0.00134,C0085281,0.001662,0.242938
96837,C0085281,C0438696,0.000729,C0085281,0.001662,C0438696,0.00134,0.242938
12512,C0398623,C0740991,0.00078,C0398623,0.001577,C0740991,0.001916,0.223301
12752,C0740991,C0398623,0.00078,C0740991,0.001916,C0398623,0.001577,0.223301
762,C0745136,C0745138,0.00139,C0745136,0.00256,C0745138,0.003679,0.222826
3829,C0745138,C0745136,0.00139,C0745138,0.003679,C0745136,0.00256,0.222826


In [72]:
# Drop duplicates
ProblemPairProbability_merged['Problems'] = ProblemPairProbability_merged.loc[:,['Problem1_CUI', 'Problem2_CUI']].values.tolist()
ProblemPairProbability_merged.Problems.apply(lambda x: x.sort())
ProblemPairProbability_merged[['Problem1_CUI','Problem2_CUI']] = pd.DataFrame(ProblemPairProbability_merged.Problems.tolist(), index= ProblemPairProbability_merged.index)
ProblemPairProbability_merged.drop_duplicates(subset=['Problem1_CUI','Problem2_CUI'], inplace=True)
ProblemPairProbability_merged

Unnamed: 0,Problem1_CUI,Problem2_CUI,ProblemPairProbability,Problem_CUI_x,problem_gen_pop_probability_x,Problem_CUI_y,problem_gen_pop_probability_y,co_occurrance_probability,Problems
17629,C0014122,C0042487,0.000780,C0042487,0.001730,C0014122,0.001373,0.251366,"[C0014122, C0042487]"
72055,C0085281,C0149521,0.000831,C0085281,0.001662,C0149521,0.001730,0.245000,"[C0085281, C0149521]"
95644,C0085281,C0438696,0.000729,C0438696,0.001340,C0085281,0.001662,0.242938,"[C0085281, C0438696]"
12512,C0398623,C0740991,0.000780,C0398623,0.001577,C0740991,0.001916,0.223301,"[C0398623, C0740991]"
762,C0745136,C0745138,0.001390,C0745136,0.002560,C0745138,0.003679,0.222826,"[C0745136, C0745138]"
...,...,...,...,...,...,...,...,...,...
41321,C0011860,C0151744,0.000017,C0011860,0.007902,C0151744,0.003256,0.001520,"[C0011860, C0151744]"
85084,C0003486,C0022661,0.000017,C0022661,0.009495,C0003486,0.001831,0.001497,"[C0003486, C0022661]"
87618,C0022661,C3203359,0.000017,C3203359,0.001865,C0022661,0.009495,0.001493,"[C0022661, C3203359]"
87459,C0019163,C0022661,0.000017,C0019163,0.001984,C0022661,0.009495,0.001477,"[C0019163, C0022661]"


In [73]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'Problem_co_occurrance_probability_'+timestamp+'.csv'
ProblemPairProbability_merged.loc[:,['Problem1_CUI', 'Problem2_CUI', 'co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [74]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob1:Concept)
WHERE prob1.cui = COLUMN.Problem1_CUI AND prob1.cui_pref_term IS NOT NULL
MATCH (prob2:Concept)
WHERE prob2.cui = COLUMN.Problem2_CUI AND prob2.cui_pref_term IS NOT NULL
MERGE (prob1)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(prob2)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7fec48db34f0>

## Probability of a procedure-problem pair occuring together vs separately

In [17]:
# Get the probability of each Problem in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD_PROBLEM]-(b:Problem)
WITH b.cui AS Problem_CUI, count(distinct(ad)) AS probTotal, ptTotal, count(distinct(Pt)) AS Pt
WITH Problem_CUI, toFloat(probTotal)/ptTotal AS problem_gen_pop_probability, Pt
WHERE Pt > 20
RETURN Problem_CUI, problem_gen_pop_probability
ORDER BY problem_gen_pop_probability DESC
'''
data = session.run(query)
problem_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [18]:
# Get the probability of each procedure in the general population
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients)-[:HAD]->(proc:Procedures_Icd)
WITH proc.icd9_code AS icd9_code, count(distinct(ad)) AS ProcTotal, ptTotal, count(distinct(Pt)) AS Pt
WHERE Pt > 20
RETURN icd9_code, toFloat(ProcTotal)/ptTotal AS ProcedureProbability
ORDER BY ProcedureProbability DESC
'''
data = session.run(query)
procedure_gen_pop_probability = pd.DataFrame([dict(record) for record in data])

In [19]:
# Get the probability of a specific pair of procedure and problem
query = '''
MATCH (ptTotal:Admissions)
WITH count(ptTotal) AS ptTotal
MATCH (ad:Admissions)<-[:HAD]-(Pt:Patients), (p:Problem)<-[:HAD_PROBLEM]-(Pt)-[:HAD]->(proc:Procedures_Icd)
WITH p.cui AS Problem_CUI, proc.icd9_code AS icd9_code, count(distinct(ad)) AS ProcProbTotal, ptTotal, count(distinct(Pt)) AS Pts
WHERE Pts > 20
RETURN Problem_CUI, icd9_code, toFloat(ProcProbTotal)/ptTotal AS Proc_Problem_Probability
ORDER BY Proc_Problem_Probability DESC
'''
data = session.run(query)
Proc_Problem_Probability = pd.DataFrame([dict(record) for record in data])

In [20]:
problem_gen_pop_probability

Unnamed: 0,Problem_CUI,problem_gen_pop_probability
0,C0022661,0.009495
1,C0020517,0.008427
2,C0011860,0.007902
3,C2830004,0.007138
4,C0039239,0.007003
...,...,...
460,C0085669,0.000577
461,C0017086,0.000577
462,C0155626,0.000577
463,C0575090,0.000577


In [21]:
procedure_gen_pop_probability

Unnamed: 0,icd9_code,ProcedureProbability
0,3893,0.318146
1,9604,0.246219
2,9671,0.220971
3,966,0.213850
4,9904,0.177699
...,...,...
604,7857,0.000407
605,7813,0.000407
606,7919,0.000390
607,9959,0.000390


In [22]:
Proc_Problem_Probability

Unnamed: 0,Problem_CUI,icd9_code,Proc_Problem_Probability
0,C0022661,3893,0.006986
1,C0020517,3893,0.006274
2,C0022661,3995,0.005799
3,C2830004,3893,0.005494
4,C0011860,3893,0.005494
...,...,...,...
1068,C0026771,9672,0.000560
1069,C0021308,3961,0.000560
1070,C0267841,3893,0.000543
1071,C0026771,966,0.000543


In [23]:
Proc_Problem_Probability_merged = pd.merge(Proc_Problem_Probability, problem_gen_pop_probability, on=['Problem_CUI'])
Proc_Problem_Probability_merged = pd.merge(Proc_Problem_Probability_merged, procedure_gen_pop_probability, on=['icd9_code'])
Proc_Problem_Probability_merged['co_occurrance_probability'] = Proc_Problem_Probability_merged.Proc_Problem_Probability / (Proc_Problem_Probability_merged.problem_gen_pop_probability + Proc_Problem_Probability_merged.ProcedureProbability)
Proc_Problem_Probability_merged

Unnamed: 0,Problem_CUI,icd9_code,Proc_Problem_Probability,problem_gen_pop_probability,ProcedureProbability,co_occurrance_probability
0,C0022661,3893,0.006986,0.009495,0.318146,0.021322
1,C0020517,3893,0.006274,0.008427,0.318146,0.019211
2,C2830004,3893,0.005494,0.007138,0.318146,0.016889
3,C0011860,3893,0.005494,0.007902,0.318146,0.016850
4,C0302148,3893,0.005426,0.006935,0.318146,0.016691
...,...,...,...,...,...,...
1068,C0085616,3972,0.000577,0.000848,0.008580,0.061151
1069,C0031039,370,0.000916,0.002628,0.011242,0.066015
1070,C0018946,0131,0.000814,0.002493,0.011106,0.059850
1071,C0398738,0066,0.001204,0.001746,0.031369,0.036354


In [24]:
Proc_Problem_Probability_merged.sort_values(by='co_occurrance_probability', ascending=False, inplace=True)
Proc_Problem_Probability_merged.head(20)

Unnamed: 0,Problem_CUI,icd9_code,Proc_Problem_Probability,problem_gen_pop_probability,ProcedureProbability,co_occurrance_probability
1067,C0002940,3972,0.001102,0.003578,0.00858,0.090656
289,C0022661,3995,0.005799,0.009495,0.074386,0.069133
1069,C0031039,370,0.000916,0.002628,0.011242,0.066015
593,C0022661,3895,0.003849,0.009495,0.050343,0.064324
1047,C0032227,3491,0.003374,0.005901,0.047816,0.062816
1068,C0085616,3972,0.000577,0.000848,0.00858,0.061151
1070,C0018946,131,0.000814,0.002493,0.011106,0.05985
1064,C0027651,9925,0.000899,0.003459,0.012327,0.056928
1072,C0008311,5187,0.000661,0.001153,0.01192,0.050584
1036,C0085605,5491,0.001984,0.003442,0.039508,0.04619


In [26]:
# Write out to CSV
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
filename = 'Procedure_Problem_co_occurrance_probability_'+timestamp+'.csv'
Proc_Problem_Probability_merged.loc[:,['Problem_CUI', 'icd9_code', 'co_occurrance_probability']].to_csv(filename, index=False)

Move the CSV into the database's Import folder

In [27]:
# Import the co-occurance probabilities into the database
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

command = '''
USING PERIODIC COMMIT 100000 LOAD CSV WITH HEADERS FROM "file:///{filename}" AS COLUMN
MATCH (prob:Concept)
WHERE prob.cui = COLUMN.Problem_CUI AND prob.cui_pref_term IS NOT NULL
MATCH (proc:D_Icd_Procedures)
WHERE proc.icd9_code = COLUMN.icd9_code
CREATE (prob)<-[r:OCCURS_WITH {{co_occurrance_probability:toFloat(COLUMN.co_occurrance_probability), source:'MIMIC-III v1.4', updated:'{timestamp}'}}]-(proc)
'''.format(timestamp=timestamp, filename=filename)

session.run(command)

<neo4j.work.result.Result at 0x7f2f847e40a0>

## Queries

In [36]:
def LikelyOrders(cui_prob_list):
    
    # Find prescriptions associated with the input problem    
    query = '''
    MATCH p=(ord:Concept)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list} AND ord.semantic_type IN ["['Clinical Drug']"]
    WITH round(r.co_occurrance_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.term AS `Order`, Score
    ORDER BY r.co_occurrance_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_rx = pd.DataFrame([dict(record) for record in data])
    
    # Find abnormal labs associated with the input problem
    query = '''
    MATCH p=(ord:Concept)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list} AND ord.semantic_type IN ["['Clinical Attribute']"]
    WITH round(r.co_occurrance_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.description AS `Order`, Score
    ORDER BY r.co_occurrance_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_lab = pd.DataFrame([dict(record) for record in data])
    
    # Find procedures associated with the input problem
    query = '''
    MATCH p=(ord:D_Icd_Procedures)-[r:OCCURS_WITH]->(c:Concept) 
    WHERE c.cui IN {cui_prob_list}
    WITH round(r.co_occurrance_probability, 5)*1000 AS Score, ord, r
    WHERE Score > 20
    RETURN ord.long_title AS `Order`, Score
    ORDER BY r.co_occurrance_probability DESC
    '''.format(cui_prob_list=cui_prob_list)
    data = session.run(query)
    orders_likely_procedure = pd.DataFrame([dict(record) for record in data])
    
    return orders_likely_rx, orders_likely_lab, orders_likely_procedure

In [37]:
start_time = time.time()

cui_prob_list = ['C0022661']
orders_likely_rx, orders_likely_lab, orders_likely_procedure = LikelyOrders(cui_prob_list)

print("Total runtime:", time.time() - start_time, "seconds")

Total runtime: 0.015000104904174805 seconds


In [30]:
orders_likely_rx

Unnamed: 0,Order,Score
0,calcitriol 0.00025 MG Oral Capsule,109.59
1,sevelamer carbonate 800 MG Oral Tablet [Renvela],106.88
2,sodium polystyrene sulfonate 250 MG/ML Oral Su...,103.32
3,1 ML epoetin alfa 4000 UNT/ML Injection [Procrit],79.31
4,150 ML Glucose 50 MG/ML Injection,76.26
...,...,...
92,100 ML Glucose 50 MG/ML Injection,20.79
93,Docusate Sodium 10 MG/ML Oral Suspension,20.71
94,Aspirin 81 MG Chewable Tablet,20.34
95,Amiodarone hydrochloride 200 MG Oral Tablet,20.32


In [31]:
orders_likely_lab

Unnamed: 0,Order,Score
0,Macrophage in Ascites,57.87
1,Lymphocytes in Other Body Fluid,57.69
2,Polys in Other Body Fluid,53.77
3,Lymphocytes in Ascites,52.82
4,"RBC, Ascites in Ascites",51.48
5,"WBC, Ascites in Ascites",50.71
6,Monocytes in Ascites,49.68
7,Polys in Ascites,48.77
8,Protein/Creatinine Ratio in Urine,48.01
9,pH in Urine,44.46


In [35]:
orders_likely_procedure

Unnamed: 0,Order,Score
0,Hemodialysis,69.13
1,Venous catheterization for renal dialysis,64.32
2,Other endoscopy of small intestine,24.7
3,Closed [endoscopic] biopsy of bronchus,23.03
4,"Venous catheterization, not elsewhere classified",21.32
5,Transfusion of packed cells,21.29
6,Insertion of endotracheal tube,20.16
