# Dealing with Time

![Time tree](images/TimeTree_demo.png)

The [MIMIC III documentation](https://mimic.physionet.org/mimicdata/time/) describes all of the time fields. Some key points are:  
"Time in the database is stored with one of two suffixes: TIME and DATE. If a column has TIME as the suffix, e.g. CHARTTIME, then the data resolution is down to the minute. If the column has DATE as the suffix, e.g. CHARTDATE, then the data resolution is down to the day."

In [16]:
import pandas as pd

### Obtain a list of all timestamp fields in the MIMIC-III data  
- Copy table of all data columns from https://mit-lcp.github.io/mimic-schema-spy/columns.byTable.html into a spreadsheet
- Filter the "Type" column on timestamp and delete all other rows
- Note that all timestamp columns have a size of 22 characters, suggesting that they likely have a uniform format across the database
- Keep only the "Table" and "Column" columns and save to CSV

In [17]:
time_fields = pd.read_csv('MIMIC-III_timestamp_fields.csv')

In [18]:
time_fields.loc[:,'Table'] = time_fields.loc[:,'Table'].str.title()
time_fields.loc[:,'Column'] = time_fields.loc[:,'Column'].str.upper()
time_fields

Unnamed: 0,Table,Column
0,Admissions,ADMITTIME
1,Admissions,DEATHTIME
2,Admissions,DISCHTIME
3,Admissions,EDOUTTIME
4,Admissions,EDREGTIME
...,...,...
69,Procedureevents_Mv,STARTTIME
70,Procedureevents_Mv,STORETIME
71,Services,TRANSFERTIME
72,Transfers,INTIME


### Initialize a connection to the neo4j database.

In [19]:
import getpass
password = getpass.getpass("\nPlease enter the Neo4j database password to continue \n")


Please enter the Neo4j database password to continue 
 ·······


In [20]:
from neo4j import GraphDatabase
driver=GraphDatabase.driver(uri="bolt://localhost:7687", auth=('neo4j',password))
session=driver.session()

### MERGE all timestamps as relationships into the time tree

In [5]:
# Create the root node of the time tree
query = 'MERGE (t:Timetree {name:"Time Tree"})'
session.run(query)

<neo4j.work.result.Result at 0x7f12acdb3cd0>

In [25]:
# Create all the year, month, and day nodes with relationships

# Column name from MIMIC becomes relationship name.
# Each node contains the datetime informationa for all higher nodes in the tree (example: month nodes contain
# We iterate through all of the data fields that contain timestamp
# data to write cypher code, which is then passed to Neo4j using the
# apoc.periodic.iterate function.
gaps = 1
iteration = 1

while gaps > 0:

    count = 0
    for index, row in time_fields.iterrows():
        Label = row[0]
        prop = row[1]
        query = '''
    "MATCH (n:{Label})
    WHERE NOT (n)-[:{prop}]->(:Day) AND n.{prop} =~ '[0-9]{{4}}-[0-9]{{2}}-[0-9]{{2}} [0-9]{{2}}:[0-9]{{2}}:[0-9]{{2}}' 
    WITH apoc.date.parse(n.{prop}, 'ms', 'yyyy-MM-dd HH:mm:ss', 'America/New York') AS ms, n
    WITH datetime({{epochmillis: ms}}).year AS yr, datetime({{epochmillis: ms}}).month AS mo, datetime({{epochmillis: ms}}).day AS dt,
    ms, n
    RETURN yr, mo, dt, ms, n",
    "MATCH (t:Timetree {{name:'Time Tree'}})
    MERGE (t)<-[:OF]-(y:Year {{year:yr}})
    MERGE (y)<-[:OF]-(m:Month {{year:yr, month:mo}})
    MERGE (m)<-[:OF]-(d:Day {{year:yr, month:mo, day:dt}})
    MERGE (d)<-[:{prop} {{{prop}:datetime({{epochmillis: ms}})}}]-(n)"'''.format(Label=Label, prop=prop)
        count += 1
#         print(Label, prop, count)
        command = 'CALL apoc.periodic.iterate('+query+', {batchSize:1000, parallel: true, iterateList:true})'
        session.run(command)

    query = '''
    MATCH (n:Noteevents)
    WHERE NOT (n)-[:STORETIME]->(:Day) AND EXISTS(n.STORETIME)
    RETURN count(n) AS gaps'''
    data = session.run(query)
    for node in data:
        gaps = node.get('gaps')
        print ('Gaps: ',gaps, 'Iteration: ',iteration)
    iteration =+ 1

# Print a test query 
# print(command)

Gaps:  376405 Iteration:  1
Gaps:  280405 Iteration:  1
Gaps:  190405 Iteration:  1
Gaps:  131405 Iteration:  1
Gaps:  81405 Iteration:  1
Gaps:  35001 Iteration:  1
Gaps:  13001 Iteration:  1
Gaps:  4001 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1
Gaps:  1 Iteration:  1


KeyboardInterrupt: 

In [15]:
# Check for variations in the date formats of various properties
for index, row in time_fields.iterrows():
    Label = row[0]
    prop = row[1]
    query = 'MATCH (n:{Label}) RETURN n.{prop} AS prop LIMIT 6'''.format(Label=Label, prop=prop)
    data = session.run(query)
    for node in data:
        print(node.get('prop'), Label, prop)

2196-04-09 12:26:00 Admissions ADMITTIME
2153-09-03 07:15:00 Admissions ADMITTIME
2157-10-18 19:34:00 Admissions ADMITTIME
2139-06-06 16:14:00 Admissions ADMITTIME
2160-11-02 02:06:00 Admissions ADMITTIME
2126-05-06 15:16:00 Admissions ADMITTIME
None Admissions DEATHTIME
None Admissions DEATHTIME
None Admissions DEATHTIME
None Admissions DEATHTIME
None Admissions DEATHTIME
None Admissions DEATHTIME
2196-04-10 15:54:00 Admissions DISCHTIME
2153-09-08 19:10:00 Admissions DISCHTIME
2157-10-25 14:00:00 Admissions DISCHTIME
2139-06-09 12:48:00 Admissions DISCHTIME
2160-11-05 14:55:00 Admissions DISCHTIME
2126-05-13 15:00:00 Admissions DISCHTIME
2196-04-09 13:24:00 Admissions EDOUTTIME
None Admissions EDOUTTIME
None Admissions EDOUTTIME
None Admissions EDOUTTIME
2160-11-02 04:27:00 Admissions EDOUTTIME
None Admissions EDOUTTIME
2196-04-09 10:06:00 Admissions EDREGTIME
None Admissions EDREGTIME
None Admissions EDREGTIME
None Admissions EDREGTIME
2160-11-02 01:01:00 Admissions EDREGTIME
None A

### Parameters used with apoc.periodic.iterate function  
Multiple parameters were attempted for the apoc.periodic.iterate function while building the time tree, noted below:

|batchSize|parallel|iterateList|observed velocity of relationship creation|CPU notes|Java RAM usage|  
|:---|:---|:---|:---|:---|:---|  
|10|true|true|3500/second|all 12 CPUs at ~100% initially, then all dropped to ~25%|6.9GB|  
|100|true|true|22600/second (may be incorrect)|all 12 CPUs at ~100% initially, then all dropped to ~40-45%|7.6GB|  
|1000|true|true|11850/second|all 12 CPUs at ~100% initially, then all dropped to ~75%|8.1GB|  

Using parallell=false was also attempted, but this setting used only 1 CPU and was visibly slower than the other methods, so it was aborted.

### Size of the tree
Prior to building time tree there were about 587,770,000 relationships in the database. After building the time tree there were 904,108,341 relationships in the database. The tree itself contained:

|Label|Nodes Count|
|:---|:---|   
|Days|55848|  
|Months|3314|  
|Years|311|  
|Time Tree root|1| 

The years spanned from year 1800 to year 2244, which is 444 years. As the table above demonstrates, only 311 of the possible 444 years actually appeared in timestamp data, which makes sense given the randomization scheme for shifting dates used during the anonymization process ([see here for details](https://mimic.physionet.org/mimicdata/time/)). 


The available disk space decreased by 149 GB after the time tree was created.

## Performance Testing
---

### Close the connection to the neo4j database

In [5]:
session.close()