## TNXpersontime tutorial

The TNXpersontime package is a small tool suite that provides estimations of time-at-risk based on encounter data as well as simple risk and rate calculations. Included in this directory is a sample dataset of diabetic ketoacidosis (DKD) patients. To setup, either pip install the tnxpersontime python module (included in the root dir of this package) to your environment of choice or activate the bundled virtual environment tnxpersontime-venv, which has tnxpersontime and dependencies pre-installed.

! source tnxpersontime-venv/bin/activate

In [6]:
import tnxpersontime as t

## Data load-in

The TNXDateCsv class is a convenience class that converts specified columns to datetime on read-in.

In [18]:
import pandas as pd

index_file = t.TNXDateCsv('sample_data/input_file.csv', dt_cols=['index_d', 'date_death', 'date_HF'])

encounter = t.TNXDateCsv('sample_data/encounter.csv', dt_cols = ["start_date", "end_date"], dt_format = "%Y%m%d")

The loaded DataFrame is accessible through the df attribute, or alternatively by function calling the TNXDateCsv.

In [19]:
encounter.df.head()

Unnamed: 0,encounter_id,patient_id,start_date,end_date,type,derived_by_TriNetX
0,4bd804a77ab72423f24ec21a305ab1f5895df1d4,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-29,2018-12-29,EMER,F
1,dee2045a192a24f1ff176005fcd1bdafbb6fbdc2,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-30,2018-12-30,EMER,F
2,f339d7168e947a2501425ffe35c374378a117823,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-30,2019-01-03,IMP,F
3,cee2baf88ec76205f9b444e40803805b3b167d80,9dd1441bc6bb7f72553ccd8fcc57729faf67be83,2013-07-15,2013-07-16,IMP,F
4,5cb6102ec1823db6591a918dd8cf6398a1943b7c,9dd1441bc6bb7f72553ccd8fcc57729faf67be83,2013-01-07,2013-01-07,AMB,F


In [20]:
index_file().head()

Unnamed: 0,patient_id,index_d,date_death,date_HF,HF,sex,age_index,reth,age_cat,combo_v4
0,0000b8d53ec0e7912f7227ad839828f9b66e995a,2018-05-17,NaT,NaT,0,F,72.421918,4,3,no change - healthy
1,0000beb21c1a19bf2c09215e58bb59ec1cc39bcd,2017-08-18,NaT,NaT,0,F,66.673973,1,3,no change - healthy
2,000196275411136c759a4c763c157eb52e5d74fa,2017-08-08,NaT,NaT,0,M,61.643836,4,3,
3,00034a621c0d2f093945b748d477b9a5c6c06ed3,2018-07-27,NaT,NaT,0,F,54.605479,2,2,change - increased from unhealthy to unhealthy
4,0003fade5db6291f8bfdcd0324129eb2607ac754,2019-07-06,NaT,NaT,0,M,78.561644,1,3,no change - healthy


The input file must be provided by the investigator. At minimum it must include a column of unique patient ids with a corresponding index date column, which is the date at the patient is considered to have entered the at-risk population - in this example, the index_d column is the date that the given patient was diagnosed with DKD.

Optionally, the index file may include additional endpoint measurements - in this example, the date of death (date_death) and the date of a heart failure incident (date_HF), as applicable. These points are considered hard endpoints at which the patient stops contributing time-at-risk.

Furthermore, if rate and risk comparisons are to be performed, a categorical column containing the exposure variable



In [None]:
persontime = t.PersonTime(
    encounter,
    input_file,
    index_file_endpoints=["date_death", "date_HF"],
    index_date_alias="index_d",
    patient_id_alias="patient_id",
)