## TNXpersontime python tutorial

The TNXpersontime package is a small tool suite that provides estimations of time-at-risk based on encounter data as well as simple risk and rate calculations. Included in this directory is a sample dataset of diabetic ketoacidosis (DKD) patients. To setup, either pip install the tnxpersontime python module (included in the root dir of this package) to your environment of choice or activate the bundled virtual environment tnxpersontime-venv, which has tnxpersontime and dependencies pre-installed.

! source tnxpersontime-venv/bin/activate

In [1]:
import tnxpersontime as t

## Data load-in

The TNXCsv class is a convenience class that converts specified columns to datetime on read-in.

In [2]:
import pandas as pd

index_file = t.TNXCsv('sample_data/sample_DKD_index.csv', dt_cols=['index_d', 'date_death', 'date_HF'])

encounter = t.TNXCsv('sample_data/sample_DKD_encounter.csv', dt_cols = ["start_date", "end_date"], dt_format = "%Y%m%d")

The loaded DataFrame is accessible through the df attribute, or alternatively by function calling the TNXCsv object.

In [19]:
encounter.df.head()

Unnamed: 0,encounter_id,patient_id,start_date,end_date,type,derived_by_TriNetX
0,4bd804a77ab72423f24ec21a305ab1f5895df1d4,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-29,2018-12-29,EMER,F
1,dee2045a192a24f1ff176005fcd1bdafbb6fbdc2,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-30,2018-12-30,EMER,F
2,f339d7168e947a2501425ffe35c374378a117823,40f426d38dc3b27a4593b1adb6863faaaac15c8f,2018-12-30,2019-01-03,IMP,F
3,cee2baf88ec76205f9b444e40803805b3b167d80,9dd1441bc6bb7f72553ccd8fcc57729faf67be83,2013-07-15,2013-07-16,IMP,F
4,5cb6102ec1823db6591a918dd8cf6398a1943b7c,9dd1441bc6bb7f72553ccd8fcc57729faf67be83,2013-01-07,2013-01-07,AMB,F


In [20]:
index_file().head()

Unnamed: 0,patient_id,index_d,date_death,date_HF,HF,sex,age_index,reth,age_cat,combo_v4
0,0000b8d53ec0e7912f7227ad839828f9b66e995a,2018-05-17,NaT,NaT,0,F,72.421918,4,3,no change - healthy
1,0000beb21c1a19bf2c09215e58bb59ec1cc39bcd,2017-08-18,NaT,NaT,0,F,66.673973,1,3,no change - healthy
2,000196275411136c759a4c763c157eb52e5d74fa,2017-08-08,NaT,NaT,0,M,61.643836,4,3,
3,00034a621c0d2f093945b748d477b9a5c6c06ed3,2018-07-27,NaT,NaT,0,F,54.605479,2,2,change - increased from unhealthy to unhealthy
4,0003fade5db6291f8bfdcd0324129eb2607ac754,2019-07-06,NaT,NaT,0,M,78.561644,1,3,no change - healthy


The input file must be provided by the investigator. At minimum it must include a column of unique patient ids with a corresponding index date column, which is the date at which the patient is considered to have entered the at-risk population - in this example, the index_d column is the date that the given patient was diagnosed with DKD.

Optionally, the index file may include additional endpoint measurements - in this example, the date of death (date_death) and the date of a heart failure incident (date_HF), as applicable. If present, these dates are considered hard endpoints at which a patient stops contributing time-at-risk.

Furthermore, if rate and risk metrics are to be calculated, a categorical column containing the exposure variable should be included in the index file. Here, the 'combo_v4' column indicates the category representing the change in estimated glomerular filtration rate (eGFR) of the patient. NaN represents missing eGFR change information.

In [21]:
index_file()['combo_v4'].unique()

array(['no change - healthy', nan,
       'change - increased from unhealthy to unhealthy',
       'change - decreased from unhealthy to unhealthy',
       'no change - unhealthy',
       'change - decreased from healthy to unhealthy',
       'change - increased from unhealthy to healthy'], dtype=object)

## Person time-at-risk calculation

The index object and encounter object can now be folded into a PersonTime object. Note that this automatically subsets the encounter file to only include patients that are in the index file. 


In [3]:
persontime = t.PersonTime(
    encounter,
    index_file,
    index_file_endpoints=["date_death", "date_HF"],
    index_date_alias="index_d"
)

Once the index and encounter object are loaded in, the person days-at-risk can be computed on a patient-by-patient basis. The window_days argument specifies the window of time before and after each encounter that is added to the day-at-risk count for a given patient. The index_offset argument can optionally be supplied if the investigator chooses to begin the person-time tally at a certain point before or after the index date. Here, time-at-risk is only added to the given patient's pool if it falls later than one year after a patient's index date.

Note that this is a computationally intensive process. For convenience, skip this code block and load in the supplied precomputed data.

In [None]:
persontime.generate_person_time_df(window_days=30, index_offset=365, output_save_path='data/new_days_30_df.csv')

In [23]:
#Load precomputed instead
persontime.load_person_time_df('sample_data/30day_preprocessed_persontime_DKD.csv')

In [27]:
persontime.person_time_df.head()

Unnamed: 0,patient_id,window_time,total_time
0,0000b8d53ec0e7912f7227ad839828f9b66e995a,171,141
1,0000beb21c1a19bf2c09215e58bb59ec1cc39bcd,432,402
2,000196275411136c759a4c763c157eb52e5d74fa,0,0
3,00034a621c0d2f093945b748d477b9a5c6c06ed3,0,0
4,0003fade5db6291f8bfdcd0324129eb2607ac754,0,0


## Sample patient

To illustrate the day-at-risk tallying algorithm, we can follow a single patient's encounter history to see how the days-at-risk was derived:

In [29]:
pt = '00047d62549eeb80ecedd202d4b8f5cb5869367d'
persontime.person_time_df[persontime.person_time_df.patient_id == pt]

Unnamed: 0,patient_id,window_time,total_time
5,00047d62549eeb80ecedd202d4b8f5cb5869367d,49,19


We can access the selected patient's history from the PersonTime object's patient_dict attribute:

In [30]:
persontime.patient_dict[pt]

{'index_d': Timestamp('2018-11-11 00:00:00'),
 'date_death': NaT,
 'date_HF': NaT}

In [28]:
encounter()[encounter().patient_id == pt]

Unnamed: 0,encounter_id,patient_id,start_date,end_date,type,derived_by_TriNetX
1976346,be2cc39330f66bc37e1bda82bfa9e5e8622b7e35,00047d62549eeb80ecedd202d4b8f5cb5869367d,2018-10-15,2018-10-15,UNKNOWN,F
1976347,ffb3dbb6786851bb7674e1345bc724fb701c8078,00047d62549eeb80ecedd202d4b8f5cb5869367d,2018-11-11,2018-11-11,UNKNOWN,F
1976348,8309f61372c909b2a89b545ca8e57d0621f64462,00047d62549eeb80ecedd202d4b8f5cb5869367d,2018-11-11,2018-11-11,UNKNOWN,F
1976349,b78d698ab154b0a55d85d3ec8fa9629597276c5e,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-01-06,2019-01-06,UNKNOWN,F
1976350,85bc11887e04b3da9213a162e9c8e0456173cd1e,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-07-14,2019-07-14,UNKNOWN,F
1976351,28abbd04f57e3112c8e08e0546061bdbb891ea20,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-07-05,2019-07-05,UNKNOWN,F
1976352,b19fe620649838fe60e2488f3aab818099176bd9,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-08-30,2019-08-30,UNKNOWN,F
1976353,9b95774adcc97e25ba71e4e6af57a10a9dbd2256,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-11-10,2019-11-10,UNKNOWN,F
1976354,65ba26ceeee48e4b339d50166950bad7c302b34a,00047d62549eeb80ecedd202d4b8f5cb5869367d,2019-11-29,2019-11-29,UNKNOWN,F
