# Assessment of nephrotoxicity of vancomycin

The aim of this study was to quantify the association between nephrotoxicity and vancomycin in a large, multi-center US database.
The study matches patients who were admitted to the emergency department and received vancomycin on ICU admission versus those who did not receive vancomycin on admission. The matching is done using the APACHE-IV score component (which is in fact equivalent to the APACHE-III score).


## Definitions

* **drug on admission:** patient received medication order -12 to 12 hours upon admission to the ICU
* **baseline creatinine:** first creatinine value between -12 to 12 hours upon admission to the ICU
* **AKI:** following KDIGO guidelines using only creatinine, any instance of AKI between 2-7 days after their ICU admission.

KDIGO guidelines for AKI are: >= 50% change from baseline over 7 days, or absolute increase of 0.3 in creatinine over 48 hours.

## 0. Setup

In [1]:
# Must install pandas-gbq. Link: https://pandas-gbq.readthedocs.io/en/latest/install.html#pip
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# helper functions stored in local py file
import utils

project_id='lcp-internal'

# Helper function to read data from BigQuery into pandas dataframes.
def run_query(query):
    return pd.io.gbq.read_gbq(query,
                              project_id=project_id, verbose=False,
                              dialect='standard')

# 1. Summarize cohort

In [2]:
query = """
select *
from `lcp-internal.vanco.cohort`
"""
co = run_query(query)

In [3]:
print('== EXCLUSIONS - TOTAL ==')
N = co.shape[0]
print(f'{N:6d} unique unit stays.')
for c in co.columns:
    if c.startswith('exclude_'):
        N = co[c].sum()
        mu = co[c].mean()*100.0
        print(f'  {N:6d} ({mu:4.1f}%) - {c}')
        
print('\n== EXCLUSIONS - SEQUENTIAL ==')
N = co.shape[0]
print(f'{N:7d} unique unit stays.')
idx = co['patientunitstayid'].notnull()
for c in co.columns:
    if c.startswith('exclude_'):
        # index patients removed by this exclusion
        idxRem = (co[c]==1)
        # calculate number of patients being removed, after applying prev excl
        N = (idx & idxRem).sum()
        mu = N/co.shape[0]*100.0
        idx = idx & (~idxRem)
        n_rem = idx.sum()
        
        print(f'- {N:5d} = {n_rem:6d} ({mu:4.1f}% removed) - {c}')

== EXCLUSIONS - TOTAL ==
200859 unique unit stays.
   25239 (12.6%) - exclude_sdu
    4930 ( 2.5%) - exclude_short_stay
  111265 (55.4%) - exclude_non_ed_admit
   12280 ( 6.1%) - exclude_secondary_stay
   33858 (16.9%) - exclude_no_med_interface
    8059 ( 4.0%) - exclude_dialysis_chronic
   18265 ( 9.1%) - exclude_dialysis_first_week
   23330 (11.6%) - exclude_cr_missing_baseline
   72280 (36.0%) - exclude_cr_missing_followup

== EXCLUSIONS - SEQUENTIAL ==
 200859 unique unit stays.
- 25239 = 175620 (12.6% removed) - exclude_sdu
-  3286 = 172334 ( 1.6% removed) - exclude_short_stay
- 87643 =  84691 (43.6% removed) - exclude_non_ed_admit
- 11597 =  73094 ( 5.8% removed) - exclude_secondary_stay
- 15006 =  58088 ( 7.5% removed) - exclude_no_med_interface
-  2238 =  55850 ( 1.1% removed) - exclude_dialysis_chronic
-  3320 =  52530 ( 1.7% removed) - exclude_dialysis_first_week
-  3456 =  49074 ( 1.7% removed) - exclude_cr_missing_baseline
- 18823 =  30251 ( 9.4% removed) - exclude_cr_miss

# 3. Analysis


## Get data from BigQuery

In [4]:
# covariates from APACHE table
query = """
SELECT dem.*
FROM `hst-953-2018.team_i.demographics` dem
"""
dem = run_query(query)

# vancomycin drug doses
query = """
SELECT v.*
FROM `lcp-internal.vanco.vanco` v
"""
v = run_query(query)

# AKI
query = """
SELECT 
  patientunitstayid
  , chartoffset
  , creatinine, creatinine_reference, creatinine_baseline
  , aki_48h, aki_7d
FROM `lcp-internal.vanco.aki`
"""
aki = run_query(query)

## Collapse vancomycin data

The `v` dataframe has every vancomycin administration for a patient.

Here we collapse it into two binary columns:

* 'vanco_adm' - vancomycin was administered on ICU admission (between hours -12 and 12)
* 'vanco_wk' - vancomycin was administered sometime between 2-7 days after ICU admission

In [5]:
v_df = utils.extract_adm_and_wk(v, 'vanco')

Print out the proportion of patients with/without vancomycin after exclusions.

In [6]:
# get patient unit stay ID after applying exclusions
idxKeep = co['patientunitstayid'].notnull()
for c in co.columns:
    if c.startswith('exclude_'):
        idxKeep = idxKeep & (co[c]==0)
        
ptid = co.loc[idxKeep, 'patientunitstayid'].values
n_pt = len(ptid)
# limit to those in vancomycin dataframe
ptid = [x for x in ptid if x in v_df.index]

N = len(ptid)
print(f'{n_pt} stays after exclusions.')
for c in v_df.columns:
    N = v_df.loc[ptid, c].sum()
    mu = N / n_pt * 100.0
    print(f'  {N} ({mu:3.1f}%) with {c}')
    
# if they have both adm, then row-wise sum must be greater than 1
N = (v_df.loc[ptid, :] == 1).sum(axis=1)
N = (N>1).sum()
mu = N / n_pt * 100.0
print(f'  {N} ({mu:3.1f}%) with both')

30251 stays after exclusions.
  4404 (14.6%) with vanco_adm
  3841 (12.7%) with vanco_wk
  2095 (6.9%) with both


## Create a dataframe for analysis

The below code block:

* Applies exclusions
* Adds vancomycin binary flags
* Adds AKI flag

In [7]:
# drop exclusions
idxKeep = co['patientunitstayid'].notnull()
for c in co.columns:
    if c.startswith('exclude_'):
        idxKeep = idxKeep & (co[c]==0)

# combine data into single dataframe
df = co.loc[idxKeep, ['patientunitstayid']].merge(dem, how='inner', on='patientunitstayid')

# add vanco adminisdtration
df = df.merge(v_df, how='left', on='patientunitstayid')
# if ptid missing in vanco dataframe, then no vanco was received
# therefore impute 0
for c in v_df.columns:
    df[c].fillna(0, inplace=True)
    df[c] = df[c].astype(int)

aki_grp = aki.groupby('patientunitstayid')[['creatinine', 'aki_48h', 'aki_7d']].max()
aki_grp.reset_index(inplace=True)
df = df.merge(aki_grp, how='inner', on='patientunitstayid')

df['aki'] = ((df['aki_48h'] == 1) | (df['aki_7d'] == 1)).astype(int)
print(df.shape)
df.head()

(30251, 16)


Unnamed: 0,patientunitstayid,unitdischargeoffset,age,gender,weight,height,BMI,BMI_group,apachescore,apache_group,vanco_adm,vanco_wk,creatinine,aki_48h,aki_7d,aki
0,3036317,3980,72,Male,84.8,185.4,25.0,overweight,51,51-60,0,0,1.22,0,0,0
1,3052627,4260,71,Male,57.51,172.7,19.0,normal,47,41-50,0,0,2.19,0,0,0
2,3054721,2495,37,Male,72.2,190.5,20.0,normal,53,51-60,0,0,0.57,0,1,1
3,3072232,555,> 89,Female,66.8,124.5,43.0,overweight,61,61-70,0,0,1.4,0,0,0
4,3079003,8825,88,Male,87.0,175.3,28.0,overweight,85,81-90,0,0,3.39,1,0,1


## Propensity matching

In [8]:
# Vanco + No Vanco Analysis
print('\n=== Cross-tabulation of vanco on admission vs. vanco during the week (days 2-7) ===')
display(pd.crosstab(df['vanco_adm'], df['vanco_wk'], margins=True))
print('Normalized:')
display(pd.crosstab(df['vanco_adm'], df['vanco_wk'], margins=True, normalize=True))


=== Cross-tabulation of vanco on admission vs. vanco during the week (days 2-7) ===


vanco_wk,0,1,All
vanco_adm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,24101,1746,25847
1,2309,2095,4404
All,26410,3841,30251


Normalized:


vanco_wk,0,1,All
vanco_adm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.796701,0.057717,0.854418
1,0.076328,0.069254,0.145582
All,0.873029,0.126971,1.0


### Primary analysis

* exposure: treated with vancomycin on admission to the ICU
* control: *not* treated with vancomycin, for the first 7 days of stay, starting at unit admit time
* excluded
  * patients treated with vanco later in the ICU stay, but not on admission

In [9]:
# INITIAL VANCO vs. NO VANCO
novanco = df[(df['vanco_wk'] == 0) & (df['vanco_adm'] == 0)]
vanco = df[(df['vanco_adm'] == 1)]
utils.match_and_print_or(exposure=vanco, control=novanco, seed=12938)

24101 in control group.
4404 in exposure group.

=== APACHE distribution, unmatched data ===

Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		18	2
11-20		465	36
21-30		1957	164
31-40		3484	389
41-50		4185	603
51-60		3829	703
61-70		2893	632
71-80		1894	502
81-90		1028	365
91-100		581	238
101-110		374	145
111-120		226	96
121-130		140	75
131-140		78	22
>140		82	34

Absolute Mean Difference of APACHE Score: -10.490506265657203

=== Match groups on APACHE ===

Shape of treatment group: (4006, 16)
Shape of control group: (4006, 16)
Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		2	2
11-20		36	36
21-30		164	164
31-40		389	389
41-50		603	603
51-60		703	703
61-70		632	632
71-80		502	502
81-90		365	365
91-100		238	238
101-110		145	145
111-120		96	96
121-130		75	75
131-140		22	22
>140		34	34

Absolute Mean Difference of APACHE Score: -0.21417873190215175

=== Odds ratio of exposure ===

Diseas

### Secondary analysis

Slight alterations in how the exposure and treatment groups are defined.

* exposure: treated with vancomycin on admission *and* during the week to the ICU
* control: *not* treated with vancomycin, for the first 7 days of stay, starting at unit admit time
* excluded
  * patients treated with vanco later in the ICU stay, but not on admission
  * patients treated with vanco on admission, but not later in the week

In [11]:
# Vanco + No Vanco Analysis
novanco = df[(df['vanco_wk'] == 0) & (df['vanco_adm'] == 0)]
vanco = df[(df['vanco_wk'] == 1) & (df['vanco_adm'] == 1)]
utils.match_and_print_or(exposure=vanco, control=novanco, seed=12301)

24101 in control group.
2095 in exposure group.

=== APACHE distribution, unmatched data ===

Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		18	1
11-20		465	18
21-30		1957	68
31-40		3484	183
41-50		4185	269
51-60		3829	345
61-70		2893	296
71-80		1894	247
81-90		1028	177
91-100		581	129
101-110		374	74
111-120		226	52
121-130		140	46
131-140		78	15
>140		82	20

Absolute Mean Difference of APACHE Score: -12.042524749903407

=== Match groups on APACHE ===

Shape of treatment group: (1940, 16)
Shape of control group: (1940, 16)
Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		1	1
11-20		18	18
21-30		68	68
31-40		183	183
41-50		269	269
51-60		345	345
61-70		296	296
71-80		247	247
81-90		177	177
91-100		129	129
101-110		74	74
111-120		52	52
121-130		46	46
131-140		15	15
>140		20	20

Absolute Mean Difference of APACHE Score: -0.21701030927835063

=== Odds ratio of exposure ===

Diseased + E

In [12]:
# Vanco + No Vanco Analysis
novanco = df[(df['vanco_wk'] == 0) & (df['vanco_adm'] == 0)]
vanco = df[(df['vanco_wk'] == 0) & (df['vanco_adm'] == 1)]
utils.match_and_print_or(exposure=vanco, control=novanco, seed=4765)

24101 in control group.
2309 in exposure group.

=== APACHE distribution, unmatched data ===

Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		18	1
11-20		465	18
21-30		1957	96
31-40		3484	206
41-50		4185	334
51-60		3829	358
61-70		2893	336
71-80		1894	255
81-90		1028	188
91-100		581	109
101-110		374	71
111-120		226	44
121-130		140	29
131-140		78	7
>140		82	14

Absolute Mean Difference of APACHE Score: -9.033859056487096

=== Match groups on APACHE ===

Shape of treatment group: (2066, 16)
Shape of control group: (2066, 16)
Counts of Apache Scores for Control Group and Treatment Group

ApacheGroups	Control	Treatment
0-10		1	1
11-20		18	18
21-30		96	96
31-40		206	206
41-50		334	334
51-60		358	358
61-70		336	336
71-80		255	255
81-90		188	188
91-100		109	109
101-110		71	71
111-120		44	44
121-130		29	29
131-140		7	7
>140		14	14

Absolute Mean Difference of APACHE Score: -0.25847047434655934

=== Odds ratio of exposure ===

Diseased + Expos