# Lactate Discordance Project
## Subcohort Mortality
### C.V. Cosgriff, MIT Critical Data

In this notebook we briefly examine the mortality with respect to lactate discordance and disease severity that we built the classifiers for; the value of predicting who these patients are is in outcomes and so the key question we seek to answer in this notebook is: in our two subpopulations, how does discordance affect mortality?

Because we did not initially extract mortality data out study outset, we'll pull the ICU death status for our cohort from the database directly. We'll then build a logistic regression model adjusting for baseline covariates.

## Step 0: Envrionment Setup

In [1]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from statsmodels.formula.api import logit
from scipy import stats # fix so you can use logit summary
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)
import psycopg2

# postgres envrionment setup; placeholds here, place your own info
sqluser = 'mimicuser'
userpass = 'harvardmit2018'
dbname = 'eicu'
schema_name = 'eicu_crd'
host = '10.8.0.1'

query_schema = 'SET search_path TO ' + schema_name + ';'

# connect to the database
con = psycopg2.connect(dbname = dbname, user = sqluser, host = host, password = userpass)

# "Tableau 20" colors as RGB.   
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),    
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),    
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),    
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),    
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]  
  
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.    
for i in range(len(tableau20)):    
    r, g, b = tableau20[i]    
    tableau20[i] = (r / 255., g / 255., b / 255.)

marker = ['v','o','d','^','s','>','+']
ls = ['-','-','-','-','-','s','--','--']

# configure matplotlib
plt.rcParams.update({'font.size': 22})
plt.style.use('classic')
plt.rcParams.update({'figure.max_open_warning': 0})

# configure jupyter for using matplotlib
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

cohort = pd.read_csv('../cleaned_cohort.csv')

  from pandas.core import datetools


## Step 1: Extract Mortality

We begin by extracting mortality from the database.

In [27]:
query = query_schema + '''
SELECT patientunitstayid
    , CASE
    WHEN actualicumortality = 'EXPIRED' THEN 1
    WHEN actualicumortality = 'ALIVE' THEN 0
    ELSE null
    END AS mortality
    
FROM apachepatientresult
WHERE apacheversion = 'IVa'; -- or else we'll get double results for every patient
'''

icu_mortality = pd.read_sql_query(query, con)

Next, we join these data to our cohort data.

In [29]:
cohort_mortality = cohort.merge(icu_mortality, on = 'patientunitstayid')

Finally, since we are building models, we'll get our data in to proper form.

In [30]:
# extra index we don't need from csv
cohort_mortality = cohort_mortality.drop('Unnamed: 0', axis = 1)

# rename ethnicity labels to make them like variable names
cohort_mortality.loc[cohort_mortality.ethnicity == 'African American', 'ethnicity'] = 'african_american'
cohort_mortality.loc[cohort_mortality.ethnicity == 'Native American', 'ethnicity'] = 'native_american'
cohort_mortality.loc[cohort_mortality.ethnicity == 'Other/Unknown', 'ethnicity'] = 'other'
cohort_mortality.loc[cohort_mortality.ethnicity == 'Asian', 'ethnicity'] = 'asian'
cohort_mortality.loc[cohort_mortality.ethnicity == 'Caucasian', 'ethnicity'] = 'caucasian'
cohort_mortality.loc[cohort_mortality.ethnicity == 'Hispanic', 'ethnicity'] = 'hispanic'
eth = pd.get_dummies(cohort_mortality.ethnicity, prefix = 'eth')
eth = eth.drop('eth_caucasian', axis = 1)
cohort_mortality = pd.concat([cohort_mortality, eth], axis = 1)
cohort_mortality = cohort_mortality.drop('ethnicity', axis = 1)

# set index to stay ID
cohort_mortality = cohort_mortality.set_index('patientunitstayid')

## Step 2: Low APACHE IVa, Mortality Model

We'll now build a mortality model for lactate discordance. In this population lactate discordance means higher than would be expected in the non-critically ill. We'll adjust for age, gender, ethnicity, ventilation, pressors, and APACHE IVa score. We'll model use outcome regression of the form: $$logit\,Pr[Y=1|X,L] = \beta_0 + \beta_1X^T + \beta_2L^T$$

In [47]:
low_apache = cohort_mortality.loc[cohort_mortality.apache_quartile == cohort_mortality.apache_quartile.min(), :]
mortality_glm = logit(formula = '''mortality ~ lactate_discordance + age  + male_gender + eth_african_american + 
                       + eth_native_american + eth_other + eth_asian + eth_hispanic + apachescore + oobintubday1 +
                       pressor''', data = low_apache).fit()
mortality_glm.summary()

Optimization terminated successfully.
         Current function value: 0.061115
         Iterations 21


0,1,2,3
Dep. Variable:,mortality,No. Observations:,13204.0
Model:,Logit,Df Residuals:,13192.0
Method:,MLE,Df Model:,11.0
Date:,"Sun, 06 May 2018",Pseudo R-squ.:,0.05763
Time:,10:24:21,Log-Likelihood:,-806.96
converged:,True,LL-Null:,-856.31
,,LLR p-value:,3.241e-16

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-7.9053,0.539,-14.672,0.000,-8.961,-6.849
lactate_discordance[T.True],0.7705,0.199,3.875,0.000,0.381,1.160
age,0.0295,0.006,5.101,0.000,0.018,0.041
male_gender,0.3335,0.165,2.017,0.044,0.009,0.658
eth_african_american,-0.0176,0.264,-0.067,0.947,-0.535,0.500
eth_native_american,-59.5788,9.35e+12,-6.37e-12,1.000,-1.83e+13,1.83e+13
eth_other,0.3967,0.321,1.237,0.216,-0.232,1.025
eth_asian,-0.2081,0.719,-0.290,0.772,-1.617,1.201
eth_hispanic,0.0199,0.423,0.047,0.962,-0.810,0.850


We note a strong signal for `lactate_discordance` and we can convert its coefficient and 95% CI to OR.

In [57]:
coef = mortality_glm.params
conf = mortality_glm.conf_int()
conf['OR'] = coef
conf.columns = ['2.5%', '97.5%', 'OR']
print(np.exp(conf[1:2]))

                                 2.5%     97.5%        OR
lactate_discordance[T.True]  1.463416  3.190797  2.160894


After adjustment for baseline covariates, we note lactate discordance is associated with increased mortality in this subpopulation.

## Step 3: High APACHE IVa, Mortality Model

We'll now build a mortality model for lactate discordance in the high APACHE IVa population. In this population lactate discordance means lower than would be expected in the severely critically ill; we might hypothesize that it would be protective. We'll adjust for age, gender, ethnicity, ventilation, pressors, and APACHE IVa score. We'll model use outcome regression of the form: $$logit\,Pr[Y=1|X,L] = \beta_0 + \beta_1X^T + \beta_2L^T$$

In [59]:
high_apache = cohort_mortality.loc[cohort_mortality.apache_quartile == cohort_mortality.apache_quartile.max(), :]
mortality_glm = logit(formula = '''mortality ~ lactate_discordance + age  + male_gender + eth_african_american + 
                       + eth_native_american + eth_other + eth_asian + eth_hispanic + apachescore + oobintubday1 +
                       pressor''', data = high_apache).fit()
mortality_glm.summary()

Optimization terminated successfully.
         Current function value: 0.522754
         Iterations 6


0,1,2,3
Dep. Variable:,mortality,No. Observations:,12244.0
Model:,Logit,Df Residuals:,12232.0
Method:,MLE,Df Model:,11.0
Date:,"Sun, 06 May 2018",Pseudo R-squ.:,0.1301
Time:,10:31:54,Log-Likelihood:,-6400.6
converged:,True,LL-Null:,-7357.7
,,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-4.4646,0.160,-27.892,0.000,-4.778,-4.151
lactate_discordance[T.True],-0.7902,0.055,-14.484,0.000,-0.897,-0.683
age,0.0010,0.001,0.646,0.519,-0.002,0.004
male_gender,0.0424,0.044,0.967,0.334,-0.044,0.128
eth_african_american,-0.1910,0.068,-2.804,0.005,-0.324,-0.057
eth_native_american,0.0859,0.205,0.419,0.675,-0.316,0.488
eth_other,-0.0504,0.101,-0.496,0.620,-0.249,0.149
eth_asian,0.0246,0.177,0.139,0.889,-0.322,0.371
eth_hispanic,-0.2755,0.123,-2.233,0.026,-0.517,-0.034


We note a strong signal for `lactate_discordance` and we can convert its coefficient and 95% CI to OR.

In [60]:
coef = mortality_glm.params
conf = mortality_glm.conf_int()
conf['OR'] = coef
conf.columns = ['2.5%', '97.5%', 'OR']
print(np.exp(conf[1:2]))

                                 2.5%     97.5%       OR
lactate_discordance[T.True]  0.407725  0.504948  0.45374


As we hypothesized, here lactate is associated with decreased mortality.