# Sepsis-3 evaluation in the MIMIC-III database

This notebook goes over the evaluation of the new Sepsis-3 guidelines in the MIMIC database. The goals of this analysis include:

1. Evaluating the Sepsis-3 guidelines in MIMIC using the same methodology as in the research paper
2. Evaluating the Sepsis-3 guidelines against ANGUS criteria
3. Assessing if there are interesting subgroup(s) which are missed by the criteria

In [1]:
from __future__ import print_function

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sepsis_utils import sepsis_utils as su
from sepsis_utils import roc_utils as ru

# used to calculate AUROC
from sklearn import metrics

# default colours for prettier plots
col = [[0.9047, 0.1918, 0.1988],
    [0.2941, 0.5447, 0.7494],
    [0.3718, 0.7176, 0.3612],
    [1.0000, 0.5482, 0.1000],
    [0.4550, 0.4946, 0.4722],
    [0.6859, 0.4035, 0.2412],
    [0.9718, 0.5553, 0.7741],
    [0.5313, 0.3359, 0.6523]];
marker = ['v','o','d','^','s','o','+']
ls = ['-','-','-','-','-','s','--','--']

%matplotlib inline

In [2]:
# load data
df = pd.read_csv('sepsis3-df.csv',sep=',')
df_mdl = pd.read_csv('sepsis3-design-matrix.csv',sep=',')

# define outcome
target_header = "angus"
y = df[target_header].values == 1

# define the covariates to be added in the MFP model (used for table of AUROCs)
preds_header = ['sirs','qsofa','sofa','mlods']

# Study questions

1. How well do the guidelines detect sepsis (Angus criteria) in the antibiotics/culture subset?
2. How well do the guidelines predict mortality (in-hospital) in the antibiotics/culture subset?
3. What factors would improve the sensitivity of the guidelines?
4. What factors would improve the specificity of the guidelines?

## Angus criteria evaluation

In [3]:
yhat = df.sepsis3.values
print('\n SEPSIS-3 guidelines for Angus criteria sepsis \n')
print('Accuracy = {}'.format(metrics.accuracy_score(y, yhat)))
su.print_cm(y, yhat, header1='ang',header2='sep3') # print confusion matrix


 SEPSIS-3 guidelines for Angus criteria sepsis 

Accuracy = 0.595797825448

Confusion matrix
      	ang=0 	ang=1 
sep3=0	  2076	  1274	NPV=61.97
sep3=1	  1477	  1979	PPV=57.26
   	58.43	60.84	Acc=59.58
   	Spec	Sens


Predictions using various levels of confounder adjustment are calculated in the subfunctions `calc_predictions`:

* `model=None` - the severity scores on their own
* `model='baseline'` - the severity scores in a vanilla regression
* `model='mfp'` -the severity scores in a fractional polynomial regression (calls an R script)

For Angus criteria we do not adjust for other factors when presenting the AUROCs.

In [4]:
preds = su.calc_predictions(df, preds_header, target_header, model=None)

In [5]:
# reproduce the AUC table
su.print_auc_table(preds, y, preds_header)
su.print_auc_table_to_file(preds, y, preds_header=preds_header,
                           filename='auc-table.csv')

     	sirs                	qsofa               	sofa                	mlods               	
sirs 	0.607 [0.594, 0.620]	0.435 [0.412, 0.456]	0.179 [0.161, 0.197]	0.227 [0.206, 0.248]	
qsofa	0.424               	0.601 [0.588, 0.613]	0.270 [0.259, 0.281]	0.356 [0.344, 0.368]	
sofa 	< 0.001               	< 0.001               	0.682 [0.670, 0.695]	0.872 [0.866, 0.877]	
mlods	< 0.001               	< 0.001               	0.600               	0.684 [0.672, 0.697]	


## Operating point statistics

This section evaluates the standard operating point statistics:

* sensitivity (% of true positives which are correctly classified)
* specificity (% of true negatives which are correctly classified)
* positive predictive value (given a positive prediction is made, what % are correct)
* negative predictive value (given a negative prediction is made, what % are correct)
* F1 score (harmonic mean of sensitivity and PPV)

In addition, we evaluate the number of false positives per 100 cases, or NFP/100. We feel this gives helpful perspective in interpretting the positive predictive value of the prediction and its relationship to the prevalance of the outcome. In this context, the measure can be summarized as: given 100 patients with suspected infection, how many will each algorithm inappropriately give a positive prediction?

In [6]:
# sepsis3 defined as qSOFA >= 2 and SOFA >= 2
yhat_all = [df.sepsis3.values,
            df.qsofa.values >= 2,
            df.sofa.values >= 2,
            df.sirs.values >= 2,
            df.mlods.values >= 2]
yhat_names = ['seps3', 'qSOFA', 'SOFA', 'SIRS', 'mLODS']

# define "targets", angus critera
y_all = [y for x in yhat_names]

stats_all = su.get_op_stats(yhat_all, y_all,
               yhat_names=yhat_names,
               header=target_header)

su.print_op_stats(stats_all,
               yhat_names=yhat_names,
               header=target_header)

Metric


     	seps3               	qSOFA               	SOFA                	SIRS                	mLODS               
TN   	 2076           	 1893           	  795           	  998           	 1651           
FP   	 1477           	 1660           	 2758           	 2555           	 1902           
FN   	 1274           	 1172           	  265           	  531           	  682           
TP   	 1979           	 2081           	 2988           	 2722           	 2571           
Sens 	60.84 [0.59, 0.63]	63.97 [0.62, 0.66]	91.85 [0.91, 0.93]	83.68 [0.82, 0.85]	79.03 [0.78, 0.80]
Spec 	58.43 [0.57, 0.60]	53.28 [0.52, 0.55]	22.38 [0.21, 0.24]	28.09 [0.27, 0.30]	46.47 [0.45, 0.48]
PPV  	57.26 [0.56, 0.59]	55.63 [0.54, 0.57]	52.00 [0.51, 0.53]	51.58 [0.50, 0.53]	57.48 [0.56, 0.59]
NPV  	61.97 [0.60, 0.64]	61.76 [0.60, 0.63]	75.00 [0.72, 0.78]	65.27 [0.63, 0.68]	70.77 [0.69, 0.73]
F1   	59.00             	59.51             	66.41             	63.82             	66.55             
NTP  	29.08