<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Early Prediction of Sepsis
     </p>

<b>Problem Statement:</b> <p> Sepsis is a life-threatening condition that occurs when the body's response to infection causes tissue damage, organ failure, or death (Singer et al., 2016). In the U.S., nearly 1.7 million people develop sepsis and 270,000 people die from sepsis each year; over one third of people who die in U.S. hospitals have sepsis (CDC).</p>

<p>Early detection and antibiotic treatment of sepsis are critical for improving sepsis outcomes, where each hour of delayed treatment has been associated with roughly an 4-8% increase in mortality.</p>

<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Importing Libraries
     </p>

In [None]:
#  import the necessary libraries and load the files needed for Exploratory Data Analysis

import pandas as pd  # data manipulation
import numpy as np   # linear algebra
import seaborn as sns 
import matplotlib.pyplot as plt # matplotlib for plotting graphs

# %matplotlib inline renders plot inline on your page
%matplotlib inline

In [None]:
pip install hvplot

In [None]:
import hvplot.pandas
from scipy import stats

<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Loading Data
     </p>

In [None]:
# read the csv file and load to a dataframe
#Pre-requisite

#Location of dataset file
file = "../input/prediction-of-sepsis/Dataset.csv"

df = pd.read_csv(file)

<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Overview of Data
     </p>

In [None]:
# By default , python displays few rows and columns. 
# Set the below parameters so that all rows and columns would be visible

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

In [None]:
#Calculate the number of rows and columns in the sheet

df.shape

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
# display datatype of each column
df.dtypes

In [None]:
#check the statistics of all columns

df.describe(include="all",datetime_is_numeric=True)

Important parameters like minimum value , maximum value , count , mean etc of all the columns can be seen using describe command.

columns belong to int , float , datetime and object datatypes.

<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Analysis of Missing Values
     </p>

In [None]:
round(100*(df.isnull().sum()/len(df.index)),2).plot.bar(figsize=(15,5))

### For all lab values , more than 80% of data are missing

As we can see from the plot, it is a case of severe data imbalance . There are multiple methods that we could try to balance it (Over sampling or Undersampling) or even proceed without balancing
but we chose to select only the patients who contracted sepsis before or after admission to ICU

<p style="padding: 10px;
          color:#FFFFFF;
          font-weight: bold;
          text-align: center;
          background-color:#006400;
          font-size:260%;">
Data Preprocessing
     </p>

<p style="padding: 10px;
          color:#000000;
          font-weight: bold;
          text-align: center;
          background-color:#FFFFFF;
          font-size:150%;">
Divide patients to 3 types - NonSepsis, SepsisAfterAdm , SepsisBeforeAdm
     </p>

In [None]:
# get the list of patients who have sepsis
septic_shock_patients=df['Patient_ID'][df['SepsisLabel']==1].unique()

In [None]:
# construct dataframe of patients with sepsis alone
septic_df=df[df.Patient_ID.isin(septic_shock_patients)]

In [None]:
# get the list of patients who have sepsis before admission to ICU
admitted_with_sepsis_patients=df['Patient_ID'][(df['SepsisLabel']==1) & (df['Hour']==0)]

In [None]:
# construct dataframe of patients who have sepsis before admission to ICU
admitted_with_sepsis_df=df[df.Patient_ID.isin(admitted_with_sepsis_patients)]

In [None]:
# construct dataframe of patients who have sepsis after admission to ICU
sepsis_after_adm_df=septic_df.merge(admitted_with_sepsis_df, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']

In [None]:
# construct dataframe of patients who have no sepsis
non_septic_df = df.merge(septic_df, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']

In [None]:
# add a new column sepsisType in dataframe with value null
df['sepsisType']=np.nan

In [None]:
# update sepsisType to SepsisBeforeAdm for patients who were admitted to ICU with sepsis
df.loc[df.Patient_ID.isin(admitted_with_sepsis_patients), 'sepsisType'] = 'SepsisBeforeAdm'

In [None]:
# update sepsisType to SepsisAfterAdm for patients who contracted sepsis after admission to ICU
df.loc[df.Patient_ID.isin(septic_shock_patients) & df['sepsisType'].isnull(), 'sepsisType'] = 'SepsisAfterAdm'

In [None]:
# update sepsisType to NonSepsis for patients who never got Sepsis
df.loc[df['sepsisType'].isnull(), 'sepsisType'] = 'NonSepsis'

In [None]:
df['sepsisType'].value_counts()

In [None]:
df['sepsisType'].value_counts().plot.bar()

In [None]:
df.head()

In [None]:
#total no:of patients
len(pd.unique(df['Patient_ID']))

In [None]:
#total number of septic patients whose are sepsis after admission
len(pd.unique(sepsis_after_adm_df['Patient_ID']))

len(pd.unique(non_septic_df['Patient_ID']))

In [None]:
#total no:of septic patients
len(pd.unique(septic_df['Patient_ID']))

In [None]:
#total number of septic patients whose are sepsis before admission
len(pd.unique(admitted_with_sepsis_df['Patient_ID']))

<p style="padding: 10px;
          color:#000000;
          font-weight: bold;
          text-align: center;
          background-color:#FFFFFF;
          font-size:150%;">
Calculate SIRS
     </p>

In [None]:
#df['hasSIRS'] = np.where(df['Temp']!= '[]', True, False)
condition=(np.isnan(df['Temp'])& np.isnan(df['HR'])& np.isnan(df['Resp']) &np.isnan(df['WBC']))

In [None]:
df['hasSIRS'] = np.where(condition, 1, 0)

In [None]:
df['hasSIRS'].value_counts()

In [None]:
condition_temp=(df['Temp']>38) |( df['Temp']<36)
condition_HR=(df['HR']>90)
condition_Resp=(df['Resp']>20) |( df['PaCO2']<32)
condition_wbc=(df['WBC']>12000) |( df['WBC']<4000)

In [None]:
df['SIRS_Score']=np.where(condition_temp, 1, 0)+np.where(condition_HR, 1, 0)+np.where(condition_Resp, 1, 0)+np.where(condition_wbc, 1, 0)

In [None]:
df['SIRS_Score'].value_counts()

In [None]:
df['SIRS_Score'].value_counts().plot.bar()

<p style="padding: 10px;
          color:#000000;
          font-weight: bold;
          text-align: center;
          background-color:#FFFFFF;
          font-size:150%;">
Export CSV with SepsisType and SIRS Score
     </p>

In [None]:
#df.to_csv('sepsistype_updated_df.csv')