Sepsis is a critical medical condition that arises when the body's reaction to an infection becomes dysfunctional, resulting in widespread inflammation, organ impairment, and the risk of fatality. The analysis and modeling of sepsis hold utmost significance in the field of healthcare, as the early identification and timely intervention can notably enhance patient prognosis.

### Columns description of the data

These columns contain various measurements, demographic information, and indicators related to the presence of sepsis.

ID : Unique number to represent patient ID

PRG : Plasma glucose

PL : Blood Work Result-1 (mu U/ml)

PR : Blood Pressure (mm Hg)

SK : Blood Work Result-2 (mm)

TS : Blood Work Result-3 (mu U/ml)

M11 : Body mass index (weight in kg/(height in m)^2

BD2 : Blood Work Result-4 (mu U/ml)

Age : Patients age (years)

Insurance : If a patient holds a valid insurance card

Sepsis : Positive: if a patient in ICU will develop a sepsis , and Negative: otherwise

### Business Questions

1. What are the contributing factors to the onset of sepsis in the provided dataset?

2. Is there a significant disparity in the age distribution between patients who have tested positive for sepsis and those who have not?

3. Can we discern any patterns or connections between the status of insurance coverage and the occurrence of sepsis?

4. Are there any noteworthy distinctions in the prevalence of sepsis cases based on specific medical parameter levels?

5. How are various medical measurements (e.g., pregnancy, blood pressure, glucose levels) correlated with the presence of sepsis?

6. Is there any link between the presence of sepsis and specific demographic factors like gender or age?

7. Is it possible to investigate the association between sepsis and other variables such as BMI, diabetes pedigree function, or the length of hospital stay?

### Hypothesis

##### Null Hypothesis (H0): The features PRG, PL, PR, SK, TS, M11, BD2, Age, and Insurance do not have predictive significance for the presence or absence of Sepsis disease.



##### Alternative Hypothesis (H1): The features PRG, PL, PR, SK, TS, M11, BD2, Age, and Insurance do have predictive significance for the presence or absence of Sepsis disease.

### IMPORTATION OF LIBRARIES & PACKAGES

In [None]:
#  Data Handling 
import pandas as pd
import numpy as np

# Visualisation
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

### DATA HANDLING

In [None]:
#Loading the train and Eval dataset 
train_df = pd.read_csv('data\Paitients_Files_Train.csv')
test_df = pd.read_csv('data\Paitients_Files_Test.csv')

### EXPLORATORY DATA ANALYSIS

In [None]:
print(train_df.head())
print(test_df.head())

In [None]:
#renames colummns with actual definition headers 
new_column_names = {'PRG':'Plasma_Glucose',
               'PL': 'Blood_Work_Result1',
               'PR': 'Blood_Pressure',
               'SK': 'Blood_Work_Result2',
               'TS': 'Blood_Work_Result3',
               'M11': 'Body_mass_index',
               'BD2': 'Blood_Work_Result4'
}

train_df.rename(columns = new_column_names, inplace = True)
test_df.rename (columns = new_column_names, inplace = True)

In [None]:
train_df.info()