## A CLASSIFICATION PROJECT - THE SEPSIS CASE STUDY

#### BUSINESS UNDERSTANDING
Sepsis is a critical health condition with significant implications for patient outcomes and healthcare systems. Predicting and managing sepsis effectively can reduce mortality rates, improve 

patient recovery, and decrease healthcare costs. This project aims to leverage machine learning models and FastAPI to predict sepsis, thereby enabling timely interventions and enhancing clinical 

decision-making. This project aims to enhance the early detection and management of sepsis through a machine learning-based predictive model deployed as an API with FastAPI. By addressing a 

critical healthcare challenge, the project seeks to improve patient outcomes, optimize resource utilization, and provide valuable decision support for healthcare providers. The success of the 

project will be measured by its impact on clinical practice and its ability to deliver timely, accurate predictions in a real-world healthcare setting.

#### Business Objectives
1. Early Detection: Develop a predictive model to identify patients at risk of sepsis early, allowing for prompt intervention and treatment.

2. Reduce Mortality Rates: Use the predictive model to minimize the time to diagnosis and treatment, thereby reducing sepsis-related deaths.

3. Optimize Resource Utilization: Allocate medical resources more efficiently by identifying high-risk patients, reducing unnecessary testing and treatments for low-risk individuals.

4. Enhance Clinical Decision-Making: Provide healthcare professionals with reliable tools to support clinical decisions, improving patient care quality.

##### HYPOTHESIS
NULL HYPOTHESIS: There is no significant relationship between sepsis and PRG (Plasma/glucose).

ALTERNATE HYPOTHESIS: There is a significant relationship between sepsis and PRG (Plasma /glucose).

#### ANALYTICAL QUESTIONS
1. How is the distribution of plasma glucose (PRG) among patients who develop sepsis versus those who don't?

2. What is the correlation between blood pressure (PR) and the likelihood of sepsis development?

3. Are there any noticeable differences in body mass index (M11) between patients with and without sepsis?

4. How does age vary between patients who develop sepsis and those who don't?

5. Is there a pattern in the blood work results (PL, SK, TS, BD2) that distinguishes patients with sepsis from those without?

#### DATA UNDERSTANDING

#### Load the necessary packages

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns



#### Load the datasets

In [9]:
df = pd.read_csv("Datasets/Paitients_Files_Train.csv")
df

Unnamed: 0,ID,PRG,PL,PR,SK,TS,M11,BD2,Age,Insurance,Sepssis
0,ICU200010,6,148,72,35,0,33.6,0.627,50,0,Positive
1,ICU200011,1,85,66,29,0,26.6,0.351,31,0,Negative
2,ICU200012,8,183,64,0,0,23.3,0.672,32,1,Positive
3,ICU200013,1,89,66,23,94,28.1,0.167,21,1,Negative
4,ICU200014,0,137,40,35,168,43.1,2.288,33,1,Positive
...,...,...,...,...,...,...,...,...,...,...,...
594,ICU200604,6,123,72,45,230,33.6,0.733,34,0,Negative
595,ICU200605,0,188,82,14,185,32.0,0.682,22,1,Positive
596,ICU200606,0,67,76,0,0,45.3,0.194,46,1,Negative
597,ICU200607,1,89,24,19,25,27.8,0.559,21,0,Negative


#### Data Description: Provide a summary of the dataset attributes:
* ID: Patient identifier

* PRG: Number of pregnancies

* PL: Plasma glucose concentration

* PR: Diastolic blood pressure

* SK: Skinfold thickness

* TS: 2-Hour serum insulin

* M11: Body mass index

* BD2: Diabetes pedigree function

* Age: Age in years

* Insurance: Insurance status (binary: 0 or 1)

* Sepssis: Sepsis status (binary: Positive or Negative)