## P5-Sepsis-Prediction: Buiding classification model to predict Sepsis

### Business Understanding

##### `Project Overview`
The primary objective of the "P5 Sepsis Prediction" project is to develop a robust machine learning model that predicts the likelihood of sepsis in ICU patients. Sepsis is a life-threatening condition caused by the body's response to infection, and early detection is critical for patient survival. By embedding the predictive model in an API, we aim to provide healthcare providers with a powerful tool for real-time sepsis prediction, enhancing decision-making and patient outcomes.


##### `Project Goal`
Build a classification Model to predict the likelihhood of sepsis in ICU patients.


##### `Business Objectives`
- Early Detection of Sepsis: Improve patient outcomes by predicting sepsis early, enabling timely intervention and treatment.
- Enhanced Clinical Decision Support: Provide healthcare professionals with actionable insights through an easily integrable API, supporting clinical decisions in real-time.
- Optimized Resource Allocation: Help healthcare facilities optimize the allocation of resources by identifying high-risk patients who may require intensive monitoring and care.


##### `Source of Data`
The dataset provided for this project is a modified version of a publicly available data source from Johns Hopkins University from Kaggle. It includes various patient attributes and their corresponding sepsis status. The dataset is subject to strict usage restrictions and can only be used for the purpose of this assignment.


##### `Key Stakeholders`
- Healthcare Providers: Doctors, nurses, and other medical staff who will use the sepsis prediction API to make informed clinical decisions.
- Hospital Administrators: Individuals responsible for resource management and policy implementation in healthcare facilities.
- Data Scientists and Developers: Team members involved in the development, training, and deployment of the machine learning model and API.


##### `Success Criteria`
- Accuracy: The model's should obtain an accuracy of 85% or higher.
- Precision and Recall:The final model should maintain both Precision and Recall scores of 0.75 or above.
- F1 Score: The final model should attain an F1 score of 0.75 to 0.85 or higher according to state-of-the-art SOTA models
Area Under the Receiver Operating Characteristic Curve (AUC-ROC): According to the state-of-the-art SOTA models for sepsis prediction should achieve AUC-ROC scores in the range of 0.85 to 0.90 or higher.


##### `Data Dictionary`

| Column Name       | Attribute/Target | Data Type | Description                                                                                 |
|-------------------|------------------|------------|---------------------------------------------------------------------------------------------|
| **ID**            | N/A              | Integer    | Unique identifier for each patient.                                                         |
| **PRG**           | Attribute        | Float      | Plasma glucose: Measurement of plasma glucose levels.                                       |
| **PL**            | Attribute        | Float      | Blood Work Result-1: Blood work result in mu U/ml.                                          |
| **PR**            | Attribute        | Float      | Blood Pressure: Measurement of blood pressure in mm Hg.                                     |
| **SK**            | Attribute        | Float      | Blood Work Result-2: Blood work result in mm.                                               |
| **TS**            | Attribute        | Float      | Blood Work Result-3: Blood work result in mu U/ml.                                          |
| **M11**           | Attribute        | Float      | Body Mass Index: BMI calculated as weight in kg/(height in m)^2.                            |
| **BD2**           | Attribute        | Float      | Blood Work Result-4: Blood work result in mu U/ml.                                          |
| **Age**           | Attribute        | Integer    | Age: Age of the patient in years.                                                           |
| **Insurance**     | N/A              | Boolean    | Insurance: Indicates whether the patient holds a valid insurance card.                      |
| **Sepsis**        | Target           | Boolean    | Sepsis: Target variable indicating whether the patient will develop sepsis (Positive) or not (Negative). |


##### `Hypothesis Statement`
`Null Hypothesis (Ho)`: There is no correlation between old age and individual's likelihood of developing sepsis.
`Altenatenate Hypothesis (Ha)`: There is a statistically significant correlation between old age and individual's likelihood of developing sepsis.


##### `Analytical Questions`
1. Are elderly people at a higher risk of developing sepsis compared to younger individuals?
2. Are patients with a high BMI at a greater risk of developing sepsis?
3. Does having a valid insurance card reduce the likelihood of developing sepsis?
4. How does high blood pressure affect the likelihood of developing sepsis?
5. Can the likelihood of sepsis be predicted based on a patient's blood work results?
6. Does higher plasma glucose increase the likelihood of developing sepsis?


### Data Understanding

#### Importations

In [1]:
print("hello world")

hello world
