<a href="https://colab.research.google.com/github/DavidCastro88/PredictDeathCovid-19/blob/main/Predictive_Mortality_Risk_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Predictive Mortality Risk Model: Analysis of Symptoms and Medical Data in Patients with COVID-19***

### **Object:**

The central purpose of this project lies in the construction of an advanced machine learning model aimed at predicting, based on the symptoms, condition and current medical history of a patient affected by COVID-19, the probability that said patient will present a significantly elevated risk. The primary intent is to provide an effective tool that allows proactive assessment of mortality risk in specific patients, based on relevant and individual-specific information. This approach seeks not only to diagnose the presence of the virus, but also to anticipate the degree of severity associated with the patient's condition, which can be instrumental in making medical decisions and allocating resources more efficiently.

In [1]:
#Common libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Statistics & Mathematics
import scipy.stats as stats
import statsmodels.api as sm

# Preprocessing data
from sklearn.preprocessing import MinMaxScaler,StandardScaler

# Model Selection for Cross Validation
from sklearn.model_selection import train_test_split

# Machine Learning metrics
from sklearn import metrics

# ML classifiers
from sklearn.ensemble import (
    HistGradientBoostingClassifier, AdaBoostClassifier,
    RandomForestClassifier, GradientBoostingClassifier,
    StackingClassifier, VotingClassifier
    )
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# ML Regresión
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor

#Metrics Clasificaction
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score,
                             confusion_matrix, classification_report, roc_auc_score, roc_curve,
                             precision_recall_curve, average_precision_score
)

#Metricas Regression
from sklearn.metrics import (mean_squared_error, mean_absolute_error, r2_score,
                             explained_variance_score, max_error, mean_poisson_deviance, mean_gamma_deviance
                             )
# Randomizer
import random

# Encoder of categorical variables
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

## **1. Recopliación de datos históricos**


In [2]:
data= pd.read_csv('https://raw.githubusercontent.com/DavidCastro88/PredictDeathCovid-19/main/Covid_Data.csv', sep=',',decimal='.')

In [3]:
data.head()

Unnamed: 0,USMER,MEDICAL_UNIT,SEX,PATIENT_TYPE,DATE_DIED,INTUBED,PNEUMONIA,AGE,PREGNANT,DIABETES,...,ASTHMA,INMSUPR,HIPERTENSION,OTHER_DISEASE,CARDIOVASCULAR,OBESITY,RENAL_CHRONIC,TOBACCO,CLASIFFICATION_FINAL,ICU
0,2,1,1,1,03/05/2020,97,1,65,2,2,...,2,2,1,2,2,2,2,2,3,97
1,2,1,2,1,03/06/2020,97,1,72,97,2,...,2,2,1,2,2,1,1,2,5,97
2,2,1,2,2,09/06/2020,1,2,55,97,1,...,2,2,2,2,2,2,2,2,3,2
3,2,1,1,1,12/06/2020,97,2,53,2,2,...,2,2,2,2,2,2,2,2,7,97
4,2,1,2,1,21/06/2020,97,2,68,97,1,...,2,2,1,2,2,2,2,2,3,97


The data set was provided by the Mexican government ([Link](https://datos.gob.mx/busca/dataset/informacion-referente-a-casos-covid-19-en-mexico)). This data set contains a huge amount of anonymous information related to the patient, including previous conditions. The raw data set consists of 21 unique features and 1,048,576 unique patients.

### ***Explanation of variables***

1(Yes), 2(No), 97 and 99 (missing data).

USMER: Indicator that classifies whether a patient has been treated in first, second or third level medical units. (1,2 or 3).

MEDICAL_UNIT: Type of institution of the National Health System that provided the care. (1-13)

SEX: 1 - female. 2 - male.

PATIENT_TYPE: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.

DATA_DIED: If the patient died indicate the date of death, and 9999-99-99 otherwise.

INTUBED: Whether the patient was connected to the ventilator.

PNEUMONIA: whether the patient already have air sacs inflammation or not.

AGE: age of the patient.

PREGNANT: whether the patient is pregnant or not.

DIABETES: whether the patient has diabetes or not.

COPD: whether the patient has Chronic obstructive pulmonary disease or not.

ASTHMA: Whether the patient has asthma or not.

INMSUPR: Whether the patient is immunosuppressed or not.

HIPERTENSION: Whether the patient has hypertension or not.

OTHER_DISEASE: Whether the patient has other disease or not.

CARDIOVASCULAR: Whether the patient has heart or blood vessels related disease.

OBESITY: Whether the patient is obese or not.

RENAL_CHRONIC: Whether the patient has chronic renal disease or not.

TOBACCO: Whether the patient is a tobacco user.

CLASIFFICATION_FINAL: Covid test results. see data description, classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.




In [4]:
data['INTUBED'].unique()

array([97,  1,  2, 99])

In [5]:
data.dtypes

USMER                    int64
MEDICAL_UNIT             int64
SEX                      int64
PATIENT_TYPE             int64
DATE_DIED               object
INTUBED                  int64
PNEUMONIA                int64
AGE                      int64
PREGNANT                 int64
DIABETES                 int64
COPD                     int64
ASTHMA                   int64
INMSUPR                  int64
HIPERTENSION             int64
OTHER_DISEASE            int64
CARDIOVASCULAR           int64
OBESITY                  int64
RENAL_CHRONIC            int64
TOBACCO                  int64
CLASIFFICATION_FINAL     int64
ICU                      int64
dtype: object