<a href="https://colab.research.google.com/github/SachinYallapurkar/Cardiovascular-Risk-Prediction/blob/main/Cardiovascular_Risk_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Project Title** :**Cardiovascular-Risk-Prediction**

##**Problem Description**

### The dataset is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has a 10-year risk of future coronary heart disease (CHD). The dataset provides the patient's information. It includes over 4,000 records and 15 attributes. Variables Each attribute is a potential risk factor. There are both demographic, behavioral, and medical risk factors.


## <u>**Data Description**</u><br><br>
### **Variables:**<br>
Each attribute is a potential risk factor. There are both demographic, behavioral, and medical risk
factors.

###**Demographic:**<br>
* <font color = green>**Sex:**</font> male or female("M" or "F")
* <font color = green>**Age:**</font> Age of the patient;(Continuous - Although the recorded ages have been truncated to
whole numbers, the concept of age is continuous)

###**Behavioral:**<br>
* <font color = 'green'>**is_smoking**:</font> whether or not the patient is a current smoker ("YES" or "NO")
* <font color = 'green'>**Cigs Per Day:**</font> the number of cigarettes that the person smoked on average in one day.(can be
considered continuous as one can have any number of cigarettes, even half a cigarette.)

###**Medical( history):**<br>
* <font color = 'green'> **BP Meds:**</font> whether or not the patient was on blood pressure medication (Nominal)
* <font color = 'green'> **Prevalent Stroke:**</font> whether or not the patient had previously had a stroke (Nominal)
* <font color = 'green'> **Prevalent Hyp:**</font> whether or not the patient was hypertensive (Nominal)
* <font color = 'green'> **Diabetes:**</font> whether or not the patient had diabetes (Nominal)

###**Medical(current):**<br>
* <font color = 'green'> **Tot Chol:**</font> total cholesterol level (Continuous)
* <font color = 'green'> **Sys BP:**</font> systolic blood pressure (Continuous)
* <font color = 'green'> **Dia BP:**</font> diastolic blood pressure (Continuous)
* <font color = 'green'>**BMI:**</font> Body Mass Index (Continuous)
* <font color = 'green'>**Heart Rate:**</font> heart rate (Continuous - In medical research, variables such as heart rate though in
fact discrete, yet are considered continuous because of large number of possible values.)
* <font color = 'green'>**Glucose:**</font> glucose level (Continuous)

###**Predict variable (desired target):**<br>
 10-year risk of <font color = 'green'>**coronary heart disease CHD**</font>(binary: “1”, means “Yes”, “0” means “No”) -
DV

## **Importing Libraries**

In [1]:
#importing necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import missingno as msno
import warnings
warnings.filterwarnings('ignore')
import plotly.graph_objects as go
from imblearn.over_sampling import SMOTE 
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline
sns.set_style('darkgrid')
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score,recall_score,accuracy_score,f1_score,confusion_matrix,roc_auc_score,classification_report,plot_confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

## **Data Inspection**

In [2]:
#Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [3]:
df = pd.read_csv('/content/drive/MyDrive/Cardiovascular-Risk-Prediction/data_cardiovascular_risk.csv')

In [4]:
df

Unnamed: 0,id,age,education,sex,is_smoking,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose,TenYearCHD
0,0,64,2.0,F,YES,3.0,0.0,0,0,0,221.0,148.0,85.0,,90.0,80.0,1
1,1,36,4.0,M,NO,0.0,0.0,0,1,0,212.0,168.0,98.0,29.77,72.0,75.0,0
2,2,46,1.0,F,YES,10.0,0.0,0,0,0,250.0,116.0,71.0,20.35,88.0,94.0,0
3,3,50,1.0,M,YES,20.0,0.0,0,1,0,233.0,158.0,88.0,28.26,68.0,94.0,1
4,4,64,1.0,F,YES,30.0,0.0,0,0,0,241.0,136.5,85.0,26.42,70.0,77.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3385,3385,60,1.0,F,NO,0.0,0.0,0,0,0,261.0,123.5,79.0,29.28,70.0,103.0,0
3386,3386,46,1.0,F,NO,0.0,0.0,0,0,0,199.0,102.0,56.0,21.96,80.0,84.0,0
3387,3387,44,3.0,M,YES,3.0,0.0,0,1,0,352.0,164.0,119.0,28.92,73.0,72.0,1
3388,3388,60,1.0,M,NO,0.0,,0,1,0,191.0,167.0,105.0,23.01,80.0,85.0,0
