# Understanding the Problem Statement

World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. Thus, it is crucial to predict and avoid heart failure strokes. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications.

#### Proposed Approach

The proposed approach is called Complex Event Processing for Heart Failure Prediction (CEP4HFP). <br>
It is based on the use of the methodology of Complex Event Processing (CEP), combined with statistical approaches. <br>
1) collects health parameters, <br>
2) processes collected data by executing analysis rules and <br>
3) triggers alarms if a heart failure has been detected.

## Data Preparation

The dataset used is from **Framingham Heart Study**, and it is from an ongoing ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The aim of this study is to predict whether the patient has 10-year risk of future Coronary Heart Disease (CHD). The dataset provides the patient's information which includes over 4,000 records and 15 attributes.

In [1]:
import pandas as pd

chd_data = pd.read_csv('Data/data.csv')

In [2]:
print(chd_data.head())

   male  age  education  currentSmoker  cigsPerDay  BPMeds  prevalentStroke  \
0     1   39        4.0              0         0.0     0.0                0   
1     0   46        2.0              0         0.0     0.0                0   
2     1   48        1.0              1        20.0     0.0                0   
3     0   61        3.0              1        30.0     0.0                0   
4     0   46        3.0              1        23.0     0.0                0   

   prevalentHyp  diabetes  totChol  sysBP  diaBP    BMI  heartRate  glucose  \
0             0         0    195.0  106.0   70.0  26.97       80.0     77.0   
1             0         0    250.0  121.0   81.0  28.73       95.0     76.0   
2             0         0    245.0  127.5   80.0  25.34       75.0     70.0   
3             1         0    225.0  150.0   95.0  28.58       65.0    103.0   
4             0         0    285.0  130.0   84.0  23.10       85.0     85.0   

   TenYearCHD  
0           0  
1           0  
2 

In [3]:
print("Total number of records in the dataset = {}".format(chd_data.shape[0]))
print("Total number of features in the dataset = {}".format(chd_data.shape[1]))

Total number of records in the dataset = 4240
Total number of features in the dataset = 16


### Features from the data

Each feature is a potential risk factor. There are all demographic, behavioural and medical risk factors.<br>
- **Demographic:** <br>
1) `sex`: male or female; (Nominal/Catogrical) <br>
2) `age`: age of the patient; (Continuous - Although the recorded ages have been truncated to whole numbers)<br>
3) `education`: level of education of patient (Nominal) <br>
- **Behavioural:** <br>
4) `currentSmoker`: whether or not the patient is a current smoker (Nominal) <br>
5) `cigsPerDay`: the number of cigarettes that the person smoked on average in one day.(Continuous) <br>
- **Medical (History):** <br>
6) `BPMeds`: whether or not the patient was on blood pressure medication (Nominal) <br>
7) `prevalentStroke`: whether or not the patient had previously had a stroke (Nominal) <br>
8) `prevalentHyp`: whether or not the patient was hypertensive (Nominal) <br>
9) `diabetes`: whether or not the patient had diabetes (Nominal) <br>
- **Medical (Current):** <br>
10) `totChol`: total cholesterol level (Continuous) <br>
11) `sysBP`: systolic blood pressure (Continuous) <br>
12) `diaBP`: diastolic blood pressure (Continuous) <br>
13) `BMI`: Body Mass Index (Continuous) <br>
14) `heartRate`: heart rate (Continuous) <br>
15) `glucose`: glucose level (Continuous) <br>
- **Target (CHD):** <br>
16) `TenYearCHD`: 10 year risk of coronary heart disease CHD (binary) (“1”, means “Yes”, “0” means “No”)

### Machine Learning Pipeline

![ML_Pipeline.png](attachment:ML_Pipeline.png)