# Introduction

## Data Description

Classification project - Cardio Catch Diseases - Predcting Cardiovascular Diseases

Data description  
There are 3 types of input features:  

Objective: factual information;  
Examination: results of medical examination;  
Subjective: information given by the patient.  

Age | Objective Feature | age | int (days)  
Height | Objective Feature | height | int (cm) |  
Weight | Objective Feature | weight | float (kg) |  
Gender | Objective Feature | gender | categorical code |  
Systolic blood pressure | Examination Feature | ap_hi | int |  
Diastolic blood pressure | Examination Feature | ap_lo | int |  
Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |  
Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |  
Smoking | Subjective Feature | smoke | binary |  
Alcohol intake | Subjective Feature | alco | binary |  
Physical activity | Subjective Feature | active | binary |  
Presence or absence of cardiovascular disease | Target Variable | cardio | binary |  
All of the dataset values were collected at the moment of medical examination.  

### Systolic and Diastolic Blood Pressure

Blood pressure is the pressure of blood pushing against the walls of your arteries. Arteries carry blood from your heart to other parts of your body (CDC.gov). The image below represents the blood pressure

![image-2.png](attachment:image-2.png)

According to *heart.org*, the image below represents the range values for the categories between what is considered "normal" and "hypertensive crisis", as recommended by the American Heart Association  
As displayed in the image, the blood pressure consists on both information of systolic and dialostic

![image.png](attachment:image.png)

- Systolic Blood Pressure - Indicates how much pressure your blood is exerting against your artery walls when the heart beats.
- Diastolic Blood Pressure - Indicates how much pressure your blood is exerting against your artery walls while the heart is resting between beats.

According to *heart.org*, more attention is given to **Systolic Blood Pressure**, as a major risk factor for cardiovascular disease for people over 50.

#### Data Information
- Type: Int
- Range: 
    - Systolic
    
### Cholesterol

https://www.cdc.gov/cholesterol/about.htm

## Project Methodology - CRISP-DM

We are going to use the CRISP-DM method to build this project

![image.png](attachment:image.png)

### Business Case

#### Key Facts
- Cardiovascular Diseases (CVDs) are the leading cause of death globally.
- 17.9 million people died from CVDs in 2019, representing 32% of all global deaths. Of these, 85% were due to heart attack and stroke.
- Over three quarters of CVD deaths take place in low-and middle-income countries.
    - This particular information is very useful if we were going to analyze different database from different locations
- Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol.
    - This information brings to us an important hypothesis: That those unhealthy factors have a significant correlation with CVDs

#### What are CVDs
Basically, there are a lot of types of CVDs, such as *coronary heart disease, cerebrovascular disease, congenital heart disease*, etc.  

Heart attacks and strokes are usually acute events and are mainly caused by a blockage that prevents blood from flowing to the heart or brain. The most common reason for this is a build-up of fatty deposits on the inner walls of the blood vessels that supply the heart or brain. (World Health Organization, 2021))

Cessation of tobacco use, reduction of salt in the diet, eating more fruit and vegetables, regular physical activity and avoiding harmful use of alcohol have been **shown to reduce the risk of cardiovascular disease**. Other determinants of CVDs include poverty, stress and hereditary factors. (World Health Organization, 2021)
- Again, these informations could be very helpful if we were going to analyze other factors, like PIB and location. But that's not the case here.  

#### How CVDs can be reduced?
The key to cardiovascular disease reduction lies in the inclusion of cardiovascular disease management interventions in universal health coverage packages, although in a high number of countries health systems require significant investment and reorientation to effectively manage CVDs. (World Health Organization, 2021)

Evidence from 18 countries has shown that hypertension programmes can be implemented efficiently and cost-effectively at the primary care level which will ultimately result in reduced coronary heart disease and stroke.
- This information can be very useful if there are any feature that can be directed related to hypertension.

# Imports

In [1]:
import pandas as pd

# Upload

In [5]:
# upload the raw data
data_raw = pd.read_csv('cardio_train.csv',delimiter=';')

In [6]:
data_raw.head()

Unnamed: 0,id,age,gender,height,weight,ap_hi,ap_lo,cholesterol,gluc,smoke,alco,active,cardio
0,0,18393,2,168,62.0,110,80,1,1,0,0,1,0
1,1,20228,1,156,85.0,140,90,3,1,0,0,1,1
2,2,18857,1,165,64.0,130,70,3,1,0,0,0,1
3,3,17623,2,169,82.0,150,100,1,1,0,0,1,1
4,4,17474,1,156,56.0,100,60,1,1,0,0,0,0


# Business Hypothesis

- Systolic blood pressure represents a major risk factor, when compared to diastolic blood pressure for cardiovascular disease for people over 50. [2]
- Systolic blood pressure rises steadily with age. [2]

# References

1. https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
2. https://www.heart.org/en/health-topics/high-blood-pressure/understanding-blood-pressure-readings
3. https://www.cdc.gov/bloodpressure/about.htm