##### Imports

In [1]:
import pandas as pd
import sagemaker

## EDA of cadiovascular diseases data 
​
The dataset consists of 70 000 records of patients data with 12 features, such as age, gender, systolic blood pressure, diastolic blood pressure etc. The target class "cardio" equals to 1, when patient has cardiovascular disease, and is 0, if patient is healthy. <br><br>
The task is to predict the presence or absence of cardiovascular disease (CVD) using patient examination results. 
​
#### Data description
​
There are 3 types of input features:
​
- *Objective*: factual information;
- *Examination*: results of medical examination;
- *Subjective*: information given by the patient.
​
| Feature | Variable Type | Variable      | Value Type |
|---------|--------------|---------------|------------|
| Age | Objective Feature | age | int (days) |
| Height | Objective Feature | height | int (cm) |
| Weight | Objective Feature | weight | float (kg) |
| Gender | Objective Feature | gender | categorical code |
| Systolic blood pressure | Examination Feature | ap_hi | int |
| Diastolic blood pressure | Examination Feature | ap_lo | int |
| Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |
| Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |
| Smoking | Subjective Feature | smoke | binary |
| Alcohol intake | Subjective Feature | alco | binary |
| Physical activity | Subjective Feature | active | binary |
| Presence or absence of cardiovascular disease | Target Variable | cardio | binary |

### Initial analysis
Let's look at the dataset and given variables.

In [2]:
df = pd.read_csv('./data/cardio.csv')
df.head()

Unnamed: 0,id,age,gender,height,weight,ap_hi,ap_lo,cholesterol,gluc,smoke,alco,active,cardio
0,0,18393,2,168,62.0,110,80,1,1,0,0,1,0
1,1,20228,1,156,85.0,140,90,3,1,0,0,1,1
2,2,18857,1,165,64.0,130,70,3,1,0,0,0,1
3,3,17623,2,169,82.0,150,100,1,1,0,0,1,1
4,4,17474,1,156,56.0,100,60,1,1,0,0,0,0


In [3]:
df.shape

(70000, 13)

##### Upload dataset from local to S3

In [4]:
session = sagemaker.Session()
default_bucket = session.default_bucket()
print(f'Default S3 bucket = {default_bucket}')

Default S3 bucket = sagemaker-us-east-1-119174016168


In [5]:
%%capture

!aws s3 cp ./data/ s3://{default_bucket}/autopilot/ --recursive