![Heart Attack](./misc/heart-attack-24.jpg "Heart Attack")

<h1 align="center"> Heart Attack - Exploratory Data Analysis</h1>

1. [Introduction](#1)
    - 1.1 [Data Dictionary](#2)
    - 1.2 [Brief Explanation of Variables](#3)
    - 1.3 [Task](#4)
2. [Preparation](#5)
    - 2.1 [Packages](#6)
    - 2.2 [Data](#7)
    - 2.3 [Understanding Data](#8)
3. [Exploratory Data Analysis](#9)
    - 3.1 [Univariate Analysis](#10)
    - 3.2 [Bivariate Analysis](#11)
4. [Conclusion](#12)
    - 4.1 [Conclusions from the EDA](#13)
5. [References](#13)

### 1. Introduction <a id=1></a>

The dataset used in this notebook is from the [UC Irvine Machine Learning Repository](https://archive-beta.ics.uci.edu/ml/datasets/heart+disease)¹.

#### 1.1 Data Dictionary <a id=2></a>
`age` - the patient's age

`sex` - the patient's sex

`cp` - Chest pain type: 0 = Typical Angina, 1 = Atypical Angina, 2 = Non-anginal Pain, 3 = Asymptomatic.

`trtbps` - Resting blood pressure (mm/Hg)

`chol` - Cholestorol in mg/dl

`fbs` - (fasting blood sugar > 120 mg/dl) ~ 1 = True, 0 = False

`restecg` - Resting electrocardiographic results: 0 = Normal, 1 = ST-T wave normality, 2 = Left ventricular hypertrophy

`thalachh`  - Maximum heart rate achieved

`oldpeak` - Previous peak

`slp` - Slope

`caa` - Number of major vessels 

`thall` - Thalium Stress Test result: (0,3)

`exng` - Exercise induced angina ~ 1 = Yes, 0 = No

`output` - Target variable

#### 1.2 Brief Explanation of Variables <a id=3></a>

This section provides a brief description on some of the variables, as some readers may not be aware of what some of the terms mean.

`cp` is chest pain type, which mentions a word, "**angina**." Angina is the medical term for chest pain.²

`trtbps` is resting blood pressure, which is measured in a unit (mm/Hg). mm/Hg is millimeters of mercury.³

`chol` - Refers to cholestorol, which is measured in a unit (mg/dl). mg/dl is milligrams per decilitre

`restecg` - Refers to electrocardiographic (ECG) results. An ECG machine measures the electrical activity in the heart.

`thall` - Refers to a thallium stress test. This test measures how well blood pumps to your heart while you're at rest or exercising.

#### 1.3 Task <a id=4></a>


Our task is to perform exploratory data analysis (EDA) by running a couple of statistics to help us understand the data. Then, from running these statistics, we can gain an understanding of the data and give some interesting insights.

TODO: Run some predictions using a few algorithms.

### 2. Preparation <a id=5></a>

We will install packages, and do some preliminary inspections of the data.

#### 2.1 Packages <a id=6></a>

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Matplotlib is building the font cache; this may take a moment.


#### 2.2 Loading data

In [57]:
df_1 = pd.read_csv('./data/heart.csv',sep=",")

In [58]:
df_1

Unnamed: 0,"age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output"
0,"63,1,3,145,233,1,0,150,0,2.3,0,0,1,1"
1,"37,1,2,130,250,0,1,187,0,3.5,0,0,2,1"
2,"41,0,1,130,204,0,0,172,0,1.4,2,0,2,1"
3,"56,1,1,120,236,0,1,178,0,0.8,2,0,2,1"
4,"57,0,0,120,354,0,1,163,1,0.6,2,0,2,1"
...,...
298,"57,0,0,140,241,0,1,123,1,0.2,1,0,3,0"
299,"45,1,3,110,264,0,1,132,0,1.2,1,0,3,0"
300,"68,1,0,144,193,1,1,141,0,3.4,1,2,3,0"
301,"57,1,0,130,131,0,1,115,1,1.2,1,1,3,0"


Upon looking at the .csv file, it appears each row is surrounded by quotation marks. This means that each row is being read as one long string of comma-seperated numbers.

There are a couple of ways to handle this problem, but the easiest way I found was to open the .csv file as a .txt file and use the replace function (Edit > Replace) and replace all of the quotation characters with blank characters. Then, we can just read the .txt file.

In [59]:
df = pd.read_csv("./data/heart.txt", sep=",")

In [60]:
df

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


#### 2.3 Understanding the Data

### References

https://archive-beta.ics.uci.edu/ml/datasets/heart+disease ¹

https://www.heart.org/en/health-topics/heart-attack/angina-chest-pain ²

https://www.healthline.com/health/high-blood-pressure-hypertension/blood-pressure-reading-explained ³