# <center>A Report on Heart Disease Dataset</center>

## 1. Data Exploration

The heart disease dataset$^{[1]}$ contains 76 attributes of which a subset of 14 attributes has been used in our dataset. The target field contains 0s and 1s with 0 representing the absence of heart disease in a patient and 1 representing the presence of it.

In [1]:
import pandas as pd

In [2]:
train_df = pd.read_csv('data/Heart_train.csv')
test_df = pd.read_csv('data/Heart_test.csv')

In [3]:
train_df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
1,53,1,2,130,246,1,0,173,0,0.0,2,3,2,1
2,61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
3,45,1,1,128,308,0,0,170,0,0.0,2,0,2,1
4,50,1,0,144,200,0,0,126,1,0.9,1,0,3,0


In [4]:
test_df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,58,1,1,120,284,0,0,160,0,1.8,1,0,2,0
1,52,1,0,112,230,0,1,160,0,0.0,2,1,2,0
2,42,0,2,120,209,0,1,173,0,0.0,1,0,2,1
3,55,1,1,130,262,0,1,155,0,0.0,2,0,2,1
4,53,0,0,130,264,0,0,143,0,0.4,1,0,2,1


In [5]:
train_df.shape

(242, 14)

In [6]:
test_df.shape

(61, 14)

Our dataset contains a total of **242 entries in the training set** and **61 entries in the testing set**. Each dataset has **14 features**.

In [7]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242 entries, 0 to 241
Data columns (total 14 columns):
age         242 non-null int64
sex         242 non-null int64
cp          242 non-null int64
trestbps    242 non-null int64
chol        242 non-null int64
fbs         242 non-null int64
restecg     242 non-null int64
thalach     242 non-null int64
exang       242 non-null int64
oldpeak     242 non-null float64
slope       242 non-null int64
ca          242 non-null int64
thal        242 non-null int64
target      242 non-null int64
dtypes: float64(1), int64(13)
memory usage: 26.6 KB


A brief description of the attributes is tabulated below$^{[1]}$:

S.No.|Attribute|Description
-----|:--------|:----------
1.|age|Age in years
2.|sex|1 = male; 0 = female
3.|cp|Chest pain type<br><br>Value 1: typical angina<br>Value 2: atypical angina<br>Value 3: non-anginal pain<br>Value 4: asymptomatic
4.|trestbps|resting blood pressure
5.|chol|Serum cholestoral in mg/dl
6.|fbs|Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
7.|restecg|Resting electrocardiographic results<br><br>Value 0: normal<br>Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)<br>Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8.|thalach|maximum heart rate achieved
9.|exang|Exercise induced angina (1 = yes; 0 = no)
10.|oldpeak|ST depression induced by exercise relative to rest
11.|slope|the slope of the peak exercise ST segment<br><br>Value 1: unsloping<br>Value 2: flat<br>Value 3: downsloping
12.|ca|Number of major vessels (0-3) colored by flourosopy
13.|thal|3 = normal; 6 = fixed defect; 7 = reversable defect
14.|target|diagnosis of heart disease (angiographic disease status)<br><br>Value 0: < 50% diameter narrowing<br>Value 1: > 50% diameter narrowing

Renaming columns

In [10]:
train_df.columns = ['age', 'sex', 'chest_pain_type', 'resting_blood_pressure', 'cholesterol', 'fasting_blood_sugar', 
              'rest_ecg', 'max_heart_rate_achieved',
       'exercise_induced_angina', 'st_depression', 'st_slope', 'num_major_vessels', 'thalassemia', 'target']

In [11]:
train_df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,num_major_vessels,thalassemia,target
0,67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
1,53,1,2,130,246,1,0,173,0,0.0,2,3,2,1
2,61,1,3,134,234,0,1,145,0,2.6,1,2,2,0
3,45,1,1,128,308,0,0,170,0,0.0,2,0,2,1
4,50,1,0,144,200,0,0,126,1,0.9,1,0,3,0


### A brief background on the domain

Cardiovascular disease generally refers to conditions that involve narrowed or **blocked blood vessels** that can lead to a heart attack, **chest pain (angina)** or stroke.$^{[2]}$

Cardiovascular disease symptoms may be different for men and women. For instance, **men are more likely to have chest pain**; women are more likely to have other symptoms along with chest discomfort, such as shortness of breath, nausea and extreme fatigue.

Heart disease symptoms caused by **abnormal heartbeats**. Heart may beat too quickly, too slowly or irregularly.

•	**Aging increases your risk** of damaged and narrowed arteries and weakened or thickened heart muscle.<br>
•	**Men are generally at greater risk** of heart disease. However, women's risk increases after menopause.<br>
•	**Uncontrolled high blood pressure** can result in hardening and thickening of your arteries, narrowing the vessels through which blood flows.<br>
•	**High levels of cholesterol** in your blood can increase the risk of formation of plaques and atherosclerosis.


### Pandas Profiling

In [12]:
from pandas_profiling import ProfileReport

In [13]:
profile = ProfileReport(train_df, title='Pandas Profiling Report', html={'style':{'full_width':True}})

HBox(children=(FloatProgress(value=0.0, description='variables', max=14.0, style=ProgressStyle(description_wid…




HBox(children=(FloatProgress(value=0.0, description='correlations', max=6.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='interactions [continuous]', max=36.0, style=ProgressStyle…




HBox(children=(FloatProgress(value=0.0, description='table', max=1.0, style=ProgressStyle(description_width='i…




HBox(children=(FloatProgress(value=0.0, description='missing', max=2.0, style=ProgressStyle(description_width=…









HBox(children=(FloatProgress(value=0.0, description='package', max=1.0, style=ProgressStyle(description_width=…




HBox(children=(FloatProgress(value=0.0, description='build report structure', max=1.0, style=ProgressStyle(des…




In [14]:
profile.to_widgets()

Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(value='Number of va…

## References

1. https://archive.ics.uci.edu/ml/datasets/Heart+Disease
2. https://www.mayoclinic.org/diseases-conditions/heart-disease/symptoms-causes/syc-20353118