## About the Dataset

Data: https://drive.google.com/file/d/1vy8itSePLx5k3c_HaoeLGbKJZAULOv47/view?usp=sharing

Note: ST = Segment T-wave

age: Age

sex:

1: male
0: female
cp: chest pain type

0: typical angina,
1: atypical angina
2: non-anginal pain
3: asymptomatic
trestbps: resting blood pressure

chol: serum cholestoral in mg/dl

fbs: fasting blood sugar > 120 mg/dl

1: True
0: False
restecg: resting electrocardiographic results (values 0,1,2)

0: Normal
1: Having ST-T wave abnormality
2: Showing probable or definite left ventricle hypertrophy by Eate's criteria
thalach: maximum heart rate achieved

exang: exercise induced angina

0=No
1=Yes
oldpeak: oldpeak = ST depression induced by exercise relative to rest

slope: the slope of the peak exercise ST segment

Value 1: upsloping
Value 2: flat
Value 3: downsloping
ca: number of major vessels (0-3) colored by flourosopy

thal:

thal = Thallium stress testing results, which is a nuclear imaging technique used to evaluate blood flow to the heart muscle. Thallium is a radioactive substance that is injected into the bloodstream during the stress test, and a special camera detects its distribution in the heart muscle.

The values associated with "thal" are as follows:

0 = normal: Indicates that there are no significant abnormalities detected in the distribution of thallium in the heart muscle. This suggests normal blood flow to all regions of the heart during rest and stress conditions.
1 = fixed defect: Indicates a region(s) of the heart muscle where thallium uptake remains abnormal both at rest and during stress. This typically suggests scar tissue or permanent damage to the heart muscle, such as from a previous heart attack.
2 = reversible defect: Indicates a region(s) of the heart muscle where thallium uptake is reduced during stress but improves during rest. This suggests that blood flow to these areas is compromised only during times of increased demand, such as during exercise or stress. Reversible defects can indicate ischemia, which is inadequate blood flow to a region of the heart, possibly due to narrowing or blockages in the coronary arteries.
target:

0: normal
1: Heart Disease

## Explanation of the Chest Types

Meaning of Angina:

Angina is chest pain that occurs due to insufficient blood flow to the heart, often felt as pressure or squeezing in the chest. It is a symptom of ischemic heart disease and can be triggered by physical activity or stress.

Chest pain type feature varies from 0 to 3.

The number 0 to 3 explain 0 is for typical angina, 1 is for atypical angina, 2 is for non-anginal pain, and 3 is for asymptomatic.

Typical Angina: This type of chest pain is considered typical because it presents with classic symptoms that are highly indicative of heart problems. People experiencing typical angina are at high risk of having underlying coronary artery disease.

Atypical Angina: Atypical angina refers to chest pain that doesn't fit the classic description of typical angina. It may not consistently respond to rest or nitroglycerin.

Non-Anginal Pain: This category includes chest discomfort or pain that is not related to coronary artery disease or angina. It could be caused by various factors such as musculoskeletal issues, gastrointestinal problems, anxiety, or respiratory conditions. Non-anginal pain may mimic angina in some cases but is typically not associated with the same underlying heart issues.

Asymptomatic: Asymptomatic means that there are no noticeable symptoms present. In the context of heart disease, it indicates that the individual is not experiencing any chest pain or related symptoms at the time of assessment. However, it's important to note that some people with heart disease may not experience noticeable symptoms, especially in the early stages or if they have other conditions masking the symptoms.

## Data Cleaning
Installing the necessary libraries Pandas

Matplotlib

seaborn

dash

In [1]:
pip install dash

Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [33]:
df = pd.read_csv('heart.csv')

In [34]:
df.rename(columns={'age':'Age',
                     'sex':'Sex',
                     'cp': 'Chest Pain',
                     'trestbps':'Resting Blood Pressure',
                     'chol':'Cholestrol',
                     'fbs':'Blood Sugar',
                     'restecg':'Resting ECG',
                     'thalach': 'Max Heart Rate',
                     'exang': 'Exercise Induced Angina',
                     'oldpeak': 'Depression',
                     'slope': 'Slope',
                     'ca': 'Vessels colored by flourosopy',
                     'thal': 'Thallium',
                     'target': 'Heart Condition'
},inplace=True)

In [35]:
df.isnull().sum()

Age                              0
Sex                              0
Chest Pain                       0
Resting Blood Pressure           0
Cholestrol                       0
Blood Sugar                      0
Resting ECG                      0
Max Heart Rate                   0
Exercise Induced Angina          0
Depression                       0
Slope                            0
Vessels colored by flourosopy    0
Thallium                         0
Heart Condition                  0
dtype: int64

Checking for missing values in the data

In [36]:
df.dtypes

Age                                int64
Sex                                int64
Chest Pain                         int64
Resting Blood Pressure             int64
Cholestrol                         int64
Blood Sugar                        int64
Resting ECG                        int64
Max Heart Rate                     int64
Exercise Induced Angina            int64
Depression                       float64
Slope                              int64
Vessels colored by flourosopy      int64
Thallium                           int64
Heart Condition                    int64
dtype: object

## EDA

In [37]:
df.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
1020     True
1021     True
1022     True
1023     True
1024     True
Length: 1025, dtype: bool

Upon examining our dataset for duplicates, we observed several instances of redundancy. However, we refrained from removing these duplicates as the data was in binary format, resulting in the majority of rows exhibiting similarity.

In [38]:
df.describe().T  #T-Transpose

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,1025.0,54.434146,9.07229,29.0,48.0,56.0,61.0,77.0
Sex,1025.0,0.69561,0.460373,0.0,0.0,1.0,1.0,1.0
Chest Pain,1025.0,0.942439,1.029641,0.0,0.0,1.0,2.0,3.0
Resting Blood Pressure,1025.0,131.611707,17.516718,94.0,120.0,130.0,140.0,200.0
Cholestrol,1025.0,246.0,51.59251,126.0,211.0,240.0,275.0,564.0
Blood Sugar,1025.0,0.149268,0.356527,0.0,0.0,0.0,0.0,1.0
Resting ECG,1025.0,0.529756,0.527878,0.0,0.0,1.0,1.0,2.0
Max Heart Rate,1025.0,149.114146,23.005724,71.0,132.0,152.0,166.0,202.0
Exercise Induced Angina,1025.0,0.336585,0.472772,0.0,0.0,0.0,1.0,1.0
Depression,1025.0,1.071512,1.175053,0.0,0.0,0.8,1.8,6.2


In [39]:
df['Sex'] = df['Sex'].map({0: 'Female', 1: 'Male'})
df['Sex'].value_counts()

Sex
Male      713
Female    312
Name: count, dtype: int64

In [40]:
df.groupby('Sex')['Depression'].mean() #depression rate among the genders

Sex
Female    0.921154
Male      1.137307
Name: Depression, dtype: float64

In [41]:
# Filtering the blood pressure between 80-120 for outliers, normal blood pressure
health_df = df[(df['Resting Blood Pressure'] > 80) &
               (df['Resting Blood Pressure'] < 120)]
health_df

Unnamed: 0,Age,Sex,Chest Pain,Resting Blood Pressure,Cholestrol,Blood Sugar,Resting ECG,Max Heart Rate,Exercise Induced Angina,Depression,Slope,Vessels colored by flourosopy,Thallium,Heart Condition
5,58,Female,0,100,248,0,0,122,0,1.0,1,0,2,1
6,58,Male,0,114,318,0,2,140,0,4.4,0,3,1,0
10,71,Female,0,112,149,0,1,125,0,1.6,1,0,2,1
12,34,Female,1,118,210,0,1,192,0,0.7,2,0,2,1
15,34,Female,1,118,210,0,1,192,0,0.7,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1014,44,Female,2,108,141,0,1,175,0,0.6,1,0,2,1
1018,41,Male,0,110,172,0,0,158,0,0.0,2,0,3,0
1019,47,Male,0,112,204,0,1,143,0,0.1,2,0,2,1
1022,47,Male,0,110,275,0,0,118,1,1.0,1,1,2,0


Determining the patients with normal blood pressure

In [42]:
# Percentage of patients with angina
(df['Chest Pain'] == 2).sum() / len(df) * 100

27.707317073170735

In [43]:
df.to_csv('Data.csv')