# Heart Disease Diagnostic Analysis

#### This notebook aims to perform a comprehensive analysis of the heart disease dataset. We will explore the data, perform statistical analysis, visualize key relationships, and derive insights.


## Importing Libraies

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

## Data Loading

In [2]:
# Load the dataset
file_path = "C://Users//h//Downloads//Heart_Disease_Dataset.csv"
data = pd.read_csv(file_path)
data.head()

FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/h/Downloads/Heart_Disease_Dataset.csv'

## Exploring Dataset Columns

In [None]:
data.columns

### Overview of the Thirteen Dataset Attributes

age: The person's age in years

sex: The person's sex (1 = male, 0 = female)

cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)

trestbps: The person's resting blood pressure (mm Hg on admission to the hospital)

chol: The person's cholesterol measurement in mg/dl

fbs: The person's fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)

restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria)

thalach: The person's maximum heart rate achieved

exang: Exercise induced angina (1 = yes; 0 = no)

oldpeak: ST depression induced by exercise relative to rest

slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping)

ca: The number of major vessels (0-3)

thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)

num: Heart disease (0 = no, 1 = yes)

## Data Preprocessing

#### Check for missing values

In [None]:
data.isnull().sum()

##### There is NO MISSING Values in our Dataset

## Heart Disease Rate in the Population

In [None]:
## Grouping the data by the 'num' column and calculating the size of each group
num=data.groupby('num').size()
num

In [None]:
# Converting Numerical Data into Categorical Data
def heart_disease(row):
    if row == 0:
        return 'Absence'
    elif row == 1:
        return 'Presence'

In [None]:
# Applying converted data into our dataset with new column - Heart_Disease
data['Heart_Disease'] = data['num'].apply(heart_disease)
data.head()

In [None]:
# Grouping by Heart_Disease to get the count
hd = data.groupby('Heart_Disease').size()
hd

In [None]:
# Pie Chart Creation of Heart Disease Population % using Matplotlib
plt.figure(figsize=(5,7))
plt.pie(hd, labels=['Absence', 'Presence'], autopct='%0.0f%%', startangle=90, colors=['#66b3ff', '#ff6666'])
plt.title('Heart Disease Population %', fontsize=20)
plt.show()

In our dataset:

46% of individuals have heart disease.

54% of individuals do not have heart disease.

In [None]:
# plotting countplot of population age using matplotlib and seaborn

plt.figure(figsize=(13,13))
plt.title('Population Age')
sns.countplot(x='age', data=data,hue="age", palette='bright',dodge=False)
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

⇒ we can observe the count of population according to the their age eg. young, middle-age,senior and elder people

In [None]:
## Statistical Analysis

# Calculate Minimum Age
Min_Age = data['age'].min()

# Calculate Maximum Age
Max_Age = data['age'].max()

# Calculate Mean Age
Mean_Age = data['age'].mean()

print("Minimum Age =", Min_Age)
print("Maximum Age =", Max_Age)
print("Mean Age =", Mean_Age)

In [None]:
#Categorical Analysis

Young_Ages=data[(data['age']>=29) & (data['age']<40)]
Middle_Ages=data[(data['age']>=40) & (data['age']<55)]
Senior_Ages=data[(data['age']>=55) & (data['age']<65)]
Elderly_Ages=data[(data['age']>65)]
print('Young Ages =',len(Young_Ages))
print('Middle Ages =',len(Middle_Ages))
print('Senior Ages =',len(Senior_Ages))
print('Elderly Ages =',len(Elderly_Ages))

In [None]:
#Bar Plot Creation of Age Category using MatplotLib and Seaborn
# Ensure categories are a list
categories = ['Young_Ages', 'Middle_Ages', 'Senior_Ages', 'Elderly_Ages']
counts = [len(Young_Ages), len(Middle_Ages), len(Senior_Ages), len(Elderly_Ages)]
    
# Create the bar plot
sns.barplot(x=categories, y=counts, hue=categories, dodge=False)

# Set the title and labels
plt.title('Distribution of Age Categories', fontsize=17)
plt.xlabel('Age Category', fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.show()

In [None]:
#Converting Numerical Data into Categorical Data

def gender(row):
    if row==1:
        return 'Male'
    elif row==0:
        return 'Female'

In [None]:
#Applying converted data into our dataset with new column - sex1

data['sex1']=data['sex'].apply(gender)
data.head()

In [None]:
#Converting Numerical Data into Categorical Data

def age_Category(row):
    if row>=29 and row<40:
        return 'Young Age'
    elif row>=40 and row<55:
        return 'Middle Age'
    elif row>=55 and row<65:
        return 'Senior Age'    
    elif row>55:
        return 'Elder Age'

In [None]:
#Applying converted data into our dataset with new column - Age_Category

data['Age_Category']=data['age'].apply(age_Category)
data.head()

In [None]:
#Swarm Plot Creation of Gender Based Age Category using MatplotLib and Seaborn

plt.figure(figsize=(10,7))
sns.swarmplot(x='Age_Category', y='age', hue='sex1', data=data,dodge=True, order=['Young Age','Middle Age','Senior Age','Elder Age'], palette='Oranges_r')
plt.title('Gender Based Age Category', fontsize=17)
plt.xlabel('Age Category', fontsize=15)
plt.ylabel('Age', fontsize=15)
plt.show()

⇒ In the given dataset, number of male population is more than the female population in each age group.

In [None]:
#Count Plot Creation of Heart Disease Based On Age Category using MatplotLib and Seaborn

plt.figure(figsize=(7,5))
hue_order=['Young Age', 'Middle Age','Senior Age','Elder Age']
sns.countplot(x='Heart_Disease', hue='Age_Category', data=data, order=['Presence','Absence'], hue_order=hue_order)
plt.title('Heart Disease Based On Age Category', fontsize=17)
plt.xlabel('Heart Disease', fontsize=15)
plt.ylabel('Counts', fontsize=15)
plt.show()

####  Senior Age People are most affected by Heart Disease AND Middle Age People are mostly FREE from any kind of Heart Disease

In [None]:
#Count Plot Creation of Heart Disease Based on Gender using MatplotLib and Seaborn

plt.figure(figsize=(7,5))
sns.countplot(x=data['Heart_Disease'], hue='sex1', data=data, palette='BuGn_r')
plt.xlabel('Heart Disease', fontsize=15)
plt.ylabel('Count',fontsize=15)
plt.legend(labels=['Male','Female'])
plt.title('Heart Disease Based on Gender',fontsize=17)
plt.show()

####  We can see that Males are more prone to Heart Disease

In [None]:
#Count Plot Creation of Chest Pain Experienced using MatplotLib and Seaborn

sns.countplot(x=data['Heart_Disease'], hue='cp', data=data, order=['Presence','Absence'])
plt.title('Chest Pain Experienced', fontsize=17)
plt.xlabel('Heart Disease',fontsize=15)
plt.ylabel('Counts',fontsize=15)
plt.legend(labels=['Typical Angina','Atypical Angina','Non-Anginal pain','Asymptomatic'])
plt.show()

####  It seems people having asymptomatic chest pain have a higher chance of heart disease

#### Asymptomatic Chest pain means neither causing nor exhibiting symptoms of Heart disease.

In [None]:
#Count Plot Creation of Chest Pain Based On Gender using MatplotLib and Seaborn

sns.countplot(x=data['sex1'], hue='cp', data=data)
plt.title('Chest Pain Based On Gender', fontsize=17)
plt.xlabel('Sex', fontsize=15)
plt.ylabel('Counts', fontsize=15)
plt.legend(labels=['Typical Angina','Atypical Angina','Non-Anginal pain','Asymptomatic'])
plt.show()

####  We can see that a higher number of men are suffering from Asymptomatic type of Chest Pain

In [None]:
#Count Plot Creation of Chest Pain Based On Age Category using MatplotLib and Seaborn

sns.countplot(x=data['Age_Category'], hue='cp', data=data, order=['Young Age', 'Middle Age','Senior Age', 'Elder Age'], palette='BrBG')
plt.title('Chest Pain Based On Age Category', fontsize=17)
plt.xlabel('Age Category', fontsize=15)
plt.ylabel('Counts', fontsize=15)
plt.legend(labels=['Typical Angina','Atypical Angina','Non-Anginal pain','Asymptomatic'])
plt.show()

#### There is very high number of Asymptomatic Pain in Elderly age Category

In [None]:
#Bar Plot Creation of Person's Resting Blood Pressure (mm Hg) using MatplotLib and Seaborn

sns.barplot(x='sex1', y='trestbps', data=data, hue='sex1', palette='plasma',dodge=False)
plt.title("Blood Pressure", fontsize=17)
plt.xlabel('Sex',fontsize=15)
plt.ylabel("Person's Resting Blood Pressure (mm Hg)", fontsize=12)
plt.show()

#### Blood Pressure Rate is almost equal in Males and Females

In [None]:
#Bar Plot Creation of Cholestrol Level Based On Gender using MatplotLib and Seaborn

sns.barplot(x='sex1', y='chol', data=data, hue='sex1', palette='turbo',dodge=False)
plt.title("Cholestrol Level Based On Gender", fontsize=17)
plt.xlabel('Sex',fontsize=15)
plt.ylabel("Cholestrol", fontsize=15)
plt.show()

####  females have little bit of higher cholesterol than males

In [None]:
#Bar Plot Creation of Cholestrol VS Heart Disease using MatplotLib and Seaborn

sns.barplot(x='Heart_Disease', y='chol', data=data, hue='Heart_Disease', palette='ocean_r', dodge=False)
plt.title('Cholesterol VS Heart Disease', fontsize=17)
plt.xlabel('Heart Disease', fontsize=15)
plt.ylabel('Cholesterol', fontsize=15)
plt.show()

Higher Cholestrol Level results Chances Of Heart Disease

In [None]:
#Bar Plot Creation of Blood Pressure VS Heart Disease using MatplotLib and Seaborn

sns.barplot(x='Heart_Disease', y='trestbps', data=data,hue='Heart_Disease', palette='tab20b_r', dodge=False)
plt.title('Blood Pressure VS Heart Disease', fontsize=17)
plt.xlabel('Heart Disease', fontsize=15)
plt.ylabel('Blood Pressure', fontsize=15)
plt.show()

Higher Blood Pressure Level results Chances Of Heart Disease

In [None]:
#Line Plot Creation of Blood Pressure VS Age using MatplotLib and Seaborn

sns.lineplot(x='age', y='trestbps', data=data, color='r')
plt.title('Blood Pressure VS Age', fontsize=17)
plt.xlabel('Age', fontsize=15)
plt.ylabel('Blood Pressure', fontsize=15)
plt.show()

 Here we can observe that Blood Pressure increases between age of 50 to 60 and somehow continue the pattern till 70

In [None]:
#Line Plot Creation of Cholestrol VS Age using MatplotLib and Seaborn

sns.lineplot(x='age', y='chol', data=data, color='b')
plt.title('Cholestrol VS Age', fontsize=17)
plt.xlabel('Age', fontsize=15)
plt.ylabel('Cholestrol', fontsize=15)
plt.show()

 Similarly Cholestrol Increasing in the age group of 50-60

In [None]:
#Line Plot Creation of ST Depression VS Age using MatplotLib and Seaborn

sns.lineplot(x='age', y='oldpeak', data=data, color='g')
plt.title('ST Depression VS Age', fontsize=17)
plt.xlabel('Age', fontsize=15)
plt.ylabel('ST depression', fontsize=15)
plt.show()

we can observe from here that ST depression mostly increases bw the age group of 30-40

ST depression refers to a finding on an electrocardiogram, wherein the trace in the ST segment is abnormally low below the baseline.¶

In [None]:
#Bar Plot Creation of ST depression VS Heart Disease using MatplotLib and Seaborn

sns.barplot(x='sex1', y='oldpeak', data=data,hue='sex1', palette='twilight_r',dodge=False)
plt.title('ST depression VS Heart Disease', fontsize=17)
plt.xlabel('Sex', fontsize=15)
plt.ylabel('ST depression', fontsize=15)
plt.show()

More Males are prone to ST depression as compare to females

In [None]:
#Bar Plot Creation of Exercise With Angina VS Heart Disease using MatplotLib and Seaborn

sns.barplot(x='Heart_Disease', y='exang', data=data,hue='Heart_Disease', palette='viridis',dodge=False)
plt.title('Exercise With Angina VS Heart Disease', fontsize=17)
plt.xlabel('Heart Disease', fontsize=15)
plt.ylabel('Exercise With Angina', fontsize=15)
plt.show()

If you suffer from Angina, you may be concerned that exercise will make your symptoms worse.

In [None]:
#Bar Plot Creation of Exercise With Angina VS Gender using MatplotLib and Seaborn

sns.barplot(x='sex1', y='exang', data=data,hue='sex1', palette='binary_r',dodge=False)
plt.title('Exercise With Angina VS Gender', fontsize=17)
plt.xlabel('Sex', fontsize=15)
plt.ylabel('Exercise With Angina', fontsize=15)
plt.show()

Males have have high Exercise Angina

A type of chest pain caused by reduced blood flow to the heart.

In [None]:
#Bar Plot Creation of Fasting Blood Sugar VS Gender using MatplotLib and Seaborn

sns.barplot(y='fbs', x='sex1', data=data,hue='sex1', palette='hsv',dodge=False)
plt.title(' Fasting Blood Sugar VS Gender', fontsize=17)
plt.xlabel('Sex', fontsize=15)
plt.ylabel('Fasting Blood Sugar', fontsize=15)
plt.show()

 Males have high no of Fasting Blood Sugar over 120

In [None]:
# Select only numeric columns for correlation calculation
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Create the heatmap using only numeric data
plt.figure(figsize=(16, 9))
sns.heatmap(numeric_data.corr(), annot=True, linewidth=3)
plt.title('Correlation Heatmap', fontsize=17)
plt.show()


In [None]:
# exporting final dataset to as csv file for Dashboarding in Power BI

data.to_csv("D:\my_data1.csv", index=False)