
# 🩺 Heart Disease Dataset - Exploratory Data Analysis (EDA)

**Author:** *Your Name*  
**Date:** *June 2025*  
**Dataset:** [Heart Disease Dataset](https://www.kaggle.com/datasets/fedesoriano/heart-disease-dataset)

---



## 🔍 Introduction

This notebook presents an in-depth Exploratory Data Analysis (EDA) of a heart disease dataset.  
The goal is to explore data distribution, detect missing values and outliers, and uncover relationships between features and the target variable.


## 📥 1. Load and Inspect the Dataset

In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

# Load dataset
df = pd.read_csv("heart.csv")
df.head()


## 📊 2. Dataset Summary

In [None]:
df.info()

In [None]:
df.describe()

## 🧩 3. Missing Value Analysis

In [None]:

df.isnull().sum()


In [None]:

sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title("Missing Value Heatmap")
plt.show()


## 📈 4. Univariate Analysis

In [None]:

df.hist(figsize=(14, 12), bins=20, edgecolor='black')
plt.suptitle("Distributions of Numerical Features", fontsize=16)
plt.tight_layout()
plt.show()


In [None]:

sns.countplot(x='sex', data=df)
plt.title("Distribution by Sex")
plt.show()


## ⚠️ 5. Outlier Detection

In [None]:

sns.boxplot(data=df)
plt.xticks(rotation=90)
plt.title("Boxplot for Outlier Detection")
plt.show()


In [None]:

sns.boxplot(y='chol', data=df)
plt.title("Outliers in Cholesterol")
plt.show()


## 🧠 6. Correlation Analysis

In [None]:

sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()


## 🔗 7. Relationships with Target Variable

In [None]:

sns.countplot(x='target', data=df)
plt.title("Target Variable Distribution")
plt.show()


In [None]:

sns.countplot(x='cp', hue='target', data=df)
plt.title("Chest Pain Type vs. Heart Disease")
plt.show()


In [None]:

sns.scatterplot(x='age', y='chol', hue='target', data=df)
plt.title("Age vs. Cholesterol by Target")
plt.show()


## 📝 8. Summary of Findings


- ✅ No missing values in the dataset.
- ⚠️ Some outliers exist in features like cholesterol (`chol`) and max heart rate (`thalach`).
- 🧬 Features like `cp` (chest pain type) and `exang` (exercise-induced angina) show strong correlation with heart disease.
- 🔎 Correlation heatmap reveals interesting inter-feature relationships worth exploring in modeling.

---
