# Heart Disease Exploratory Data Analysis (EDA)

This notebook covers the initial analysis of the UCI Heart Disease dataset.

## Step 0: Load Libraries and Data
First, we import the necessary libraries and load the dataset into a pandas DataFrame.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset (assumes 'heart.csv' is in the same directory or data/)
df = pd.read_csv('heart.csv')

# Display the first 5 rows to confirm it's loaded correctly
df.head()

## Step 1: High-Level Overview

This gives us the basic shape and technical summary of the dataset. It helps us understand the number of rows (patients), columns (features), and the data types of each column.

In [None]:
# See the number of rows and columns
print("Data Shape:")
print(df.shape)
print("-" * 30)

# Get a technical summary (column names, non-null counts, data types)
print("Data Info:")
df.info()

## Step 2: Check for Missing Values

This is one of the most important data quality checks. Missing data can cause errors or bias in our analysis and models.

In [None]:
# Check for any missing values in each column
print("Missing Values:")
print(df.isnull().sum())

## Step 3: Get a Statistical Summary

This provides a quick statistical overview of all numerical columns, including measures like mean, standard deviation, min, and max. It's useful for spotting outliers or potential data entry errors.

In [None]:
# Get descriptive statistics (mean, std, min, max, etc.)
df.describe()

## Step 4: Visualize the Outcome

We visualize the distribution of our target variable (`target`) to see how many patients in the dataset have heart disease versus those who don't. This helps us check for class imbalance.

In [None]:
# Create a plot to see the distribution of the target variable
sns.countplot(x='target', data=df)
plt.title('Distribution of Heart Disease Outcome')
plt.xlabel('0 = No Heart Disease, 1 = Has Heart Disease')
plt.ylabel('Patient Count')
plt.show()