# Mastering Seaborn Visualizations for Data Science with Python
**Dataset:** Heart Disease Dataset<br>
Kaggle:<br>
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset <br>
Github:<br>
https://github.com/Sargei95/seaborn-data-visualisation <br>
Youtube Playlist:<br>
https://www.youtube.com/playlist?list=PLeKQXSD4L73wXw7AojiZp5ynKdfS-nUDQ <br>

# What you will learn!

Welcome to our video series on mastering data visualization with Seaborn.

In the world of data science, the ability to visualize data clearly and effectively is just as important as analyzing it.

Whether you're exploring data, identifying patterns, or communicating results, strong visualizations help transform raw numbers into actionable insights. This is where Seaborn comes in.

Seaborn is a powerful Python library built on top of Matplotlib that simplifies the process of creating beautiful and informative statistical graphics. It comes with high-level functions for drawing attractive and meaningful plots with just a few lines of code.

In this session, we’ll focus on the most important Seaborn plots every data scientist should know.

These include:

Distribution plots for understanding the shape of your data

Categorical plots for comparing variables across groups

Relational plots for exploring relationships between features

Heatmaps for visualizing correlations and matrix data

Pair plots for quick exploratory data analysis across multiple dimensions

By the end of this lesson, you’ll have a solid grasp of how to use Seaborn to create the right plots for the right questions, enabling clearer insights and stronger storytelling with data.

# Setup and requirements

1. Pandas: Pandas is a Python library used for data manipulation and analysis, providing powerful data structures like DataFrames.
2. Matplotlib: Matplotlib is a comprehensive Python library for creating static, animated, and interactive visualizations.
3. Seaborn: Seaborn is a Python data visualization library built on Matplotlib that provides a high-level interface for drawing attractive statistical graphics.
---
1. pip install pandas
2. pip install matplotlib
3. pip install seaborn

# Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as  sns

# Dataset information
The dataset compiles heart disease patient records from four sources (Cleveland, Hungary, Switzerland, and Long Beach V) dating back to 1988, containing 76 attributes (commonly reduced to a subset of 14 key features such as age, sex, chest pain, blood pressure, cholesterol, ECG results, max heart rate, exercise-induced angina, ST depression, slope, number of major vessels, and thalassemia) aimed at predicting the presence of heart disease—with the target variable being binary (0 = no disease, 1 = disease).

This dataset is widely used to build and benchmark machine learning models (like Logistic Regression, Random Forests, SVMs, Neural Networks) for binary classification tasks, distinguishing between patients with and without heart disease .

# Reading Data

In [2]:
# define the filepath to the csv-file
file = "heart_disease.csv"

# read the data
df = pd.read_csv(file, sep =',')

# view the data
df.sample(5, random_state= 500)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
496,58,M,ASY,132,458,1,Normal,69,N,1.0,Down,0
580,51,M,ASY,131,152,1,LVH,130,Y,1.0,Flat,1
897,55,F,ASY,128,205,0,ST,130,Y,2.0,Flat,1
341,64,M,ASY,110,0,1,Normal,114,Y,1.3,Down,1
721,51,M,NAP,100,222,0,Normal,143,Y,1.2,Flat,0


In [3]:
# data information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             918 non-null    int64  
 1   Sex             918 non-null    object 
 2   ChestPainType   918 non-null    object 
 3   RestingBP       918 non-null    int64  
 4   Cholesterol     918 non-null    int64  
 5   FastingBS       918 non-null    int64  
 6   RestingECG      918 non-null    object 
 7   MaxHR           918 non-null    int64  
 8   ExerciseAngina  918 non-null    object 
 9   Oldpeak         918 non-null    float64
 10  ST_Slope        918 non-null    object 
 11  HeartDisease    918 non-null    int64  
dtypes: float64(1), int64(6), object(5)
memory usage: 86.2+ KB


In [4]:
# dtypes from object to categories
for cols in df.select_dtypes(include='object').columns:
    df[cols] = df[cols].astype('category')

In [5]:
# view on data info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   Age             918 non-null    int64   
 1   Sex             918 non-null    category
 2   ChestPainType   918 non-null    category
 3   RestingBP       918 non-null    int64   
 4   Cholesterol     918 non-null    int64   
 5   FastingBS       918 non-null    int64   
 6   RestingECG      918 non-null    category
 7   MaxHR           918 non-null    int64   
 8   ExerciseAngina  918 non-null    category
 9   Oldpeak         918 non-null    float64 
 10  ST_Slope        918 non-null    category
 11  HeartDisease    918 non-null    int64   
dtypes: category(5), float64(1), int64(6)
memory usage: 55.5 KB


In [6]:
# checking for missing values
df.isna().sum()

Age               0
Sex               0
ChestPainType     0
RestingBP         0
Cholesterol       0
FastingBS         0
RestingECG        0
MaxHR             0
ExerciseAngina    0
Oldpeak           0
ST_Slope          0
HeartDisease      0
dtype: int64