# Seaborn Notebook


This notebook covers the fundamentals of Seaborn, a powerful Python visualization library built on matplotlib.

## 1. Setup and Imports

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Set style for better-looking plots
sns.set_theme()

# For displaying plots in notebook
%matplotlib inline

print("Seaborn version:", sns.__version__)

## 2. Loading Sample Datasets

Seaborn comes with built-in datasets perfect for practice. We'll use the **tips** dataset, which contains information about restaurant bills, tips, and customer details.

In [None]:
# Load the tips dataset
tips = sns.load_dataset('tips')

# Display first few rows to understand the data
print("Tips Dataset:")
print(tips.head(10))
print("\nShape:", tips.shape)
print("\nColumns:", tips.columns.tolist())

## 3. Understanding the Two Main Plot Functions

Seaborn has two powerful functions that handle most plotting needs:

### `relplot()` - Relationship plots
Used to visualize relationships between numerical variables
- **scatter**: Shows individual data points (default)
- **line**: Shows trends over time or continuous variables

### `catplot()` - Categorical plots
Used to compare categories or groups
- **strip**: Individual points for each category
- **box**: Shows distribution with quartiles
- **violin**: Shows distribution shape
- **bar**: Shows average values with confidence intervals

Let's explore each!

## 4. Relationship Plots with `relplot()`

### 4.1 Scatter Plots
**Goal**: See if there's a relationship between total bill and tip amount

In [None]:
# Basic scatter plot: Do higher bills lead to higher tips?
sns.relplot(data=tips, x='total_bill', y='tip', kind='scatter')
plt.title('Relationship between Total Bill and Tip')
plt.show()

In [None]:
# Add color to show male vs female tippers
sns.relplot(data=tips, x='total_bill', y='tip', hue='sex')
plt.title('Total Bill vs Tip by Gender')
plt.show()

In [None]:
# Add more dimensions: color for time, different panels for smoker/non-smoker
sns.relplot(data=tips, x='total_bill', y='tip', hue='time', col='smoker')
plt.show()

### 4.2 Line Plots
**Goal**: See trends over time

Let's load a time-series dataset about flight passengers

In [None]:
# Load flights dataset
flights = sns.load_dataset('flights')
print(flights.head())

# Plot: How did passenger numbers change over years?
sns.relplot(data=flights, x='year', y='passengers', kind='line')
plt.title('Airline Passengers Over Time')
plt.show()

## 5. Categorical Plots with `catplot()`

### 5.1 Strip Plots
**Goal**: See all individual data points for each category

In [None]:
# Compare bills across different days of the week
sns.catplot(data=tips, x='day', y='total_bill', kind='strip')
plt.title('Total Bills by Day of Week')
plt.show()

### 5.2 Box Plots
**Goal**: See the distribution and compare medians across categories

Box plots show:
- The middle line = median (50th percentile)
- Box edges = 25th and 75th percentiles
- Whiskers = range of most data
- Dots = outliers

In [None]:
# Compare bill distributions by day
sns.catplot(data=tips, x='day', y='total_bill', kind='box')
plt.title('Distribution of Bills by Day')
plt.show()

In [None]:
# Split by gender to compare male vs female
sns.catplot(data=tips, x='day', y='total_bill', hue='sex', kind='box')
plt.title('Bill Distribution by Day and Gender')
plt.show()

### 5.3 Violin Plots
**Goal**: See the shape of the distribution (like a box plot but shows more detail)

Wider sections = more data points at that value

In [None]:
# See the full distribution shape for each day
sns.catplot(data=tips, x='day', y='total_bill', kind='violin')
plt.title('Distribution Shape of Bills by Day')
plt.show()

### 5.4 Bar Plots
**Goal**: Compare average values across categories

Bar height = average, black line = confidence interval (uncertainty)

In [None]:
# What's the average bill on each day?
sns.catplot(data=tips, x='day', y='total_bill', kind='bar')
plt.title('Average Total Bill by Day')
plt.show()

In [None]:
# Compare lunch vs dinner
sns.catplot(data=tips, x='day', y='total_bill', hue='time', kind='bar')
plt.title('Average Bill by Day and Time')
plt.show()

## 6. Distribution Plots

### 6.1 Histogram
**Goal**: See how frequently different values occur

In [None]:
# How are total bills distributed?
sns.histplot(data=tips, x='total_bill')
plt.title('Distribution of Total Bills')
plt.show()

In [None]:
# Add a smooth curve (KDE) to see the shape better
sns.histplot(data=tips, x='total_bill', kde=True)
plt.title('Total Bill Distribution with Smooth Curve')
plt.show()

In [None]:
# Compare male vs female distributions
sns.histplot(data=tips, x='total_bill', hue='sex', kde=True)
plt.title('Bill Distribution by Gender')
plt.show()

### 6.2 Count Plot
**Goal**: Count how many observations in each category

In [None]:
# How many customers on each day?
sns.countplot(data=tips, x='day')
plt.title('Number of Customers by Day')
plt.show()

In [None]:
# Break down by smoker status
sns.countplot(data=tips, x='day', hue='smoker')
plt.title('Customer Count by Day and Smoking Status')
plt.show()

## 7. Correlation Analysis

### Heatmap
**Goal**: See which variables are related to each other

Values close to 1 = strong positive relationship  
Values close to -1 = strong negative relationship  
Values close to 0 = no relationship

In [None]:
# Calculate correlations between numerical columns
correlation = tips.corr(numeric_only=True)

# Visualize as a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Between Variables')
plt.show()

## 8. Customization Basics

### 8.1 Color Palettes

In [None]:
# Try different color schemes
sns.catplot(data=tips, x='day', y='total_bill', kind='bar', palette='pastel')
plt.title('Using Pastel Colors')
plt.show()

In [None]:
# Bright colors
sns.catplot(data=tips, x='day', y='total_bill', kind='bar', palette='bright')
plt.title('Using Bright Colors')
plt.show()

### 8.2 Figure Size

In [None]:
# Make plots larger or smaller
sns.catplot(data=tips, x='day', y='total_bill', kind='box', height=6, aspect=1.5)
plt.title('Larger Plot')
plt.show()

## 9. Practice Exercises

### Exercise 1: Explore the Penguins Dataset

In [None]:
# Load the penguins dataset
penguins = sns.load_dataset('penguins')
print(penguins.head())

# Your tasks:
# 1. Create a scatter plot: bill_length_mm vs bill_depth_mm, colored by species
# 2. Create a box plot: body_mass_g by species
# 3. Create a histogram of flipper_length_mm

# Write your code below:


### Exercise 2: Analyze the Titanic Dataset

In [None]:
# Load the titanic dataset
titanic = sns.load_dataset('titanic')
print(titanic.head())

# Your tasks:
# 1. Create a count plot: How many passengers in each class?
# 2. Create a bar plot: Average age by class
# 3. Create a box plot: Fare by class, split by sex

# Write your code below:


### Exercise 3: Back to Tips Dataset

In [None]:
# Your tasks:
# 1. Do smokers or non-smokers give bigger tips on average? (bar plot)
# 2. What's the relationship between table size and total bill? (scatter plot)
# 3. How does tip distribution differ between lunch and dinner? (histogram with hue)

# Write your code below:


## 10. Summary

### Key Functions to Remember:

**For relationships between numbers:**
- `relplot(kind='scatter')` - scatter plots
- `relplot(kind='line')` - line plots

**For comparing categories:**
- `catplot(kind='strip')` - see all points
- `catplot(kind='box')` - see distribution summary
- `catplot(kind='violin')` - see distribution shape
- `catplot(kind='bar')` - compare averages

**For distributions:**
- `histplot()` - histogram
- `countplot()` - count categories

**For correlations:**
- `heatmap()` - correlation matrix

### Common Parameters:
- `data` - your DataFrame
- `x`, `y` - columns to plot
- `hue` - add color by category
- `col`, `row` - create separate panels
- `palette` - change colors

### Resources:
- Official Documentation: https://seaborn.pydata.org/
- Gallery with Examples: https://seaborn.pydata.org/examples/index.html