# Seaborn: Statistical Data Visualization

Welcome to our guide on Seaborn, a powerful Python library for creating beautiful and informative statistical visualizations. This notebook is designed for students who have a basic understanding of Matplotlib and want to explore more advanced plotting techniques.

**What is Seaborn?**

Seaborn is a data visualization library built on top of Matplotlib. It provides a high-level interface for creating a wide range of statistical plots, making it easier to explore and understand your data. With Seaborn, you can create complex visualizations with just a few lines of code.

**Why Use Seaborn?**

*   **Ease of Use:** Seaborn simplifies the process of creating common statistical plots, such as histograms, scatter plots, and bar charts.
*   **Aesthetics:** Seaborn comes with a variety of built-in themes and color palettes, making your plots visually appealing and easy to read.
*   **Integration with Pandas:** Seaborn works seamlessly with Pandas DataFrames, allowing you to create plots directly from your data.

In this notebook, we'll explore some of the key features of Seaborn and learn how to create a variety of statistical plots.

## 1. Distribution Plots: Visualizing Data Distributions

**What are Distribution Plots?**

Distribution plots are used to visualize the distribution of a single variable. They can help you understand the central tendency, spread, and shape of your data.

**When to Use Distribution Plots:**

*   Analyzing the distribution of exam scores
*   Visualizing the age distribution of a population
*   Examining the distribution of income levels

Let's create a distribution plot to visualize the distribution of a dataset of heights.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Sample data: Heights of 100 individuals
heights = np.random.normal(loc=170, scale=10, size=100)

# Create the distribution plot
plt.figure(figsize=(8, 5))
sns.histplot(heights, kde=True, bins=15, color='skyblue')

# Add titles and labels
plt.title('Distribution of Heights')
plt.xlabel('Height (cm)')
plt.ylabel('Frequency')

# Show the plot
plt.show()

## 2. Categorical Plots: Comparing Categories

**What are Categorical Plots?**

Categorical plots are used to visualize the relationship between a numerical variable and one or more categorical variables. They can help you compare different groups and identify patterns in your data.

**When to Use Categorical Plots:**

*   Comparing the average income of different education levels
*   Visualizing the distribution of exam scores for different subjects
*   Analyzing the relationship between gender and tipping behavior

Let's create a box plot to compare the distribution of exam scores for different subjects.

In [None]:
import pandas as pd

# Sample data: Exam scores for different subjects
data = {'Subject': ['Math', 'Math', 'Math', 'Science', 'Science', 'Science', 'English', 'English', 'English'],
        'Score': [85, 90, 92, 78, 88, 80, 75, 82, 88]}
df = pd.DataFrame(data)

# Create the box plot
plt.figure(figsize=(8, 5))
sns.boxplot(x='Subject', y='Score', data=df)

# Add titles and labels
plt.title('Distribution of Exam Scores by Subject')
plt.xlabel('Subject')
plt.ylabel('Score')

# Show the plot
plt.show()

## 3. Relational Plots: Exploring Relationships

**What are Relational Plots?**

Relational plots are used to visualize the relationship between two or more numerical variables. They can help you identify correlations, trends, and patterns in your data.

**When to Use Relational Plots:**

*   Investigating the relationship between advertising spending and sales
*   Exploring the connection between a car's horsepower and its MPG
*   Analyzing the correlation between a person's age and their blood pressure

Let's create a scatter plot to explore the relationship between a car's horsepower and its MPG.

In [None]:
# Load the 'mpg' dataset from Seaborn
mpg = sns.load_dataset('mpg')

# Create the scatter plot
plt.figure(figsize=(8, 5))
sns.scatterplot(x='horsepower', y='mpg', data=mpg, hue='origin', size='weight', alpha=0.7)

# Add titles and labels
plt.title('Relationship Between Horsepower and MPG')
plt.xlabel('Horsepower')
plt.ylabel('Miles Per Gallon (MPG)')

# Show the plot
plt.show()

## 4. Regression Plots: Visualizing Linear Relationships

**What are Regression Plots?**

Regression plots are used to visualize the relationship between two numerical variables and fit a linear regression model to the data. They can help you understand the strength and direction of the relationship.

**When to Use Regression Plots:**

*   Visualizing the relationship between years of experience and salary
*   Analyzing the connection between a student's attendance and their grades
*   Exploring the correlation between temperature and ice cream sales

Let's create a regression plot to visualize the relationship between a person's age and their monthly income.

In [None]:
# Sample data: Age and monthly income
age = np.array([22, 25, 30, 35, 40, 45, 50, 55])
income = np.array([2500, 3000, 4000, 5500, 7000, 8000, 9500, 11000])

# Create the regression plot
plt.figure(figsize=(8, 5))
sns.regplot(x=age, y=income, color='purple')

# Add titles and labels
plt.title('Relationship Between Age and Monthly Income')
plt.xlabel('Age')
plt.ylabel('Monthly Income ($)')

# Show the plot
plt.show()

## Conclusion and Next Steps

Congratulations! You've learned how to create a variety of statistical plots using Seaborn. With its high-level interface and beautiful aesthetics, Seaborn is a powerful tool for exploring and communicating your data.

**Exercises:**

1.  **Create a distribution plot:** Load the 'tips' dataset from Seaborn and visualize the distribution of the 'total_bill'.
2.  **Create a categorical plot:** Use the 'titanic' dataset to explore the relationship between 'class' and 'age'.
3.  **Create a relational plot:** Load the 'iris' dataset and create a scatter plot to visualize the relationship between 'sepal_length' and 'sepal_width'.
4.  **Create a regression plot:** Use the 'tips' dataset to visualize the relationship between 'total_bill' and 'tip'.

In the next notebook, we'll dive into more advanced statistical visualizations, including heatmaps, pair plots, and facet grids.