# Session 5: Data Visualization with matplotlib and seaborn

**Objective:** Create visualizations for exploratory analysis and publication purposes using Matplotlib and Seaborn.

## 1. Introduction to Matplotlib and Seaborn

- **Matplotlib**: A low-level library for creating static, animated, and interactive visualizations in Python.
- **Seaborn**: A high-level data visualization library built on top of Matplotlib, providing more aesthetically pleasing and informative visualizations.

In [None]:
# Install the libraries (uncomment if needed)
# !pip install matplotlib seaborn

In [None]:
# Importing libraries
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

## 2. Creating Basic Plots

In [None]:
# Scatter Plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y, color='blue', alpha=0.5)
plt.title("Scatter Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

In [None]:
# Line Plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y, label='Sine Wave')
plt.title("Line Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

In [None]:
# Bar Plot
categories = ['A', 'B', 'C', 'D']
values = [5, 7, 3, 8]
plt.bar(categories, values, color='green')
plt.title("Bar Chart Example")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

In [None]:
# Histogram
data = np.random.randn(1000)
plt.hist(data, bins=20, color='purple', edgecolor='black')
plt.title("Histogram Example")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

## 3. Customizing Plots

In [None]:
# Titles, Labels, Legends, Grid
plt.plot(x, y, label='Sine Wave', color='red')
plt.title("Customized Line Plot")
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
plt.legend()
plt.grid()
plt.show()

In [None]:
# Seaborn Theme and Histogram with KDE
sns.set_theme(style="darkgrid")
sns.histplot(data, bins=20, kde=True, color='blue')
plt.title("Seaborn Histogram with KDE")
plt.show()

## Activity: Generate and Customize Visualizations from a Cleaned Dataset

In [None]:
# Task 1: Load and visualize distribution
try:
    df = pd.read_csv("data.csv")
    sns.histplot(df["column_name"], bins=30, kde=True)
    plt.title("Data Distribution")
    plt.show()
except FileNotFoundError:
    print("data.csv not found.")
except KeyError:
    print("'column_name' not found in the dataset.")

In [None]:
# Task 2: Category comparison bar chart
try:
    sns.barplot(x="category_column", y="value_column", data=df, palette="viridis")
    plt.title("Category Comparison")
    plt.show()
except KeyError:
    print("'category_column' or 'value_column' not found in the dataset.")