# Matplotlib Basics Demo (Enhanced with Explanations)

This notebook demonstrates fundamental matplotlib operations for data visualization with detailed explanations of both what the code does and why it's useful.

In [None]:
# Import necessary libraries
import pandas as pd  # For data manipulation and analysis
import numpy as np   # For numerical operations
import matplotlib.pyplot as plt  # The main plotting library
from matplotlib.gridspec import GridSpec  # For creating complex grid layouts

# Create a directory for saving plots if it doesn't exist
# This ensures our plots have a consistent location to be saved
import os
if not os.path.exists('plots'):
    os.makedirs('plots')

# Set the style for plots
# Using a predefined style helps create visually appealing plots without manual styling
plt.style.use('seaborn-v0_8-whitegrid')

## Create sample data

Before we can create visualizations, we need data to visualize. Here we'll create synthetic data that demonstrates different patterns.

In [None]:
# Set a random seed for reproducibility
# This ensures that the "random" data will be the same each time we run the code
np.random.seed(42)  

# Create an array of 50 evenly spaced values from 0 to 10
# This will be our x-axis data - evenly spaced points are ideal for showing trends
x = np.linspace(0, 10, 50)

# Create a linear relationship with some random noise
# y1 = 3x + 5 + noise - this creates data with a clear linear trend plus some randomness
y1 = 3 * x + 5 + np.random.normal(0, 2, 50)

# Create a quadratic relationship with some random noise
# y2 = 2x² + noise - this creates data with a curved pattern plus some randomness
y2 = 2 * x**2 + np.random.normal(0, 10, 50)

# Create a DataFrame to store our data
# Using a DataFrame makes it easier to organize and access our data for plotting
df = pd.DataFrame({
    'x': x,
    'y1': y1,
    'y2': y2,
    'category': np.random.choice(['A', 'B', 'C', 'D'], 50)  # Add categorical data for more complex plots
})

# Display the first few rows to see what our data looks like
df.head()

## 1. Line Plot

Line plots are ideal for showing trends over time or continuous variables. They connect data points with lines, making it easy to see patterns, trends, and changes in the data.

In [None]:
# Create a figure with a specific size
# The figsize parameter sets the width and height in inches - larger figures show more detail
plt.figure(figsize=(10, 6))

# Plot the first line with customized appearance
# We use markers (o) to highlight individual data points while the line shows the overall trend
plt.plot(x, y1, label='Linear Trend', color='blue', linestyle='-', linewidth=2, marker='o', markersize=5)

# Plot the second line with different styling
# Using different colors and line styles helps distinguish between multiple data series
plt.plot(x, y2, label='Quadratic Trend', color='red', linestyle='--', linewidth=2, marker='s', markersize=5)

# Add a title to explain what the plot shows
# Titles should be descriptive and help viewers understand the visualization's purpose
plt.title('Basic Line Plot Example', fontsize=16)

# Label the axes to clarify what each dimension represents
# Clear axis labels are essential for understanding what the plot is showing
plt.xlabel('X-axis Label', fontsize=12)
plt.ylabel('Y-axis Label', fontsize=12)

# Add a grid to make it easier to read values from the plot
# Grids help viewers estimate values more accurately
plt.grid(True, linestyle='--', alpha=0.7)

# Add a legend to identify each line
# Legends explain what each visual element represents
plt.legend(fontsize=12)

# Adjust the layout to ensure everything fits well
# tight_layout automatically adjusts subplot parameters for better spacing
plt.tight_layout()

# Save the plot as an image file
# Saving plots allows you to use them in reports or presentations
plt.savefig('plots/line_plot.png', dpi=300)  # dpi=300 creates a high-resolution image

# Display the plot in the notebook
plt.show()

## 2. Scatter Plot

Scatter plots are perfect for showing the relationship between two variables. Each point represents an individual data point, allowing you to see patterns, clusters, or outliers in the data. They're especially useful for identifying correlations between variables.

In [None]:
# Create a figure with a specific size
plt.figure(figsize=(10, 6))

# Create a scatter plot for the first dataset
# We use alpha (transparency) to help see overlapping points
# The 's' parameter controls the size of the markers
plt.scatter(x, y1, label='Group 1', color='blue', marker='o', s=50, alpha=0.7)

# Create a scatter plot for the second dataset with different markers
# Using different markers helps distinguish between different groups
plt.scatter(x, y2, label='Group 2', color='red', marker='x', s=50, alpha=0.7)

# Add a title and axis labels
plt.title('Scatter Plot Example', fontsize=16)
plt.xlabel('X-axis Label', fontsize=12)
plt.ylabel('Y-axis Label', fontsize=12)

# Add a grid to help with reading values
plt.grid(True, linestyle='--', alpha=0.7)

# Add a legend to identify each group
plt.legend(fontsize=12)

# Adjust layout and save
plt.tight_layout()
plt.savefig('plots/scatter_plot.png', dpi=300)
plt.show()

## 3. Bar Plot

Bar plots are excellent for comparing categorical data. The height of each bar represents a value, making it easy to compare values across different categories. They're particularly useful for showing differences between groups or categories.

In [None]:
# Create sample categorical data
categories = ['Category A', 'Category B', 'Category C', 'Category D', 'Category E']
values1 = np.random.randint(10, 100, 5)  # Random values for the first group
values2 = np.random.randint(10, 100, 5)  # Random values for the second group

plt.figure(figsize=(10, 6))

# Calculate positions for the bars
# For grouped bar charts, we need to position each group's bars side by side
bar_width = 0.35  # Width of each bar
x_pos = np.arange(len(categories))  # Positions for the categories

# Create the first group of bars
# We offset these bars to the left of each position using bar_width/2
plt.bar(x_pos - bar_width/2, values1, bar_width, label='Group 1', color='skyblue', edgecolor='black')

# Create the second group of bars
# We offset these bars to the right of each position using bar_width/2
plt.bar(x_pos + bar_width/2, values2, bar_width, label='Group 2', color='lightcoral', edgecolor='black')

# Add title and labels
plt.title('Bar Plot Example', fontsize=16)
plt.xlabel('Categories', fontsize=12)
plt.ylabel('Values', fontsize=12)

# Set the x-tick positions and labels
# This ensures the category labels are centered between each pair of bars
plt.xticks(x_pos, categories, fontsize=10)

# Add a grid for the y-axis only
# Horizontal grid lines help compare bar heights accurately
plt.grid(True, linestyle='--', alpha=0.7, axis='y')

# Add a legend
plt.legend(fontsize=12)

# Adjust layout and save
plt.tight_layout()
plt.savefig('plots/bar_plot.png', dpi=300)
plt.show()

## 4. Histogram

Histograms show the distribution of a dataset. They group data into bins (ranges) and show the frequency of data points in each bin. Histograms are ideal for understanding the shape, center, and spread of your data distribution.

In [None]:
plt.figure(figsize=(10, 6))

# Create a histogram for the first dataset
# bins=15 divides the data into 15 equal-width bins
# alpha controls transparency, which helps when overlaying multiple histograms
plt.hist(y1, bins=15, alpha=0.7, color='skyblue', edgecolor='black', label='Distribution 1')

# Create a histogram for the second dataset with lower alpha for better visibility
# Using a lower alpha value (0.5) makes it easier to see overlapping regions
plt.hist(y2, bins=15, alpha=0.5, color='lightcoral', edgecolor='black', label='Distribution 2')

# Add title and labels
plt.title('Histogram Example', fontsize=16)
plt.xlabel('Values', fontsize=12)
plt.ylabel('Frequency', fontsize=12)

# Add a grid
plt.grid(True, linestyle='--', alpha=0.7)

# Add a legend
plt.legend(fontsize=12)

# Adjust layout and save
plt.tight_layout()
plt.savefig('plots/histogram.png', dpi=300)
plt.show()

## 5. Subplots

Subplots allow you to create multiple plots in a single figure. This is useful for comparing different visualizations of the same data or showing related plots together. It helps viewers see relationships and patterns across different aspects of the data.

In [None]:
# Create a figure with a 2x2 grid of subplots
# The figsize parameter sets the overall figure size
# The 2,2 parameters create a grid with 2 rows and 2 columns
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Line plot in the first subplot (top-left)
# axes[0, 0] refers to the subplot in the first row, first column
axes[0, 0].plot(x, y1, color='blue', marker='o')
axes[0, 0].set_title('Line Plot')  # Each subplot gets its own title
axes[0, 0].set_xlabel('X-axis')    # Each subplot gets its own axis labels
axes[0, 0].set_ylabel('Y-axis')
axes[0, 0].grid(True)              # Each subplot can have its own grid

# Scatter plot in the second subplot (top-right)
# axes[0, 1] refers to the subplot in the first row, second column
axes[0, 1].scatter(x, y2, color='red', marker='x')
axes[0, 1].set_title('Scatter Plot')
axes[0, 1].set_xlabel('X-axis')
axes[0, 1].set_ylabel('Y-axis')
axes[0, 1].grid(True)

# Bar plot in the third subplot (bottom-left)
# axes[1, 0] refers to the subplot in the second row, first column
# We're using just the first 3 categories to keep it simple
axes[1, 0].bar(categories[:3], values1[:3], color='green')
axes[1, 0].set_title('Bar Plot')
axes[1, 0].set_xlabel('Categories')
axes[1, 0].set_ylabel('Values')
axes[1, 0].grid(True)

# Histogram in the fourth subplot (bottom-right)
# axes[1, 1] refers to the subplot in the second row, second column
axes[1, 1].hist(y1, bins=10, color='purple', alpha=0.7)
axes[1, 1].set_title('Histogram')
axes[1, 1].set_xlabel('Values')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].grid(True)

# Adjust layout to prevent overlap and save
# tight_layout automatically adjusts spacing between subplots
plt.tight_layout()
plt.savefig('plots/subplots.png', dpi=300)
plt.show()

## 6. Pair Plot (using pandas)

Pair plots show relationships between multiple variables in a dataset. They create a grid of plots where each variable is plotted against every other variable, with histograms on the diagonal. This is extremely useful for exploring multivariate data and identifying correlations between variables.

In [None]:
# Create a more interesting dataset for the pair plot
# We use multiple normally distributed variables with different means and standard deviations
np.random.seed(42)
pair_data = pd.DataFrame({
    'feature1': np.random.normal(0, 1, 100),    # Mean=0, SD=1
    'feature2': np.random.normal(5, 2, 100),    # Mean=5, SD=2
    'feature3': np.random.normal(-3, 1.5, 100), # Mean=-3, SD=1.5
    'feature4': np.random.normal(10, 3, 100)    # Mean=10, SD=3
})

# Create relationship: make feature2 correlated with feature1
pair_data['feature2'] = 2 * pair_data['feature1'] + np.random.normal(0, 0.5, 100)


# Calculate correlations between features
# This gives us a measure of how strongly each pair of variables is related
correlations = pair_data.corr()
correlations

In [None]:
# Create a figure with a grid of subplots
# We need a flexible grid layout where each variable gets a row and column
fig = plt.figure(figsize=(12, 10))
n_vars = len(pair_data.columns)  # Number of variables
grid = GridSpec(n_vars, n_vars)  # Create a grid with n_vars × n_vars cells

# Loop through all pairs of variables to create a grid of plots
# This creates a matrix where each variable is compared with every other variable
for i, var1 in enumerate(pair_data.columns):
    for j, var2 in enumerate(pair_data.columns):
        # Create a subplot at position [i,j] in our grid
        # This helps organize multiple plots in a structured way
        ax = plt.subplot(grid[i, j])
        
        # Diagonal plots (where i=j) show the distribution of a single variable
        # We use histograms on the diagonal because they're ideal for showing how values are distributed
        if i == j:
            ax.hist(pair_data[var1], bins=20, color='skyblue', edgecolor='black')
            ax.set_title(f'{var1}', fontsize=10)
        # Off-diagonal plots show relationships between two different variables
        # We use scatter plots because they're best for showing how two variables relate to each other
        else:
            ax.scatter(pair_data[var2], pair_data[var1], alpha=0.6, s=20)
            # Display the correlation coefficient (r) to quantify the relationship strength
            # This helps viewers quickly understand if there's a positive, negative, or no correlation
            ax.set_title(f'r = {correlations.iloc[i, j]:.2f}', fontsize=8)
        
        # Only show x-axis labels for the bottom row to avoid redundancy
        # This reduces clutter while ensuring all variables are still labeled
        if i == n_vars - 1:
            ax.set_xlabel(var2, fontsize=8)
        else:
            ax.set_xticklabels([])
        
        # Only show y-axis labels for the first column to avoid redundancy
        # This keeps the plot clean while maintaining necessary information
        if j == 0:
            ax.set_ylabel(var1, fontsize=8)
        else:
            ax.set_yticklabels([])

# Adjust spacing between subplots for better readability
# Without this, plots might overlap or have too much empty space
plt.tight_layout()

# Save the plot as a high-resolution image for reports or presentations
# DPI=300 ensures good quality for both screen viewing and printing
plt.savefig('plots/pair_plot.png', dpi=300)

# Display the plot in the notebook or interactive environment
# This allows immediate visualization of the results
plt.show()