# Data Visualization with Matplotlib: A Comprehensive Introduction

Data visualization is crucial for understanding patterns, trends, and insights in data. In this notebook, we'll explore the fundamentals of creating effective visualizations using Matplotlib.

## Why Data Visualization Matters

- **Exploratory Data Analysis (EDA)**: Understand data distributions and relationships
- **Communication**: Present findings to stakeholders effectively  
- **Pattern Recognition**: Identify trends, outliers, and anomalies
- **Model Validation**: Visualize model performance and diagnostics

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import make_classification, make_regression
import seaborn as sns

# Set style for better looking plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
np.random.seed(42)

print("Libraries imported successfully!")
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")

## 1. Basic Line Plots and Scatter Plots

Let's start with the fundamental plot types:

In [None]:
# Generate sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.sin(x) * np.exp(-x/5)  # Damped sine wave

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Basic Plot Types', fontsize=16, fontweight='bold')

# Line plot
axes[0, 0].plot(x, y1, label='sin(x)', linewidth=2)
axes[0, 0].plot(x, y2, label='cos(x)', linewidth=2, linestyle='--')
axes[0, 0].set_title('Line Plot: Trigonometric Functions')
axes[0, 0].set_xlabel('x')
axes[0, 0].set_ylabel('y')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Scatter plot with regression data
X_reg, y_reg = make_regression(n_samples=50, n_features=1, noise=10, random_state=42)
axes[0, 1].scatter(X_reg, y_reg, alpha=0.7, s=50)
axes[0, 1].set_title('Scatter Plot: Regression Data')
axes[0, 1].set_xlabel('Feature')
axes[0, 1].set_ylabel('Target')
axes[0, 1].grid(True, alpha=0.3)

# Damped oscillation
axes[1, 0].plot(x, y3, color='purple', linewidth=2, label='Damped sine')
axes[1, 0].fill_between(x, y3, alpha=0.3, color='purple')
axes[1, 0].set_title('Area Plot: Damped Oscillation')
axes[1, 0].set_xlabel('Time')
axes[1, 0].set_ylabel('Amplitude')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Multiple series with different styles
axes[1, 1].plot(x, y1, 'b-', label='sin(x)', linewidth=2)
axes[1, 1].plot(x[::10], y2[::10], 'ro', label='cos(x) - sampled', markersize=6)
axes[1, 1].plot(x, y3, 'g--', label='damped sin(x)', linewidth=2)
axes[1, 1].set_title('Mixed Plot Styles')
axes[1, 1].set_xlabel('x')
axes[1, 1].set_ylabel('y')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 2. Statistical Visualizations

Essential plots for understanding data distributions:

In [None]:
# Generate sample datasets
normal_data = np.random.normal(100, 15, 1000)
exponential_data = np.random.exponential(2, 1000)
uniform_data = np.random.uniform(0, 10, 1000)

# Create classification data for box plots
X_class, y_class = make_classification(n_samples=300, n_features=2, n_redundant=0, 
                                      n_informative=2, n_clusters_per_class=1, 
                                      random_state=42)

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Statistical Visualizations', fontsize=16, fontweight='bold')

# Histogram with multiple distributions
axes[0, 0].hist(normal_data, bins=30, alpha=0.7, label='Normal', density=True)
axes[0, 0].hist(exponential_data, bins=30, alpha=0.7, label='Exponential', density=True)
axes[0, 0].set_title('Histograms: Comparing Distributions')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Density')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Box plot
data_for_boxplot = [normal_data, exponential_data, uniform_data]
box_plot = axes[0, 1].boxplot(data_for_boxplot, labels=['Normal', 'Exponential', 'Uniform'])
axes[0, 1].set_title('Box Plot: Distribution Comparison')
axes[0, 1].set_ylabel('Value')
axes[0, 1].grid(True, alpha=0.3)

# Violin plot (using seaborn style)
df_violin = pd.DataFrame({
    'Normal': normal_data[:200],
    'Exponential': exponential_data[:200],
    'Uniform': uniform_data[:200]
})
axes[0, 2].violinplot([df_violin['Normal'], df_violin['Exponential'], df_violin['Uniform']], 
                     positions=[1, 2, 3])
axes[0, 2].set_title('Violin Plot: Distribution Shapes')
axes[0, 2].set_xticks([1, 2, 3])
axes[0, 2].set_xticklabels(['Normal', 'Exponential', 'Uniform'])
axes[0, 2].grid(True, alpha=0.3)

# Classification scatter plot
colors = ['red', 'blue']
for i, color in enumerate(colors):
    mask = y_class == i
    axes[1, 0].scatter(X_class[mask, 0], X_class[mask, 1], 
                      c=color, alpha=0.7, label=f'Class {i}', s=50)
axes[1, 0].set_title('Classification Scatter Plot')
axes[1, 0].set_xlabel('Feature 1')
axes[1, 0].set_ylabel('Feature 2')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Correlation heatmap
correlation_data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 100)
axes[1, 1].hexbin(correlation_data[:, 0], correlation_data[:, 1], 
                 gridsize=20, cmap='Blues')
axes[1, 1].set_title('Hexbin Plot: 2D Distribution')
axes[1, 1].set_xlabel('X')
axes[1, 1].set_ylabel('Y')

# Q-Q plot for normality testing
from scipy import stats
stats.probplot(normal_data, dist="norm", plot=axes[1, 2])
axes[1, 2].set_title('Q-Q Plot: Testing Normality')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Advanced Visualization Techniques

More sophisticated plots for complex data analysis:

In [None]:
# Generate time series data
dates = pd.date_range('2020-01-01', periods=365, freq='D')
trend = np.linspace(100, 150, 365)
seasonal = 20 * np.sin(2 * np.pi * np.arange(365) / 365.25 * 4)
noise = np.random.normal(0, 5, 365)
time_series = trend + seasonal + noise

# Create advanced visualizations
fig = plt.figure(figsize=(20, 15))

# Time series plot
ax1 = plt.subplot(3, 3, 1)
plt.plot(dates, time_series, linewidth=1, alpha=0.7, label='Data')
plt.plot(dates, trend, linewidth=2, color='red', label='Trend')
plt.title('Time Series Analysis')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)

# 3D surface plot
ax2 = plt.subplot(3, 3, 2, projection='3d')
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))
surface = ax2.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8)
ax2.set_title('3D Surface Plot')
ax2.set_xlabel('X')
ax2.set_ylabel('Y')
ax2.set_zlabel('Z')

# Contour plot
ax3 = plt.subplot(3, 3, 3)
contour = plt.contour(X, Y, Z, levels=15)
plt.colorbar(contour)
plt.title('Contour Plot')
plt.xlabel('X')
plt.ylabel('Y')

# Polar plot
ax4 = plt.subplot(3, 3, 4, projection='polar')
theta = np.linspace(0, 2*np.pi, 100)
r1 = 1 + 0.3*np.sin(5*theta)
r2 = 1 + 0.3*np.cos(3*theta)
ax4.plot(theta, r1, label='Flower 1')
ax4.plot(theta, r2, label='Flower 2')
ax4.set_title('Polar Plot')
ax4.legend()

# Stem plot
ax5 = plt.subplot(3, 3, 5)
x_stem = np.linspace(0, 10, 20)
y_stem = np.sin(x_stem) * np.exp(-x_stem/5)
plt.stem(x_stem, y_stem, basefmt=" ")
plt.title('Stem Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True, alpha=0.3)

# Step plot
ax6 = plt.subplot(3, 3, 6)
x_step = np.arange(10)
y_step = np.random.randint(1, 10, 10)
plt.step(x_step, y_step, where='mid', linewidth=2, label='Steps')
plt.fill_between(x_step, y_step, step='mid', alpha=0.3)
plt.title('Step Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True, alpha=0.3)

# Error bars
ax7 = plt.subplot(3, 3, 7)
x_err = np.arange(5)
y_err = [1, 4, 2, 8, 5]
y_error = [0.5, 1, 0.8, 1.2, 0.7]
plt.errorbar(x_err, y_err, yerr=y_error, fmt='o-', capsize=5, capthick=2, linewidth=2)
plt.title('Error Bars')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True, alpha=0.3)

# Pie chart
ax8 = plt.subplot(3, 3, 8)
sizes = [30, 25, 20, 15, 10]
labels = ['A', 'B', 'C', 'D', 'E']
colors = plt.cm.Set3(np.linspace(0, 1, len(sizes)))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
plt.title('Pie Chart')

# Stacked bar chart
ax9 = plt.subplot(3, 3, 9)
categories = ['A', 'B', 'C', 'D']
values1 = [20, 35, 30, 25]
values2 = [15, 25, 35, 30]
values3 = [10, 15, 20, 25]

width = 0.6
plt.bar(categories, values1, width, label='Series 1')
plt.bar(categories, values2, width, bottom=values1, label='Series 2')
plt.bar(categories, values3, width, 
        bottom=[i+j for i,j in zip(values1, values2)], label='Series 3')
plt.title('Stacked Bar Chart')
plt.ylabel('Values')
plt.legend()

plt.tight_layout()
plt.show()

## 4. Customization and Best Practices

Making professional, publication-ready visualizations:

In [None]:
# Generate sample data for demonstration
np.random.seed(42)
x = np.linspace(0, 10, 100)
y1 = np.sin(x) + 0.1 * np.random.randn(100)
y2 = np.cos(x) + 0.1 * np.random.randn(100)

# Create before/after comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# "Before" - Basic plot
ax1.plot(x, y1)
ax1.plot(x, y2)
ax1.set_title('Before: Basic Plot')

# "After" - Professional plot
ax2.plot(x, y1, linewidth=2, alpha=0.8, label='sin(x) + noise', color='#2E86AB')
ax2.plot(x, y2, linewidth=2, alpha=0.8, label='cos(x) + noise', color='#A23B72')

# Professional styling
ax2.set_title('After: Professional Plot', fontsize=14, fontweight='bold', pad=20)
ax2.set_xlabel('Time (seconds)', fontsize=12)
ax2.set_ylabel('Amplitude', fontsize=12)

# Customize legend
ax2.legend(frameon=True, fancybox=True, shadow=True, loc='upper right')

# Grid and styling
ax2.grid(True, alpha=0.3, linestyle='--')
ax2.spines['top'].set_visible(False)
ax2.spines['right'].set_visible(False)

# Set axis limits
ax2.set_xlim(0, 10)
ax2.set_ylim(-1.5, 1.5)

plt.tight_layout()
plt.show()

# Demonstrate color palettes and themes
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Color Palettes and Themes', fontsize=16)

# Generate sample data
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]

# Default colors
axes[0, 0].bar(categories, values)
axes[0, 0].set_title('Default Colors')

# Custom color palette
custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7']
axes[0, 1].bar(categories, values, color=custom_colors)
axes[0, 1].set_title('Custom Color Palette')

# Gradient colors
gradient_colors = plt.cm.viridis(np.linspace(0, 1, len(categories)))
axes[1, 0].bar(categories, values, color=gradient_colors)
axes[1, 0].set_title('Viridis Gradient')

# Monochrome with highlights
mono_colors = ['#CCCCCC'] * len(categories)
mono_colors[2] = '#FF6B6B'  # Highlight category C
axes[1, 1].bar(categories, values, color=mono_colors)
axes[1, 1].set_title('Monochrome with Highlight')

plt.tight_layout()
plt.show()

## Key Principles for Effective Data Visualization

### 1. **Choose the Right Plot Type**
- **Line plots**: Time series, continuous relationships
- **Bar charts**: Categorical comparisons
- **Scatter plots**: Correlation between variables
- **Histograms**: Data distribution
- **Box plots**: Distribution statistics and outliers

### 2. **Design Principles**
- **Clarity over complexity**: Simple is often better
- **Appropriate color use**: Consider colorblind-friendly palettes
- **Consistent styling**: Use the same style throughout
- **Meaningful labels**: Clear titles, axis labels, and legends

### 3. **Technical Best Practices**
```python
# Good practices demonstrated:
plt.figure(figsize=(10, 6))  # Appropriate size
plt.plot(x, y, linewidth=2, alpha=0.8)  # Readable lines
plt.grid(True, alpha=0.3)  # Subtle grid
plt.legend(frameon=True)  # Clear legend
plt.tight_layout()  # Proper spacing
```

### 4. **Common Mistakes to Avoid**
- 3D effects when 2D is sufficient
- Too many colors or patterns
- Missing or unclear labels
- Misleading scales or proportions
- Cluttered layouts

### 5. **Advanced Tips**
- Use subplots for comparative analysis
- Implement interactive features with libraries like Plotly
- Export high-quality figures for publications
- Consider your audience and their expertise level

This comprehensive introduction provides the foundation for creating effective data visualizations in your machine learning and data science projects!