# 📊 Chapter 03: Matplotlib - Data Visualization with Multiple Datasets

## 🎯 Learning Objectives
By the end of this chapter, you will:
- ✅ Understand Matplotlib and why it's the foundation of Python visualization
- ✅ Create line plots, scatter plots, bar charts, histograms
- ✅ Apply visualization techniques on **3 different datasets**
- ✅ Customize plots (colors, labels, legends, styles)
- ✅ Create subplots and multi-panel dashboards

---

## 📁 Datasets Used

### 1️⃣ Stock Prices (Time Series)
### 2️⃣ Iris Flowers (Scatter & Distributions)
### 3️⃣ Tips Restaurant (Categorical Analysis)

**Why 3 datasets?** Different visualization needs - time series, distributions, categories!

---

## 📚 Table of Contents
1. [Introduction to Matplotlib](#intro)
2. [Loading Datasets](#loading)
3. [Line Plots](#line)
4. [Scatter Plots](#scatter)
5. [Bar Charts](#bar)
6. [Histograms & Distributions](#hist)
7. [Subplots](#subplots)
8. [Customization](#custom)
9. [Practice Exercises](#exercises)
10. [Next Steps: Projects](#projects)

In [1]:
# Import visualization libraries
# matplotlib.pyplot is the main plotting interface (imported as 'plt' by convention)
import matplotlib.pyplot as plt

# seaborn provides beautiful default styles and additional plot types
import seaborn as sns

# numpy for numerical operations
import numpy as np

# pandas for loading and manipulating data
import pandas as pd

# Set plot style for better aesthetics
# seaborn styles make plots look professional with minimal effort
plt.style.use('seaborn-v0_8-darkgrid')  # Use seaborn's dark grid style

# Set default figure size (width, height in inches)
plt.rcParams['figure.figsize'] = (10, 6)

# Check versions
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"pandas version: {pd.__version__}")
print("\n✅ All libraries imported successfully!")

Matplotlib version: 3.10.7
Seaborn version: 0.13.2
NumPy version: 2.3.1
pandas version: 2.3.0

✅ All libraries imported successfully!


---

<a id="intro"></a>
## **1. Introduction to Matplotlib**

**What is Matplotlib?**
- The **foundational** plotting library in Python (created in 2003)
- Works like MATLAB's plotting interface
- Almost all other Python visualization libraries are built on top of it (seaborn, pandas plotting, etc.)

**Why Learn Matplotlib?**
- ✅ Complete control over every plot element
- ✅ Industry standard - used in research, business, data science
- ✅ Create publication-quality figures
- ✅ Integrate with Jupyter notebooks, web apps, GUI applications

**The Figure-Axes Hierarchy:**
```
Figure (entire window)
└── Axes (actual plot area)
    ├── Title
    ├── X-axis (label, ticks, limits)
    ├── Y-axis (label, ticks, limits)
    ├── Legend
    └── Plot elements (lines, points, bars, etc.)
```

**Two Ways to Use Matplotlib:**
1. **pyplot interface** (MATLAB-style): `plt.plot(x, y)` - Easier for beginners
2. **Object-oriented** (fig, ax): `fig, ax = plt.subplots()` - More control

We'll start with pyplot, then move to OO for advanced plots.

---

<a id="loading"></a>
## **2. Loading 3 Datasets for Visualization**

We'll practice on 3 different types of data:
1. **Stock Prices** - Time series data (dates + values)
2. **Iris Flowers** - Multi-dimensional data (measurements)
3. **Tips Restaurant** - Categorical data (groups + amounts)

In [2]:
print("=" * 70)
print("LOADING 3 DATASETS FOR VISUALIZATION")
print("=" * 70)

# ============================================================================
# DATASET 1: Stock Prices - Time Series Data
# ============================================================================
print("\n📈 DATASET 1: Stock Prices (Time Series)")
print("-" * 70)

# Generate synthetic stock price data (resembles real stock movement)
# Using random walk to simulate realistic price changes
np.random.seed(42)  # For reproducible results

# Create 365 days of data (1 year)
dates = pd.date_range(start='2024-01-01', periods=365, freq='D')

# Starting price
start_price = 100

# Daily returns (random fluctuations, typically -2% to +2%)
# cumsum() converts daily changes into cumulative price movement
daily_returns = np.random.randn(365) * 2  # Random values * 2% volatility
cumulative_returns = np.cumsum(daily_returns)  # Cumulative sum
stock_prices = start_price + cumulative_returns  # Add to starting price

# Create DataFrame
stock_data = pd.DataFrame({
    'date': dates,
    'price': stock_prices,
    'volume': np.random.randint(1000000, 5000000, size=365)  # Trading volume
})

print(f"Shape: {stock_data.shape}")
print(f"Date range: {stock_data['date'].min()} to {stock_data['date'].max()}")
print(f"Price range: ${stock_data['price'].min():.2f} - ${stock_data['price'].max():.2f}")
print("\nFirst 3 days:")
print(stock_data.head(3))

# ============================================================================
# DATASET 2: Iris Flowers - Classic ML Dataset
# ============================================================================
print("\n\n🌸 DATASET 2: Iris Flowers (Multi-dimensional)")
print("-" * 70)

# Load famous iris dataset (sepal/petal measurements of 3 flower species)
iris = sns.load_dataset('iris')

print(f"Shape: {iris.shape}")
print(f"Species: {iris['species'].unique()}")
print("\nColumns (features):")
print(iris.columns.tolist())
print("\nFirst 3 flowers:")
print(iris.head(3))

# ============================================================================
# DATASET 3: Tips - Restaurant Tipping Data
# ============================================================================
print("\n\n🍽️ DATASET 3: Restaurant Tips (Categorical)")
print("-" * 70)

# Load tips dataset (bills, tips, and categorical info)
tips = sns.load_dataset('tips')

print(f"Shape: {tips.shape}")
print(f"Days: {tips['day'].unique()}")
print(f"Time periods: {tips['time'].unique()}")
print("\nFirst 3 transactions:")
print(tips.head(3))

print("\n✅ All 3 datasets loaded successfully!")
print("=" * 70)

LOADING 3 DATASETS FOR VISUALIZATION

📈 DATASET 1: Stock Prices (Time Series)
----------------------------------------------------------------------
Shape: (365, 3)
Date range: 2024-01-01 00:00:00 to 2024-12-30 00:00:00
Price range: $72.95 - $113.58

First 3 days:
        date       price   volume
0 2024-01-01  100.993428  4388532
1 2024-01-02  100.716900  2425541
2 2024-01-03  102.012277  2509433


🌸 DATASET 2: Iris Flowers (Multi-dimensional)
----------------------------------------------------------------------
Shape: (150, 5)
Species: ['setosa' 'versicolor' 'virginica']

Columns (features):
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

First 3 flowers:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa


🍽️ DATASET 3: Restaurant Tips (Categorical)
----

---

<a id="line"></a>
## **3. Line Plots - Time Series & Trends**

**When to Use Line Plots:**
- Showing trends over time (stock prices, temperature, sales)
- Connecting continuous data points
- Comparing multiple time series

**Key Functions:**
- `plt.plot(x, y)` - Basic line plot
- `plt.plot(x, y, 'ro-')` - Line with red circles
- `plt.xlabel()`, `plt.ylabel()` - Axis labels
- `plt.title()` - Plot title
- `plt.legend()` - Add legend
- `plt.grid()` - Add gridlines
- `plt.show()` - Display plot (automatic in notebooks)

In [None]:
print("=" * 70)
print("LINE PLOTS - TIME SERIES VISUALIZATION")
print("=" * 70)

# ============================================================================
# PART 1: Basic Line Plot
# ============================================================================
print("\n1️⃣ BASIC LINE PLOT")
print("-" * 70)

# Create a simple line plot
# plt.figure() creates a new figure window
plt.figure(figsize=(12, 5))  # width=12 inches, height=5 inches

# Plot stock price over time
# x-axis: dates, y-axis: prices
plt.plot(stock_data['date'], stock_data['price'])

# Add labels and title
plt.xlabel('Date', fontsize=12)  # X-axis label with font size
plt.ylabel('Stock Price ($)', fontsize=12)  # Y-axis label
plt.title('Stock Price Over Time (2024)', fontsize=14, fontweight='bold')

# Add grid for easier reading
plt.grid(True, alpha=0.3)  # alpha controls transparency (0=invisible, 1=solid)

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)  # Rotate labels 45 degrees

# Adjust layout to prevent label cutoff
plt.tight_layout()

# Display the plot
plt.show()

print("✅ Basic line plot created!")

# ============================================================================
# PART 2: Customizing Line Style
# ============================================================================
print("\n\n2️⃣ CUSTOMIZING LINE STYLE")
print("-" * 70)

plt.figure(figsize=(12, 5))

# Plot with custom line style
# Color codes: 'b'=blue, 'r'=red, 'g'=green, 'k'=black
# Line styles: '-'=solid, '--'=dashed, '-.'=dash-dot, ':'=dotted
# Markers: 'o'=circle, 's'=square, '^'=triangle, '*'=star

plt.plot(stock_data['date'], stock_data['price'], 
         color='darkblue',           # Line color
         linestyle='-',               # Solid line
         linewidth=2,                 # Line thickness
         marker='o',                  # Circle markers at data points
         markersize=3,                # Marker size
         markerfacecolor='red',       # Marker fill color
         markeredgecolor='darkred',   # Marker outline color
         alpha=0.7)                   # Transparency

plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.title('Stock Price with Styled Line', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✅ Styled line plot created!")

# ============================================================================
# PART 3: Multiple Lines on Same Plot
# ============================================================================
print("\n\n3️⃣ MULTIPLE LINES - COMPARING TRENDS")
print("-" * 70)

# Create moving averages for comparison
# Rolling mean smooths out short-term fluctuations
stock_data['ma_7'] = stock_data['price'].rolling(window=7).mean()   # 7-day average
stock_data['ma_30'] = stock_data['price'].rolling(window=30).mean()  # 30-day average

plt.figure(figsize=(14, 6))

# Plot actual price
plt.plot(stock_data['date'], stock_data['price'], 
         label='Actual Price',      # Label for legend
         color='lightblue',
         linewidth=1,
         alpha=0.5)

# Plot 7-day moving average
plt.plot(stock_data['date'], stock_data['ma_7'], 
         label='7-Day MA',
         color='orange',
         linewidth=2)

# Plot 30-day moving average
plt.plot(stock_data['date'], stock_data['ma_30'], 
         label='30-Day MA',
         color='red',
         linewidth=2.5)

plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.title('Stock Price with Moving Averages', fontsize=14, fontweight='bold')

# Add legend to identify lines
# loc='best' automatically finds best position
plt.legend(loc='best', fontsize=11, framealpha=0.9)

plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✅ Multiple lines plotted with legend!")

# ============================================================================
# PART 4: Highlighting Areas
# ============================================================================
print("\n\n4️⃣ FILL BETWEEN - HIGHLIGHTING AREAS")
print("-" * 70)

plt.figure(figsize=(14, 6))

# Plot the main line
plt.plot(stock_data['date'], stock_data['price'], 
         label='Stock Price',
         color='darkblue',
         linewidth=2)

# Fill area between price and moving average
# Highlights when price is above/below average
plt.fill_between(stock_data['date'], 
                 stock_data['price'], 
                 stock_data['ma_30'],
                 where=(stock_data['price'] > stock_data['ma_30']),  # Condition
                 color='green',
                 alpha=0.2,
                 label='Above 30-Day MA')

plt.fill_between(stock_data['date'], 
                 stock_data['price'], 
                 stock_data['ma_30'],
                 where=(stock_data['price'] <= stock_data['ma_30']),
                 color='red',
                 alpha=0.2,
                 label='Below 30-Day MA')

# Add 30-day MA line
plt.plot(stock_data['date'], stock_data['ma_30'], 
         label='30-Day MA',
         color='black',
         linewidth=1.5,
         linestyle='--')

plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.title('Stock Price with Filled Areas', fontsize=14, fontweight='bold')
plt.legend(loc='best', fontsize=10)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✅ Fill between areas added!")

# ============================================================================
# PART 5: Subplot - Multiple Plots Vertically
# ============================================================================
print("\n\n5️⃣ SUBPLOTS - PRICE AND VOLUME")
print("-" * 70)

# Create figure with 2 subplots stacked vertically
# nrows=2, ncols=1 means 2 rows, 1 column
# sharex=True means they share the same x-axis
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(14, 8), sharex=True)

# Subplot 1: Stock Price
ax1.plot(stock_data['date'], stock_data['price'], color='darkblue', linewidth=2)
ax1.set_ylabel('Price ($)', fontsize=12)
ax1.set_title('Stock Price and Trading Volume', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)

# Subplot 2: Trading Volume (bar chart)
ax2.bar(stock_data['date'], stock_data['volume'], 
        color='steelblue', alpha=0.6, width=1)  # width=1 removes gaps
ax2.set_xlabel('Date', fontsize=12)
ax2.set_ylabel('Volume', fontsize=12)
ax2.grid(True, alpha=0.3)

# Format y-axis to show millions
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1e6:.1f}M'))

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("✅ Subplots created - Price and Volume!")

print("\n✅ Line plots mastered on stock price data!")
print("=" * 70)

---

<a id="scatter"></a>
## **4. Scatter Plots - Relationships & Distributions**

**When to Use Scatter Plots:**
- Showing relationship between two numerical variables
- Identifying correlations, clusters, outliers
- Visualizing multi-dimensional data with color/size encoding

**Key Functions:**
- `plt.scatter(x, y)` - Basic scatter plot
- `plt.scatter(x, y, c=colors)` - Color by category
- `plt.scatter(x, y, s=sizes)` - Size by value
- `plt.colorbar()` - Add color scale legend

**Best Practices:**
- Use transparency (`alpha`) when points overlap
- Color by categories to show groups
- Size by importance/magnitude to show third dimension

In [None]:
print("=" * 70)
print("SCATTER PLOTS - RELATIONSHIPS & PATTERNS")
print("=" * 70)

# ============================================================================
# PART 1: Basic Scatter Plot
# ============================================================================
print("\n1️⃣ BASIC SCATTER PLOT")
print("-" * 70)

plt.figure(figsize=(10, 6))

# Plot sepal length vs sepal width
# x-axis: sepal_length, y-axis: sepal_width
plt.scatter(iris['sepal_length'], iris['sepal_width'])

plt.xlabel('Sepal Length (cm)', fontsize=12)
plt.ylabel('Sepal Width (cm)', fontsize=12)
plt.title('Iris: Sepal Length vs Width', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Basic scatter plot created!")
print("Notice: Hard to see patterns without color coding by species")

# ============================================================================
# PART 2: Scatter Plot with Color by Category
# ============================================================================
print("\n\n2️⃣ COLOR BY CATEGORY - REVEALING GROUPS")
print("-" * 70)

plt.figure(figsize=(10, 6))

# Get unique species
species = iris['species'].unique()

# Define colors for each species
colors = ['red', 'green', 'blue']

# Plot each species separately with different colors
for species_name, color in zip(species, colors):
    # Filter data for this species
    species_data = iris[iris['species'] == species_name]
    
    # Plot this species
    plt.scatter(species_data['sepal_length'], 
               species_data['sepal_width'],
               c=color,                    # Color for this species
               label=species_name.title(), # Legend label (capitalize first letter)
               alpha=0.6,                  # Transparency
               s=50,                       # Point size
               edgecolors='black',         # Black outline around points
               linewidth=0.5)              # Outline thickness

plt.xlabel('Sepal Length (cm)', fontsize=12)
plt.ylabel('Sepal Width (cm)', fontsize=12)
plt.title('Iris Species: Sepal Dimensions', fontsize=14, fontweight='bold')
plt.legend(title='Species', fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Color-coded scatter plot reveals 3 distinct clusters!")

# ============================================================================
# PART 3: Scatter Plot with Size Encoding
# ============================================================================
print("\n\n3️⃣ SIZE ENCODING - 3RD DIMENSION")
print("-" * 70)

plt.figure(figsize=(10, 6))

# Plot petal length vs petal width
# Size of points represents sepal length
# Larger points = longer sepals
plt.scatter(iris['petal_length'], 
           iris['petal_width'],
           s=iris['sepal_length'] * 30,  # Size proportional to sepal length (scaled by 30)
           c=iris['sepal_width'],         # Color by sepal width
           cmap='viridis',                # Color map (yellow to purple)
           alpha=0.6,
           edgecolors='black',
           linewidth=0.5)

plt.xlabel('Petal Length (cm)', fontsize=12)
plt.ylabel('Petal Width (cm)', fontsize=12)
plt.title('Iris: Petal Dimensions (size=sepal length, color=sepal width)', 
         fontsize=13, fontweight='bold')

# Add colorbar to show what colors mean
cbar = plt.colorbar()
cbar.set_label('Sepal Width (cm)', fontsize=11)

plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ 4 dimensions visualized: x, y, size, and color!")

# ============================================================================
# PART 4: Multiple Scatter Plots - Comparison
# ============================================================================
print("\n\n4️⃣ MULTIPLE SCATTER PLOTS - PAIRWISE COMPARISONS")
print("-" * 70)

# Create 2x2 grid of scatter plots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Iris Dataset: All Pairwise Comparisons', fontsize=16, fontweight='bold')

# Define color map for species
color_map = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}

# Subplot 1: Sepal Length vs Sepal Width
ax = axes[0, 0]
for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['sepal_length'], species_data['sepal_width'],
              c=color_map[species_name], label=species_name.title(), alpha=0.6)
ax.set_xlabel('Sepal Length', fontsize=10)
ax.set_ylabel('Sepal Width', fontsize=10)
ax.set_title('Sepal: Length vs Width')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Subplot 2: Sepal Length vs Petal Length
ax = axes[0, 1]
for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['sepal_length'], species_data['petal_length'],
              c=color_map[species_name], label=species_name.title(), alpha=0.6)
ax.set_xlabel('Sepal Length', fontsize=10)
ax.set_ylabel('Petal Length', fontsize=10)
ax.set_title('Sepal Length vs Petal Length')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Subplot 3: Petal Length vs Petal Width
ax = axes[1, 0]
for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['petal_length'], species_data['petal_width'],
              c=color_map[species_name], label=species_name.title(), alpha=0.6)
ax.set_xlabel('Petal Length', fontsize=10)
ax.set_ylabel('Petal Width', fontsize=10)
ax.set_title('Petal: Length vs Width')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

# Subplot 4: Sepal Width vs Petal Width
ax = axes[1, 1]
for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['sepal_width'], species_data['petal_width'],
              c=color_map[species_name], label=species_name.title(), alpha=0.6)
ax.set_xlabel('Sepal Width', fontsize=10)
ax.set_ylabel('Petal Width', fontsize=10)
ax.set_title('Sepal Width vs Petal Width')
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ 4 pairwise scatter plots reveal strong correlations!")
print("Key insight: Petal dimensions clearly separate the 3 species")

# ============================================================================
# PART 5: Tips Dataset - Bill vs Tip Scatter
# ============================================================================
print("\n\n5️⃣ TIPS DATASET - RELATIONSHIP ANALYSIS")
print("-" * 70)

plt.figure(figsize=(10, 6))

# Scatter plot: total bill vs tip
# Color by time of day
time_colors = {'Lunch': 'orange', 'Dinner': 'purple'}

for time_period in tips['time'].unique():
    time_data = tips[tips['time'] == time_period]
    plt.scatter(time_data['total_bill'], 
               time_data['tip'],
               c=time_colors[time_period],
               label=time_period,
               alpha=0.6,
               s=60,
               edgecolors='black',
               linewidth=0.5)

# Add trendline
# np.polyfit fits a polynomial (degree 1 = straight line)
z = np.polyfit(tips['total_bill'], tips['tip'], 1)  # Get slope and intercept
p = np.poly1d(z)  # Create polynomial function
plt.plot(tips['total_bill'], p(tips['total_bill']), 
        "r--", linewidth=2, label=f'Trend: y={z[0]:.2f}x+{z[1]:.2f}')

plt.xlabel('Total Bill ($)', fontsize=12)
plt.ylabel('Tip ($)', fontsize=12)
plt.title('Restaurant Tips: Bill vs Tip Amount', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"✅ Positive correlation: Higher bills → Higher tips")
print(f"Trendline equation: Tip = {z[0]:.3f} × Bill + {z[1]:.3f}")

print("\n✅ Scatter plots mastered on Iris and Tips datasets!")
print("=" * 70)

---

<a id="bar"></a>
## **5. Bar Charts - Categorical Comparisons**

**When to Use Bar Charts:**
- Comparing quantities across categories (sales by region, scores by student)
- Showing counts or totals for different groups
- Horizontal bars for long category names

**Types:**
- **Vertical bars**: `plt.bar(categories, values)`
- **Horizontal bars**: `plt.barh(categories, values)`
- **Grouped bars**: Multiple bars per category
- **Stacked bars**: Bars on top of each other

---

<a id="hist"></a>
## **6. Histograms & Distributions**

**When to Use Histograms:**
- Understanding data distribution (normal, skewed, bimodal)
- Finding data range, center, spread
- Identifying outliers

**Key Concepts:**
- **Bins**: Divides data range into intervals
- **Frequency**: Count of values in each bin
- **KDE** (Kernel Density Estimate): Smooth curve showing distribution

In [None]:
print("=" * 70)
print("BAR CHARTS & HISTOGRAMS")
print("=" * 70)

# ============================================================================
# PART 1: Vertical Bar Chart - Tips by Day
# ============================================================================
print("\n1️⃣ VERTICAL BAR CHART")
print("-" * 70)

# Calculate average tip per day
avg_tip_per_day = tips.groupby('day')['tip'].mean().sort_values(ascending=False)

plt.figure(figsize=(10, 6))

# Create bar chart
# x=categories (days), height=values (average tips)
bars = plt.bar(avg_tip_per_day.index, avg_tip_per_day.values,
              color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A'],  # Custom colors
              edgecolor='black',
              linewidth=1.5)

# Add value labels on top of bars
for bar in bars:
    height = bar.get_height()  # Get bar height
    plt.text(bar.get_x() + bar.get_width()/2., height,  # Position: center of bar, at top
            f'${height:.2f}',  # Text: formatted as currency
            ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Average Tip ($)', fontsize=12)
plt.title('Average Tips by Day of Week', fontsize=14, fontweight='bold')
plt.ylim(0, avg_tip_per_day.max() * 1.15)  # Add 15% space at top for labels
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print(f"✅ Best tipping day: {avg_tip_per_day.idxmax()} (${avg_tip_per_day.max():.2f})")

# ============================================================================
# PART 2: Horizontal Bar Chart - Meal Counts
# ============================================================================
print("\n\n2️⃣ HORIZONTAL BAR CHART")
print("-" * 70)

# Count meals per day
meal_counts = tips['day'].value_counts().sort_values()  # Sort for better viz

plt.figure(figsize=(10, 6))

# Horizontal bar chart
plt.barh(meal_counts.index, meal_counts.values,
        color='steelblue',
        edgecolor='navy',
        linewidth=1.5)

# Add value labels at end of bars
for i, (day, count) in enumerate(meal_counts.items()):
    plt.text(count + 1, i, str(count),  # Position slightly right of bar
            va='center', fontsize=11, fontweight='bold')

plt.xlabel('Number of Meals', fontsize=12)
plt.ylabel('Day of Week', fontsize=12)
plt.title('Total Meals Served by Day', fontsize=14, fontweight='bold')
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Horizontal bars great for long category names!")

# ============================================================================
# PART 3: Grouped Bar Chart - Tips by Day and Time
# ============================================================================
print("\n\n3️⃣ GROUPED BAR CHART - MULTIPLE CATEGORIES")
print("-" * 70)

# Calculate average tip by day AND time
grouped_data = tips.groupby(['day', 'time'])['tip'].mean().unstack()
# unstack() pivots so time becomes columns

plt.figure(figsize=(12, 6))

# Get x positions for bars
x = np.arange(len(grouped_data.index))
width = 0.35  # Width of bars

# Plot bars for each time period side by side
bars1 = plt.bar(x - width/2, grouped_data['Lunch'], width,
               label='Lunch', color='orange', edgecolor='black')
bars2 = plt.bar(x + width/2, grouped_data['Dinner'], width,
               label='Dinner', color='purple', edgecolor='black')

plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Average Tip ($)', fontsize=12)
plt.title('Average Tips: Lunch vs Dinner by Day', fontsize=14, fontweight='bold')
plt.xticks(x, grouped_data.index)  # Set x-tick labels to days
plt.legend(fontsize=11)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Grouped bars compare two categories side by side!")

# ============================================================================
# PART 4: Histogram - Distribution of Tips
# ============================================================================
print("\n\n4️⃣ HISTOGRAM - DATA DISTRIBUTION")
print("-" * 70)

plt.figure(figsize=(12, 5))

# Plot histogram
# bins=20 divides data into 20 intervals
n, bins, patches = plt.hist(tips['tip'], bins=20,
                            color='skyblue',
                            edgecolor='black',
                            linewidth=1.2,
                            alpha=0.7)

# Color bars based on frequency (darker = more common)
# Normalize colors from light to dark
cm = plt.cm.Blues  # Color map
norm = plt.Normalize(vmin=n.min(), vmax=n.max())
for count, patch in zip(n, patches):
    patch.set_facecolor(cm(norm(count)))

plt.xlabel('Tip Amount ($)', fontsize=12)
plt.ylabel('Frequency (Number of Occurrences)', fontsize=12)
plt.title('Distribution of Tip Amounts', fontsize=14, fontweight='bold')

# Add statistical lines
mean_tip = tips['tip'].mean()
median_tip = tips['tip'].median()
plt.axvline(mean_tip, color='red', linestyle='--', linewidth=2, label=f'Mean: ${mean_tip:.2f}')
plt.axvline(median_tip, color='green', linestyle='--', linewidth=2, label=f'Median: ${median_tip:.2f}')

plt.legend(fontsize=11)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print(f"✅ Distribution shows: Mean=${mean_tip:.2f}, Median=${median_tip:.2f}")
print("Most tips are between $2-4")

# ============================================================================
# PART 5: Multiple Histograms - Species Comparison
# ============================================================================
print("\n\n5️⃣ OVERLAPPING HISTOGRAMS - COMPARING DISTRIBUTIONS")
print("-" * 70)

plt.figure(figsize=(12, 6))

# Plot histogram for each species
species = iris['species'].unique()
colors = ['red', 'green', 'blue']

for sp, color in zip(species, colors):
    species_data = iris[iris['species'] == sp]['petal_length']
    plt.hist(species_data, bins=15,
            alpha=0.5,  # Transparency so we can see overlaps
            label=sp.title(),
            color=color,
            edgecolor='black',
            linewidth=1)

plt.xlabel('Petal Length (cm)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title('Iris: Petal Length Distribution by Species', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("✅ Overlapping histograms show setosa has shorter petals!")

# ============================================================================
# PART 6: Histogram with KDE - Smooth Distribution
# ============================================================================
print("\n\n6️⃣ HISTOGRAM + KDE CURVE")
print("-" * 70)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle('Iris: Feature Distributions with KDE', fontsize=16, fontweight='bold')

features = ['sepal_length', 'sepal_width', 'petal_length']

for ax, feature in zip(axes, features):
    # Plot histogram
    ax.hist(iris[feature], bins=20, color='lightblue', edgecolor='black', alpha=0.7, density=True)
    
    # Plot KDE (smooth curve)
    # density=True normalizes histogram to match KDE scale
    iris[feature].plot(kind='kde', ax=ax, color='red', linewidth=2, label='KDE')
    
    ax.set_xlabel(feature.replace('_', ' ').title(), fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.legend(fontsize=10)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ KDE shows smooth probability distribution!")

print("\n✅ Bar charts and histograms mastered on all 3 datasets!")
print("=" * 70)

---

<a id="subplots"></a>
## **7. Advanced Subplots - Multi-Panel Dashboards**

**Why Use Subplots:**
- Compare multiple views side by side
- Create dashboards with different plot types
- Save space by combining related visualizations

**Methods:**
- `plt.subplots(nrows, ncols)` - Grid of plots
- `plt.subplot(rows, cols, index)` - Add plots one by one
- `plt.tight_layout()` - Auto-adjust spacing

---

<a id="custom"></a>
## **8. Customization - Professional Polish**

**Professional Touches:**
- Color palettes and themes
- Annotations and text
- Custom styles
- Exporting high-quality images

In [None]:
print("=" * 70)
print("ADVANCED SUBPLOTS & CUSTOMIZATION")
print("=" * 70)

# ============================================================================
# PART 1: Complex Dashboard - All 3 Datasets
# ============================================================================
print("\n1️⃣ COMPREHENSIVE DASHBOARD - 6 PLOTS")
print("-" * 70)

# Create 3x2 grid
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 2, hspace=0.3, wspace=0.3)

# Plot 1: Stock Price Line Chart
ax1 = fig.add_subplot(gs[0, :])  # Top row, span both columns
ax1.plot(stock_data['date'], stock_data['price'], color='darkblue', linewidth=2)
ax1.plot(stock_data['date'], stock_data['ma_30'], color='red', linestyle='--', linewidth=2, label='30-Day MA')
ax1.set_title('📈 Stock Price with Moving Average', fontsize=13, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Price ($)')
ax1.legend()
ax1.grid(True, alpha=0.3)
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)

# Plot 2: Iris Scatter
ax2 = fig.add_subplot(gs[1, 0])
for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    ax2.scatter(species_data['petal_length'], species_data['petal_width'],
               label=species_name.title(), alpha=0.6, s=40)
ax2.set_title('🌸 Iris: Petal Dimensions', fontsize=12, fontweight='bold')
ax2.set_xlabel('Petal Length (cm)')
ax2.set_ylabel('Petal Width (cm)')
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3)

# Plot 3: Tips Bar Chart
ax3 = fig.add_subplot(gs[1, 1])
avg_tip = tips.groupby('day')['tip'].mean().sort_values(ascending=False)
ax3.bar(avg_tip.index, avg_tip.values, color='coral', edgecolor='black')
ax3.set_title('🍽️ Tips: Average by Day', fontsize=12, fontweight='bold')
ax3.set_xlabel('Day')
ax3.set_ylabel('Avg Tip ($)')
ax3.grid(axis='y', alpha=0.3)

# Plot 4: Stock Volume
ax4 = fig.add_subplot(gs[2, 0])
ax4.bar(stock_data['date'], stock_data['volume'], color='steelblue', alpha=0.6, width=1)
ax4.set_title('📊 Trading Volume', fontsize=12, fontweight='bold')
ax4.set_xlabel('Date')
ax4.set_ylabel('Volume')
ax4.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1e6:.0f}M'))
plt.setp(ax4.xaxis.get_majorticklabels(), rotation=45)
ax4.grid(axis='y', alpha=0.3)

# Plot 5: Tips Histogram
ax5 = fig.add_subplot(gs[2, 1])
ax5.hist(tips['tip'], bins=15, color='lightgreen', edgecolor='black')
mean_val = tips['tip'].mean()
ax5.axvline(mean_val, color='red', linestyle='--', linewidth=2, label=f'Mean: ${mean_val:.2f}')
ax5.set_title('💵 Tip Distribution', fontsize=12, fontweight='bold')
ax5.set_xlabel('Tip ($)')
ax5.set_ylabel('Frequency')
ax5.legend(fontsize=9)
ax5.grid(axis='y', alpha=0.3)

# Main title
fig.suptitle('📊 COMPREHENSIVE DATA VISUALIZATION DASHBOARD', 
            fontsize=18, fontweight='bold', y=0.995)

plt.show()

print("✅ Complex dashboard with 5 different plot types created!")

# ============================================================================
# PART 2: Customization - Annotations
# ============================================================================
print("\n\n2️⃣ ANNOTATIONS - HIGHLIGHTING KEY POINTS")
print("-" * 70)

plt.figure(figsize=(12, 6))

# Plot stock price
plt.plot(stock_data['date'], stock_data['price'], color='darkblue', linewidth=2)

# Find max and min prices
max_idx = stock_data['price'].idxmax()
min_idx = stock_data['price'].idxmin()

max_price = stock_data.loc[max_idx, 'price']
min_price = stock_data.loc[min_idx, 'price']
max_date = stock_data.loc[max_idx, 'date']
min_date = stock_data.loc[min_idx, 'date']

# Annotate maximum
plt.annotate(f'Peak: ${max_price:.2f}',
            xy=(max_date, max_price),      # Point to annotate
            xytext=(max_date, max_price + 10),  # Text position
            fontsize=11,
            fontweight='bold',
            color='green',
            bbox=dict(boxstyle='round,pad=0.5', facecolor='lightgreen', alpha=0.7),
            arrowprops=dict(arrowstyle='->', color='green', lw=2))

# Annotate minimum
plt.annotate(f'Low: ${min_price:.2f}',
            xy=(min_date, min_price),
            xytext=(min_date, min_price - 10),
            fontsize=11,
            fontweight='bold',
            color='red',
            bbox=dict(boxstyle='round,pad=0.5', facecolor='lightcoral', alpha=0.7),
            arrowprops=dict(arrowstyle='->', color='red', lw=2))

plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.title('Stock Price with Annotated Extremes', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print(f"✅ Peak: ${max_price:.2f} on {max_date.strftime('%Y-%m-%d')}")
print(f"✅ Low: ${min_price:.2f} on {min_date.strftime('%Y-%m-%d')}")

# ============================================================================
# PART 3: Custom Styles & Themes
# ============================================================================
print("\n\n3️⃣ CUSTOM STYLES")
print("-" * 70)

# List available styles
print("Available matplotlib styles:")
print(plt.style.available[:10])  # Show first 10

# Apply different styles
styles = ['seaborn-v0_8-darkgrid', 'ggplot', 'fivethirtyeight']

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

for ax, style in zip(axes, styles):
    with plt.style.context(style):  # Temporarily apply style
        # Create simple plot
        ax.plot(iris['sepal_length'], iris['sepal_width'], 'o', alpha=0.5)
        ax.set_title(f'Style: {style}', fontsize=11, fontweight='bold')
        ax.set_xlabel('Sepal Length')
        ax.set_ylabel('Sepal Width')
        ax.grid(True)

plt.tight_layout()
plt.show()

print("✅ Different styles change the overall look!")

# ============================================================================
# PART 4: Color Palettes
# ============================================================================
print("\n\n4️⃣ COLOR PALETTES")
print("-" * 70)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Palette 1: Default
ax = axes[0, 0]
for i, species_name in enumerate(iris['species'].unique()):
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['petal_length'], species_data['petal_width'],
              label=species_name.title(), alpha=0.6, s=50)
ax.set_title('Default Colors', fontsize=12, fontweight='bold')
ax.legend()

# Palette 2: Pastel
ax = axes[0, 1]
pastel = ['#FFB3BA', '#BAFFC9', '#BAE1FF']
for i, species_name in enumerate(iris['species'].unique()):
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['petal_length'], species_data['petal_width'],
              color=pastel[i], label=species_name.title(), alpha=0.8, s=50, edgecolors='black')
ax.set_title('Pastel Colors', fontsize=12, fontweight='bold')
ax.legend()

# Palette 3: Dark
ax = axes[1, 0]
dark = ['#8B0000', '#006400', '#00008B']
for i, species_name in enumerate(iris['species'].unique()):
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['petal_length'], species_data['petal_width'],
              color=dark[i], label=species_name.title(), alpha=0.7, s=50)
ax.set_title('Dark Colors', fontsize=12, fontweight='bold')
ax.legend()

# Palette 4: Colormap
ax = axes[1, 1]
for i, species_name in enumerate(iris['species'].unique()):
    species_data = iris[iris['species'] == species_name]
    ax.scatter(species_data['petal_length'], species_data['petal_width'],
              c=species_data['sepal_length'], cmap='viridis',
              label=species_name.title(), alpha=0.7, s=50)
ax.set_title('Viridis Colormap', fontsize=12, fontweight='bold')
ax.legend()

fig.suptitle('Color Palette Examples', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("✅ Choose colors that match your message!")

# ============================================================================
# PART 5: Exporting High-Quality Plots
# ============================================================================
print("\n\n5️⃣ SAVING PLOTS")
print("-" * 70)

# Create a publication-quality plot
plt.figure(figsize=(10, 6))

for species_name in iris['species'].unique():
    species_data = iris[iris['species'] == species_name]
    plt.scatter(species_data['petal_length'], species_data['petal_width'],
               label=species_name.title(), alpha=0.7, s=60, edgecolors='black', linewidth=0.5)

plt.xlabel('Petal Length (cm)', fontsize=13)
plt.ylabel('Petal Width (cm)', fontsize=13)
plt.title('Iris Species: Petal Dimensions', fontsize=15, fontweight='bold', pad=15)
plt.legend(title='Species', fontsize=11, title_fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()

# Save in multiple formats
# plt.savefig('iris_plot.png', dpi=300, bbox_inches='tight')  # PNG - web/presentations
# plt.savefig('iris_plot.pdf', bbox_inches='tight')  # PDF - publications
# plt.savefig('iris_plot.svg', bbox_inches='tight')  # SVG - scalable vector

plt.show()

print("✅ Plot ready for export!")
print("Formats: PNG (web), PDF (print), SVG (scalable)")
print("Commands commented out to avoid creating files")

print("\n✅ Advanced subplots and customization mastered!")
print("=" * 70)

---
<a id="projects"></a>
## 🚀 Chapter Completed! Now Practice with Projects

### 🎉 Congratulations! You've Mastered Matplotlib!

### 📝 Recommended Projects:

#### **Project 02: Visualization & EDA** ⭐⭐ Beginner-Intermediate
**Link:** [Open Project 02](../projects/Project_02_Visualization.md)

**Time:** 3-4 hours

---

## 📚 Continue Learning

### ➡️ **Chapter 04: scikit-learn - Machine Learning**
**Link:** [Open Chapter 04](04_ScikitLearn_MachineLearning.ipynb)

---

## 🔗 Navigation

- **Previous**: [Chapter 02: pandas](02_Pandas_DataManipulation.ipynb)
- **Next**: [Chapter 04: scikit-learn](04_ScikitLearn_MachineLearning.ipynb)
- **Home**: [START HERE](../START_HERE.md)
- **Index**: [Main Index](../index.md)