# Module 01: Matplotlib Fundamentals

**Estimated Time**: 90 minutes  
**Difficulty**: Beginner

## Learning Objectives

By the end of this module, you will:
- Understand Matplotlib's figure and axes architecture
- Create line plots and scatter plots
- Build bar charts and histograms
- Add titles, labels, and legends
- Save figures to files

---

In [None]:
# Import required libraries
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

## Part 1: Understanding Figure and Axes

Matplotlib has a hierarchical structure:

```
Figure (the whole window)
    └─ Axes (the plot area - can have multiple)
        ├─ X-axis
        ├─ Y-axis
        ├─ Title
        ├─ Legend
        └─ Plot elements (lines, points, bars, etc.)
```

### Key Concepts
- **Figure**: The entire image or window
- **Axes**: A single plot (NOT the plural of axis!)
- **Axis**: The X or Y line with ticks and labels

### Two Ways to Create Plots

1. **pyplot interface** (simple, quick): `plt.plot()`
2. **Object-oriented interface** (more control): `fig, ax = plt.subplots()`

We'll use both, but focus on the OO interface for better practices.

In [None]:
# Example: Simple plot with pyplot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.title("Simple Plot (pyplot style)")
plt.show()

print("This works, but we'll use the object-oriented approach for better control.")

In [None]:
# Example: Same plot with OO interface (RECOMMENDED)
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot([1, 2, 3, 4], [1, 4, 9, 16])
ax.set_title("Simple Plot (OO style)", fontsize=14)
ax.set_xlabel("X values")
ax.set_ylabel("Y values")
plt.tight_layout()
plt.show()

print("This gives us more control and is better for complex figures!")

## Part 2: Line Plots

Line plots show trends and continuous data. Perfect for time series, functions, and relationships.

### When to Use Line Plots
- Time series data (stock prices, temperature over time)
- Continuous functions (y = f(x))
- Showing trends and changes
- Comparing multiple series

In [None]:
# Example 1: Basic line plot
x = np.linspace(0, 10, 100)  # 100 points from 0 to 10
y = np.sin(x)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y)
ax.set_title("Sine Wave", fontsize=16, fontweight="bold")
ax.set_xlabel("X (radians)", fontsize=12)
ax.set_ylabel("sin(x)", fontsize=12)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Example 2: Multiple lines on same plot
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.sin(x) * np.cos(x)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y1, label="sin(x)", linewidth=2)
ax.plot(x, y2, label="cos(x)", linewidth=2)
ax.plot(x, y3, label="sin(x) * cos(x)", linewidth=2, linestyle="--")

ax.set_title("Trigonometric Functions", fontsize=16, fontweight="bold")
ax.set_xlabel("X (radians)", fontsize=12)
ax.set_ylabel("Y value", fontsize=12)
ax.legend(loc="upper right", fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Notice how the legend helps identify each line!")

In [None]:
# Example 3: Real-world data - Stock price simulation
days = np.arange(1, 31)
stock_price = 100 + np.cumsum(np.random.randn(30) * 2)  # Random walk

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(days, stock_price, marker="o", linewidth=2, markersize=6, color="darkgreen")
ax.set_title("Stock Price Over 30 Days", fontsize=16, fontweight="bold")
ax.set_xlabel("Day", fontsize=12)
ax.set_ylabel("Price ($)", fontsize=12)
ax.grid(True, alpha=0.3, linestyle=":")
ax.axhline(y=100, color="red", linestyle="--", label="Starting Price", alpha=0.7)
ax.legend()
plt.tight_layout()
plt.show()

print(f"Final price: ${stock_price[-1]:.2f} (Started at $100.00)")

## Part 3: Scatter Plots

Scatter plots show individual data points and reveal relationships between two variables.

### When to Use Scatter Plots
- Exploring relationships between two variables
- Identifying correlations and patterns
- Detecting outliers
- Showing distributions in 2D space

In [None]:
# Example 1: Basic scatter plot
x = np.random.randn(50)
y = 2 * x + np.random.randn(50) * 0.5  # Linear relationship with noise

fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(x, y, s=50, alpha=0.6)
ax.set_title("Scatter Plot: Positive Correlation", fontsize=16, fontweight="bold")
ax.set_xlabel("Variable X", fontsize=12)
ax.set_ylabel("Variable Y", fontsize=12)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Notice the positive linear relationship between X and Y")

In [None]:
# Example 2: Scatter with colors and sizes
n_points = 100
x = np.random.rand(n_points) * 100
y = np.random.rand(n_points) * 100
colors = np.random.rand(n_points)  # Random colors
sizes = np.random.rand(n_points) * 500  # Random sizes

fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(x, y, c=colors, s=sizes, alpha=0.5, cmap="viridis")
ax.set_title("Scatter Plot with Varying Colors and Sizes", fontsize=16, fontweight="bold")
ax.set_xlabel("X values", fontsize=12)
ax.set_ylabel("Y values", fontsize=12)
plt.colorbar(scatter, ax=ax, label="Color value")
plt.tight_layout()
plt.show()

print("Color and size can represent additional dimensions of data!")

In [None]:
# Example 3: Real-world - Height vs Weight
heights = np.random.normal(170, 10, 100)  # Mean 170cm, std 10cm
weights = heights * 0.7 + np.random.normal(0, 5, 100)  # Correlated with height

# Separate by gender
gender = np.random.choice(["Male", "Female"], 100)
male_mask = gender == "Male"
female_mask = gender == "Female"

fig, ax = plt.subplots(figsize=(10, 7))
ax.scatter(heights[male_mask], weights[male_mask], label="Male", alpha=0.6, s=80, color="blue")
ax.scatter(heights[female_mask], weights[female_mask], label="Female", alpha=0.6, s=80, color="red")

ax.set_title("Height vs Weight by Gender", fontsize=16, fontweight="bold")
ax.set_xlabel("Height (cm)", fontsize=12)
ax.set_ylabel("Weight (kg)", fontsize=12)
ax.legend(loc="upper left", fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Part 4: Bar Charts

Bar charts compare quantities across categories.

### When to Use Bar Charts
- Comparing quantities across categories
- Showing rankings
- Discrete data (not continuous)
- Survey results and categorical data

In [None]:
# Example 1: Basic bar chart
languages = ["Python", "JavaScript", "Java", "C++", "Go"]
popularity = [85, 70, 65, 55, 45]

fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(languages, popularity, color="steelblue", alpha=0.8)
ax.set_title("Programming Language Popularity", fontsize=16, fontweight="bold")
ax.set_xlabel("Programming Language", fontsize=12)
ax.set_ylabel("Popularity Score", fontsize=12)
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Example 2: Horizontal bar chart
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(languages, popularity, color="coral", alpha=0.8)
ax.set_title("Programming Language Popularity (Horizontal)", fontsize=16, fontweight="bold")
ax.set_xlabel("Popularity Score", fontsize=12)
ax.set_ylabel("Programming Language", fontsize=12)
ax.grid(axis="x", alpha=0.3)
plt.tight_layout()
plt.show()

print("Horizontal bars are great for long category names!")

In [None]:
# Example 3: Grouped bar chart
categories = ["Q1", "Q2", "Q3", "Q4"]
product_a = [12, 19, 15, 25]
product_b = [10, 15, 20, 22]
product_c = [8, 12, 18, 20]

x = np.arange(len(categories))
width = 0.25

fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(x - width, product_a, width, label="Product A", color="skyblue")
ax.bar(x, product_b, width, label="Product B", color="lightcoral")
ax.bar(x + width, product_c, width, label="Product C", color="lightgreen")

ax.set_title("Quarterly Sales by Product", fontsize=16, fontweight="bold")
ax.set_xlabel("Quarter", fontsize=12)
ax.set_ylabel("Sales (Thousands)", fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

## Part 5: Histograms

Histograms show the distribution of a single numerical variable.

### When to Use Histograms
- Understanding data distribution
- Checking for normality
- Identifying outliers
- Seeing data spread and central tendency

In [None]:
# Example 1: Basic histogram
data = np.random.normal(100, 15, 1000)  # Normal distribution

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(data, bins=30, color="purple", alpha=0.7, edgecolor="black")
ax.set_title("Distribution of Test Scores", fontsize=16, fontweight="bold")
ax.set_xlabel("Score", fontsize=12)
ax.set_ylabel("Frequency", fontsize=12)
ax.axvline(data.mean(), color="red", linestyle="--", linewidth=2, label=f"Mean: {data.mean():.1f}")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Mean: {data.mean():.2f}")
print(f"Std Dev: {data.std():.2f}")

In [None]:
# Example 2: Comparing distributions
data1 = np.random.normal(100, 15, 1000)
data2 = np.random.normal(110, 12, 1000)

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(data1, bins=30, alpha=0.5, label="Group A", color="blue", edgecolor="black")
ax.hist(data2, bins=30, alpha=0.5, label="Group B", color="red", edgecolor="black")
ax.set_title("Comparing Two Distributions", fontsize=16, fontweight="bold")
ax.set_xlabel("Value", fontsize=12)
ax.set_ylabel("Frequency", fontsize=12)
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

print("Overlapping histograms help compare distributions")

In [None]:
# Example 3: Different bin sizes show different patterns
data = np.random.exponential(2, 1000)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, bins in zip(axes, [10, 30, 50]):
    ax.hist(data, bins=bins, color="green", alpha=0.7, edgecolor="black")
    ax.set_title(f"{bins} Bins", fontsize=14, fontweight="bold")
    ax.set_xlabel("Value")
    ax.set_ylabel("Frequency")
    ax.grid(axis="y", alpha=0.3)

fig.suptitle("Effect of Bin Size on Histogram", fontsize=16, fontweight="bold")
plt.tight_layout()
plt.show()

print("Bin size matters! Too few bins hide detail, too many add noise.")

## Part 6: Plot Anatomy - Understanding All Elements

In [None]:
# Creating a complete, annotated plot
x = np.linspace(0, 10, 50)
y1 = np.sin(x)
y2 = np.cos(x)

fig, ax = plt.subplots(figsize=(12, 7))

# Plot lines
ax.plot(x, y1, label="sin(x)", linewidth=2.5, color="blue")
ax.plot(x, y2, label="cos(x)", linewidth=2.5, color="red", linestyle="--")

# Add titles and labels
ax.set_title("Complete Plot Anatomy", fontsize=18, fontweight="bold", pad=20)
ax.set_xlabel("X values (radians)", fontsize=14)
ax.set_ylabel("Y values", fontsize=14)

# Add legend
ax.legend(loc="upper right", fontsize=12, framealpha=0.9)

# Add grid
ax.grid(True, alpha=0.3, linestyle=":")

# Add horizontal and vertical lines
ax.axhline(y=0, color="black", linewidth=0.8)
ax.axvline(x=np.pi, color="gray", linewidth=0.8, linestyle=":", label="π")

# Add annotation
ax.annotate(
    "Maximum",
    xy=(np.pi / 2, 1),
    xytext=(np.pi / 2 + 1, 0.8),
    arrowprops=dict(arrowstyle="->", color="black", lw=1.5),
    fontsize=12,
    fontweight="bold",
)

# Set axis limits
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)

plt.tight_layout()
plt.show()

print("This plot demonstrates all key elements of a complete visualization!")

## Part 7: Saving Figures to Files

Once you've created a great plot, you'll want to save it!

In [None]:
# Create a sample plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), linewidth=2)
ax.set_title("Sample Plot for Saving", fontsize=16, fontweight="bold")
ax.set_xlabel("X values")
ax.set_ylabel("sin(x)")
ax.grid(True, alpha=0.3)

# Save in different formats
fig.savefig("../notebooks/outputs/sample_plot.png", dpi=300, bbox_inches="tight")
fig.savefig("../notebooks/outputs/sample_plot.pdf", bbox_inches="tight")
fig.savefig("../notebooks/outputs/sample_plot.svg", bbox_inches="tight")

plt.show()

print("Figure saved as:")
print("  - PNG (raster, 300 DPI - good for presentations)")
print("  - PDF (vector - perfect for papers)")
print("  - SVG (vector - great for web)")
print("\nFiles saved to: notebooks/outputs/")

### File Format Guide

| Format | Type   | Use Case                          | Pros                    | Cons              |
|--------|--------|-----------------------------------|-------------------------|-------------------|
| PNG    | Raster | Presentations, web, general use   | Universal support       | Can be pixelated  |
| PDF    | Vector | Academic papers, print            | Scalable, professional  | Large file size   |
| SVG    | Vector | Web, graphic design               | Scalable, editable      | Limited support   |
| JPG    | Raster | Photos, web (not recommended)     | Small files             | Compression loss  |

**Recommendation**: Use PNG (300 DPI) for most cases, PDF for publications.

## Part 8: Key Takeaways

### What You've Learned
✓ Figure and axes architecture in Matplotlib  
✓ Object-oriented approach: `fig, ax = plt.subplots()`  
✓ **Line plots**: Trends and continuous data  
✓ **Scatter plots**: Relationships and correlations  
✓ **Bar charts**: Categorical comparisons  
✓ **Histograms**: Data distributions  
✓ Adding titles, labels, legends, and grids  
✓ Saving figures in multiple formats  

### Best Practices
1. Always add **titles** and **axis labels**
2. Use **legends** when plotting multiple series
3. Add **grids** for easier reading (but keep them subtle)
4. Choose **appropriate plot types** for your data
5. Save at **high DPI** (300) for presentations and papers

### What's Next
In **Module 02**, you'll learn to customize plots:
- Colors, markers, and line styles
- Creating subplots
- Annotations and text
- Style sheets and themes
- Advanced layouts

---

## Exercises

Practice what you've learned!

### Exercise 1: Temperature Data
Create a line plot showing temperature over a week:
- Days: Monday through Sunday
- Temperatures: [22, 24, 23, 25, 27, 26, 24] (°C)
- Add a horizontal line showing the average temperature
- Include title, labels, and legend

In [None]:
# Your code here

### Exercise 2: Exam Scores
Create a scatter plot showing study hours vs exam scores:
- Study hours: Generate 30 random values between 0 and 10
- Scores: Create a positive relationship (e.g., score = 50 + 5*hours + noise)
- Color points by score (use a colormap)
- Add appropriate labels and title

In [None]:
# Your code here

### Exercise 3: Survey Results
Create a horizontal bar chart showing favorite programming languages:
- Languages: ['Python', 'Java', 'C++', 'JavaScript', 'Ruby']
- Votes: [45, 30, 20, 35, 15]
- Color bars by vote count (darker = more votes)
- Add labels and title

In [None]:
# Your code here

### Exercise 4: Data Distribution
Create a histogram of randomly generated height data:
- Generate 500 heights with mean 170cm and std 10cm
- Use 25 bins
- Add a vertical line showing the mean
- Include title and labels
- Calculate and print mean and standard deviation

In [None]:
# Your code here

### Challenge: Combine Multiple Plot Types
Create a figure with 4 subplots (2x2) showing:
1. Line plot of a simple function
2. Scatter plot of random data
3. Bar chart of categorical data
4. Histogram of a distribution

All in one figure!

In [None]:
# Your code here

---

**Congratulations!** You've mastered Matplotlib fundamentals. You can now create the four most common plot types and understand the anatomy of a complete visualization.

**Next**: Module 02 - Customizing Plots