# Module 00: Setup & Introduction to Data Visualization

**Estimated Time**: 45 minutes  
**Difficulty**: Beginner

## Learning Objectives

By the end of this module, you will:
- Understand what data visualization is and why it matters
- Set up your Python environment for visualization
- Know when to use Matplotlib, Seaborn, or Plotly
- Create your first simple plot
- Understand the visualization workflow

---

## Part 1: What is Data Visualization?

### Definition
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

### Why Visualize Data?

1. **Human brains process images faster** - We can understand a visualization in seconds vs. minutes of reading numbers
2. **Reveal patterns and trends** - Patterns invisible in raw data become obvious when visualized
3. **Communicate insights effectively** - A picture is worth a thousand words (or numbers!)
4. **Support decision-making** - Visual data helps stakeholders make informed decisions quickly
5. **Tell compelling stories** - Transform dry statistics into engaging narratives

### Real-World Examples
- **Business**: Sales dashboards, KPI tracking, market analysis
- **Science**: Research findings, experimental results, climate data
- **Healthcare**: Patient monitoring, epidemic tracking, treatment outcomes
- **Finance**: Stock prices, portfolio performance, risk analysis
- **Journalism**: Data-driven stories, infographics, interactive articles

## Part 2: Environment Setup

Let's verify that all required libraries are installed correctly.

In [None]:
# Enable inline plotting for Jupyter
%matplotlib inline

# Import libraries and check versions
import sys

print(f"Python version: {sys.version}")
print("=" * 50)

# Import and check matplotlib
import matplotlib
import matplotlib.pyplot as plt

print(f"✓ Matplotlib version: {matplotlib.__version__}")

# Import and check seaborn
import seaborn as sns

print(f"✓ Seaborn version: {sns.__version__}")

# Import and check plotly
import plotly
import plotly.express as px
import plotly.graph_objects as go

print(f"✓ Plotly version: {plotly.__version__}")

# Import data libraries
import numpy as np
import pandas as pd

print(f"✓ NumPy version: {np.__version__}")
print(f"✓ Pandas version: {pd.__version__}")

print("=" * 50)
print("All libraries imported successfully!")

## Part 3: The Three Core Libraries

Python has three main visualization libraries, each with different strengths:

### 1. Matplotlib - The Foundation
**What it is**: The original and most fundamental plotting library for Python.

**Strengths**:
- Complete control over every element of your plot
- Publication-quality figures
- Works everywhere (Jupyter, scripts, web apps)
- Massive ecosystem - many libraries build on top of it

**Best for**:
- Static plots for papers and reports
- When you need precise control
- Scientific and engineering applications

**Learning curve**: Moderate - powerful but syntax can be verbose

---

### 2. Seaborn - Beautiful Statistical Graphics
**What it is**: High-level interface built on Matplotlib, designed for statistical visualization.

**Strengths**:
- Beautiful default styles
- Excellent for exploring data
- Works seamlessly with pandas DataFrames
- Statistical functions built-in (regression lines, confidence intervals)

**Best for**:
- Quick exploratory data analysis
- Statistical plots (distributions, correlations)
- Making matplotlib plots prettier with less code

**Learning curve**: Easy - intuitive and concise

---

### 3. Plotly - Interactive Visualizations
**What it is**: Library for creating interactive, web-based visualizations.

**Strengths**:
- Interactive plots (zoom, pan, hover, click)
- Works in Jupyter and web browsers
- Animations and 3D plots
- Can export to HTML files

**Best for**:
- Dashboards and web applications
- Presentations where interactivity adds value
- Exploring complex datasets

**Learning curve**: Moderate - two interfaces (Express is easy, Graph Objects is powerful)

---

### Decision Tree: Which Library to Use?

```
Do you need interactivity?
├─ YES → Use Plotly
└─ NO
   └─ Is it a statistical plot?
      ├─ YES → Use Seaborn
      └─ NO → Use Matplotlib
```

**Pro tip**: You can combine them! Use Seaborn for quick exploration, then export to Matplotlib for fine-tuning publication figures.

## Part 4: Your First Plots

Let's create the same simple visualization using each library to see the differences.

In [None]:
# Create simple sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

print("Data to visualize:")
print(f"x: {x}")
print(f"y: {y}")

### Example 1: Matplotlib

In [None]:
# Create a plot with Matplotlib
plt.figure(figsize=(8, 5))
plt.plot(x, y, marker="o", color="blue", linewidth=2)
plt.title("My First Matplotlib Plot", fontsize=14, fontweight="bold")
plt.xlabel("X values", fontsize=12)
plt.ylabel("Y values", fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Notice: Matplotlib gives you precise control but requires more code")

### Example 2: Seaborn

In [None]:
# Create a plot with Seaborn
# First, convert data to DataFrame (Seaborn loves DataFrames!)
df = pd.DataFrame({"x": x, "y": y})

plt.figure(figsize=(8, 5))
sns.lineplot(data=df, x="x", y="y", marker="o", linewidth=2)
plt.title("My First Seaborn Plot", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

print("Notice: Seaborn has beautiful defaults and works great with DataFrames")

### Example 3: Plotly

In [None]:
# Create a plot with Plotly Express
fig = px.line(
    df,
    x="x",
    y="y",
    markers=True,
    title="My First Plotly Plot",
    labels={"x": "X values", "y": "Y values"},
)

fig.update_traces(line=dict(width=3))
fig.show()

print("Notice: Plotly is interactive! Hover over points, zoom, pan, and more.")

## Part 5: The Data Visualization Workflow

Creating effective visualizations follows a standard workflow:

### 1. Understand Your Data
- What type of data do you have? (numerical, categorical, time series)
- What are you trying to show? (comparison, distribution, relationship, trend)
- What questions are you trying to answer?

### 2. Choose the Right Chart Type
- **Comparison**: Bar chart, grouped bar chart
- **Distribution**: Histogram, box plot, violin plot
- **Relationship**: Scatter plot, line plot
- **Trend over time**: Line chart, area chart
- **Part-to-whole**: Pie chart (use sparingly!), stacked bar

### 3. Create the Basic Plot
- Start simple
- Get the data on the screen
- Don't worry about aesthetics yet

### 4. Customize and Refine
- Add clear labels and titles
- Choose appropriate colors
- Add annotations if needed
- Remove chart junk (unnecessary elements)

### 5. Validate and Test
- Is the message clear?
- Could it be misleading?
- Is it accessible (colorblind-friendly)?
- Get feedback from others

## Part 6: A Slightly More Interesting Example

Let's visualize some temperature data to see a more realistic use case.

In [None]:
# Create sample temperature data
np.random.seed(42)  # For reproducibility

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
temperature = [5, 7, 12, 17, 22, 27, 30, 29, 24, 18, 11, 6]

# Add some random variation
temperature_variation = [temp + np.random.randint(-2, 3) for temp in temperature]

temp_df = pd.DataFrame({"Month": months, "Temperature (°C)": temperature_variation})

print(temp_df)

In [None]:
# Visualize with Matplotlib
plt.figure(figsize=(10, 6))
plt.plot(
    temp_df["Month"],
    temp_df["Temperature (°C)"],
    marker="o",
    color="crimson",
    linewidth=2.5,
    markersize=8,
)
plt.title("Average Monthly Temperature", fontsize=16, fontweight="bold")
plt.xlabel("Month", fontsize=12)
plt.ylabel("Temperature (°C)", fontsize=12)
plt.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

In [None]:
# Same data with Plotly for interactivity
fig = px.line(
    temp_df,
    x="Month",
    y="Temperature (°C)",
    markers=True,
    title="Average Monthly Temperature (Interactive)",
)

fig.update_traces(line=dict(width=3, color="crimson"), marker=dict(size=10))
fig.update_layout(hovermode="x unified")
fig.show()

print("Try hovering over the line to see exact values!")

## Part 7: Key Takeaways

### What You've Learned
✓ Data visualization transforms numbers into insights  
✓ Python has three excellent visualization libraries, each with strengths  
✓ Matplotlib: Complete control, publication-quality  
✓ Seaborn: Beautiful defaults, great for statistics  
✓ Plotly: Interactive, web-ready visualizations  
✓ Your environment is set up correctly  
✓ You've created your first plots!  

### What's Next
In **Module 01**, you'll dive deep into Matplotlib to learn:
- Figure and axes architecture
- Line plots, scatter plots, bar charts, and histograms
- Saving figures to files
- Understanding plot anatomy

---

## Exercise: Experiment!

Before moving to the next module, try these challenges:

1. **Modify the temperature plot**:
   - Change the line color to blue
   - Add a different marker style (try 's' for squares or '^' for triangles)
   - Change the title to your city's name

2. **Create your own data**:
   - Make a list of your favorite 5 foods and ratings (1-10)
   - Create a bar chart to visualize your ratings

3. **Explore interactivity**:
   - In the Plotly temperature plot, try clicking the legend
   - Try zooming in on a specific part of the chart
   - Hover over different points

Use the code cells below to experiment!

In [None]:
# Your experimentation space - Challenge 1

In [None]:
# Your experimentation space - Challenge 2

In [None]:
# Your experimentation space - Challenge 3

## Additional Resources

- [Matplotlib Gallery](https://matplotlib.org/stable/gallery/index.html) - Examples of every plot type
- [Seaborn Gallery](https://seaborn.pydata.org/examples/index.html) - Beautiful statistical plots
- [Plotly Gallery](https://plotly.com/python/) - Interactive visualization examples
- [From Data to Viz](https://www.data-to-viz.com/) - Choosing the right chart type

---

**Congratulations!** You've completed Module 00. You're now ready to dive deep into Matplotlib in Module 01.