# Data Visualization Cheat Sheet - Overview
## Essential Knowledge for Data Scientists

This notebook contains the most important and concise information about data visualization in Python, covering key concepts that apply across all visualization libraries.

## 1. Python Visualization Ecosystem Overview

### Main Libraries and Their Strengths

| Library | Best For | Key Features |
|---------|----------|--------------|
| **Matplotlib** | Publication-quality static plots | - Full control over every element<br>- Supports all plot types<br>- Backend for many other libraries |
| **Pandas .plot()** | Quick exploratory data analysis | - Built on matplotlib<br>- Integrated with DataFrames<br>- Minimal code for basic plots |
| **Seaborn** | Statistical visualizations | - Beautiful default styles<br>- Statistical plot types<br>- Handles DataFrames natively |
| **Plotly** | Interactive web visualizations | - Interactive by default<br>- 3D plots<br>- Dashboard integration |

### Quick Decision Guide
```
Need a quick plot from DataFrame? → pandas .plot()
Need statistical visualization? → seaborn
Need publication-quality static plot? → matplotlib
Need interactive/web visualization? → plotly
```


## 2. Essential Setup and Imports

### Standard Import Convention


In [1]:
# Standard imports for data visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

# Jupyter notebook setup
# %matplotlib inline  # For static plots in notebook
# or
# %matplotlib notebook  # For interactive matplotlib plots

# Set display options
pd.set_option('display.max_columns', None)
plt.rcParams['figure.figsize'] = (10, 6)  # Default figure size
sns.set_style("whitegrid")  # Seaborn style


## 3. Universal Visualization Principles

### The Grammar of Graphics
Every plot consists of:
1. **Data** - What you're visualizing
2. **Aesthetics** - How data maps to visual properties (x, y, color, size)
3. **Geometries** - The type of plot (points, lines, bars)
4. **Facets** - Subplots by categories
5. **Statistics** - Transformations (mean, count, etc.)
6. **Coordinates** - Coordinate system (cartesian, polar)
7. **Themes** - Overall visual styling

### Key Questions Before Plotting
1. **What is your message?** - What insight do you want to communicate?
2. **Who is your audience?** - Technical vs. non-technical
3. **What data type?** - Continuous, categorical, time series
4. **How many variables?** - Univariate, bivariate, multivariate

### Common Pitfalls to Avoid
- 🚫 Using 3D when 2D would be clearer
- 🚫 Too many colors or categories
- 🚫 Missing axis labels or units
- 🚫 Inappropriate plot type for data
- 🚫 Not considering colorblind users
- 🚫 Cluttered or busy plots


## 4. Essential Plot Types Quick Reference

### By Data Relationship

| Relationship | Plot Types | When to Use |
|-------------|------------|-------------|
| **Distribution** | Histogram, KDE, Box plot, Violin plot | Understanding data spread and outliers |
| **Comparison** | Bar chart, Grouped bars, Box plots | Comparing categories or groups |
| **Relationship** | Scatter plot, Line plot, Heatmap | Finding correlations or trends |
| **Composition** | Pie chart, Stacked bar, Treemap | Showing parts of a whole |
| **Time Series** | Line plot, Area chart | Showing change over time |

### By Number of Variables

| Variables | Best Plot Types |
|-----------|----------------|
| **1 Variable** | Histogram, Density plot, Box plot |
| **2 Variables** | Scatter (continuous), Bar (categorical), Line (time) |
| **3+ Variables** | Bubble plot, Parallel coordinates, Facet grids |
| **Many-to-Many** | Heatmap, Correlation matrix |


## 5. Color Guidelines

### Color Usage Best Practices

1. **Categorical Data**: Use distinct hues
   - Maximum 7-10 categories
   - Consider colorblind-safe palettes

2. **Sequential Data**: Use gradient of single hue
   - Light → Dark for low → high values
   - Examples: Blues, Greens, Reds

3. **Diverging Data**: Two hues with neutral center
   - For data with meaningful center (0, mean)
   - Example: Red ← White → Blue

### Colorblind-Safe Palettes
```python
# Seaborn colorblind palette
sns.color_palette("colorblind")

# Matplotlib colorblind-friendly
plt.cm.viridis  # Sequential
plt.cm.RdBu     # Diverging
```

### Cultural Color Considerations
- Red: Danger/Loss in West, Prosperity in China
- Green: Go/Profit in West, Can vary by culture
- Blue: Generally safe, professional
- Gray: Neutral, good for de-emphasis


## 6. Quick Plotting Decision Tree

```
Start Here
    ↓
Is your data time-based?
    Yes → Line plot, Area chart
    No ↓
    
How many variables?
    1 → Distribution plot (Histogram, Density, Box)
    2 ↓
    
Are both continuous?
    Yes → Scatter plot
    No ↓
    
Is one categorical?
    Yes → Bar chart (if other is numeric)
         → Count plot (if counting occurrences)
    No ↓
    
Are both categorical?
    Yes → Heatmap, Grouped bars
    
3+ variables?
    → Consider: Bubble plot, Pair plot, Facet grid
```

## 7. Performance Tips

1. **Large Datasets** (>10k points)
   - Sample or aggregate data
   - Use `alpha` for transparency in scatter plots
   - Consider hexbin or 2D histograms
   - Use datashader for millions of points

2. **Many Subplots**
   - Pre-create figure and axes: `fig, axes = plt.subplots(n, m)`
   - Reuse axes objects
   - Clear memory: `plt.close('all')`

3. **File Export**
   - Vector formats (SVG, PDF) for publication
   - PNG for web, choose DPI wisely
   - `plt.savefig('plot.png', dpi=300, bbox_inches='tight')`
