# 📊 Data Visualization Examples

Welcome to the world of data visualization! This notebook will teach you how to create compelling charts and graphs using NumPy, Pandas, and Matplotlib.

## What You'll Learn
- Basic plotting techniques
- Pandas built-in visualization
- Advanced visualization methods
- Best practices for data presentation

Let's create some amazing visualizations! 🎨

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
plt.style.use('default')
sns.set_palette("husl")

print("Libraries loaded successfully!")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")

## 📈 Basic Plotting Examples

Let's start with fundamental plot types that form the foundation of data visualization.

In [None]:
print("=== Creating Basic Plots ===")

# Generate sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

print(f"Generated {len(x)} data points for trigonometric functions")

In [None]:
# Create subplots for multiple charts
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Basic Plot Types', fontsize=16, fontweight='bold')

# 1. Line plot - great for time series and continuous data
axes[0, 0].plot(x, y1, label='sin(x)', color='blue', linewidth=2)
axes[0, 0].plot(x, y2, label='cos(x)', color='red', linewidth=2)
axes[0, 0].set_title('Trigonometric Functions')
axes[0, 0].set_xlabel('X values')
axes[0, 0].set_ylabel('Y values')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Scatter plot - perfect for showing relationships
np.random.seed(42)
x_scatter = np.random.randn(100)
y_scatter = 2 * x_scatter + np.random.randn(100)
axes[0, 1].scatter(x_scatter, y_scatter, alpha=0.6, s=50)
axes[0, 1].set_title('Scatter Plot (Correlation)')
axes[0, 1].set_xlabel('X values')
axes[0, 1].set_ylabel('Y values')
axes[0, 1].grid(True, alpha=0.3)

# 3. Histogram - shows distribution of data
data = np.random.normal(100, 15, 1000)
axes[1, 0].hist(data, bins=30, alpha=0.7, color='green', edgecolor='black')
axes[1, 0].axvline(np.mean(data), color='red', linestyle='--', 
                   label=f'Mean: {np.mean(data):.1f}')
axes[1, 0].set_title('Normal Distribution')
axes[1, 0].set_xlabel('Values')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].legend()

# 4. Bar plot - great for categorical data
categories = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
values = [23, 45, 56, 78, 32]
bars = axes[1, 1].bar(categories, values, color='orange', alpha=0.8)
axes[1, 1].set_title('Sales by Product')
axes[1, 1].set_xlabel('Products')
axes[1, 1].set_ylabel('Sales (units)')
axes[1, 1].tick_params(axis='x', rotation=45)

# Add value labels on bars
for bar, value in zip(bars, values):
    axes[1, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
                    str(value), ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("✅ Basic plots created successfully!")

## 🐼 Pandas Built-in Visualization

Pandas makes visualization incredibly easy with built-in plotting methods.

In [None]:
print("=== Pandas Visualization ===")

# Create sample sales data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=365)
sales_data = pd.DataFrame({
    'Date': dates,
    'Sales': np.random.randint(1000, 5000, 365) + np.sin(np.arange(365) * 2 * np.pi / 365) * 500,
    'Product': np.random.choice(['Laptop', 'Phone', 'Tablet'], 365),
    'Region': np.random.choice(['North', 'South', 'East', 'West'], 365)
})

# Set date as index for time series operations
sales_data.set_index('Date', inplace=True)

print(f"Created sales dataset with {len(sales_data)} records")
print("Sample data:")
print(sales_data.head())

In [None]:
# Create comprehensive visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Pandas Visualization Examples', fontsize=16, fontweight='bold')

# 1. Time series plot - monthly average sales
monthly_sales = sales_data['Sales'].resample('M').mean()
monthly_sales.plot(ax=axes[0, 0], title='Monthly Average Sales Trend', 
                   color='blue', linewidth=2, marker='o')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].grid(True, alpha=0.3)

# 2. Box plot by product - shows distribution and outliers
sales_data.boxplot(column='Sales', by='Product', ax=axes[0, 1])
axes[0, 1].set_title('Sales Distribution by Product')
axes[0, 1].set_xlabel('Product')
axes[0, 1].set_ylabel('Sales ($)')

# 3. Pie chart - sales distribution by region
region_sales = sales_data.groupby('Region')['Sales'].sum()
axes[1, 0].pie(region_sales.values, labels=region_sales.index, autopct='%1.1f%%',
               startangle=90, colors=['#ff9999', '#66b3ff', '#99ff99', '#ffcc99'])
axes[1, 0].set_title('Sales Distribution by Region')

# 4. Correlation heatmap
sales_data['Month'] = sales_data.index.month
sales_data['DayOfYear'] = sales_data.index.dayofyear
correlation_data = sales_data[['Sales', 'Month', 'DayOfYear']].corr()

im = axes[1, 1].imshow(correlation_data, cmap='coolwarm', aspect='auto', vmin=-1, vmax=1)
axes[1, 1].set_xticks(range(len(correlation_data.columns)))
axes[1, 1].set_yticks(range(len(correlation_data.columns)))
axes[1, 1].set_xticklabels(correlation_data.columns)
axes[1, 1].set_yticklabels(correlation_data.columns)
axes[1, 1].set_title('Correlation Matrix')

# Add correlation values to heatmap
for i in range(len(correlation_data.columns)):
    for j in range(len(correlation_data.columns)):
        axes[1, 1].text(j, i, f'{correlation_data.iloc[i, j]:.2f}', 
                       ha='center', va='center', color='white', fontweight='bold')

plt.tight_layout()
plt.show()

print("✅ Pandas visualizations created successfully!")

In [None]:
# Additional Pandas plotting examples
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# 1. Product sales comparison (horizontal bar)
product_totals = sales_data.groupby('Product')['Sales'].sum().sort_values()
product_totals.plot(kind='barh', ax=axes[0], color='skyblue')
axes[0].set_title('Total Sales by Product')
axes[0].set_xlabel('Total Sales ($)')

# 2. Daily sales pattern (line plot with rolling average)
daily_avg = sales_data['Sales'].resample('D').mean()
rolling_avg = daily_avg.rolling(window=7).mean()

axes[1].plot(daily_avg.index[:30], daily_avg.values[:30], alpha=0.3, label='Daily')
axes[1].plot(rolling_avg.index[:30], rolling_avg.values[:30], linewidth=2, label='7-day Average')
axes[1].set_title('Daily Sales Pattern (First 30 Days)')
axes[1].set_ylabel('Sales ($)')
axes[1].legend()
axes[1].tick_params(axis='x', rotation=45)

# 3. Sales distribution histogram
sales_data['Sales'].hist(bins=30, ax=axes[2], alpha=0.7, color='green')
axes[2].axvline(sales_data['Sales'].mean(), color='red', linestyle='--', 
                label=f'Mean: ${sales_data["Sales"].mean():.0f}')
axes[2].axvline(sales_data['Sales'].median(), color='orange', linestyle='--', 
                label=f'Median: ${sales_data["Sales"].median():.0f}')
axes[2].set_title('Sales Distribution')
axes[2].set_xlabel('Sales ($)')
axes[2].set_ylabel('Frequency')
axes[2].legend()

plt.tight_layout()
plt.show()

print("📊 Additional Pandas plots completed!")

## 🚀 Advanced Visualization Techniques

Let's explore more sophisticated visualization methods for complex data analysis.

In [None]:
print("=== Advanced Visualization ===")

# Create complex dataset for advanced visualizations
np.random.seed(42)
n_samples = 1000

data = pd.DataFrame({
    'x': np.random.randn(n_samples),
    'y': np.random.randn(n_samples),
    'category': np.random.choice(['A', 'B', 'C'], n_samples),
    'size': np.random.randint(20, 200, n_samples),
    'value': np.random.exponential(2, n_samples)
})

print(f"Created complex dataset with {len(data)} samples")
print("Dataset preview:")
print(data.head())

In [None]:
# Create advanced plots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Advanced Visualization Techniques', fontsize=16, fontweight='bold')

# 1. Bubble chart - shows 4 dimensions of data
colors = ['red', 'blue', 'green']
for i, category in enumerate(data['category'].unique()):
    subset = data[data['category'] == category]
    axes[0, 0].scatter(subset['x'], subset['y'], s=subset['size']/5, 
                      alpha=0.6, label=f'Category {category}', color=colors[i])
axes[0, 0].set_title('Bubble Chart (4D Data)')
axes[0, 0].set_xlabel('X values')
axes[0, 0].set_ylabel('Y values')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Density plot (hexbin) - shows data concentration
hb = axes[0, 1].hexbin(data['x'], data['y'], gridsize=20, cmap='Blues', alpha=0.8)
axes[0, 1].set_title('Density Plot (Hexbin)')
axes[0, 1].set_xlabel('X values')
axes[0, 1].set_ylabel('Y values')
plt.colorbar(hb, ax=axes[0, 1], label='Count')

# 3. Violin plot - shows distribution shape
category_data = [data[data['category'] == cat]['value'].values for cat in ['A', 'B', 'C']]
violin_parts = axes[1, 0].violinplot(category_data, positions=[1, 2, 3], 
                                    showmeans=True, showmedians=True)
axes[1, 0].set_xticks([1, 2, 3])
axes[1, 0].set_xticklabels(['Category A', 'Category B', 'Category C'])
axes[1, 0].set_title('Violin Plot (Distribution Shape)')
axes[1, 0].set_ylabel('Values')
axes[1, 0].grid(True, alpha=0.3)

# 4. Stacked area chart - shows composition over time
dates = pd.date_range('2024-01-01', periods=50)
area_data = pd.DataFrame({
    'Product A': np.cumsum(np.random.randn(50) + 2),
    'Product B': np.cumsum(np.random.randn(50) + 1),
    'Product C': np.cumsum(np.random.randn(50) + 1.5)
}, index=dates)

axes[1, 1].stackplot(dates, area_data['Product A'], area_data['Product B'], area_data['Product C'],
                    labels=['Product A', 'Product B', 'Product C'], alpha=0.8)
axes[1, 1].set_title('Stacked Area Chart')
axes[1, 1].set_xlabel('Date')
axes[1, 1].set_ylabel('Cumulative Sales')
axes[1, 1].legend(loc='upper left')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("✅ Advanced visualizations created successfully!")

## 🎨 Seaborn Statistical Visualizations

Seaborn provides beautiful statistical visualizations with minimal code.

In [None]:
# Create sample dataset for Seaborn examples
np.random.seed(42)
seaborn_data = pd.DataFrame({
    'score': np.random.normal(75, 15, 200),
    'subject': np.random.choice(['Math', 'Science', 'English'], 200),
    'grade': np.random.choice(['A', 'B', 'C'], 200),
    'hours_studied': np.random.randint(1, 10, 200)
})

# Add some correlation
seaborn_data['score'] = seaborn_data['score'] + seaborn_data['hours_studied'] * 2

print("Seaborn dataset created:")
print(seaborn_data.head())

In [None]:
# Seaborn visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Seaborn Statistical Visualizations', fontsize=16, fontweight='bold')

# 1. Box plot with Seaborn
sns.boxplot(data=seaborn_data, x='subject', y='score', ax=axes[0, 0])
axes[0, 0].set_title('Score Distribution by Subject')

# 2. Scatter plot with regression line
sns.scatterplot(data=seaborn_data, x='hours_studied', y='score', 
                hue='subject', ax=axes[0, 1])
sns.regplot(data=seaborn_data, x='hours_studied', y='score', 
            scatter=False, ax=axes[0, 1], color='red')
axes[0, 1].set_title('Score vs Hours Studied')

# 3. Heatmap of correlations
# Create numeric data for correlation
numeric_data = seaborn_data.copy()
numeric_data['subject_num'] = numeric_data['subject'].map({'Math': 1, 'Science': 2, 'English': 3})
numeric_data['grade_num'] = numeric_data['grade'].map({'A': 3, 'B': 2, 'C': 1})

correlation_matrix = numeric_data[['score', 'hours_studied', 'subject_num', 'grade_num']].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, ax=axes[1, 0])
axes[1, 0].set_title('Correlation Heatmap')

# 4. Distribution plot
for subject in seaborn_data['subject'].unique():
    subset = seaborn_data[seaborn_data['subject'] == subject]
    axes[1, 1].hist(subset['score'], alpha=0.7, label=subject, bins=15)
axes[1, 1].set_title('Score Distribution by Subject')
axes[1, 1].set_xlabel('Score')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

print("📊 Seaborn visualizations completed!")

## 🎯 Practice Exercises

Try these visualization exercises to reinforce your skills:

In [None]:
# Exercise 1: Create a comprehensive sales dashboard
print("Exercise 1: Sales Dashboard")

# Create sample sales data
np.random.seed(42)
dashboard_data = pd.DataFrame({
    'month': pd.date_range('2024-01-01', periods=12, freq='M'),
    'sales': np.random.randint(10000, 50000, 12),
    'profit': np.random.randint(2000, 10000, 12),
    'customers': np.random.randint(100, 500, 12)
})

# Create dashboard
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Sales Dashboard 2024', fontsize=16, fontweight='bold')

# Sales trend
axes[0, 0].plot(dashboard_data['month'], dashboard_data['sales'], marker='o', linewidth=2)
axes[0, 0].set_title('Monthly Sales Trend')
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].tick_params(axis='x', rotation=45)

# Profit vs Sales scatter
axes[0, 1].scatter(dashboard_data['sales'], dashboard_data['profit'], s=100, alpha=0.7)
axes[0, 1].set_title('Profit vs Sales')
axes[0, 1].set_xlabel('Sales ($)')
axes[0, 1].set_ylabel('Profit ($)')

# Customer growth
axes[1, 0].bar(range(len(dashboard_data)), dashboard_data['customers'], alpha=0.8)
axes[1, 0].set_title('Monthly Customers')
axes[1, 0].set_xlabel('Month')
axes[1, 0].set_ylabel('Customers')
axes[1, 0].set_xticks(range(len(dashboard_data)))
axes[1, 0].set_xticklabels([d.strftime('%b') for d in dashboard_data['month']], rotation=45)

# Performance metrics
metrics = ['Total Sales', 'Total Profit', 'Avg Customers']
values = [dashboard_data['sales'].sum(), dashboard_data['profit'].sum(), dashboard_data['customers'].mean()]
colors = ['blue', 'green', 'orange']

bars = axes[1, 1].bar(metrics, values, color=colors, alpha=0.8)
axes[1, 1].set_title('Key Metrics')
axes[1, 1].set_ylabel('Value')

# Add value labels
for bar, value in zip(bars, values):
    axes[1, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(values)*0.01, 
                    f'{value:,.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("✅ Sales dashboard created!")

In [None]:
# Exercise 2: Create a multi-dimensional analysis
print("\nExercise 2: Multi-dimensional Analysis")

# Create complex dataset
np.random.seed(42)
analysis_data = pd.DataFrame({
    'age': np.random.randint(18, 65, 300),
    'income': np.random.randint(30000, 120000, 300),
    'education': np.random.choice(['High School', 'Bachelor', 'Master', 'PhD'], 300),
    'satisfaction': np.random.randint(1, 11, 300)
})

# Add some realistic correlations
analysis_data.loc[analysis_data['education'] == 'PhD', 'income'] += 20000
analysis_data.loc[analysis_data['education'] == 'Master', 'income'] += 10000

# Create comprehensive analysis
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Multi-dimensional Customer Analysis', fontsize=16, fontweight='bold')

# Age distribution
axes[0, 0].hist(analysis_data['age'], bins=20, alpha=0.7, color='skyblue')
axes[0, 0].set_title('Age Distribution')
axes[0, 0].set_xlabel('Age')
axes[0, 0].set_ylabel('Count')

# Income by education
education_order = ['High School', 'Bachelor', 'Master', 'PhD']
income_by_edu = [analysis_data[analysis_data['education'] == edu]['income'].values for edu in education_order]
axes[0, 1].boxplot(income_by_edu, labels=education_order)
axes[0, 1].set_title('Income by Education Level')
axes[0, 1].set_ylabel('Income ($)')
axes[0, 1].tick_params(axis='x', rotation=45)

# Age vs Income scatter
colors = {'High School': 'red', 'Bachelor': 'blue', 'Master': 'green', 'PhD': 'purple'}
for edu in analysis_data['education'].unique():
    subset = analysis_data[analysis_data['education'] == edu]
    axes[0, 2].scatter(subset['age'], subset['income'], alpha=0.6, 
                      label=edu, color=colors[edu], s=30)
axes[0, 2].set_title('Age vs Income by Education')
axes[0, 2].set_xlabel('Age')
axes[0, 2].set_ylabel('Income ($)')
axes[0, 2].legend()

# Satisfaction by education
satisfaction_by_edu = analysis_data.groupby('education')['satisfaction'].mean()
axes[1, 0].bar(satisfaction_by_edu.index, satisfaction_by_edu.values, 
               color=['red', 'blue', 'green', 'purple'], alpha=0.8)
axes[1, 0].set_title('Average Satisfaction by Education')
axes[1, 0].set_ylabel('Satisfaction (1-10)')
axes[1, 0].tick_params(axis='x', rotation=45)

# Income distribution
axes[1, 1].hist(analysis_data['income'], bins=25, alpha=0.7, color='lightgreen')
axes[1, 1].axvline(analysis_data['income'].mean(), color='red', linestyle='--', 
                   label=f'Mean: ${analysis_data["income"].mean():,.0f}')
axes[1, 1].set_title('Income Distribution')
axes[1, 1].set_xlabel('Income ($)')
axes[1, 1].set_ylabel('Count')
axes[1, 1].legend()

# Satisfaction distribution
satisfaction_counts = analysis_data['satisfaction'].value_counts().sort_index()
axes[1, 2].bar(satisfaction_counts.index, satisfaction_counts.values, 
               alpha=0.8, color='orange')
axes[1, 2].set_title('Satisfaction Score Distribution')
axes[1, 2].set_xlabel('Satisfaction Score')
axes[1, 2].set_ylabel('Count')

plt.tight_layout()
plt.show()

print("✅ Multi-dimensional analysis completed!")
print(f"Dataset summary: {len(analysis_data)} customers analyzed")
print(f"Average satisfaction: {analysis_data['satisfaction'].mean():.2f}/10")
print(f"Income range: ${analysis_data['income'].min():,} - ${analysis_data['income'].max():,}")

## 🎉 Congratulations!

You've mastered data visualization with Python! You now know how to:

✅ Create basic plots (line, scatter, histogram, bar)  
✅ Use Pandas built-in visualization methods  
✅ Build advanced visualizations (bubble charts, heatmaps, violin plots)  
✅ Create statistical visualizations with Seaborn  
✅ Design comprehensive dashboards  
✅ Analyze multi-dimensional data visually  

## 🚀 Next Steps

1. **Explore Interactive Visualizations**
   - Learn Plotly for interactive charts
   - Try Bokeh for web-based visualizations
   - Experiment with Altair for statistical graphics

2. **Advanced Techniques**
   - 3D plotting with matplotlib
   - Geographic visualizations with folium
   - Animation and dynamic plots

3. **Best Practices**
   - Color theory for data visualization
   - Accessibility in chart design
   - Storytelling with data

4. **Real-World Applications**
   - Business intelligence dashboards
   - Scientific data visualization
   - Financial market analysis

Keep creating beautiful and meaningful visualizations! 📊✨