# 📈 Notebook 04: Data Visualization with Matplotlib & Seaborn

**Week 1-2: Python & ML Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

By the end of this notebook, you will master:
1. ✅ Matplotlib basics for plotting
2. ✅ Seaborn for statistical visualizations
3. ✅ Customizing plots for professional reports
4. ✅ Creating dashboards and subplots
5. ✅ Visualizing manufacturing/quality control data
6. ✅ Best practices for data visualization

**Estimated Time:** 2-3 hours

---

## 📚 Why Visualization?

"A picture is worth a thousand words" - especially in data science!

- 🎯 **Understand patterns** quickly
- 📊 **Communicate insights** effectively
- 🔍 **Detect anomalies** visually
- 📈 **Track trends** over time

Let's create beautiful, insightful visualizations! 🎨

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

# Create sample manufacturing data
np.random.seed(42)
n_samples = 100

manufacturing_data = pd.DataFrame({
    'batch_id': range(1, n_samples + 1),
    'temperature': np.random.normal(75, 5, n_samples),
    'pressure': np.random.normal(120, 10, n_samples),
    'humidity': np.random.normal(50, 8, n_samples),
    'defect_count': np.random.poisson(2, n_samples),
    'production_time': np.random.normal(45, 10, n_samples),
    'shift': np.random.choice(['Morning', 'Evening', 'Night'], n_samples)
})

manufacturing_data['quality_score'] = (
    100 - (manufacturing_data['defect_count'] * 5) + 
    np.random.normal(0, 2, n_samples)
).clip(0, 100)

print("✅ Data loaded successfully!")
print(manufacturing_data.head())

## 1️⃣ Matplotlib Basics

### Line Plots

In [None]:
# Simple line plot
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(manufacturing_data['batch_id'][:30], 
         manufacturing_data['temperature'][:30],
         marker='o', linestyle='-', color='blue', label='Temperature')
plt.xlabel('Batch ID')
plt.ylabel('Temperature (°C)')
plt.title('Temperature vs Batch ID')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(manufacturing_data['batch_id'][:30], 
         manufacturing_data['quality_score'][:30],
         marker='s', linestyle='--', color='green', label='Quality Score')
plt.xlabel('Batch ID')
plt.ylabel('Quality Score')
plt.title('Quality Score vs Batch ID')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Scatter Plots

In [None]:
# Scatter plot with color mapping
plt.figure(figsize=(10, 6))

scatter = plt.scatter(
    manufacturing_data['temperature'],
    manufacturing_data['quality_score'],
    c=manufacturing_data['defect_count'],
    s=100,
    cmap='RdYlGn_r',
    alpha=0.6,
    edgecolors='black',
    linewidth=0.5
)

plt.colorbar(scatter, label='Defect Count')
plt.xlabel('Temperature (°C)')
plt.ylabel('Quality Score')
plt.title('Temperature vs Quality Score\n(Color = Defect Count)')
plt.grid(True, alpha=0.3)
plt.show()

### Bar Charts and Histograms

In [None]:
# Bar chart: Average quality by shift
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
shift_quality = manufacturing_data.groupby('shift')['quality_score'].mean().sort_values(ascending=False)
colors = ['#2ecc71', '#3498db', '#e74c3c']
bars = plt.bar(shift_quality.index, shift_quality.values, color=colors, edgecolor='black')
plt.xlabel('Shift')
plt.ylabel('Average Quality Score')
plt.title('Quality Score by Shift')
plt.ylim(85, 95)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.1f}',
             ha='center', va='bottom', fontweight='bold')

# Histogram: Distribution of defect counts
plt.subplot(1, 2, 2)
plt.hist(manufacturing_data['defect_count'], bins=10, color='coral', 
         edgecolor='black', alpha=0.7)
plt.xlabel('Defect Count')
plt.ylabel('Frequency')
plt.title('Distribution of Defect Counts')
plt.axvline(manufacturing_data['defect_count'].mean(), 
            color='red', linestyle='--', linewidth=2, label=f'Mean: {manufacturing_data["defect_count"].mean():.2f}')
plt.legend()

plt.tight_layout()
plt.show()

## 2️⃣ Seaborn Statistical Plots

### Box Plots and Violin Plots

In [None]:
# Box plot and Violin plot comparison
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
sns.boxplot(data=manufacturing_data, x='shift', y='quality_score', 
            palette='Set2', hue='shift', legend=False)
plt.title('Quality Score Distribution by Shift\n(Box Plot)', fontsize=12, fontweight='bold')
plt.ylabel('Quality Score')
plt.xlabel('Shift')

plt.subplot(1, 2, 2)
sns.violinplot(data=manufacturing_data, x='shift', y='quality_score',
               palette='Set3', hue='shift', legend=False)
plt.title('Quality Score Distribution by Shift\n(Violin Plot)', fontsize=12, fontweight='bold')
plt.ylabel('Quality Score')
plt.xlabel('Shift')

plt.tight_layout()
plt.show()

### Correlation Heatmap

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))

# Select numeric columns
numeric_cols = ['temperature', 'pressure', 'humidity', 'defect_count', 
                'production_time', 'quality_score']
correlation_matrix = manufacturing_data[numeric_cols].corr()

sns.heatmap(correlation_matrix, annot=True, fmt='.2f', 
            cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Correlation Heatmap - Manufacturing Parameters', 
          fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

# Insights
print("🔍 Key Insights:")
print(f"Strong negative correlation between defect_count and quality_score: {correlation_matrix.loc['defect_count', 'quality_score']:.2f}")

### Pair Plots

In [None]:
# Pair plot for key variables
subset_data = manufacturing_data[['temperature', 'pressure', 'defect_count', 
                                   'quality_score', 'shift']].sample(50)

pairplot = sns.pairplot(subset_data, hue='shift', palette='husl', 
                        diag_kind='kde', plot_kws={'alpha': 0.6, 's': 80, 'edgecolor': 'k'},
                        height=2.5)
pairplot.fig.suptitle('Pair Plot - Manufacturing Parameters by Shift', 
                       y=1.02, fontsize=16, fontweight='bold')
plt.show()

## 3️⃣ Advanced Visualizations

### Multi-Panel Dashboard

In [None]:
# Create a comprehensive dashboard
fig = plt.figure(figsize=(16, 10))
fig.suptitle('Manufacturing Quality Control Dashboard', fontsize=18, fontweight='bold', y=0.995)

# 1. Temperature time series
ax1 = plt.subplot(2, 3, 1)
ax1.plot(manufacturing_data['batch_id'], manufacturing_data['temperature'], 
         color='#e74c3c', linewidth=1, alpha=0.7)
ax1.axhline(manufacturing_data['temperature'].mean(), color='blue', 
            linestyle='--', label=f'Mean: {manufacturing_data["temperature"].mean():.1f}°C')
ax1.set_title('Temperature Monitoring', fontweight='bold')
ax1.set_xlabel('Batch ID')
ax1.set_ylabel('Temperature (°C)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Quality distribution
ax2 = plt.subplot(2, 3, 2)
ax2.hist(manufacturing_data['quality_score'], bins=20, color='#2ecc71', 
         edgecolor='black', alpha=0.7)
ax2.axvline(90, color='red', linestyle='--', linewidth=2, label='Target: 90')
ax2.set_title('Quality Score Distribution', fontweight='bold')
ax2.set_xlabel('Quality Score')
ax2.set_ylabel('Frequency')
ax2.legend()

# 3. Defect count by shift
ax3 = plt.subplot(2, 3, 3)
shift_defects = manufacturing_data.groupby('shift')['defect_count'].sum().sort_values()
colors_shift = ['#2ecc71', '#f39c12', '#e74c3c']
ax3.barh(shift_defects.index, shift_defects.values, color=colors_shift, edgecolor='black')
ax3.set_title('Total Defects by Shift', fontweight='bold')
ax3.set_xlabel('Total Defect Count')
ax3.set_ylabel('Shift')

# 4. Scatter: Temperature vs Quality
ax4 = plt.subplot(2, 3, 4)
scatter = ax4.scatter(manufacturing_data['temperature'], 
                      manufacturing_data['quality_score'],
                      c=manufacturing_data['defect_count'], 
                      s=50, cmap='RdYlGn_r', alpha=0.6, edgecolors='black')
plt.colorbar(scatter, ax=ax4, label='Defects')
ax4.set_title('Temperature vs Quality', fontweight='bold')
ax4.set_xlabel('Temperature (°C)')
ax4.set_ylabel('Quality Score')
ax4.grid(True, alpha=0.3)

# 5. Box plot: Quality by shift
ax5 = plt.subplot(2, 3, 5)
manufacturing_data.boxplot(column='quality_score', by='shift', ax=ax5, 
                           patch_artist=True, grid=False)
ax5.set_title('Quality by Shift', fontweight='bold')
ax5.set_xlabel('Shift')
ax5.set_ylabel('Quality Score')
plt.suptitle('')  # Remove default title

# 6. Production time distribution
ax6 = plt.subplot(2, 3, 6)
ax6.hist(manufacturing_data['production_time'], bins=15, color='#3498db', 
         edgecolor='black', alpha=0.7)
ax6.axvline(manufacturing_data['production_time'].mean(), color='red', 
            linestyle='--', linewidth=2, 
            label=f'Mean: {manufacturing_data["production_time"].mean():.1f} min')
ax6.set_title('Production Time Distribution', fontweight='bold')
ax6.set_xlabel('Production Time (min)')
ax6.set_ylabel('Frequency')
ax6.legend()

plt.tight_layout()
plt.show()

### Time Series with Trends

In [None]:
# Create time-based data
dates = pd.date_range('2025-01-01', periods=100, freq='H')
time_series_data = pd.DataFrame({
    'timestamp': dates,
    'sensor_value': np.cumsum(np.random.randn(100)) + 100,
    'threshold': 100
})

# Add rolling mean
time_series_data['rolling_mean'] = time_series_data['sensor_value'].rolling(window=10).mean()

plt.figure(figsize=(14, 6))
plt.plot(time_series_data['timestamp'], time_series_data['sensor_value'], 
         label='Sensor Reading', alpha=0.5, color='blue')
plt.plot(time_series_data['timestamp'], time_series_data['rolling_mean'], 
         label='10-Hour Moving Average', color='red', linewidth=2)
plt.axhline(y=100, color='green', linestyle='--', linewidth=2, label='Threshold')

# Fill area above/below threshold
plt.fill_between(time_series_data['timestamp'], 
                 time_series_data['sensor_value'], 100,
                 where=(time_series_data['sensor_value'] > 100),
                 alpha=0.2, color='red', label='Above Threshold')
plt.fill_between(time_series_data['timestamp'], 
                 time_series_data['sensor_value'], 100,
                 where=(time_series_data['sensor_value'] <= 100),
                 alpha=0.2, color='green', label='Below Threshold')

plt.xlabel('Time')
plt.ylabel('Sensor Value')
plt.title('Sensor Monitoring with Trend Analysis', fontsize=14, fontweight='bold')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4️⃣ Customization and Styling

In [None]:
# Professional styled plot
fig, ax = plt.subplots(figsize=(12, 7))

# Data for plot
shifts = ['Morning', 'Evening', 'Night']
metrics = {
    'Production': [850, 820, 780],
    'Quality Score': [92, 89, 85],
    'Efficiency': [88, 85, 80]
}

x = np.arange(len(shifts))
width = 0.25
multiplier = 0

colors = ['#3498db', '#2ecc71', '#f39c12']

for i, (attribute, measurement) in enumerate(metrics.items()):
    offset = width * multiplier
    rects = ax.bar(x + offset, measurement, width, label=attribute, 
                   color=colors[i], edgecolor='black', linewidth=1.5)
    ax.bar_label(rects, padding=3, fontweight='bold')
    multiplier += 1

# Customize
ax.set_xlabel('Shift', fontsize=12, fontweight='bold')
ax.set_ylabel('Score / Units', fontsize=12, fontweight='bold')
ax.set_title('Manufacturing Performance Metrics by Shift', 
             fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x + width, shifts)
ax.legend(loc='upper right', framealpha=0.9, edgecolor='black')
ax.set_ylim(0, 1000)
ax.grid(True, axis='y', alpha=0.3, linestyle='--')

# Add company branding (example)
ax.text(0.02, 0.98, 'Manufacturing Analytics Dashboard', 
        transform=ax.transAxes, fontsize=10, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()

## 5️⃣ Saving Plots

In [None]:
# Create a plot to save
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(manufacturing_data['temperature'], 
           manufacturing_data['quality_score'],
           c=manufacturing_data['defect_count'],
           s=100, cmap='RdYlGn_r', alpha=0.6, edgecolors='black')
ax.set_xlabel('Temperature (°C)', fontweight='bold')
ax.set_ylabel('Quality Score', fontweight='bold')
ax.set_title('Quality Analysis Report', fontsize=14, fontweight='bold')
plt.colorbar(ax.collections[0], label='Defect Count', ax=ax)

# Save in multiple formats
plt.savefig('quality_analysis.png', dpi=300, bbox_inches='tight')
plt.savefig('quality_analysis.pdf', bbox_inches='tight')
plt.show()

print("✅ Plots saved as:")
print("   - quality_analysis.png (high resolution)")
print("   - quality_analysis.pdf (vector format)")

## 🎯 Visualization Best Practices

### Do's ✅
- Use appropriate chart types for your data
- Label all axes with units
- Add titles and legends
- Use color meaningfully
- Keep it simple and focused
- Use high DPI for professional reports

### Don'ts ❌
- Don't use 3D charts unless necessary
- Avoid too many colors
- Don't use pie charts for >5 categories
- Don't distort axes to mislead
- Avoid chartjunk (unnecessary decorations)

### Chart Selection Guide
- **Line Plot**: Time series, trends
- **Scatter Plot**: Correlations, relationships
- **Bar Chart**: Comparisons between categories
- **Histogram**: Distributions
- **Box Plot**: Statistical distributions, outliers
- **Heatmap**: Correlations, matrices

## 🎉 Summary

You've mastered data visualization! Key takeaways:

### Matplotlib
- ✅ Line, scatter, bar, histogram plots
- ✅ Subplots and figure customization
- ✅ Color mapping and styling
- ✅ Annotations and labels

### Seaborn
- ✅ Statistical visualizations
- ✅ Box plots and violin plots
- ✅ Correlation heatmaps
- ✅ Pair plots for multi-variable analysis

### Professional Skills
- ✅ Creating dashboards
- ✅ Customizing for reports
- ✅ Saving in multiple formats
- ✅ Following best practices

---

### 📚 Next Steps

Continue to **Notebook 05: Machine Learning with scikit-learn** to start building ML models!

<div align="center">
<b>Fantastic! You can now tell compelling data stories! 📊</b>
</div>