# Episode 3: Data Visualization with Matplotlib

Visualization is crucial for understanding data patterns. In this notebook, we'll learn how to create informative plots of inflammation data using Matplotlib.

## Learning Objectives
- Import and configure Matplotlib
- Create basic plots (line, scatter, histogram)
- Customize plot appearance
- Create subplots and multi-panel figures
- Save plots to files

## Introduction

Matplotlib is the most widely used plotting library for Python. It provides fine-grained control over every aspect of your plots.

## 1. Setting Up Matplotlib

Import the necessary libraries and configure for notebook display:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Configure matplotlib for notebook display
%matplotlib inline
plt.style.use('default')  # Use default style

# Set figure size default
plt.rcParams['figure.figsize'] = [10, 6]

print("Matplotlib version:", plt.__version__)

## 2. Prepare Sample Data

Let's create some inflammation data to visualize:

In [None]:
# Create sample inflammation data
np.random.seed(42)  # For reproducible results

# Patient data over 40 days
days = np.arange(1, 41)
patient_1 = np.concatenate([
    np.linspace(0, 6, 20),  # Increasing inflammation
    np.linspace(6, 0, 20)   # Decreasing inflammation
]) + np.random.normal(0, 0.5, 40)  # Add some noise

patient_2 = np.concatenate([
    np.linspace(0, 4, 15),
    np.linspace(4, 8, 10),
    np.linspace(8, 0, 15)
]) + np.random.normal(0, 0.3, 40)

patient_3 = 2 * np.sin(days * np.pi / 20) + 3 + np.random.normal(0, 0.4, 40)

# Ensure no negative values
patient_1 = np.maximum(patient_1, 0)
patient_2 = np.maximum(patient_2, 0)
patient_3 = np.maximum(patient_3, 0)

print(f"Data prepared for {len(days)} days and 3 patients")
print(f"Patient 1 inflammation range: {patient_1.min():.1f} - {patient_1.max():.1f}")
print(f"Patient 2 inflammation range: {patient_2.min():.1f} - {patient_2.max():.1f}")
print(f"Patient 3 inflammation range: {patient_3.min():.1f} - {patient_3.max():.1f}")

## 3. Basic Line Plots

Start with simple line plots to show inflammation over time:

In [None]:
# Simple line plot
plt.figure(figsize=(10, 6))
plt.plot(days, patient_1)
plt.title('Patient 1 Inflammation Over Time')
plt.xlabel('Day')
plt.ylabel('Inflammation Level')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Multiple patients on one plot
plt.figure(figsize=(12, 6))
plt.plot(days, patient_1, label='Patient 1', linewidth=2)
plt.plot(days, patient_2, label='Patient 2', linewidth=2)
plt.plot(days, patient_3, label='Patient 3', linewidth=2)

plt.title('Inflammation Levels for Multiple Patients', fontsize=16)
plt.xlabel('Day', fontsize=12)
plt.ylabel('Inflammation Level', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Customizing Plot Appearance

Make plots more informative and visually appealing:

In [None]:
# Customized plot with colors, markers, and styles
plt.figure(figsize=(12, 8))

plt.plot(days, patient_1, 'b-', label='Patient 1', linewidth=2, marker='o', markersize=4, alpha=0.8)
plt.plot(days, patient_2, 'r--', label='Patient 2', linewidth=2, marker='s', markersize=4, alpha=0.8)
plt.plot(days, patient_3, 'g:', label='Patient 3', linewidth=3, marker='^', markersize=4, alpha=0.8)

plt.title('Inflammation Study Results', fontsize=18, fontweight='bold')
plt.xlabel('Day of Treatment', fontsize=14)
plt.ylabel('Inflammation Level (arbitrary units)', fontsize=14)

# Customize legend
plt.legend(loc='upper right', fontsize=12, frameon=True, fancybox=True, shadow=True)

# Add grid
plt.grid(True, linestyle='-', alpha=0.2)

# Set axis limits
plt.xlim(0, 42)
plt.ylim(-1, 9)

# Add horizontal line for reference
plt.axhline(y=3, color='black', linestyle='--', alpha=0.5, label='Normal level')

plt.tight_layout()
plt.show()

### Exercise 3.1
Create a plot showing only the first 20 days of data for all patients, with different colors and markers:

In [None]:
# Exercise 3.1 - Your code here

## 5. Statistical Plots

Create plots showing statistical summaries:

In [None]:
# Calculate daily statistics across all patients
all_patients = np.array([patient_1, patient_2, patient_3])
daily_mean = np.mean(all_patients, axis=0)
daily_std = np.std(all_patients, axis=0)
daily_min = np.min(all_patients, axis=0)
daily_max = np.max(all_patients, axis=0)

# Plot with error bars
plt.figure(figsize=(12, 8))

# Main line with error bars
plt.errorbar(days, daily_mean, yerr=daily_std, 
             label='Mean ± Std', linewidth=2, capsize=5, capthick=2)

# Fill between min and max
plt.fill_between(days, daily_min, daily_max, alpha=0.2, label='Min-Max range')

plt.title('Daily Inflammation Statistics Across All Patients', fontsize=16)
plt.xlabel('Day', fontsize=12)
plt.ylabel('Inflammation Level', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Different Plot Types

Explore scatter plots, histograms, and box plots:

In [None]:
# Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(patient_1, patient_2, alpha=0.6, s=60, c=days, cmap='viridis')
plt.colorbar(label='Day')
plt.xlabel('Patient 1 Inflammation')
plt.ylabel('Patient 2 Inflammation')
plt.title('Patient 1 vs Patient 2 Inflammation Correlation')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Histograms
plt.figure(figsize=(15, 5))

# Subplot 1: Individual histograms
plt.subplot(1, 3, 1)
plt.hist(patient_1, bins=15, alpha=0.7, color='blue', label='Patient 1')
plt.hist(patient_2, bins=15, alpha=0.7, color='red', label='Patient 2')
plt.hist(patient_3, bins=15, alpha=0.7, color='green', label='Patient 3')
plt.xlabel('Inflammation Level')
plt.ylabel('Frequency')
plt.title('Distribution of Inflammation Levels')
plt.legend()

# Subplot 2: Stacked histogram
plt.subplot(1, 3, 2)
plt.hist([patient_1, patient_2, patient_3], bins=15, 
         stacked=True, label=['Patient 1', 'Patient 2', 'Patient 3'])
plt.xlabel('Inflammation Level')
plt.ylabel('Frequency')
plt.title('Stacked Distribution')
plt.legend()

# Subplot 3: Box plot
plt.subplot(1, 3, 3)
plt.boxplot([patient_1, patient_2, patient_3], labels=['Patient 1', 'Patient 2', 'Patient 3'])
plt.ylabel('Inflammation Level')
plt.title('Box Plot Comparison')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

## 7. Subplots and Multi-Panel Figures

Create complex multi-panel figures for comprehensive analysis:

In [None]:
# Create a comprehensive 2x2 subplot figure
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Top left: Individual patient data
axes[0, 0].plot(days, patient_1, 'b-', label='Patient 1', linewidth=2)
axes[0, 0].plot(days, patient_2, 'r-', label='Patient 2', linewidth=2)
axes[0, 0].plot(days, patient_3, 'g-', label='Patient 3', linewidth=2)
axes[0, 0].set_title('Individual Patient Data')
axes[0, 0].set_xlabel('Day')
axes[0, 0].set_ylabel('Inflammation Level')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Top right: Statistical summary
axes[0, 1].errorbar(days, daily_mean, yerr=daily_std, capsize=3)
axes[0, 1].fill_between(days, daily_min, daily_max, alpha=0.2)
axes[0, 1].set_title('Daily Statistics (Mean ± Std)')
axes[0, 1].set_xlabel('Day')
axes[0, 1].set_ylabel('Inflammation Level')
axes[0, 1].grid(True, alpha=0.3)

# Bottom left: Correlation
scatter = axes[1, 0].scatter(patient_1, patient_2, c=days, cmap='viridis', alpha=0.6)
axes[1, 0].set_title('Patient 1 vs Patient 2 Correlation')
axes[1, 0].set_xlabel('Patient 1 Inflammation')
axes[1, 0].set_ylabel('Patient 2 Inflammation')
axes[1, 0].grid(True, alpha=0.3)
plt.colorbar(scatter, ax=axes[1, 0], label='Day')

# Bottom right: Distribution
axes[1, 1].hist(patient_1, bins=12, alpha=0.7, color='blue', label='Patient 1')
axes[1, 1].hist(patient_2, bins=12, alpha=0.7, color='red', label='Patient 2')
axes[1, 1].hist(patient_3, bins=12, alpha=0.7, color='green', label='Patient 3')
axes[1, 1].set_title('Inflammation Distribution')
axes[1, 1].set_xlabel('Inflammation Level')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].legend()

# Add overall title
fig.suptitle('Comprehensive Inflammation Data Analysis', fontsize=16, fontweight='bold')

plt.tight_layout()
plt.show()

### Exercise 3.2
Create a 1x3 subplot showing:
1. Line plot of Patient 1 data
2. Bar chart of average inflammation per week (group days 1-7, 8-14, etc.)
3. Pie chart showing the proportion of days with low (<2), medium (2-5), and high (>5) inflammation for Patient 1

In [None]:
# Exercise 3.2 - Your code here

## 8. Advanced Plotting Techniques

Learn some advanced visualization techniques:

In [None]:
# Heatmap of patient data
plt.figure(figsize=(12, 4))

# Create a 2D array for heatmap (patients × days)
heatmap_data = np.array([patient_1, patient_2, patient_3])

im = plt.imshow(heatmap_data, cmap='YlOrRd', aspect='auto', interpolation='nearest')
plt.colorbar(im, label='Inflammation Level')
plt.title('Inflammation Heatmap: Patients vs Days')
plt.xlabel('Day')
plt.ylabel('Patient')
plt.yticks([0, 1, 2], ['Patient 1', 'Patient 2', 'Patient 3'])

# Add some day markers
plt.xticks(np.arange(0, 40, 10), np.arange(1, 41, 10))
plt.tight_layout()
plt.show()

In [None]:
# Violin plot for distribution comparison
from matplotlib.patches import Polygon

plt.figure(figsize=(10, 6))

# Create violin plot data
data_for_violin = [patient_1, patient_2, patient_3]
parts = plt.violinplot(data_for_violin, positions=[1, 2, 3], showmeans=True, showmedians=True)

# Customize violin plot
colors = ['blue', 'red', 'green']
for i, pc in enumerate(parts['bodies']):
    pc.set_facecolor(colors[i])
    pc.set_alpha(0.7)

plt.title('Distribution Comparison: Violin Plot')
plt.xlabel('Patient')
plt.ylabel('Inflammation Level')
plt.xticks([1, 2, 3], ['Patient 1', 'Patient 2', 'Patient 3'])
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 9. Annotations and Text

Add annotations to highlight important features:

In [None]:
# Plot with annotations
plt.figure(figsize=(12, 8))

plt.plot(days, patient_1, 'b-', linewidth=2, label='Patient 1')

# Find and annotate maximum
max_day = np.argmax(patient_1)
max_value = patient_1[max_day]

plt.annotate(f'Peak inflammation\nDay {max_day+1}: {max_value:.1f}',
             xy=(max_day+1, max_value), xytext=(max_day+10, max_value+1),
             arrowprops=dict(arrowstyle='->', color='red', lw=2),
             fontsize=12, ha='center',
             bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.7))

# Add text box with study information
textstr = '\n'.join([
    'Study Information:',
    f'Duration: {len(days)} days',
    f'Mean inflammation: {np.mean(patient_1):.1f}',
    f'Std deviation: {np.std(patient_1):.1f}'
])
props = dict(boxstyle='round', facecolor='wheat', alpha=0.8)
plt.text(0.02, 0.98, textstr, transform=plt.gca().transAxes, fontsize=10,
         verticalalignment='top', bbox=props)

plt.title('Patient 1 Inflammation with Annotations', fontsize=16)
plt.xlabel('Day')
plt.ylabel('Inflammation Level')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 10. Saving Plots

Save your plots in various formats:

In [None]:
# Create a final summary plot
plt.figure(figsize=(12, 8))

plt.plot(days, patient_1, 'b-', linewidth=2, label='Patient 1')
plt.plot(days, patient_2, 'r-', linewidth=2, label='Patient 2')
plt.plot(days, patient_3, 'g-', linewidth=2, label='Patient 3')
plt.plot(days, daily_mean, 'k--', linewidth=3, label='Daily Mean')

plt.fill_between(days, daily_mean - daily_std, daily_mean + daily_std, 
                 alpha=0.2, color='gray', label='±1 Std Dev')

plt.title('Inflammation Study: Complete Results', fontsize=18, fontweight='bold')
plt.xlabel('Day of Treatment', fontsize=14)
plt.ylabel('Inflammation Level (arbitrary units)', fontsize=14)
plt.legend(loc='upper right', fontsize=12)
plt.grid(True, alpha=0.3)

# Add study metadata
plt.figtext(0.02, 0.02, 'Generated with Python/Matplotlib for Software Carpentry', 
            fontsize=8, style='italic')

plt.tight_layout()

# Save in multiple formats
plt.savefig('inflammation_study_results.png', dpi=300, bbox_inches='tight')
plt.savefig('inflammation_study_results.pdf', bbox_inches='tight')
plt.savefig('inflammation_study_results.svg', bbox_inches='tight')

plt.show()

print("Plots saved as:")
print("- inflammation_study_results.png (high resolution)")
print("- inflammation_study_results.pdf (vector format)")
print("- inflammation_study_results.svg (web-friendly vector)")

## Summary

In this episode, we learned:
- **Basic plotting**: Line plots, scatter plots, histograms
- **Customization**: Colors, markers, labels, legends
- **Statistical plots**: Error bars, box plots, violin plots
- **Subplots**: Multi-panel figures for comprehensive analysis
- **Advanced techniques**: Heatmaps, annotations, styling
- **File output**: Saving plots in multiple formats

Effective visualization is key to understanding and communicating scientific results!

## Clean up

Remove generated files (optional):

In [None]:
import os

# List of files to clean up
files_to_remove = [
    'inflammation_study_results.png',
    'inflammation_study_results.pdf',
    'inflammation_study_results.svg'
]

for filename in files_to_remove:
    if os.path.exists(filename):
        os.remove(filename)
        print(f"Removed {filename}")
    else:
        print(f"{filename} not found")