# Lecture 6: Visualization & Communication
## Exercise Solutions

**Course:** Introduction to Scientific Programming  
**Institution:** CNC-UC - Center for Neuroscience and Cell Biology

---

This notebook contains detailed solutions to all exercises with explanations.

In [None]:
# Import all required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats, signal
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("‚úì All libraries loaded successfully")

## Exercise 1 Solution: Multi-Panel Publication Figure

### Key Learning Points:
- Creating multi-panel layouts with `plt.subplots()`
- Adding panel labels programmatically
- Removing unnecessary spines
- Using colorblind-safe colors
- Proper export settings

In [None]:
# Generate data
time = np.linspace(0, 10, 100)
control = 10 + 3 * np.sin(2 * np.pi * 0.5 * time) + np.random.normal(0, 0.8, 100)
treatment1 = 12 + 4 * np.sin(2 * np.pi * 0.5 * time + 0.3) + np.random.normal(0, 0.8, 100)
treatment2 = 15 + 5 * np.sin(2 * np.pi * 0.5 * time + 0.6) + np.random.normal(0, 0.9, 100)

# Mean activity for bar plot
conditions = ['Control', 'Treatment 1', 'Treatment 2']
means = [np.mean(control), np.mean(treatment1), np.mean(treatment2)]
sems = [stats.sem(control), stats.sem(treatment1), stats.sem(treatment2)]

# Correlation data
var1 = np.random.randn(50) * 2 + 10
var2 = var1 * 0.8 + np.random.randn(50) * 1.5

# Firing rate distribution
firing_rates = np.concatenate([control, treatment1, treatment2])

# Define colorblind-safe colors (from tab10 palette)
colors = ['#1f77b4', '#ff7f0e', '#2ca02c']  # blue, orange, green

# Create figure
fig, axes = plt.subplots(2, 2, figsize=(7, 6))

# Panel A: Line plot
axes[0, 0].plot(time, control, label='Control', color=colors[0], linewidth=2)
axes[0, 0].plot(time, treatment1, label='Treatment 1', color=colors[1], linewidth=2)
axes[0, 0].plot(time, treatment2, label='Treatment 2', color=colors[2], linewidth=2)
axes[0, 0].set_xlabel('Time (s)', fontsize=10)
axes[0, 0].set_ylabel('Neural Activity (Hz)', fontsize=10)
axes[0, 0].set_title('Time Course', fontweight='bold', fontsize=11)
axes[0, 0].legend(frameon=False, fontsize=9)

# Panel B: Bar plot with error bars
x_pos = np.arange(len(conditions))
axes[0, 1].bar(x_pos, means, yerr=sems, color=colors, capsize=5, alpha=0.8)
axes[0, 1].set_xticks(x_pos)
axes[0, 1].set_xticklabels(conditions, rotation=45, ha='right', fontsize=9)
axes[0, 1].set_ylabel('Mean Activity (Hz)', fontsize=10)
axes[0, 1].set_title('Mean Comparison', fontweight='bold', fontsize=11)

# Panel C: Scatter plot with correlation
axes[1, 0].scatter(var1, var2, alpha=0.6, s=50, color=colors[0])
# Add regression line
z = np.polyfit(var1, var2, 1)
p = np.poly1d(z)
axes[1, 0].plot(var1, p(var1), 'r--', linewidth=2, alpha=0.8)
# Calculate and display correlation
r, p_val = stats.pearsonr(var1, var2)
axes[1, 0].text(0.05, 0.95, f'r = {r:.2f}, p < 0.001', 
                transform=axes[1, 0].transAxes, va='top', fontsize=9)
axes[1, 0].set_xlabel('Variable 1', fontsize=10)
axes[1, 0].set_ylabel('Variable 2', fontsize=10)
axes[1, 0].set_title('Correlation', fontweight='bold', fontsize=11)

# Panel D: Histogram
axes[1, 1].hist(firing_rates, bins=30, color=colors[0], alpha=0.7, edgecolor='black')
axes[1, 1].set_xlabel('Firing Rate (Hz)', fontsize=10)
axes[1, 1].set_ylabel('Count', fontsize=10)
axes[1, 1].set_title('Distribution', fontweight='bold', fontsize=11)

# Add panel labels and remove spines
for i, ax in enumerate(axes.flat):
    # Panel label
    label = chr(65 + i)  # A, B, C, D
    ax.text(-0.15, 1.1, label, transform=ax.transAxes,
            fontsize=16, fontweight='bold', va='top')
    
    # Remove top and right spines
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

plt.tight_layout()

# Save figure
fig.savefig('/home/user/exercise1_solution.pdf', dpi=300, bbox_inches='tight')
print("‚úì Figure saved as 'exercise1_solution.pdf'")

plt.show()

**Explanation:**

1. **Color choice**: Used colorblind-safe colors from tab10 palette
2. **Panel labels**: Added programmatically using `chr(65 + i)` to generate A, B, C, D
3. **Spines**: Removed top and right spines for cleaner appearance
4. **Error bars**: Used SEM (standard error of mean) for bar plot
5. **Export**: PDF format at 300 DPI with `bbox_inches='tight'` to remove whitespace

## Exercise 2 Solution: Statistical Comparison with Seaborn

### Key Learning Points:
- Using Seaborn for statistical visualization
- Combining multiple plot types
- Working with long-format data
- Creating pairplots for multi-dimensional exploration

In [None]:
# Generate data
n_subjects = 60

data = pd.DataFrame({
    'subject_id': range(n_subjects),
    'control_rt': np.random.normal(450, 50, n_subjects),
    'drug_rt': np.random.normal(400, 45, n_subjects),
    'age': np.random.randint(20, 65, n_subjects),
    'age_group': np.random.choice(['Young', 'Old'], n_subjects),
    'sex': np.random.choice(['Male', 'Female'], n_subjects),
    'accuracy': np.clip(0.7 + 0.2 * np.random.random(n_subjects), 0, 1)
})

# Convert to long format
data_long = pd.melt(data,
                     id_vars=['subject_id', 'age_group', 'sex', 'age', 'accuracy'],
                     value_vars=['control_rt', 'drug_rt'],
                     var_name='condition',
                     value_name='reaction_time')
data_long['condition'] = data_long['condition'].str.replace('_rt', '').str.capitalize()

# Set Seaborn style
sns.set_theme(style='whitegrid', palette='colorblind')

print("Data prepared for visualization")
print(f"Total observations: {len(data_long)}")

In [None]:
# Plot 1: Violin plot with individual points
fig, ax = plt.subplots(figsize=(10, 6))

# Violin plot
sns.violinplot(data=data_long, x='condition', y='reaction_time', hue='age_group',
               split=True, inner=None, alpha=0.5, ax=ax)

# Add individual points
sns.stripplot(data=data_long, x='condition', y='reaction_time', hue='age_group',
              dodge=True, size=3, alpha=0.6, ax=ax, legend=False)

ax.set_ylabel('Reaction Time (ms)', fontsize=12)
ax.set_xlabel('Condition', fontsize=12)
ax.set_title('Reaction Time by Condition and Age Group', fontsize=14, fontweight='bold')
ax.legend(title='Age Group', frameon=False, fontsize=10)

plt.tight_layout()
fig.savefig('/home/user/exercise2_violin.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì Violin plot saved")

In [None]:
# Plot 2: Point plot with confidence intervals
fig, ax = plt.subplots(figsize=(10, 6))

sns.pointplot(data=data_long, x='condition', y='reaction_time', hue='age_group',
              errorbar='ci', markers=['o', 's'], linestyles=['-', '--'],
              capsize=0.1, ax=ax)

ax.set_ylabel('Mean Reaction Time (ms)', fontsize=12)
ax.set_xlabel('Condition', fontsize=12)
ax.set_title('Mean RT with 95% Confidence Intervals', fontsize=14, fontweight='bold')
ax.legend(title='Age Group', frameon=False, fontsize=10)

plt.tight_layout()
fig.savefig('/home/user/exercise2_pointplot.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì Point plot saved")

In [None]:
# Plot 3: Pairplot
# Select relevant columns and add condition info
plot_data = data.copy()
plot_data['mean_rt'] = (plot_data['control_rt'] + plot_data['drug_rt']) / 2

g = sns.pairplot(plot_data[['mean_rt', 'accuracy', 'age', 'age_group']],
                 hue='age_group', diag_kind='kde', corner=False, height=2.5)

g.fig.suptitle('Pairwise Relationships in Reaction Time Data', 
               y=1.01, fontsize=14, fontweight='bold')

plt.savefig('/home/user/exercise2_pairplot.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì Pairplot saved")
print("\nüìä All three plots completed successfully!")

**Explanation:**

1. **Data format**: Converted to long format using `pd.melt()` for easier plotting with Seaborn
2. **Violin + Strip**: Combined to show both distribution and individual data points
3. **Point plot**: Shows means with confidence intervals, useful for comparing groups
4. **Pairplot**: Automatically creates scatter plots for all variable pairs
5. **Colorblind-safe**: Used Seaborn's 'colorblind' palette throughout

## Exercise 3 Solution: Interactive Dashboard with Plotly

### Key Learning Points:
- Creating multi-panel layouts with `make_subplots()`
- Combining different plot types in one figure
- Adding 3D visualizations
- Exporting interactive HTML

In [None]:
# Generate data
n_trials = 200

data = pd.DataFrame({
    'trial': range(n_trials),
    'firing_rate': 10 + 5 * np.sin(2 * np.pi * np.arange(n_trials) / 50) + np.random.randn(n_trials) * 2,
    'condition': np.random.choice(['Control', 'Treatment'], n_trials),
    'neuron_id': np.random.choice([f'N{i+1}' for i in range(10)], n_trials),
    'session': np.tile(range(1, 21), n_trials // 20)
})

# PCA coordinates (simulated)
data['PC1'] = np.random.randn(n_trials) * 2 + (data['condition'] == 'Treatment') * 3
data['PC2'] = np.random.randn(n_trials) * 1.5 + (data['condition'] == 'Treatment') * 2
data['PC3'] = np.random.randn(n_trials) * 1.2

print("Data generated for interactive dashboard")

In [None]:
# Create subplots
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('Firing Rate Over Trials', 'Average Firing Rate by Session',
                    'Firing Rate Distribution', 'PCA Clustering (3D)'),
    specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
           [{'type': 'box'}, {'type': 'scatter3d'}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)

# Panel 1: Scatter plot of firing rate over trials
for condition in data['condition'].unique():
    subset = data[data['condition'] == condition]
    fig.add_trace(
        go.Scatter(
            x=subset['trial'],
            y=subset['firing_rate'],
            mode='markers',
            name=condition,
            legendgroup=condition,
            marker=dict(size=4, opacity=0.6),
            customdata=subset[['neuron_id']],
            hovertemplate='Trial: %{x}<br>FR: %{y:.2f} Hz<br>Neuron: %{customdata[0]}<extra></extra>'
        ),
        row=1, col=1
    )

# Panel 2: Line plot of average firing rate by session
session_avg = data.groupby(['session', 'condition'])['firing_rate'].mean().reset_index()
for condition in session_avg['condition'].unique():
    subset = session_avg[session_avg['condition'] == condition]
    fig.add_trace(
        go.Scatter(
            x=subset['session'],
            y=subset['firing_rate'],
            mode='lines+markers',
            name=condition,
            legendgroup=condition,
            showlegend=False,
            line=dict(width=2),
            marker=dict(size=8)
        ),
        row=1, col=2
    )

# Panel 3: Box plot
for condition in data['condition'].unique():
    subset = data[data['condition'] == condition]
    fig.add_trace(
        go.Box(
            y=subset['firing_rate'],
            name=condition,
            legendgroup=condition,
            showlegend=False,
            marker=dict(opacity=0.7)
        ),
        row=2, col=1
    )

# Panel 4: 3D scatter plot
for condition in data['condition'].unique():
    subset = data[data['condition'] == condition]
    fig.add_trace(
        go.Scatter3d(
            x=subset['PC1'],
            y=subset['PC2'],
            z=subset['PC3'],
            mode='markers',
            name=condition,
            legendgroup=condition,
            showlegend=False,
            marker=dict(size=4, opacity=0.7)
        ),
        row=2, col=2
    )

# Update axes labels
fig.update_xaxes(title_text='Trial', row=1, col=1)
fig.update_yaxes(title_text='Firing Rate (Hz)', row=1, col=1)

fig.update_xaxes(title_text='Session', row=1, col=2)
fig.update_yaxes(title_text='Mean FR (Hz)', row=1, col=2)

fig.update_yaxes(title_text='Firing Rate (Hz)', row=2, col=1)

# Update layout
fig.update_layout(
    height=800,
    title_text='Neural Recording Interactive Dashboard',
    title_font_size=16,
    showlegend=True,
    legend=dict(orientation='v', yanchor='top', y=1, xanchor='left', x=1.02)
)

# Export as HTML
fig.write_html('/home/user/exercise3_dashboard.html')
print("‚úì Interactive dashboard saved as 'exercise3_dashboard.html'")
print("  Open in browser for full interactivity!")

fig.show()

**Explanation:**

1. **Subplot specs**: Defined different plot types for each panel
2. **Legend groups**: Used `legendgroup` to avoid duplicate legend entries
3. **Custom hover**: Added `customdata` and `hovertemplate` for detailed information
4. **3D visualization**: Essential for visualizing PCA results interactively
5. **HTML export**: Creates standalone file that can be shared via email or web

## Exercise 4 Solution: Fixing Bad Visualizations

### Problems in Original Code:
1. ‚ùå Truncated y-axis (95-102 instead of 0-102)
2. ‚ùå Red-green colors (not colorblind-safe)
3. ‚ùå Overlapping bars (hard to compare)
4. ‚ùå Uninformative title
5. ‚ùå No axis labels or units
6. ‚ùå Legend has frame
7. ‚ùå Top and right spines present

### Fixed Version:

In [None]:
# Data
data1 = [98, 99, 100, 101, 99.5]
data2 = [97, 98, 99, 100, 98.5]
categories = ['Subject A', 'Subject B', 'Subject C', 'Subject D', 'Subject E']

# Create figure with proper size
fig, ax = plt.subplots(figsize=(8, 6))

# Use colorblind-safe colors
colors_safe = ['#0173B2', '#DE8F05']  # Blue and orange

# Create grouped bar plot (not overlapping)
x = np.arange(len(categories))
width = 0.35

bars1 = ax.bar(x - width/2, data1, width, label='Group 1', 
               color=colors_safe[0], alpha=0.8)
bars2 = ax.bar(x + width/2, data2, width, label='Group 2', 
               color=colors_safe[1], alpha=0.8)

# Fix y-axis: start from 0 or slightly below minimum
ax.set_ylim(0, 105)

# Add informative labels with units
ax.set_xlabel('Subjects', fontsize=12)
ax.set_ylabel('Task Performance Score (%)', fontsize=12)
ax.set_title('Comparison of Task Performance Between Groups', 
             fontsize=14, fontweight='bold')

# Set x-tick labels
ax.set_xticks(x)
ax.set_xticklabels(categories)

# Add legend without frame
ax.legend(frameon=False, fontsize=11)

# Remove top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Add grid for easier reading
ax.yaxis.grid(True, alpha=0.3, linestyle='--')
ax.set_axisbelow(True)

plt.tight_layout()
fig.savefig('/home/user/exercise4_fixed.pdf', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Fixed visualization saved")
print("\nImprovements made:")
print("  1. Y-axis starts at 0 (shows true scale)")
print("  2. Used colorblind-safe blue and orange")
print("  3. Bars are grouped side-by-side (easy comparison)")
print("  4. Informative title and labels with units")
print("  5. Removed legend frame")
print("  6. Removed top and right spines")
print("  7. Added subtle grid for readability")

**Key Improvements:**

1. **Honest y-axis**: Starting at 0 shows the true scale of differences
2. **Accessible colors**: Blue and orange work for all viewers
3. **Clear comparison**: Side-by-side bars are easier to compare than overlapping
4. **Complete labels**: Title, axis labels, and units provide full context
5. **Clean design**: Removed unnecessary visual elements

## Exercise 5 Solution: Multi-Dimensional Data Exploration

### Key Learning Points:
- Using FacetGrid for conditional visualization
- Creating correlation heatmaps
- Interactive 3D exploration with Plotly

In [None]:
# Generate multi-dimensional neural data
n_neurons = 150

data = pd.DataFrame({
    'neuron_id': range(n_neurons),
    'region': np.random.choice(['V1', 'V2', 'MT'], n_neurons),
    'condition': np.random.choice(['Rest', 'Task'], n_neurons),
    'firing_rate': np.random.gamma(5, 2, n_neurons),
    'synchrony': np.random.beta(2, 5, n_neurons),
    'variability': np.random.exponential(0.3, n_neurons),
    'response_latency': np.random.normal(150, 30, n_neurons)
})

# Add some correlations
data['firing_rate'] += data['synchrony'] * 3
data['variability'] -= data['synchrony'] * 0.2

print("Multi-dimensional neural dataset created")
print(f"Neurons: {n_neurons}")
print(f"Regions: {data['region'].unique()}")
print(f"Conditions: {data['condition'].unique()}")

In [None]:
# 1. FacetGrid: Firing rates by region and condition
sns.set_theme(style='whitegrid', palette='colorblind')

g = sns.FacetGrid(data, col='region', row='condition', hue='region',
                  height=3, aspect=1.2, margin_titles=True)
g.map(sns.histplot, 'firing_rate', kde=True, alpha=0.6, bins=15)
g.set_axis_labels('Firing Rate (Hz)', 'Count')
g.set_titles(col_template='{col_name}', row_template='{row_name}')
g.fig.suptitle('Firing Rate Distributions by Region and Condition', 
               y=1.02, fontsize=14, fontweight='bold')

plt.savefig('/home/user/exercise5_facetgrid.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì FacetGrid saved")

In [None]:
# 2. Pairplot: Relationships between neural properties
g = sns.pairplot(data[['firing_rate', 'synchrony', 'variability', 'region']],
                 hue='region', diag_kind='kde', corner=False, height=2.5)
g.fig.suptitle('Pairwise Relationships in Neural Properties',
               y=1.01, fontsize=14, fontweight='bold')

plt.savefig('/home/user/exercise5_pairplot.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì Pairplot saved")

In [None]:
# 3. Correlation heatmap
numeric_cols = ['firing_rate', 'synchrony', 'variability', 'response_latency']
correlation = data[numeric_cols].corr()

fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, vmin=-1, vmax=1, square=True, ax=ax,
            cbar_kws={'label': 'Pearson Correlation'})
ax.set_title('Correlation Matrix of Neural Properties',
             fontsize=14, fontweight='bold', pad=15)

plt.tight_layout()
fig.savefig('/home/user/exercise5_correlation.png', dpi=600, bbox_inches='tight')
plt.show()

print("‚úì Correlation heatmap saved")

In [None]:
# 4. Interactive 3D clustering visualization
fig = px.scatter_3d(data,
                    x='firing_rate',
                    y='synchrony',
                    z='variability',
                    color='region',
                    symbol='condition',
                    size_max=10,
                    opacity=0.7,
                    hover_data=['neuron_id', 'response_latency'],
                    title='Neural Properties in 3D Space',
                    labels={'firing_rate': 'Firing Rate (Hz)',
                           'synchrony': 'Synchrony',
                           'variability': 'Variability'})

fig.update_layout(
    scene=dict(
        xaxis_title='Firing Rate (Hz)',
        yaxis_title='Synchrony',
        zaxis_title='Variability'
    ),
    height=700
)

fig.write_html('/home/user/exercise5_3d_clustering.html')
print("‚úì Interactive 3D plot saved as HTML")

fig.show()

**Explanation:**

1. **FacetGrid**: Shows how distributions vary across categorical combinations
2. **Pairplot**: Quickly reveals all pairwise relationships and potential correlations
3. **Heatmap**: Quantifies correlations between continuous variables
4. **3D interactive**: Allows exploration of clustering in multi-dimensional space

## Summary

### Key Takeaways from Solutions:

1. **Planning**: Think about your message before choosing plot type
2. **Accessibility**: Always use colorblind-safe palettes
3. **Clarity**: Remove unnecessary elements (chartjunk)
4. **Completeness**: Include all labels, units, and legends
5. **Consistency**: Apply same styling across all figures
6. **Format**: Choose appropriate format (PDF for publications, HTML for interactivity)

### Common Pitfalls to Avoid:

- ‚ùå Truncated axes
- ‚ùå Red-green color schemes
- ‚ùå Missing labels or units
- ‚ùå Overlapping elements
- ‚ùå Too much information in one plot
- ‚ùå Inappropriate plot types

### Next Steps:

1. Practice with your own research data
2. Create a personal style template
3. Build a library of reusable plotting functions
4. Always get feedback on figures before publication

**Remember:** A figure should tell its story without requiring the caption!