# Part 3: Data Visualization - Sports Radio Dashboard
## CSC 2053 - Lab 12

Time to make your data visual! In this lab, you'll create compelling charts and graphs to tell the story of sports radio.

**What You'll Learn:**
- Matplotlib fundamentals
- Pandas built-in plotting
- Seaborn for statistical graphics
- Customizing visualizations
- Creating multi-panel dashboards

**What You'll Create:**
- Bar charts of top ownership groups
- Geographic distribution maps
- Power distribution histograms
- Format breakdown pie charts
- A comprehensive sports radio dashboard



---
## Setup: Load Your Data

Let's load the same data from Lab 2 and prepare for visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Choose your sport: "MLB", "NHL", "NFL", or "NBA"
sport = "MLB"  # Change this!

url = f'https://raw.githubusercontent.com/CSC-2053-100-Fall25/python-datascience-template/main/{sport}.csv'
df = pd.read_csv(url)

print(f"✓ Loaded {sport} data: {len(df)} stations")
print(f"✓ Matplotlib and Seaborn ready!")

---
## Part 1: Matplotlib Basics

Matplotlib is Python's foundational plotting library. Everything else builds on it.

### Your First Plot

In [None]:
# Simple scatter plot: frequency vs power
plt.figure(figsize=(10, 6))
plt.scatter(df['frequency'], df['erp'], alpha=0.5)
plt.xlabel('Frequency (MHz)')
plt.ylabel('Power (kW)')
plt.title('Station Power vs Frequency')
plt.show()

### Bar Charts

In [None]:
# Top 10 states by station count
top_states = df['state'].value_counts().head(10)

plt.figure(figsize=(12, 6))
plt.bar(top_states.index, top_states.values, color='steelblue')
plt.xlabel('State')
plt.ylabel('Number of Stations')
plt.title(f'Top 10 States - {sport} Radio Affiliates')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Histograms

In [None]:
# Distribution of station power
plt.figure(figsize=(10, 6))
plt.hist(df['erp'], bins=50, color='coral', edgecolor='black', alpha=0.7)
plt.xlabel('Power (kW)')
plt.ylabel('Number of Stations')
plt.title('Distribution of Station Power')
plt.axvline(df['erp'].mean(), color='red', linestyle='--', label=f'Mean: {df["erp"].mean():.1f} kW')
plt.legend()
plt.show()

### Exercise 1.1: Create Basic Plots

Make three different plot types to explore your data.

In [None]:
# YOUR CODE HERE

# 1. Create a histogram of frequencies
# Hint: Use plt.hist() with df['frequency']


# 2. Create a horizontal bar chart of top 10 formats
# Hint: Use .value_counts() and plt.barh()


# 3. Create a scatter plot of latitude vs longitude
# Hint: This will show geographic distribution!


---
## Part 2: Pandas Built-in Plotting

Pandas DataFrames have plotting methods built right in - super convenient!

### Direct from DataFrames

In [None]:
# Top formats - one line!
df['new_format'].value_counts().head(10).plot(kind='barh', figsize=(10, 6), color='teal')
plt.xlabel('Number of Stations')
plt.title('Top 10 Radio Formats')
plt.tight_layout()
plt.show()

In [None]:
# Stations by state - sorted
by_state = df['state'].value_counts().head(15)
by_state.plot(kind='bar', figsize=(12, 6), color='darkgreen')
plt.ylabel('Number of Stations')
plt.title(f'{sport} Radio Network - Geographic Distribution')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Pie Charts

In [None]:
# Format breakdown (top 5 + other)
format_counts = df['new_format'].value_counts()
top_5_formats = format_counts.head(5)
other = format_counts[5:].sum()
pie_data = pd.concat([top_5_formats, pd.Series({'Other': other})])

plt.figure(figsize=(10, 8))
plt.pie(pie_data, labels=pie_data.index, autopct='%1.1f%%', startangle=90)
plt.title(f'{sport} Station Formats')
plt.axis('equal')
plt.show()

### Exercise 2.1: Pandas Plotting

Use Pandas' built-in plotting for quick visualizations.

In [None]:
# YOUR CODE HERE

# 1. Create a horizontal bar chart of top 10 ownership groups
# Hint: df['owner'].value_counts().head(10).plot(kind='barh')


# 2. Create a pie chart showing USA vs Canada stations
# Hint: Use df['country'].value_counts()


# 3. Create a line plot showing cumulative station count by frequency
# Hint: Sort by frequency, then use .cumcount() or create range


---
## Part 3: Seaborn for Statistical Graphics

Seaborn makes beautiful, informative statistical plots with minimal code.

### Count Plots

In [None]:
# Top 10 states with seaborn
top_10_states = df['state'].value_counts().head(10).index
df_top_states = df[df['state'].isin(top_10_states)]

plt.figure(figsize=(12, 6))
sns.countplot(data=df_top_states, y='state', order=top_10_states, palette='viridis')
plt.xlabel('Number of Stations')
plt.ylabel('State')
plt.title(f'{sport} Radio Affiliates by State (Top 10)')
plt.tight_layout()
plt.show()

### Box Plots

In [None]:
# Power distribution by format (top 8 formats)
top_formats = df['new_format'].value_counts().head(8).index
df_top_formats = df[df['new_format'].isin(top_formats)]

plt.figure(figsize=(14, 6))
sns.boxplot(data=df_top_formats, x='new_format', y='erp', palette='Set2')
plt.xlabel('Format')
plt.ylabel('Power (kW)')
plt.title('Station Power Distribution by Format')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

### Scatter Plots with Regression

In [None]:
# Is there a relationship between frequency and power?
plt.figure(figsize=(10, 6))
sns.regplot(data=df, x='frequency', y='erp', scatter_kws={'alpha': 0.3}, line_kws={'color': 'red'})
plt.xlabel('Frequency (MHz)')
plt.ylabel('Power (kW)')
plt.title('Frequency vs Power (with regression line)')
plt.tight_layout()
plt.show()

### Exercise 3.1: Seaborn Visualizations

Create sophisticated statistical graphics with Seaborn.

In [None]:
# YOUR CODE HERE

# 1. Create a count plot of the top 10 ownership groups
# Hint: Similar to the state example above


# 2. Create a violin plot comparing power distribution between USA and Canada
# Hint: sns.violinplot(data=df, x='country', y='erp')


# 3. Create a scatter plot of population vs number of stations in each market
# Hint: You'll need to group by market first, then plot


---
## Part 4: Customization and Styling

Make your plots publication-ready with professional styling.

### Color Palettes

In [None]:
# Compare different color schemes
top_5_states = df['state'].value_counts().head(5)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Default colors
top_5_states.plot(kind='bar', ax=axes[0], color='steelblue')
axes[0].set_title('Single Color')
axes[0].set_ylabel('Station Count')

# Colormap
top_5_states.plot(kind='bar', ax=axes[1], colormap='plasma')
axes[1].set_title('Plasma Colormap')
axes[1].set_ylabel('Station Count')

# Custom palette
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A', '#98D8C8']
top_5_states.plot(kind='bar', ax=axes[2], color=colors)
axes[2].set_title('Custom Colors')
axes[2].set_ylabel('Station Count')

plt.tight_layout()
plt.show()

### Annotations and Labels

In [None]:
# Top owners with value labels
top_owners = df['owner'].value_counts().head(10)

fig, ax = plt.subplots(figsize=(12, 6))
bars = ax.barh(range(len(top_owners)), top_owners.values, color='darkslateblue')
ax.set_yticks(range(len(top_owners)))
ax.set_yticklabels(top_owners.index)
ax.set_xlabel('Number of Stations', fontsize=12, fontweight='bold')
ax.set_title(f'{sport} Radio - Top 10 Broadcasting Companies', fontsize=14, fontweight='bold')

# Add value labels on bars
for i, (bar, value) in enumerate(zip(bars, top_owners.values)):
    ax.text(value + 0.5, i, str(value), va='center', fontweight='bold')

ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

### Themes and Styles

In [None]:
# Try different seaborn styles
styles = ['darkgrid', 'whitegrid', 'dark', 'white', 'ticks']

fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()

for idx, style in enumerate(styles):
    sns.set_style(style)
    ax = axes[idx]
    df['erp'].plot(kind='hist', bins=30, ax=ax, color='coral', edgecolor='black')
    ax.set_title(f'Style: {style}', fontweight='bold')
    ax.set_xlabel('Power (kW)')
    ax.set_ylabel('Frequency')

# Remove extra subplot
fig.delaxes(axes[5])

plt.tight_layout()
plt.show()

# Reset to whitegrid
sns.set_style('whitegrid')

### Exercise 4.1: Professional Styling

Create a beautifully styled, publication-ready chart.

In [None]:
# YOUR CODE HERE

# Create a bar chart of top 8 formats with:
# 1. Custom color palette (your choice)
# 2. Value labels on each bar
# 3. Bold, descriptive title
# 4. Rotated x-axis labels
# 5. Grid lines
# 6. Tight layout

# Make it look professional!


---
## Part 5: Multi-Panel Figures

Combine multiple plots into comprehensive dashboards.

### Subplots Basics

In [None]:
# 2x2 grid of different analyses
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Top left: Power histogram
axes[0, 0].hist(df['erp'], bins=50, color='coral', edgecolor='black', alpha=0.7)
axes[0, 0].set_xlabel('Power (kW)')
axes[0, 0].set_ylabel('Number of Stations')
axes[0, 0].set_title('A) Power Distribution', fontweight='bold', loc='left')
axes[0, 0].axvline(df['erp'].median(), color='red', linestyle='--', label=f'Median: {df["erp"].median():.1f}')
axes[0, 0].legend()

# Top right: Top states
top_states = df['state'].value_counts().head(10)
axes[0, 1].barh(range(len(top_states)), top_states.values, color='steelblue')
axes[0, 1].set_yticks(range(len(top_states)))
axes[0, 1].set_yticklabels(top_states.index)
axes[0, 1].set_xlabel('Number of Stations')
axes[0, 1].set_title('B) Top 10 States', fontweight='bold', loc='left')
axes[0, 1].grid(axis='x', alpha=0.3)

# Bottom left: Frequency histogram
axes[1, 0].hist(df['frequency'], bins=40, color='teal', edgecolor='black', alpha=0.7)
axes[1, 0].set_xlabel('Frequency (MHz)')
axes[1, 0].set_ylabel('Number of Stations')
axes[1, 0].set_title('C) Frequency Distribution', fontweight='bold', loc='left')

# Bottom right: Geographic scatter
axes[1, 1].scatter(df['lon'], df['lat'], alpha=0.5, c='darkgreen', s=20)
axes[1, 1].set_xlabel('Longitude')
axes[1, 1].set_ylabel('Latitude')
axes[1, 1].set_title('D) Geographic Distribution', fontweight='bold', loc='left')

plt.suptitle(f'{sport} Radio Affiliates - Overview Dashboard', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

### Advanced Layouts

In [None]:
# Custom layout with different sized panels
fig = plt.figure(figsize=(16, 10))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# Large plot spanning top row
ax1 = fig.add_subplot(gs[0, :])
top_owners = df['owner'].value_counts().head(15)
ax1.bar(range(len(top_owners)), top_owners.values, color='darkslateblue')
ax1.set_xticks(range(len(top_owners)))
ax1.set_xticklabels(top_owners.index, rotation=45, ha='right')
ax1.set_ylabel('Number of Stations')
ax1.set_title('Top 15 Broadcasting Companies', fontweight='bold', fontsize=12)
ax1.grid(axis='y', alpha=0.3)

# Bottom left: Format pie
ax2 = fig.add_subplot(gs[1:, 0])
format_counts = df['new_format'].value_counts()
top_formats = format_counts.head(5)
other = format_counts[5:].sum()
pie_data = pd.concat([top_formats, pd.Series({'Other': other})])
ax2.pie(pie_data, labels=pie_data.index, autopct='%1.1f%%', startangle=90)
ax2.set_title('Format Breakdown', fontweight='bold')

# Bottom middle: Power by format box
ax3 = fig.add_subplot(gs[1:, 1])
top_5_formats = df['new_format'].value_counts().head(5).index
df_top_5 = df[df['new_format'].isin(top_5_formats)]
df_top_5.boxplot(column='erp', by='new_format', ax=ax3)
ax3.set_xlabel('Format')
ax3.set_ylabel('Power (kW)')
ax3.set_title('Power by Format (Top 5)', fontweight='bold')
plt.sca(ax3)
plt.xticks(rotation=45, ha='right')
plt.suptitle('')  # Remove auto-generated title

# Bottom right: Geographic
ax4 = fig.add_subplot(gs[1:, 2])
ax4.scatter(df['lon'], df['lat'], alpha=0.4, c='crimson', s=15)
ax4.set_xlabel('Longitude')
ax4.set_ylabel('Latitude')
ax4.set_title('Station Locations', fontweight='bold')

fig.suptitle(f'{sport} Radio Network Analysis', fontsize=16, fontweight='bold')
plt.show()

### Exercise 5.1: Build Your Dashboard

Create a 2x2 dashboard with different analyses.

In [None]:
# YOUR CODE HERE

# Create a 2x2 subplot dashboard with:
# 1. Top-left: Histogram of frequencies
# 2. Top-right: Bar chart of top 8 markets (by station count)
# 3. Bottom-left: Scatter plot of lat/lon colored by power
# 4. Bottom-right: Box plot of power by country

# Add an overall title and make it look professional!


---
## Putting It All Together

Create a comprehensive **Sports Radio Dashboard** that tells the complete story of your data.

In [None]:
# YOUR CODE HERE

# Create a comprehensive dashboard with at least 6 visualizations:

# 1. Geographic distribution (scatter plot of lat/lon)
# 2. Top ownership groups (bar chart)
# 3. Format breakdown (pie chart or bar chart)
# 4. Power distribution (histogram with statistics)
# 5. Top markets by station count
# 6. Any additional analysis you find interesting!

# Use a custom layout (3x2, 2x3, or custom gridspec)
# Add professional styling, colors, labels, and titles
# Tell a story with your visualizations!

print(f"Creating {sport} Radio Network Dashboard...")

# Your dashboard code here


---
## Challenge Problem

**Power vs Coverage Analysis**

Create a sophisticated multi-panel visualization exploring the relationship between station power, geographic coverage, and market size:

1. **Main plot** (large, spanning 2 columns): Scatter plot of latitude vs longitude where:
   - Point size represents station power (ERP)
   - Point color represents format (use top 5 formats + "Other")
   - Add a legend showing format colors
   - Title should indicate the sport

2. **Side panel 1**: Distribution of power (histogram) with vertical lines showing:
   - Mean (red dashed line)
   - Median (blue dashed line)
   - 25th and 75th percentiles (green dashed lines)
   - Add legend

3. **Side panel 2**: Top 10 markets by average station power (horizontal bar chart)
   - Color bars by average power (use colormap)
   - Add value labels on bars

4. **Bottom panel**: Scatter plot of market population vs average power in that market
   - Size of points by number of stations in market
   - Add trend line
   - Label outliers (markets with unusual power/population ratios)

**Bonus:** Add interactive tooltips or annotations for the highest-powered stations.

**Extra Bonus:** Save your dashboard as a high-resolution PNG file using `plt.savefig()`

In [None]:
# YOUR CODE HERE

def create_power_coverage_dashboard(df, sport):
    """
    Create a sophisticated multi-panel visualization analyzing
    power, coverage, and market relationships.
    
    Parameters:
        df (DataFrame): Sports radio affiliate data
        sport (str): Sport name for title
    
    Returns:
        fig: Matplotlib figure object
    """
    # YOUR CODE HERE
    pass

# Create and display the dashboard
fig = create_power_coverage_dashboard(df, sport)
# plt.savefig(f'{sport}_power_coverage_dashboard.png', dpi=300, bbox_inches='tight')  # Uncomment to save
plt.show()

---
## Wrap-Up

### Real Skills You've Gained:
- Transform data into visual insights
- Choose the right plot type for your question
- Create publication-ready visualizations
- Build comprehensive dashboards
- Tell stories with data

### Visualization Best Practices:
1. **Choose the right chart type:**
   - Comparisons → Bar charts
   - Distributions → Histograms, box plots
   - Relationships → Scatter plots
   - Proportions → Pie charts
   - Trends → Line plots

2. **Keep it simple:** Don't overload with information
3. **Label everything:** Axes, titles, legends
4. **Use color thoughtfully:** Highlight important information
5. **Tell a story:** Guide viewers to your insights

### Next Steps:
- **Lab 4:** Create interactive web maps with Folium
- **Beyond:** Explore Plotly for interactive visualizations

