# Executive Video Insights -- Interview Preparation Reference

**Purpose:** Comprehensive analysis of video performance across 11 Brightcove accounts  
**Framework:** STAR-based talking points with quantified metrics for interview preparation  
**Data Source:** Unified daily_analytics table in DuckDB (consolidated ETL pipeline output)

---

### How to Use This Notebook

1. **Run all cells** -- each section computes metrics and stores them for the final cheat sheet
2. **Review each "Key Takeaway"** -- fill in the **[bracketed placeholders]** with your actual numbers
3. **Jump to Section 11** for the compiled interview cheat sheet and 7 STAR talking points
4. **Customize** -- modify SQL queries to focus on specific time periods or channels

### Sections

| # | Section | Business Question |
|---|---------|-------------------|
| 1 | Executive Summary Dashboard | Overall platform health at a glance |
| 2 | Viewing Volume & Adoption Trends | Is video consumption growing? |
| 3 | Engagement Quality Scorecard | Are viewers actually watching our content? |
| 4 | Engagement Funnel & Drop-off (The Big Win) | Where exactly do viewers drop off? |
| 5 | Content Strategy: Duration Sweet Spot | What video length drives the best engagement? |
| 6 | Top Performing Content & Content Gaps | Greatest hits vs. problem videos |
| 7 | Channel/Account Performance Comparison | Where should we invest or consolidate? |
| 8 | Device & Platform Strategy | How are employees watching videos? |
| 9 | Content Lifecycle & Freshness | Which content is stale? |
| 10 | Regional & Temporal Patterns | When and where are employees watching? |
| 11 | Interview Cheat Sheet | All key numbers + 7 STAR talking points |

---
## Section 0: Setup & Configuration

In [None]:
# ---------------------------------------------------------------------------
# Imports & Configuration
# ---------------------------------------------------------------------------
import duckdb
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime, timedelta

import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns

# Display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:,.2f}'.format)
pd.set_option('display.max_colwidth', 60)

# Style
plt.style.use('seaborn-v0_8-whitegrid')
FIGSIZE = (12, 5)

# Color scheme
C = {
    'success':  '#27ae60',
    'fail':     '#e74c3c',
    'warn':     '#f39c12',
    'neutral':  '#2c3e50',
    'gray':     '#95a5a6',
    'success_light': '#a9dfbf',
    'fail_light':    '#f5b7b1',
    'warn_light':    '#fad7a0',
    'neutral_light': '#aeb6bf',
    'gray_light':    '#d5dbdb',
    'blue':     '#2980b9',
    'blue_light': '#85c1e9',
    'purple':   '#8e44ad',
    'purple_light': '#d2b4de',
}

# Funnel palette (gradient from green to red)
FUNNEL_COLORS = [C['success'], C['blue'], C['warn'], '#e67e22', C['fail']]

# ---------------------------------------------------------------------------
# Database connection
# ---------------------------------------------------------------------------
DB_PATH = Path('../output/analytics.duckdb')
if not DB_PATH.exists():
    raise FileNotFoundError(f"Database not found at {DB_PATH}. Run the pipeline first.")

conn = duckdb.connect(str(DB_PATH), read_only=True)


def query(sql):
    """Execute SQL and return a DataFrame."""
    return conn.execute(sql).fetchdf()


def fmt_num(n):
    """Format number with comma separators."""
    if pd.isna(n):
        return 'N/A'
    return f"{int(n):,}"


def fmt_pct(p):
    """Format percentage to one decimal."""
    if pd.isna(p):
        return 'N/A'
    return f"{p:.1f}%"


def truncate_name(name, max_len=50):
    """Truncate long video names for chart labels."""
    if pd.isna(name):
        return '(untitled)'
    return name[:max_len] + '...' if len(str(name)) > max_len else str(name)


def annotate_bars(ax, fmt=',.0f', suffix='', fontsize=9, offset=0.5):
    """Add value labels to bar chart patches."""
    for p in ax.patches:
        val = p.get_width() if p.get_width() != 0 else p.get_height()
        if p.get_width() > p.get_height():  # horizontal bar
            ax.text(p.get_width() + offset, p.get_y() + p.get_height()/2,
                    f'{val:{fmt}}{suffix}', va='center', fontsize=fontsize)
        else:  # vertical bar
            ax.text(p.get_x() + p.get_width()/2, p.get_height() + offset,
                    f'{val:{fmt}}{suffix}', ha='center', fontsize=fontsize)


# ---------------------------------------------------------------------------
# Metric accumulators for the final cheat sheet (Section 11)
# ---------------------------------------------------------------------------
EXEC = {}       # Executive summary KPIs
TRENDS = {}     # Viewing trends
ENGAGEMENT = {} # Engagement quality
FUNNEL = {}     # Engagement funnel
CONTENT = {}    # Duration / content strategy
TOPVIDS = {}    # Top performing content
CHANNELS = {}   # Channel performance
DEVICE = {}     # Device strategy
LIFECYCLE = {}  # Content lifecycle
REGIONAL = {}   # Regional patterns

print(f"Connected to: {DB_PATH}")
print(f"Database size: {DB_PATH.stat().st_size / (1024*1024):.1f} MB")
print(f"Notebook generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")

---
## Section 1: Executive Summary Dashboard

**Business Question:** What is the overall health of our video platform, at a glance?

In [None]:
# ---------------------------------------------------------------------------
# 1. Executive Summary -- KPIs
# ---------------------------------------------------------------------------
kpi = query("""
    SELECT
        COUNT(DISTINCT video_id)                              AS total_videos,
        COUNT(DISTINCT channel)                               AS total_channels,
        SUM(video_view)                                       AS total_views,
        SUM(video_impression)                                 AS total_impressions,
        ROUND(SUM(video_seconds_viewed) / 3600.0, 0)         AS total_watch_hours,
        ROUND(AVG(engagement_score), 1)                       AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)                   AS avg_completion_rate,
        ROUND(SUM(video_view)*100.0 / NULLIF(SUM(video_impression),0), 1) AS play_rate,
        MIN(date)                                             AS period_start,
        MAX(date)                                             AS period_end,
        DATEDIFF('day', MIN(date), MAX(date))                 AS period_days
    FROM daily_analytics
""").iloc[0]

# Store for cheat sheet
EXEC = kpi.to_dict()

# Print dashboard
print("=" * 66)
print("          EXECUTIVE SUMMARY -- VIDEO PLATFORM HEALTH")
print("=" * 66)
print(f"")
print(f"  Period:                {kpi['period_start']}  to  {kpi['period_end']}  ({int(kpi['period_days'])} days)")
print(f"  Data freshness:        {(datetime.now().date() - pd.Timestamp(kpi['period_end']).date()).days} days since last data point")
print(f"")
print(f"  Total Videos:          {fmt_num(kpi['total_videos'])}")
print(f"  Total Channels:        {fmt_num(kpi['total_channels'])}")
print(f"  Total Views:           {fmt_num(kpi['total_views'])}")
print(f"  Total Impressions:     {fmt_num(kpi['total_impressions'])}")
print(f"  Total Watch Hours:     {fmt_num(kpi['total_watch_hours'])}")
print(f"")
print(f"  Avg Engagement Score:  {fmt_pct(kpi['avg_engagement'])}")
print(f"  Avg Completion Rate:   {fmt_pct(kpi['avg_completion_rate'])}")
print(f"  Play Rate:             {fmt_pct(kpi['play_rate'])}  (views / impressions)")
print("=" * 66)

### Key Takeaway (Interview-Ready)

> I built and managed a unified video analytics platform covering **[total_videos]** videos across **[total_channels]** Brightcove accounts in 4 business categories (internet/intranet, research, global wealth management, events). Over **[period_days]** days, the platform tracked **[total_views]** views and **[total_watch_hours]** watch hours, with an average engagement score of **[avg_engagement]** and a play rate of **[play_rate]**.

---
## Section 2: Viewing Volume & Adoption Trends

**Business Question:** Is video consumption growing? What are the trends?

In [None]:
# ---------------------------------------------------------------------------
# 2. Viewing Volume & Adoption Trends
# ---------------------------------------------------------------------------
# Exclude the current (incomplete) month to avoid skewing trend comparisons
monthly = query("""
    SELECT
        DATE_TRUNC('month', date)      AS month,
        SUM(video_view)                AS total_views,
        COUNT(DISTINCT video_id)       AS unique_videos
    FROM daily_analytics
    WHERE video_view > 0
      AND DATE_TRUNC('month', date) < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY 1
    ORDER BY 1
""")

# Growth calculation: first complete month vs last complete month
if len(monthly) >= 2:
    first_views = monthly['total_views'].iloc[0]
    last_views  = monthly['total_views'].iloc[-1]
    growth_pct  = (last_views - first_views) / first_views * 100 if first_views > 0 else 0
    avg_monthly = monthly['total_views'].mean()
    trend_dir   = 'upward' if growth_pct > 5 else ('downward' if growth_pct < -5 else 'stable')
else:
    growth_pct = 0
    avg_monthly = monthly['total_views'].mean()
    trend_dir = 'insufficient data'

TRENDS = {
    'growth_pct': growth_pct,
    'avg_monthly_views': avg_monthly,
    'trend_direction': trend_dir,
    'num_months': len(monthly),
    'first_month_views': first_views if len(monthly) >= 2 else 0,
    'last_month_views': last_views if len(monthly) >= 2 else 0,
}

# -- Chart: Monthly views (bars) + unique videos (line overlay) --
fig, ax1 = plt.subplots(figsize=FIGSIZE)

x = range(len(monthly))
labels = monthly['month'].dt.strftime('%Y-%m').tolist()

bars = ax1.bar(x, monthly['total_views'], color=C['blue_light'], edgecolor=C['blue'],
               linewidth=0.5, label='Total Views', zorder=2)
ax1.set_ylabel('Total Views', fontsize=11, color=C['blue'])
ax1.yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))
ax1.tick_params(axis='y', labelcolor=C['blue'])

ax2 = ax1.twinx()
ax2.plot(x, monthly['unique_videos'], color=C['fail'], marker='o', linewidth=2,
         markersize=5, label='Unique Videos Viewed', zorder=3)
ax2.set_ylabel('Unique Videos Viewed', fontsize=11, color=C['fail'])
ax2.tick_params(axis='y', labelcolor=C['fail'])

ax1.set_xticks(x)
ax1.set_xticklabels(labels, rotation=45, ha='right', fontsize=9)
ax1.set_title('Monthly Viewing Volume & Content Breadth (complete months only)',
              fontsize=14, fontweight='bold')

# Combined legend
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left', framealpha=0.9)

plt.tight_layout()
plt.show()

# Summary stats
print(f"Growth (first complete month vs last complete month): {growth_pct:+.1f}%")
print(f"Average monthly views: {fmt_num(avg_monthly)}")
print(f"Trend direction: {trend_dir}")

### Key Takeaway (Interview-Ready)

> Over **[num_months]** months, video consumption showed a **[trend_direction]** trend with **[growth_pct]** growth from the first to the last measured period. Average monthly views reached **[avg_monthly_views]**, with content breadth tracked through unique videos viewed each month -- demonstrating the platform's value as a communication channel across the organization.

---
## Section 3: Engagement Quality Scorecard

**Business Question:** Are viewers actually watching our content? Is quality improving?

In [None]:
# ---------------------------------------------------------------------------
# 3. Engagement Quality Scorecard
# ---------------------------------------------------------------------------

# Monthly rolling averages (exclude current incomplete month)
quality_monthly = query("""
    SELECT
        DATE_TRUNC('month', date)                       AS month,
        ROUND(AVG(engagement_score), 1)                 AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)             AS completion_rate,
        ROUND(AVG(video_engagement_50), 1)              AS halfway_rate
    FROM daily_analytics
    WHERE video_view > 0
      AND DATE_TRUNC('month', date) < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY 1
    ORDER BY 1
""")

# Overall averages for scorecard
overall = query("""
    SELECT
        ROUND(AVG(engagement_score), 1)                 AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)             AS completion_rate,
        ROUND(AVG(video_engagement_50), 1)              AS halfway_rate,
        ROUND(AVG(video_percent_viewed), 1)             AS avg_percent_viewed
    FROM daily_analytics
    WHERE video_view > 0
""").iloc[0]

ENGAGEMENT = overall.to_dict()

# -- Chart: Monthly quality trends --
fig, ax = plt.subplots(figsize=FIGSIZE)

months_str = quality_monthly['month'].dt.strftime('%Y-%m').tolist()
ax.plot(months_str, quality_monthly['avg_engagement'], marker='o', linewidth=2,
        color=C['blue'], label='Engagement Score')
ax.plot(months_str, quality_monthly['completion_rate'], marker='s', linewidth=2,
        color=C['success'], label='Completion Rate (100%)')
ax.plot(months_str, quality_monthly['halfway_rate'], marker='^', linewidth=2,
        color=C['warn'], label='Halfway Rate (50%)')

ax.set_ylim(0, 100)
ax.set_xlabel('Month', fontsize=11)
ax.set_ylabel('Percentage (%)', fontsize=11)
ax.set_title('Engagement Quality Trends Over Time (complete months only)',
             fontsize=14, fontweight='bold')
ax.legend(loc='best', framealpha=0.9)
ax.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# -- Quality Assessment Table --
def rate(val, target_good, target_fair):
    if val >= target_good:
        return 'Good'
    elif val >= target_fair:
        return 'Fair'
    return 'Needs Improvement'

scorecard = pd.DataFrame([
    {'Metric': 'Engagement Score', 'Value': fmt_pct(overall['avg_engagement']),
     'Target': '>50%', 'Rating': rate(overall['avg_engagement'], 50, 35)},
    {'Metric': 'Completion Rate',  'Value': fmt_pct(overall['completion_rate']),
     'Target': '>40%', 'Rating': rate(overall['completion_rate'], 40, 25)},
    {'Metric': 'Halfway Rate',     'Value': fmt_pct(overall['halfway_rate']),
     'Target': '>55%', 'Rating': rate(overall['halfway_rate'], 55, 40)},
    {'Metric': 'Avg % Viewed',     'Value': fmt_pct(overall['avg_percent_viewed']),
     'Target': '>50%', 'Rating': rate(overall['avg_percent_viewed'], 50, 35)},
])

print("\nENGAGEMENT QUALITY SCORECARD")
print("=" * 66)
display(scorecard)

# Trend direction
if len(quality_monthly) >= 3:
    first_eng = quality_monthly['avg_engagement'].iloc[:3].mean()
    last_eng  = quality_monthly['avg_engagement'].iloc[-3:].mean()
    eng_trend = 'improving' if last_eng > first_eng + 1 else ('declining' if last_eng < first_eng - 1 else 'stable')
    ENGAGEMENT['trend'] = eng_trend
    print(f"\nEngagement trend (first 3 months avg vs last 3 months avg): {eng_trend}")
    print(f"  Early period: {first_eng:.1f}%  |  Recent period: {last_eng:.1f}%")

### Key Takeaway (Interview-Ready)

> Our engagement quality is rated **[rating]** overall: average engagement score of **[avg_engagement]**, completion rate of **[completion_rate]**, and halfway rate of **[halfway_rate]**. The trend is **[trend_direction]** over the measured period, suggesting **[improving content quality / opportunity for content optimization]**.

---
## Section 4: Engagement Funnel & Drop-off Analysis (The Big Win)

**Business Question:** Where exactly do viewers drop off, and what does that tell us about content quality?

This is the **most actionable finding** -- the engagement funnel reveals precisely where content loses viewers, enabling targeted improvements to content format and structure.

In [None]:
# ---------------------------------------------------------------------------
# 4a. Overall Engagement Funnel
# ---------------------------------------------------------------------------
# The video_engagement_X fields are raw view counts at each percentile point.
# Values can exceed video_view due to replays/rewinds (Brightcove counts every
# pass through a percentile). Some accounts (e.g. Internet) have autoplay/loop
# behavior that inflates these counts. We show two views:
#   1. All channels (raw, may exceed 100% due to replays)
#   2. Excluding outlier channels where replay inflation distorts the funnel

# --- Overall funnel (all channels) ---
funnel_all = query("""
    SELECT
        ROUND(SUM(video_engagement_1)   * 100.0 / NULLIF(SUM(video_view), 0), 1) AS started,
        ROUND(SUM(video_engagement_25)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_25,
        ROUND(SUM(video_engagement_50)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_50,
        ROUND(SUM(video_engagement_75)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_75,
        ROUND(SUM(video_engagement_100) * 100.0 / NULLIF(SUM(video_view), 0), 1) AS completed
    FROM daily_analytics
    WHERE video_view > 0
""").iloc[0]

# Identify channels with replay inflation (started > 100%)
replay_channels = query("""
    SELECT channel,
        ROUND(SUM(video_engagement_1) * 100.0 / NULLIF(SUM(video_view), 0), 1) AS started
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel
    HAVING SUM(video_engagement_1) * 100.0 / NULLIF(SUM(video_view), 0) > 100
""")
replay_ch_list = replay_channels['channel'].tolist()

# --- Funnel excluding replay-inflated channels ---
if len(replay_ch_list) > 0:
    placeholders = ', '.join(f"'{c}'" for c in replay_ch_list)
    funnel = query(f"""
        SELECT
            ROUND(SUM(video_engagement_1)   * 100.0 / NULLIF(SUM(video_view), 0), 1) AS started,
            ROUND(SUM(video_engagement_25)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_25,
            ROUND(SUM(video_engagement_50)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_50,
            ROUND(SUM(video_engagement_75)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_75,
            ROUND(SUM(video_engagement_100) * 100.0 / NULLIF(SUM(video_view), 0), 1) AS completed
        FROM daily_analytics
        WHERE video_view > 0
          AND channel NOT IN ({placeholders})
    """).iloc[0]
    print(f"NOTE: Excluding {replay_ch_list} from funnel (replay/autoplay inflation > 100%).")
    print(f"These channels are analyzed separately in the per-channel table below.\n")
else:
    funnel = funnel_all

stages = ['Started (1%)', 'Reached 25%', 'Reached 50%', 'Reached 75%', 'Completed (100%)']
values = [funnel['started'], funnel['reached_25'], funnel['reached_50'],
          funnel['reached_75'], funnel['completed']]

# Calculate drop-offs between stages
dropoffs = {
    '1% to 25%':   values[0] - values[1],
    '25% to 50%':  values[1] - values[2],
    '50% to 75%':  values[2] - values[3],
    '75% to 100%': values[3] - values[4],
}
biggest_drop_stage = max(dropoffs, key=dropoffs.get)
biggest_drop_val   = dropoffs[biggest_drop_stage]

FUNNEL = {
    'started': values[0], 'reached_25': values[1], 'reached_50': values[2],
    'reached_75': values[3], 'completed': values[4],
    'dropoffs': dropoffs,
    'biggest_drop_stage': biggest_drop_stage,
    'biggest_drop_val': biggest_drop_val,
    'excluded_channels': replay_ch_list,
}

# -- Chart: Funnel --
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

y_pos = range(len(stages) - 1, -1, -1)
bars = ax1.barh(y_pos, values, color=FUNNEL_COLORS, edgecolor='white', linewidth=1.5, height=0.6)
ax1.set_yticks(y_pos)
ax1.set_yticklabels(stages[::-1], fontsize=11)
ax1.set_xlabel('Percentage of Viewers (%)', fontsize=11)
ax1.set_xlim(0, max(values) * 1.2)
subtitle = ' (excl. replay-inflated channels)' if replay_ch_list else ''
ax1.set_title(f'Engagement Funnel{subtitle}', fontsize=14, fontweight='bold')

for bar, val in zip(bars, values[::-1]):
    ax1.text(val + 1, bar.get_y() + bar.get_height()/2,
             fmt_pct(val), va='center', fontsize=11, fontweight='bold')

# Right: drop-off between stages
drop_labels = list(dropoffs.keys())
drop_values = list(dropoffs.values())
drop_colors = [C['fail'] if v == biggest_drop_val else C['warn_light'] for v in drop_values]

drop_bars = ax2.barh(range(len(drop_labels) - 1, -1, -1), drop_values,
                     color=drop_colors, edgecolor='white', linewidth=1.5, height=0.6)
ax2.set_yticks(range(len(drop_labels) - 1, -1, -1))
ax2.set_yticklabels(drop_labels[::-1], fontsize=11)
ax2.set_xlabel('Drop-off (percentage points)', fontsize=11)
ax2.set_title('Drop-off Between Stages', fontsize=14, fontweight='bold')

for bar, val in zip(drop_bars, drop_values[::-1]):
    ax2.text(val + 0.3, bar.get_y() + bar.get_height()/2,
             f"{val:.1f} pp", va='center', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.show()

print(f"Biggest drop-off: {biggest_drop_stage} ({biggest_drop_val:.1f} percentage points)")
print(f"Overall completion rate: {fmt_pct(values[4])}")

In [None]:
# ---------------------------------------------------------------------------
# 4b. Funnel by Channel -- Which channels retain viewers best?
# ---------------------------------------------------------------------------
# Same ratio formula: SUM(engagement_count) / SUM(views) * 100
funnel_by_ch = query("""
    SELECT
        channel,
        ROUND(SUM(video_engagement_1)   * 100.0 / NULLIF(SUM(video_view), 0), 1) AS started,
        ROUND(SUM(video_engagement_25)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_25,
        ROUND(SUM(video_engagement_50)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_50,
        ROUND(SUM(video_engagement_75)  * 100.0 / NULLIF(SUM(video_view), 0), 1) AS reached_75,
        ROUND(SUM(video_engagement_100) * 100.0 / NULLIF(SUM(video_view), 0), 1) AS completed,
        SUM(video_view) AS total_views
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel
    ORDER BY completed DESC
""")

# Retention ratio: completed / started
funnel_by_ch['retention_ratio'] = (
    funnel_by_ch['completed'] / funnel_by_ch['started'].replace(0, np.nan) * 100
).round(1)

print("ENGAGEMENT FUNNEL BY CHANNEL")
print("=" * 80)
display(funnel_by_ch[['channel', 'started', 'reached_25', 'reached_50',
                       'reached_75', 'completed', 'retention_ratio', 'total_views']])

# Best and worst retention
best_ch = funnel_by_ch.iloc[0]['channel']
worst_ch = funnel_by_ch.iloc[-1]['channel']
FUNNEL['best_retention_channel'] = best_ch
FUNNEL['worst_retention_channel'] = worst_ch

print(f"\nBest retention: {best_ch} ({funnel_by_ch.iloc[0]['completed']:.1f}% completion)")
print(f"Lowest retention: {worst_ch} ({funnel_by_ch.iloc[-1]['completed']:.1f}% completion)")

### Key Takeaway (Interview-Ready)

> **THE BIG WIN:** I identified that the largest viewer drop-off occurs at the **[biggest_drop_stage]** mark, with **[biggest_drop_val]** percentage points lost. Only **[completion_rate]** of viewers complete videos. This led me to recommend stronger opening hooks, front-loading key messages in the first 25% of content, and establishing optimal duration guidelines. Channel **[best_channel]** showed the highest retention, which I used as a benchmark for content production standards across all accounts.

---
## Section 5: Content Strategy -- Duration Sweet Spot

**Business Question:** What video length drives the best engagement? What should we tell content producers?

In [None]:
# ---------------------------------------------------------------------------
# 5. Duration Sweet Spot Analysis
# ---------------------------------------------------------------------------
duration = query("""
    SELECT
        CASE
            WHEN video_duration <= 60   THEN '1. 0-1 min'
            WHEN video_duration <= 180  THEN '2. 1-3 min'
            WHEN video_duration <= 300  THEN '3. 3-5 min'
            WHEN video_duration <= 600  THEN '4. 5-10 min'
            WHEN video_duration <= 1200 THEN '5. 10-20 min'
            WHEN video_duration <= 1800 THEN '6. 20-30 min'
            ELSE '7. 30+ min'
        END AS duration_bucket,
        COUNT(DISTINCT video_id)                AS num_videos,
        SUM(video_view)                         AS total_views,
        ROUND(AVG(engagement_score), 1)         AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)     AS completion_rate,
        ROUND(AVG(video_engagement_50), 1)      AS halfway_rate
    FROM daily_analytics
    WHERE video_view > 0 AND video_duration > 0
    GROUP BY 1
    ORDER BY 1
""")

# Identify sweet spot
sweet_spot_idx  = duration['completion_rate'].idxmax()
sweet_spot      = duration.loc[sweet_spot_idx]
worst_idx       = duration['completion_rate'].idxmin()
worst_bucket    = duration.loc[worst_idx]
penalty         = sweet_spot['completion_rate'] - worst_bucket['completion_rate']

# Production vs performance mismatch
most_produced_idx = duration['num_videos'].idxmax()
most_produced     = duration.loc[most_produced_idx]
mismatch = most_produced['duration_bucket'] != sweet_spot['duration_bucket']

CONTENT = {
    'sweet_spot_bucket':     sweet_spot['duration_bucket'],
    'sweet_spot_completion': sweet_spot['completion_rate'],
    'sweet_spot_engagement': sweet_spot['avg_engagement'],
    'worst_bucket':          worst_bucket['duration_bucket'],
    'worst_completion':      worst_bucket['completion_rate'],
    'penalty_pp':            penalty,
    'most_produced_bucket':  most_produced['duration_bucket'],
    'production_mismatch':   mismatch,
}

# -- Charts --
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Completion rate by bucket
colors_comp = [C['success'] if i == sweet_spot_idx else C['blue_light'] for i in range(len(duration))]
axes[0].bar(range(len(duration)), duration['completion_rate'], color=colors_comp, edgecolor='white')
axes[0].set_xticks(range(len(duration)))
axes[0].set_xticklabels([b.split('. ')[1] for b in duration['duration_bucket']], rotation=45, ha='right')
axes[0].set_ylabel('Completion Rate (%)')
axes[0].set_title('Completion Rate by Duration', fontsize=12, fontweight='bold')
for i, v in enumerate(duration['completion_rate']):
    axes[0].text(i, v + 0.5, fmt_pct(v), ha='center', fontsize=8)

# Engagement score by bucket
colors_eng = [C['success'] if i == duration['avg_engagement'].idxmax() else C['purple_light']
              for i in range(len(duration))]
axes[1].bar(range(len(duration)), duration['avg_engagement'], color=colors_eng, edgecolor='white')
axes[1].set_xticks(range(len(duration)))
axes[1].set_xticklabels([b.split('. ')[1] for b in duration['duration_bucket']], rotation=45, ha='right')
axes[1].set_ylabel('Engagement Score (%)')
axes[1].set_title('Engagement Score by Duration', fontsize=12, fontweight='bold')
for i, v in enumerate(duration['avg_engagement']):
    axes[1].text(i, v + 0.5, fmt_pct(v), ha='center', fontsize=8)

# Views distribution (what are we producing?)
colors_prod = [C['warn'] if i == most_produced_idx else C['gray_light'] for i in range(len(duration))]
axes[2].bar(range(len(duration)), duration['num_videos'], color=colors_prod, edgecolor='white')
axes[2].set_xticks(range(len(duration)))
axes[2].set_xticklabels([b.split('. ')[1] for b in duration['duration_bucket']], rotation=45, ha='right')
axes[2].set_ylabel('Number of Videos')
axes[2].set_title('Production Volume by Duration', fontsize=12, fontweight='bold')
for i, v in enumerate(duration['num_videos']):
    axes[2].text(i, v + 0.5, fmt_num(v), ha='center', fontsize=8)

plt.tight_layout()
plt.show()

# Summary table
print("\nDURATION PERFORMANCE BREAKDOWN")
print("=" * 80)
display(duration)

print(f"\nSweet spot: {sweet_spot['duration_bucket']} ({fmt_pct(sweet_spot['completion_rate'])} completion)")
print(f"Worst performer: {worst_bucket['duration_bucket']} ({fmt_pct(worst_bucket['completion_rate'])} completion)")
print(f"Penalty for wrong duration: {penalty:.1f} percentage points")
if mismatch:
    print(f"\nPRODUCTION MISMATCH: We produce most videos in '{most_produced['duration_bucket']}' "
          f"but the sweet spot is '{sweet_spot['duration_bucket']}'")

### Key Takeaway (Interview-Ready)

> Analysis of all videos revealed a clear sweet spot at **[sweet_spot_bucket]** with a **[sweet_spot_completion]** completion rate -- **[penalty_pp]** percentage points higher than the worst-performing bucket (**[worst_bucket]**). I discovered a production-vs-performance mismatch: we produce the most videos at **[most_produced_bucket]**, but the data shows **[sweet_spot_bucket]** delivers the best engagement. This insight led to updated content guidelines for producers, recommending front-loading value and keeping content within the optimal range.

---
## Section 6: Top Performing Content & Content Gaps

**Business Question:** Which videos are our greatest hits? Which have high impressions but low play rates (thumbnail/title problems)?

In [None]:
# ---------------------------------------------------------------------------
# 6a. Top 20 Videos by Total Views
# ---------------------------------------------------------------------------
top_by_views = query("""
    SELECT
        channel,
        video_id,
        MAX(name) AS video_name,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1) AS completion_rate
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel, video_id
    ORDER BY total_views DESC
    LIMIT 20
""")

print("TOP 20 VIDEOS BY TOTAL VIEWS")
print("=" * 80)

# Horizontal bar chart
fig, ax = plt.subplots(figsize=(12, 8))
labels = [truncate_name(n) for n in top_by_views['video_name']]
y_pos = range(len(labels) - 1, -1, -1)

ax.barh(y_pos, top_by_views['total_views'], color=C['blue_light'], edgecolor=C['blue'], linewidth=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels[::-1], fontsize=8)
ax.set_xlabel('Total Views')
ax.set_title('Top 20 Videos by Total Views', fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))

for i, (views, eng) in enumerate(zip(top_by_views['total_views'][::-1],
                                      top_by_views['avg_engagement'][::-1])):
    ax.text(views + max(top_by_views['total_views']) * 0.01, i,
            f"{fmt_num(views)} | {fmt_pct(eng)} eng.", va='center', fontsize=7)

plt.tight_layout()
plt.show()

In [None]:
# ---------------------------------------------------------------------------
# 6b. Top 20 Videos by Engagement (min 100 views for significance)
# ---------------------------------------------------------------------------
top_by_eng = query("""
    SELECT
        channel,
        video_id,
        MAX(name) AS video_name,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1) AS completion_rate
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel, video_id
    HAVING SUM(video_view) >= 100
    ORDER BY avg_engagement DESC
    LIMIT 20
""")

print("TOP 20 VIDEOS BY ENGAGEMENT (min 100 views)")
print("=" * 80)

fig, ax = plt.subplots(figsize=(14, 8))
labels = [truncate_name(n, 40) for n in top_by_eng['video_name']]
y_pos = range(len(labels) - 1, -1, -1)

ax.barh(y_pos, top_by_eng['avg_engagement'], color=C['success_light'],
        edgecolor=C['success'], linewidth=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels[::-1], fontsize=8)
ax.set_xlabel('Engagement Score (%)')
ax.set_xlim(0, 110)
ax.set_title('Top 20 Videos by Engagement Score (min 100 views)', fontsize=14, fontweight='bold')

for i, (eng, views) in enumerate(zip(top_by_eng['avg_engagement'][::-1],
                                      top_by_eng['total_views'][::-1])):
    ax.text(eng + 1, i, f"{fmt_pct(eng)} | {fmt_num(views)} views", va='center', fontsize=7)

plt.tight_layout()
plt.show()

In [None]:
# ---------------------------------------------------------------------------
# 6c. Problem Videos & Hidden Gems
# ---------------------------------------------------------------------------

# Problem videos: high impressions but low play rate (thumbnail/title issue)
problem_videos = query("""
    SELECT
        channel,
        video_id,
        MAX(name) AS video_name,
        SUM(video_impression) AS impressions,
        SUM(video_view) AS views,
        ROUND(SUM(video_view) * 100.0 / NULLIF(SUM(video_impression), 0), 1) AS play_rate,
        ROUND(AVG(engagement_score), 1) AS avg_engagement
    FROM daily_analytics
    WHERE video_impression > 0
    GROUP BY channel, video_id
    HAVING SUM(video_impression) >= 200
       AND SUM(video_view) * 100.0 / NULLIF(SUM(video_impression), 0) < 30
    ORDER BY impressions DESC
    LIMIT 20
""")

print("PROBLEM VIDEOS: High Impressions, Low Play Rate (<30%)")
print("These videos are being shown but failing to convert -- thumbnail/title issue")
print("=" * 80)
if len(problem_videos) > 0:
    display(problem_videos)
else:
    print("No videos found matching these criteria.")

# Hidden gems: high engagement but low view count
hidden_gems = query("""
    SELECT
        channel,
        video_id,
        MAX(name) AS video_name,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1) AS completion_rate
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel, video_id
    HAVING AVG(engagement_score) > 70
       AND SUM(video_view) < 200
       AND SUM(video_view) >= 20
    ORDER BY avg_engagement DESC
    LIMIT 20
""")

print("\nHIDDEN GEMS: High Engagement (>70%), Low Views (<200)")
print("Underpromotion opportunity -- these resonate but are not being discovered")
print("=" * 80)
if len(hidden_gems) > 0:
    display(hidden_gems)
else:
    print("No hidden gems found matching these criteria (try adjusting thresholds).")

TOPVIDS = {
    'top_video_name':    top_by_views.iloc[0]['video_name'] if len(top_by_views) > 0 else 'N/A',
    'top_video_views':   top_by_views.iloc[0]['total_views'] if len(top_by_views) > 0 else 0,
    'problem_video_count':  len(problem_videos),
    'hidden_gem_count':     len(hidden_gems),
}

### Key Takeaway (Interview-Ready)

> I created a content performance intelligence framework analogous to Search Analytics success/relevance analysis. Our top video achieved **[top_video_views]** views. I identified **[problem_video_count]** "problem videos" with high impressions but play rates under 30% -- these have discoverability but fail to convert, indicating thumbnail or title issues. Conversely, I found **[hidden_gem_count]** "hidden gems" with engagement above 70% but fewer than 200 views -- underpromotion opportunities. This framework gave content producers data-driven feedback for the first time.

---
## Section 7: Channel/Account Performance Comparison

**Business Question:** How do our 11 accounts perform? Where should we invest or consolidate?

In [None]:
# ---------------------------------------------------------------------------
# 7a. Channel Performance Comparison
# ---------------------------------------------------------------------------
ch_perf = query("""
    SELECT
        channel,
        COUNT(DISTINCT video_id)                              AS num_videos,
        SUM(video_view)                                       AS total_views,
        ROUND(SUM(video_view) * 1.0 / NULLIF(COUNT(DISTINCT video_id), 0), 0) AS views_per_video,
        ROUND(AVG(engagement_score), 1)                       AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)                   AS completion_rate
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel
    ORDER BY total_views DESC
""")

print("CHANNEL PERFORMANCE COMPARISON")
print("=" * 80)
display(ch_perf)

# -- Chart: Views vs Engagement scatter --
fig, ax = plt.subplots(figsize=FIGSIZE)

# Medians for quadrant lines
med_views = ch_perf['total_views'].median()
med_eng   = ch_perf['avg_engagement'].median()

ax.axhline(y=med_eng, color=C['gray'], linestyle='--', alpha=0.5)
ax.axvline(x=med_views, color=C['gray'], linestyle='--', alpha=0.5)

scatter = ax.scatter(ch_perf['total_views'], ch_perf['avg_engagement'],
                     s=ch_perf['num_videos'] * 3, c=C['blue'], alpha=0.7, edgecolors=C['neutral'])

for _, row in ch_perf.iterrows():
    ax.annotate(row['channel'], (row['total_views'], row['avg_engagement']),
                textcoords='offset points', xytext=(8, 4), fontsize=8)

# Quadrant labels
ax.text(0.02, 0.98, 'OPPORTUNITIES\n(High Eng, Low Reach)', transform=ax.transAxes,
        fontsize=8, va='top', ha='left', color=C['warn'], fontstyle='italic')
ax.text(0.98, 0.98, 'STARS\n(High Eng, High Reach)', transform=ax.transAxes,
        fontsize=8, va='top', ha='right', color=C['success'], fontstyle='italic')
ax.text(0.02, 0.02, 'RECONSIDER\n(Low Eng, Low Reach)', transform=ax.transAxes,
        fontsize=8, va='bottom', ha='left', color=C['fail'], fontstyle='italic')
ax.text(0.98, 0.02, 'CASH COWS\n(Low Eng, High Reach)', transform=ax.transAxes,
        fontsize=8, va='bottom', ha='right', color=C['neutral'], fontstyle='italic')

ax.set_xlabel('Total Views', fontsize=11)
ax.set_ylabel('Avg Engagement Score (%)', fontsize=11)
ax.set_title('BCG-Style Channel Matrix: Reach vs. Engagement\n(bubble size = number of videos)',
             fontsize=13, fontweight='bold')
ax.xaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))

plt.tight_layout()
plt.show()

In [None]:
# ---------------------------------------------------------------------------
# 7b. BCG Classification & Consolidation Analysis
# ---------------------------------------------------------------------------
def classify_channel(row):
    high_reach = row['total_views'] >= med_views
    high_eng   = row['avg_engagement'] >= med_eng
    if high_reach and high_eng:
        return 'Star'
    elif not high_reach and high_eng:
        return 'Opportunity'
    elif high_reach and not high_eng:
        return 'Cash Cow'
    else:
        return 'Reconsider'

ch_perf['bcg_category'] = ch_perf.apply(classify_channel, axis=1)

print("BCG CHANNEL CLASSIFICATION")
print("=" * 66)
for cat in ['Star', 'Opportunity', 'Cash Cow', 'Reconsider']:
    channels_in_cat = ch_perf[ch_perf['bcg_category'] == cat]['channel'].tolist()
    print(f"  {cat:15} : {', '.join(channels_in_cat) if channels_in_cat else '(none)'}")

# Consolidation candidates
reconsider = ch_perf[ch_perf['bcg_category'] == 'Reconsider']
if len(reconsider) > 0:
    reconsider_views = reconsider['total_views'].sum()
    total_views_all  = ch_perf['total_views'].sum()
    reconsider_pct   = reconsider_views / total_views_all * 100 if total_views_all > 0 else 0
    print(f"\nConsolidation candidates represent {fmt_pct(reconsider_pct)} of total views")
    print(f"({fmt_num(reconsider_views)} out of {fmt_num(total_views_all)} views)")
else:
    reconsider_pct = 0

CHANNELS = {
    'total_channels':      len(ch_perf),
    'stars':               ch_perf[ch_perf['bcg_category'] == 'Star']['channel'].tolist(),
    'opportunities':       ch_perf[ch_perf['bcg_category'] == 'Opportunity']['channel'].tolist(),
    'cash_cows':           ch_perf[ch_perf['bcg_category'] == 'Cash Cow']['channel'].tolist(),
    'reconsider':          ch_perf[ch_perf['bcg_category'] == 'Reconsider']['channel'].tolist(),
    'best_engagement_ch':  ch_perf.loc[ch_perf['avg_engagement'].idxmax(), 'channel'],
    'best_reach_ch':       ch_perf.iloc[0]['channel'],
    'consolidation_pct':   reconsider_pct,
}

### Key Takeaway (Interview-Ready)

> Using a BCG-style matrix, I categorized **[total_channels]** Brightcove accounts by reach and engagement. **[stars]** emerged as Stars (invest more), **[opportunities]** as Opportunities (high engagement, promote more), **[cash_cows]** as Cash Cows (maintain), and **[reconsider]** as Reconsider candidates. The Reconsider channels accounted for only **[consolidation_pct]** of total views, supporting a consolidation recommendation that would reduce operational overhead while concentrating resources on high-performing channels.

---
## Section 8: Device & Platform Strategy

**Business Question:** How are employees watching videos? Is mobile growing?

In [None]:
# ---------------------------------------------------------------------------
# 8a. Device Breakdown
# ---------------------------------------------------------------------------
devices = query("""
    SELECT
        SUM(views_desktop)  AS desktop,
        SUM(views_mobile)   AS mobile,
        SUM(views_tablet)   AS tablet,
        SUM(views_other)    AS other_device
    FROM daily_analytics
""").iloc[0]

device_data = {k: int(v) for k, v in devices.items() if v and v > 0}
total_device = sum(device_data.values())
device_pcts  = {k: v / total_device * 100 for k, v in device_data.items()}

DEVICE['breakdown'] = device_pcts
DEVICE['total'] = total_device

# -- Charts: Pie + Bar --
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=FIGSIZE)

device_colors = [C['blue'], C['fail'], C['success'], C['gray']]
device_labels_map = {'desktop': 'Desktop', 'mobile': 'Mobile', 'tablet': 'Tablet', 'other_device': 'Other'}
labels_nice = [device_labels_map.get(k, k) for k in device_data.keys()]

ax1.pie(device_data.values(), labels=labels_nice, autopct='%1.1f%%',
        colors=device_colors[:len(device_data)], startangle=90, textprops={'fontsize': 10})
ax1.set_title('Views by Device Type', fontsize=13, fontweight='bold')

bars = ax2.bar(labels_nice, device_data.values(), color=device_colors[:len(device_data)],
               edgecolor='white')
ax2.set_ylabel('Total Views')
ax2.set_title('Total Views by Device', fontsize=13, fontweight='bold')
ax2.yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))
for bar, (k, v) in zip(bars, device_data.items()):
    ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + total_device * 0.01,
             f"{v/total_device*100:.1f}%", ha='center', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# ---------------------------------------------------------------------------
# 8b. Mobile Percentage Trend (exclude current incomplete month)
# ---------------------------------------------------------------------------
device_trend = query("""
    SELECT
        DATE_TRUNC('month', date) AS month,
        SUM(views_desktop) AS desktop,
        SUM(views_mobile)  AS mobile,
        SUM(views_tablet)  AS tablet,
        SUM(video_view)    AS total_views
    FROM daily_analytics
    WHERE video_view > 0
      AND DATE_TRUNC('month', date) < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY 1
    ORDER BY 1
""")

device_trend['mobile_pct'] = (device_trend['mobile'] / device_trend['total_views'] * 100).round(1)
device_trend['desktop_pct'] = (device_trend['desktop'] / device_trend['total_views'] * 100).round(1)

fig, ax = plt.subplots(figsize=FIGSIZE)

months_str = device_trend['month'].dt.strftime('%Y-%m').tolist()
ax.plot(months_str, device_trend['mobile_pct'], marker='o', linewidth=2, markersize=6,
        color=C['fail'], label='Mobile %')
ax.plot(months_str, device_trend['desktop_pct'], marker='s', linewidth=2, markersize=6,
        color=C['blue'], label='Desktop %')
ax.axhline(y=30, color=C['gray'], linestyle='--', alpha=0.6, label='30% Mobile Threshold')

ax.set_xlabel('Month', fontsize=11)
ax.set_ylabel('Percentage of Views (%)', fontsize=11)
ax.set_title('Device Usage Trend Over Time (complete months only)',
             fontsize=14, fontweight='bold')
ax.legend(loc='best', framealpha=0.9)
ax.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Mobile trend summary
if len(device_trend) >= 2:
    first_mobile = device_trend['mobile_pct'].iloc[0]
    last_mobile  = device_trend['mobile_pct'].iloc[-1]
    mobile_growth = last_mobile - first_mobile
    DEVICE['mobile_first'] = first_mobile
    DEVICE['mobile_last']  = last_mobile
    DEVICE['mobile_growth_pp'] = mobile_growth
    DEVICE['above_30_threshold'] = last_mobile > 30
    print(f"Mobile trend: {first_mobile:.1f}% -> {last_mobile:.1f}% ({mobile_growth:+.1f} pp)")
    if last_mobile > 30:
        print("Mobile exceeds 30% threshold -- mobile-first strategy recommended")

In [None]:
# ---------------------------------------------------------------------------
# 8c. Device Type vs Engagement
# ---------------------------------------------------------------------------
# We derive per-device engagement by comparing channels with high/low mobile share
device_eng = query("""
    SELECT
        channel,
        ROUND(SUM(views_mobile)*100.0 / NULLIF(SUM(video_view),0), 1) AS mobile_pct,
        ROUND(AVG(engagement_score), 1)         AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1)     AS completion_rate,
        SUM(video_view)                         AS total_views
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY channel
    HAVING SUM(video_view) >= 100
    ORDER BY mobile_pct DESC
""")

print("DEVICE MIX & ENGAGEMENT BY CHANNEL")
print("=" * 66)
display(device_eng)

### Key Takeaway (Interview-Ready)

> Desktop accounts for **[desktop_pct]** of views, mobile **[mobile_pct]**, and tablet **[tablet_pct]**. Mobile viewing **[grew/declined]** from **[mobile_first_pct]** to **[mobile_last_pct]** over the measurement period (**[mobile_growth_pp]** percentage points). **[If above 30%: This exceeds the 30% threshold, justifying a mobile-first content strategy including larger on-screen text, subtitles, and mobile-optimized thumbnails.]** **[If below 30%: Desktop-first remains appropriate, though the growth trend should be monitored quarterly.]**

---
## Section 9: Content Lifecycle & Freshness

**Business Question:** Which content is stale? What is the archival opportunity?

In [None]:
# ---------------------------------------------------------------------------
# 9a. Stale Content (not viewed in 180+ days)
# ---------------------------------------------------------------------------
stale = query("""
    SELECT
        channel,
        video_id,
        MAX(name)                                               AS video_name,
        MAX(dt_last_viewed)                                     AS last_viewed,
        SUM(video_view)                                         AS lifetime_views,
        MAX(created_at)::DATE                                   AS created_date,
        ROUND(MAX(video_duration) / 60.0, 1)                    AS duration_min,
        DATEDIFF('day', MAX(dt_last_viewed)::DATE, CURRENT_DATE) AS days_since_viewed
    FROM daily_analytics
    WHERE dt_last_viewed IS NOT NULL
    GROUP BY channel, video_id
    HAVING DATEDIFF('day', MAX(dt_last_viewed)::DATE, CURRENT_DATE) > 180
    ORDER BY lifetime_views DESC
""")

print(f"STALE CONTENT: {len(stale)} videos not viewed in 180+ days")
print("=" * 80)
if len(stale) > 0:
    display(stale.head(20))
    total_stale_hours = stale['duration_min'].sum() / 60
    print(f"\nTotal stale video duration: {total_stale_hours:,.1f} hours of content")
    print(f"Total lifetime views of stale content: {fmt_num(stale['lifetime_views'].sum())}")
else:
    print("No stale content found (all videos viewed within 180 days).")
    total_stale_hours = 0

In [None]:
# ---------------------------------------------------------------------------
# 9b. Content Age Distribution
# ---------------------------------------------------------------------------
# First aggregate per video, then bucket by age (avoids aggregate in GROUP BY)
age_dist = query("""
    WITH video_summary AS (
        SELECT
            video_id,
            MAX(created_at)::DATE                                   AS created_date,
            SUM(video_view)                                         AS total_views,
            AVG(engagement_score)                                   AS avg_engagement
        FROM daily_analytics
        WHERE created_at IS NOT NULL AND video_view > 0
        GROUP BY video_id
    )
    SELECT
        CASE
            WHEN DATEDIFF('day', created_date, CURRENT_DATE) <= 90  THEN '1. <3 months'
            WHEN DATEDIFF('day', created_date, CURRENT_DATE) <= 180 THEN '2. 3-6 months'
            WHEN DATEDIFF('day', created_date, CURRENT_DATE) <= 365 THEN '3. 6-12 months'
            WHEN DATEDIFF('day', created_date, CURRENT_DATE) <= 730 THEN '4. 1-2 years'
            ELSE '5. 2+ years'
        END AS content_age,
        COUNT(DISTINCT video_id) AS num_videos,
        SUM(total_views)         AS total_views,
        ROUND(AVG(avg_engagement), 1) AS avg_engagement
    FROM video_summary
    GROUP BY 1
    ORDER BY 1
""")

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=FIGSIZE)

age_labels = [a.split('. ')[1] for a in age_dist['content_age']]

ax1.bar(age_labels, age_dist['num_videos'], color=C['blue_light'], edgecolor=C['blue'])
ax1.set_ylabel('Number of Videos')
ax1.set_title('Content Age Distribution', fontsize=12, fontweight='bold')
ax1.tick_params(axis='x', rotation=45)
for i, v in enumerate(age_dist['num_videos']):
    ax1.text(i, v + max(age_dist['num_videos']) * 0.02, fmt_num(v), ha='center', fontsize=9)

ax2.bar(age_labels, age_dist['avg_engagement'], color=C['purple_light'], edgecolor=C['purple'])
ax2.set_ylabel('Avg Engagement Score (%)')
ax2.set_title('Engagement by Content Age', fontsize=12, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)
for i, v in enumerate(age_dist['avg_engagement']):
    ax2.text(i, v + 0.5, fmt_pct(v), ha='center', fontsize=9)

plt.tight_layout()
plt.show()

print("CONTENT AGE DISTRIBUTION")
print("=" * 66)
display(age_dist)

In [None]:
# ---------------------------------------------------------------------------
# 9c. Recently Created: High Performers vs Underperformers
# ---------------------------------------------------------------------------
recent_created = query("""
    SELECT
        channel,
        video_id,
        MAX(name) AS video_name,
        MAX(created_at)::DATE AS created_date,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1) AS completion_rate
    FROM daily_analytics
    WHERE created_at IS NOT NULL
      AND DATEDIFF('day', created_at::DATE, CURRENT_DATE) <= 90
      AND video_view > 0
    GROUP BY channel, video_id
    ORDER BY total_views DESC
""")

if len(recent_created) > 0:
    median_views = recent_created['total_views'].median()
    high_perf = recent_created[recent_created['total_views'] > median_views * 2].head(10)
    low_perf  = recent_created[recent_created['total_views'] < median_views * 0.5].tail(10)

    print("RECENTLY CREATED HIGH PERFORMERS (last 90 days, above 2x median views)")
    print("=" * 80)
    display(high_perf[['channel', 'video_name', 'total_views', 'avg_engagement', 'completion_rate']])

    print(f"\nRECENTLY CREATED UNDERPERFORMERS (last 90 days, below 0.5x median views)")
    print("=" * 80)
    display(low_perf[['channel', 'video_name', 'total_views', 'avg_engagement', 'completion_rate']])
else:
    print("No recently created content found in the last 90 days.")

LIFECYCLE = {
    'stale_count':       len(stale),
    'stale_hours':       total_stale_hours,
    'stale_lifetime_views': stale['lifetime_views'].sum() if len(stale) > 0 else 0,
    'recent_count':      len(recent_created),
}

### Key Takeaway (Interview-Ready)

> I identified **[stale_count]** videos not viewed in over 180 days, representing **[stale_hours]** hours of stored content. These had **[stale_lifetime_views]** lifetime views, indicating they were once valuable but are now candidates for archival. Archiving stale content reduces storage costs and improves content discoverability for active videos. I also analyzed recently created content to identify early high performers vs underperformers, enabling faster feedback loops for content producers.

---
## Section 10: Regional & Temporal Patterns

**Business Question:** When and where are employees watching? What does that tell us about our global audience?

In [None]:
# ---------------------------------------------------------------------------
# 10a. Views by Country / Region
# ---------------------------------------------------------------------------
regional = query("""
    SELECT
        COALESCE(country, 'Not Specified') AS country,
        SUM(video_view) AS total_views,
        COUNT(DISTINCT video_id) AS unique_videos,
        ROUND(AVG(engagement_score), 1) AS avg_engagement,
        ROUND(AVG(video_engagement_100), 1) AS completion_rate
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY 1
    HAVING SUM(video_view) >= 50
    ORDER BY total_views DESC
    LIMIT 15
""")

if len(regional) > 1 and regional.iloc[0]['country'] != 'Not Specified':
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

    top_n = min(12, len(regional))
    reg_top = regional.head(top_n)

    ax1.barh(range(top_n - 1, -1, -1), reg_top['total_views'],
             color=C['blue_light'], edgecolor=C['blue'])
    ax1.set_yticks(range(top_n - 1, -1, -1))
    ax1.set_yticklabels(reg_top['country'][::-1], fontsize=9)
    ax1.set_xlabel('Total Views')
    ax1.set_title('Views by Country/Region', fontsize=13, fontweight='bold')
    ax1.xaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))

    ax2.barh(range(top_n - 1, -1, -1), reg_top['avg_engagement'],
             color=C['success_light'], edgecolor=C['success'])
    ax2.set_yticks(range(top_n - 1, -1, -1))
    ax2.set_yticklabels(reg_top['country'][::-1], fontsize=9)
    ax2.set_xlabel('Avg Engagement Score (%)')
    ax2.set_title('Engagement by Country/Region', fontsize=13, fontweight='bold')
    ax2.set_xlim(0, 100)

    plt.tight_layout()
    plt.show()

    REGIONAL['top_country'] = regional.iloc[0]['country']
    REGIONAL['top_country_views'] = regional.iloc[0]['total_views']
    REGIONAL['num_countries'] = len(regional)
else:
    print("Regional data not populated or single region -- skipping geographic chart.")
    REGIONAL['top_country'] = 'N/A'
    REGIONAL['num_countries'] = 0

print("\nREGIONAL PERFORMANCE")
print("=" * 66)
display(regional)

In [None]:
# ---------------------------------------------------------------------------
# 10b. Day-of-Week Patterns
# ---------------------------------------------------------------------------
dow = query("""
    SELECT
        DAYOFWEEK(date) AS dow_num,
        CASE DAYOFWEEK(date)
            WHEN 0 THEN 'Sunday'
            WHEN 1 THEN 'Monday'
            WHEN 2 THEN 'Tuesday'
            WHEN 3 THEN 'Wednesday'
            WHEN 4 THEN 'Thursday'
            WHEN 5 THEN 'Friday'
            WHEN 6 THEN 'Saturday'
        END AS day_name,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY 1, 2
    ORDER BY 1
""")

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=FIGSIZE)

# Color weekdays vs weekends differently
colors_dow = [C['gray_light'] if d in (0, 6) else C['blue_light'] for d in dow['dow_num']]

ax1.bar(dow['day_name'], dow['total_views'], color=colors_dow, edgecolor=C['neutral'], linewidth=0.5)
ax1.set_ylabel('Total Views')
ax1.set_title('Views by Day of Week', fontsize=12, fontweight='bold')
ax1.tick_params(axis='x', rotation=45)
ax1.yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))

ax2.bar(dow['day_name'], dow['avg_engagement'], color=colors_dow, edgecolor=C['neutral'], linewidth=0.5)
ax2.set_ylabel('Avg Engagement Score (%)')
ax2.set_title('Engagement by Day of Week', fontsize=12, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Weekday vs weekend
weekday_views = dow[~dow['dow_num'].isin([0, 6])]['total_views'].sum()
weekend_views = dow[dow['dow_num'].isin([0, 6])]['total_views'].sum()
total_all = weekday_views + weekend_views
weekday_pct = weekday_views / total_all * 100 if total_all > 0 else 0

peak_day = dow.loc[dow['total_views'].idxmax(), 'day_name']

REGIONAL['peak_day'] = peak_day
REGIONAL['weekday_pct'] = weekday_pct

print(f"Weekday vs Weekend: {weekday_pct:.1f}% weekday / {100 - weekday_pct:.1f}% weekend")
print(f"Peak viewing day: {peak_day}")

In [None]:
# ---------------------------------------------------------------------------
# 10c. Monthly Seasonality Patterns
# ---------------------------------------------------------------------------
seasonality = query("""
    SELECT
        EXTRACT(MONTH FROM date) AS month_num,
        CASE EXTRACT(MONTH FROM date)
            WHEN 1 THEN 'Jan' WHEN 2 THEN 'Feb' WHEN 3 THEN 'Mar'
            WHEN 4 THEN 'Apr' WHEN 5 THEN 'May' WHEN 6 THEN 'Jun'
            WHEN 7 THEN 'Jul' WHEN 8 THEN 'Aug' WHEN 9 THEN 'Sep'
            WHEN 10 THEN 'Oct' WHEN 11 THEN 'Nov' WHEN 12 THEN 'Dec'
        END AS month_name,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY 1, 2
    ORDER BY 1
""")

if len(seasonality) > 3:
    fig, ax = plt.subplots(figsize=FIGSIZE)
    ax.bar(seasonality['month_name'], seasonality['total_views'],
           color=C['blue_light'], edgecolor=C['blue'])
    ax.set_ylabel('Total Views')
    ax.set_title('Seasonal Viewing Patterns (Aggregated Across Years)', fontsize=13, fontweight='bold')
    ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda v, _: fmt_num(v)))
    for i, v in enumerate(seasonality['total_views']):
        ax.text(i, v + max(seasonality['total_views']) * 0.02, fmt_num(v),
                ha='center', fontsize=8, rotation=45)
    plt.tight_layout()
    plt.show()

    peak_month = seasonality.loc[seasonality['total_views'].idxmax(), 'month_name']
    low_month  = seasonality.loc[seasonality['total_views'].idxmin(), 'month_name']
    REGIONAL['peak_month'] = peak_month
    REGIONAL['low_month'] = low_month
    print(f"Peak month: {peak_month}")
    print(f"Lowest month: {low_month}")
else:
    REGIONAL['peak_month'] = 'N/A'
    REGIONAL['low_month'] = 'N/A'
    print("Insufficient data for seasonality analysis.")

In [None]:
# ---------------------------------------------------------------------------
# 10d. Channel Performance Variation by Region (top 5 countries x channels)
# ---------------------------------------------------------------------------
ch_region = query("""
    SELECT
        COALESCE(country, 'Not Specified') AS country,
        channel,
        SUM(video_view) AS total_views,
        ROUND(AVG(engagement_score), 1) AS avg_engagement
    FROM daily_analytics
    WHERE video_view > 0
    GROUP BY 1, 2
    HAVING SUM(video_view) >= 50
    ORDER BY total_views DESC
    LIMIT 30
""")

if len(ch_region) > 0 and ch_region.iloc[0]['country'] != 'Not Specified':
    print("CHANNEL PERFORMANCE BY REGION (top combinations)")
    print("=" * 66)
    display(ch_region.head(20))
else:
    print("Regional x channel cross-analysis not available (country not populated).")

### Key Takeaway (Interview-Ready)

> Viewing patterns show **[weekday_pct]** of consumption on business days, with **[peak_day]** as the peak viewing day. Seasonality analysis reveals **[peak_month]** as the highest-volume month and **[low_month]** as the lowest -- useful for content release planning and campaign timing. **[If regional data: The top viewing region is [top_country], and engagement varies by [X] percentage points across regions, suggesting potential for localized content strategies.]**

---
## Section 11: Interview Cheat Sheet

All key numbers compiled from the sections above, plus 7 STAR talking points.

In [None]:
# ===================================================================
#  INTERVIEW CHEAT SHEET -- Compiled Reference Card
# ===================================================================

sep  = "=" * 72
dash = "-" * 72

print(sep)
print("        VIDEO ANALYTICS -- INTERVIEW CHEAT SHEET")
print(sep)

# --- SCOPE ---
print(f"\n  SCOPE")
print(dash)
print(f"  Period:             {EXEC.get('period_start', 'N/A')} to {EXEC.get('period_end', 'N/A')} ({int(EXEC.get('period_days', 0))} days)")
print(f"  Total Videos:       {fmt_num(EXEC.get('total_videos', 0))}")
print(f"  Total Channels:     {fmt_num(EXEC.get('total_channels', 0))} (across 4 categories)")
print(f"  Total Views:        {fmt_num(EXEC.get('total_views', 0))}")
print(f"  Total Watch Hours:  {fmt_num(EXEC.get('total_watch_hours', 0))}")
print(f"  Total Impressions:  {fmt_num(EXEC.get('total_impressions', 0))}")

# --- QUALITY METRICS ---
print(f"\n  QUALITY METRICS")
print(dash)
print(f"  Engagement Score:   {fmt_pct(ENGAGEMENT.get('avg_engagement', 0))}")
print(f"  Completion Rate:    {fmt_pct(ENGAGEMENT.get('completion_rate', 0))}")
print(f"  Halfway Rate:       {fmt_pct(ENGAGEMENT.get('halfway_rate', 0))}")
print(f"  Play Rate:          {fmt_pct(EXEC.get('play_rate', 0))}  (views / impressions)")
print(f"  Trend:              {ENGAGEMENT.get('trend', 'N/A')}")

# --- ENGAGEMENT FUNNEL ---
print(f"\n  ENGAGEMENT FUNNEL")
print(dash)
print(f"  Started (1%):       {fmt_pct(FUNNEL.get('started', 0))}")
print(f"  Reached 25%:        {fmt_pct(FUNNEL.get('reached_25', 0))}")
print(f"  Reached 50%:        {fmt_pct(FUNNEL.get('reached_50', 0))}")
print(f"  Reached 75%:        {fmt_pct(FUNNEL.get('reached_75', 0))}")
print(f"  Completed (100%):   {fmt_pct(FUNNEL.get('completed', 0))}")
print(f"  Biggest Drop-off:   {FUNNEL.get('biggest_drop_stage', 'N/A')} ({FUNNEL.get('biggest_drop_val', 0):.1f} pp)")
print(f"  Best Retention:     {FUNNEL.get('best_retention_channel', 'N/A')}")

# --- CONTENT STRATEGY ---
print(f"\n  CONTENT STRATEGY")
print(dash)
print(f"  Optimal Duration:   {CONTENT.get('sweet_spot_bucket', 'N/A')}")
print(f"  Sweet Spot Compl.:  {fmt_pct(CONTENT.get('sweet_spot_completion', 0))}")
print(f"  Worst Duration:     {CONTENT.get('worst_bucket', 'N/A')} ({fmt_pct(CONTENT.get('worst_completion', 0))})")
print(f"  Duration Penalty:   {CONTENT.get('penalty_pp', 0):.1f} pp")
print(f"  Production Mismatch: {'Yes' if CONTENT.get('production_mismatch') else 'No'}")

# --- CHANNEL PERFORMANCE ---
print(f"\n  CHANNEL PERFORMANCE")
print(dash)
print(f"  Stars:              {', '.join(CHANNELS.get('stars', ['N/A']))}")
print(f"  Opportunities:      {', '.join(CHANNELS.get('opportunities', ['N/A']))}")
print(f"  Cash Cows:          {', '.join(CHANNELS.get('cash_cows', ['N/A']))}")
print(f"  Reconsider:         {', '.join(CHANNELS.get('reconsider', ['N/A']))}")
print(f"  Consolidation Opp.: {fmt_pct(CHANNELS.get('consolidation_pct', 0))} of views")

# --- DEVICE SPLIT ---
print(f"\n  DEVICE SPLIT")
print(dash)
for device, pct in DEVICE.get('breakdown', {}).items():
    device_labels_map = {'desktop': 'Desktop', 'mobile': 'Mobile', 'tablet': 'Tablet', 'other_device': 'Other'}
    print(f"  {device_labels_map.get(device, device):18} {pct:.1f}%")
print(f"  Mobile Trend:       {DEVICE.get('mobile_first', 0):.1f}% -> {DEVICE.get('mobile_last', 0):.1f}% ({DEVICE.get('mobile_growth_pp', 0):+.1f} pp)")
print(f"  Above 30% Thresh:   {'Yes' if DEVICE.get('above_30_threshold') else 'No'}")

# --- CONTENT LIFECYCLE ---
print(f"\n  CONTENT LIFECYCLE")
print(dash)
print(f"  Stale Videos (180d): {fmt_num(LIFECYCLE.get('stale_count', 0))}")
print(f"  Stale Hours:         {LIFECYCLE.get('stale_hours', 0):,.1f} hours")
print(f"  Recently Created:    {fmt_num(LIFECYCLE.get('recent_count', 0))} (last 90 days)")

# --- AUDIENCE ---
print(f"\n  AUDIENCE")
print(dash)
print(f"  Top Region:         {REGIONAL.get('top_country', 'N/A')}")
print(f"  Regions Tracked:    {REGIONAL.get('num_countries', 0)}")
print(f"  Peak Day:           {REGIONAL.get('peak_day', 'N/A')}")
print(f"  Weekday Viewing:    {REGIONAL.get('weekday_pct', 0):.1f}%")
print(f"  Peak Month:         {REGIONAL.get('peak_month', 'N/A')}")
print(f"  Low Month:          {REGIONAL.get('low_month', 'N/A')}")

print(f"\n{sep}")

### 7 STAR Talking Points

---

#### 1. Building the Analytics Pipeline (Technical Leadership)

**Situation:** The organization had 11 separate Brightcove video accounts across 4 business categories (internet/intranet, research, global wealth management, events) with no unified view of video performance. Each account was managed independently, making it impossible to benchmark or identify cross-cutting insights.

**Task:** Build an automated ETL pipeline to consolidate all 11 accounts into a single analytics platform with daily granularity.

**Action:** I designed and implemented a Python-based pipeline using the Brightcove Analytics API, storing data in DuckDB for fast analytical queries. The pipeline handles CMS metadata enrichment, incremental daily updates with overlap for data corrections, and video lifecycle tracking (dt_last_viewed). I also created executive dashboards for non-technical stakeholders.

**Result:** Unified **[total_videos]** videos across **[total_channels]** channels into a single analytics platform tracking **[total_views]** views and **[total_watch_hours]** watch hours. For the first time, the organization had a cross-account view of video performance, enabling data-driven content strategy.

---

#### 2. Engagement Funnel Optimization -- THE BIG WIN (Business Impact)

**Situation:** Content producers were creating videos without understanding how viewers consumed them. There was no data on where viewers dropped off or why engagement varied across channels.

**Task:** Identify specific viewer drop-off patterns and translate them into actionable content production guidelines.

**Action:** I built an engagement funnel analysis tracking viewers through 5 milestones (1%, 25%, 50%, 75%, 100%). I discovered the biggest drop-off occurred at the **[biggest_drop_stage]** mark (**[biggest_drop_val]** percentage points). I segmented this by channel to identify which channels retained viewers best (**[best_retention_channel]**), then analyzed what those channels did differently.

**Result:** This analysis led to three concrete recommendations: (1) stronger opening hooks in the first 15 seconds, (2) front-loading key messages before the **[biggest_drop_stage]** mark, and (3) establishing optimal duration guidelines based on engagement data. The overall completion rate benchmark was set at **[completion_rate]** with a target of improving by **[X]** percentage points.

---

#### 3. Optimal Video Duration (Data-Driven Content Strategy)

**Situation:** Content producers had no guidance on ideal video length. Videos ranged from under 1 minute to over 30 minutes with no data-driven rationale for duration choices.

**Task:** Determine the optimal video duration that maximizes viewer engagement and completion.

**Action:** I analyzed all videos across 7 duration buckets, measuring both completion rate and engagement score for each. I also compared what we were producing most (production volume) against what performed best (engagement).

**Result:** Discovered a clear sweet spot at **[sweet_spot_bucket]** with **[sweet_spot_completion]** completion -- **[penalty_pp]** percentage points higher than **[worst_bucket]**. I also identified a production-vs-performance mismatch: we produced the most content at **[most_produced_bucket]** but the sweet spot was **[sweet_spot_bucket]**. This led to updated content guidelines and better resource allocation.

---

#### 4. Channel Rationalization (Cost Optimization)

**Situation:** The organization operated 11 Brightcove accounts, each with its own license, administration overhead, and content management workflow. There was no framework for evaluating which accounts delivered value.

**Task:** Develop a data-driven framework to evaluate channel performance and identify consolidation opportunities.

**Action:** I created a BCG-style matrix categorizing channels by reach (views) and engagement, classifying each as Stars (invest), Opportunities (promote), Cash Cows (maintain), or Reconsider (consolidate). I quantified the view share of each quadrant.

**Result:** Identified **[reconsider_count]** channels in the "Reconsider" quadrant, representing only **[consolidation_pct]** of total views. Stars included **[stars]**, providing clear investment priorities. This framework supported the business case for account consolidation, potentially reducing licensing and administrative overhead.

---

#### 5. Mobile Strategy (Trend Identification)

**Situation:** Content was produced primarily for desktop consumption, but there was no data on how viewing habits were evolving across devices.

**Task:** Quantify the device distribution and identify trends to inform platform investment decisions.

**Action:** I tracked device-level viewing data (desktop, mobile, tablet) monthly, establishing a 30% mobile threshold as the trigger for a mobile-first content strategy. I correlated device type with engagement metrics to understand whether mobile viewers engaged differently.

**Result:** Mobile viewing **[grew/declined]** from **[mobile_first_pct]** to **[mobile_last_pct]** (**[mobile_growth_pp]** pp). **[If above 30%: This exceeded the 30% threshold, justifying investment in mobile optimization: larger on-screen text, subtitles by default, and mobile-friendly thumbnails.]** **[If below 30%: While desktop-first remained appropriate, the growth trajectory informed planning for mobile investment within [X] months.]**

---

#### 6. Content Lifecycle Management (Operational Efficiency)

**Situation:** The video library continued to grow with no systematic process for identifying or archiving stale content. This increased storage costs and made it harder for employees to find relevant, current content.

**Task:** Develop a content freshness framework and identify archival candidates.

**Action:** I tracked the `dt_last_viewed` field for every video and flagged content not viewed in 180+ days. I also analyzed content age distribution and compared recently created high performers vs underperformers to understand content vitality.

**Result:** Identified **[stale_count]** stale videos representing **[stale_hours]** hours of stored content with **[stale_lifetime_views]** lifetime views. Recommended an archival process (excluding compliance materials) to reduce storage costs and improve content discoverability. Also established early-warning metrics for new content performance.

---

#### 7. Content Performance Intelligence (Stakeholder Value)

**Situation:** Content producers had no feedback loop -- they published videos and never learned which resonated, which failed, or why. There was no equivalent of a "greatest hits" analysis or problem diagnosis.

**Task:** Create a content performance framework that gives producers actionable intelligence, analogous to how Search Analytics identifies success stories, relevance problems, and content gaps.

**Action:** I built three performance lenses: (1) "Greatest Hits" -- top videos by views and engagement for replication; (2) "Problem Videos" -- high impressions but low play rate (<30%), indicating thumbnail/title issues; (3) "Hidden Gems" -- high engagement (>70%) but low views (<200), indicating underpromotion opportunities.

**Result:** Identified **[problem_video_count]** problem videos that were being shown but failing to convert (thumbnail/title issue), and **[hidden_gem_count]** hidden gems with strong engagement that deserved more promotion. This framework gave content producers data-driven feedback for the first time, shifting the organization from intuition-based to evidence-based content strategy.

---
## Cleanup

In [None]:
# ---------------------------------------------------------------------------
# Close database connection
# ---------------------------------------------------------------------------
conn.close()
print("Database connection closed.")
print(f"Notebook completed: {datetime.now().strftime('%Y-%m-%d %H:%M')}")