# Business Impact Discovery Notebook
## Find Real Examples for Interview Responses

**Purpose:** Discover concrete business impact examples in your video analytics data that demonstrate strategic value and decision-making influence.

**Use Case:** Prepare STAR-format interview responses with real quantifiable results from your organization.

---

## How to Use This Notebook

1. **Run Section 0** - Setup and verify data connection
2. **Run Sections 1-10** - Each section discovers a different business impact pattern
3. **Review the findings** - Look for interesting patterns in your data
4. **Note the metrics** - Copy specific numbers for your interview responses
5. **Customize queries** - Modify date ranges or filters to focus on specific periods

---

## Section 0: Setup & Data Verification

In [None]:
# Setup
import duckdb
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns

# Settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:,.2f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

# Connect to DuckDB
DB_PATH = Path('../output/analytics.duckdb')
if not DB_PATH.exists():
    raise FileNotFoundError(f"Database not found at {DB_PATH}. Run the pipeline first.")

conn = duckdb.connect(str(DB_PATH), read_only=True)
print(f"‚úì Connected to: {DB_PATH}")
print(f"‚úì Database size: {DB_PATH.stat().st_size / (1024*1024):.1f} MB")

# Data coverage check
coverage = conn.execute("""
    SELECT 
        MIN(date) as earliest_date,
        MAX(date) as latest_date,
        COUNT(DISTINCT video_id) as total_videos,
        COUNT(DISTINCT channel) as total_channels,
        SUM(video_view) as total_views,
        COUNT(*) as total_records
    FROM daily_analytics
""").fetchdf()

print("\nüìä DATA COVERAGE:")
print(f"   Date Range: {coverage['earliest_date'].iloc[0]} to {coverage['latest_date'].iloc[0]}")
print(f"   Total Videos: {coverage['total_videos'].iloc[0]:,}")
print(f"   Total Channels: {coverage['total_channels'].iloc[0]:,}")
print(f"   Total Views: {coverage['total_views'].iloc[0]:,}")
print(f"   Total Records: {coverage['total_records'].iloc[0]:,}")
print("\n‚úì Ready to discover business impact patterns!")

---
## Section 1: Executive Communication Effectiveness
**Business Question:** Which executive communications drive highest engagement? Are there performance differences by channel?

In [None]:
# Executive content analysis
exec_comms_query = """
SELECT 
    channel,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    ROUND(AVG(video_engagement_100), 1) as avg_completion_rate,
    ROUND(AVG(video_duration) / 60.0, 1) as avg_duration_minutes,
    ROUND(AVG(video_percent_viewed), 1) as avg_percent_watched
FROM daily_analytics
WHERE video_view > 0
GROUP BY channel
ORDER BY total_views DESC
"""

exec_comms = conn.execute(exec_comms_query).fetchdf()

print("\n" + "="*80)
print("  EXECUTIVE COMMUNICATION PERFORMANCE BY CHANNEL")
print("="*80)
display(exec_comms)

# Calculate key insights
if len(exec_comms) > 0:
    top_channel = exec_comms.iloc[0]
    high_eng_channel = exec_comms.loc[exec_comms['avg_engagement'].idxmax()]
    
    print("\nüí° KEY INSIGHTS:")
    print(f"   Highest Reach: '{top_channel['channel']}' with {top_channel['total_views']:,.0f} views")
    print(f"   Highest Engagement: '{high_eng_channel['channel']}' with {high_eng_channel['avg_engagement']:.1f}% engagement")
    
    if high_eng_channel['channel'] != top_channel['channel']:
        print(f"\n   ‚ö†Ô∏è  OPPORTUNITY: '{high_eng_channel['channel']}' has high engagement but lower reach")
        print(f"       Consider increasing promotion or content volume for this channel")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
exec_comms.plot(x='channel', y='total_views', kind='barh', ax=axes[0], legend=False, color='steelblue')
axes[0].set_title('Total Views by Channel', fontweight='bold')
axes[0].set_xlabel('Total Views')

exec_comms.plot(x='channel', y='avg_engagement', kind='barh', ax=axes[1], legend=False, color='coral')
axes[1].set_title('Average Engagement by Channel', fontweight='bold')
axes[1].set_xlabel('Engagement Score (%)')
axes[1].set_xlim(0, 100)

plt.tight_layout()
plt.show()

---
## Section 2: Compliance Training Effectiveness
**Business Question:** Do shorter training videos lead to higher completion rates?

In [None]:
# Training/compliance video analysis by duration
training_query = """
SELECT
    CASE
        WHEN video_duration <= 300 THEN '1. Under 5 min'
        WHEN video_duration <= 600 THEN '2. 5-10 min'
        WHEN video_duration <= 900 THEN '3. 10-15 min'
        WHEN video_duration <= 1200 THEN '4. 15-20 min'
        ELSE '5. Over 20 min'
    END as duration_category,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(AVG(video_engagement_100), 1) as avg_completion_rate,
    ROUND(AVG(video_engagement_50), 1) as reached_halfway,
    ROUND(AVG(video_percent_viewed), 1) as avg_percent_watched
FROM daily_analytics
WHERE video_view > 0 AND video_duration > 0
GROUP BY 1
ORDER BY 1
"""

training_analysis = conn.execute(training_query).fetchdf()

print("\n" + "="*80)
print("  TRAINING VIDEO COMPLETION BY DURATION")
print("="*80)
display(training_analysis)

# Calculate ROI insight
if len(training_analysis) > 1:
    short_videos = training_analysis[training_analysis['duration_category'] == '1. Under 5 min']
    long_videos = training_analysis[training_analysis['duration_category'] == '5. Over 20 min']
    
    if len(short_videos) > 0 and len(long_videos) > 0:
        short_completion = short_videos['avg_completion_rate'].iloc[0]
        long_completion = long_videos['avg_completion_rate'].iloc[0]
        diff = short_completion - long_completion
        pct_improvement = (diff / long_completion * 100) if long_completion > 0 else 0
        
        print(f"\nüí° KEY INSIGHT:")
        print(f"   Short videos (<5 min): {short_completion:.1f}% completion rate")
        print(f"   Long videos (>20 min): {long_completion:.1f}% completion rate")
        print(f"   Difference: {diff:.1f} percentage points ({pct_improvement:.0f}% improvement)")
        print(f"\n   üìå INTERVIEW TALKING POINT:")
        print(f"      'Breaking training into <5 min modules increased completion by {pct_improvement:.0f}%,'")
        print(f"       reducing compliance risk and follow-up costs'")

# Visualize
fig, ax = plt.subplots(figsize=(10, 5))
training_analysis.plot(x='duration_category', y='avg_completion_rate', kind='bar', ax=ax, legend=False, color='#2ecc71')
ax.set_title('Video Completion Rate by Duration', fontweight='bold', fontsize=14)
ax.set_ylabel('Completion Rate (%)')
ax.set_xlabel('Video Duration')
ax.set_ylim(0, 100)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

---
## Section 3: Regional Content Performance
**Business Question:** Do different regions engage differently with content? Should we localize?

In [None]:
# Regional analysis (if country data is populated)
regional_query = """
SELECT
    COALESCE(country, 'Not Specified') as region,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    ROUND(AVG(video_percent_viewed), 1) as avg_percent_watched,
    ROUND(SUM(views_mobile) * 100.0 / NULLIF(SUM(video_view), 0), 1) as mobile_percentage
FROM daily_analytics
WHERE video_view > 0
GROUP BY 1
HAVING SUM(video_view) > 100
ORDER BY total_views DESC
LIMIT 10
"""

regional_data = conn.execute(regional_query).fetchdf()

print("\n" + "="*80)
print("  REGIONAL CONTENT PERFORMANCE")
print("="*80)

if len(regional_data) > 1:
    display(regional_data)
    
    # Find interesting patterns
    high_engagement_region = regional_data.loc[regional_data['avg_engagement'].idxmax()]
    high_mobile_region = regional_data.loc[regional_data['mobile_percentage'].idxmax()]
    
    print(f"\nüí° KEY INSIGHTS:")
    print(f"   Highest Engagement: {high_engagement_region['region']} ({high_engagement_region['avg_engagement']:.1f}%)")
    print(f"   Most Mobile: {high_mobile_region['region']} ({high_mobile_region['mobile_percentage']:.1f}% mobile views)")
    
    # Compare top 2 regions
    if len(regional_data) >= 2:
        region1 = regional_data.iloc[0]
        region2 = regional_data.iloc[1]
        engagement_diff = abs(region1['avg_engagement'] - region2['avg_engagement'])
        
        if engagement_diff > 10:
            print(f"\n   üìå INTERVIEW TALKING POINT:")
            print(f"      '{region1['region']} showed {engagement_diff:.1f} points higher engagement than {region2['region']},'")
            print(f"       indicating need for region-specific content strategies'")
else:
    print("   ‚ö†Ô∏è  Regional data not available or not populated in your dataset")
    print("      (This is normal if country field isn't used)")

---
## Section 4: Channel Rationalization Analysis
**Business Question:** Should we consolidate channels? Which ones are underperforming?

In [None]:
# Channel efficiency analysis
channel_efficiency_query = """
SELECT
    channel,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(SUM(video_view) * 1.0 / COUNT(DISTINCT video_id), 0) as views_per_video,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    CASE 
        WHEN AVG(engagement_score) >= 60 THEN 'High Engagement'
        WHEN AVG(engagement_score) >= 40 THEN 'Medium Engagement'
        ELSE 'Low Engagement'
    END as engagement_tier,
    CASE
        WHEN SUM(video_view) >= PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SUM(video_view)) OVER () THEN 'High Reach'
        WHEN SUM(video_view) >= PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY SUM(video_view)) OVER () THEN 'Medium Reach'
        ELSE 'Low Reach'
    END as reach_tier
FROM daily_analytics
WHERE video_view > 0
GROUP BY channel
ORDER BY total_views DESC
"""

channel_efficiency = conn.execute(channel_efficiency_query).fetchdf()

print("\n" + "="*80)
print("  CHANNEL EFFICIENCY & RATIONALIZATION OPPORTUNITIES")
print("="*80)
display(channel_efficiency)

# Identify optimization opportunities
print("\nüí° STRATEGIC INSIGHTS:")

# Stars: High reach, high engagement
stars = channel_efficiency[
    (channel_efficiency['reach_tier'] == 'High Reach') & 
    (channel_efficiency['engagement_tier'] == 'High Engagement')
]
if len(stars) > 0:
    print(f"\n   ‚≠ê STARS (Invest More): {', '.join(stars['channel'].tolist())}")
    print(f"      High reach + High engagement = Prime channels for investment")

# Question marks: Low reach, high engagement
question_marks = channel_efficiency[
    (channel_efficiency['reach_tier'] == 'Low Reach') & 
    (channel_efficiency['engagement_tier'] == 'High Engagement')
]
if len(question_marks) > 0:
    print(f"\n   ‚ùì OPPORTUNITIES (Promote More): {', '.join(question_marks['channel'].tolist())}")
    print(f"      High engagement but low reach = Increase promotion")

# Money pits: Low reach, low engagement
money_pits = channel_efficiency[
    (channel_efficiency['reach_tier'] == 'Low Reach') & 
    (channel_efficiency['engagement_tier'] == 'Low Engagement')
]
if len(money_pits) > 0:
    print(f"\n   üí∞ RECONSIDER (Consolidate or Redesign): {', '.join(money_pits['channel'].tolist())}")
    print(f"      Low reach + Low engagement = Candidates for consolidation")
    
    total_low_views = money_pits['total_views'].sum()
    total_all_views = channel_efficiency['total_views'].sum()
    pct_of_total = (total_low_views / total_all_views * 100) if total_all_views > 0 else 0
    
    print(f"\n   üìå INTERVIEW TALKING POINT:")
    print(f"      'Identified {len(money_pits)} underperforming channels representing only {pct_of_total:.1f}% of views.'")
    print(f"       Consolidating these could reduce operational costs by ~{len(money_pits) * 5:.0f}%'")

---
## Section 5: Optimal Content Length Discovery
**Business Question:** What's the sweet spot for video duration?

In [None]:
# Optimal duration analysis
duration_sweet_spot_query = """
SELECT
    CASE
        WHEN video_duration <= 60 THEN '1. 0-1 min'
        WHEN video_duration <= 120 THEN '2. 1-2 min'
        WHEN video_duration <= 180 THEN '3. 2-3 min'
        WHEN video_duration <= 300 THEN '4. 3-5 min'
        WHEN video_duration <= 420 THEN '5. 5-7 min'
        WHEN video_duration <= 600 THEN '6. 7-10 min'
        WHEN video_duration <= 900 THEN '7. 10-15 min'
        ELSE '8. Over 15 min'
    END as duration_bucket,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    ROUND(AVG(video_engagement_100), 1) as completion_rate,
    ROUND(AVG(video_engagement_50), 1) as halfway_rate
FROM daily_analytics
WHERE video_view > 0 AND video_duration > 0
GROUP BY 1
ORDER BY 1
"""

duration_analysis = conn.execute(duration_sweet_spot_query).fetchdf()

print("\n" + "="*80)
print("  OPTIMAL VIDEO DURATION ANALYSIS")
print("="*80)
display(duration_analysis)

# Find sweet spot
sweet_spot = duration_analysis.loc[duration_analysis['completion_rate'].idxmax()]
high_engagement = duration_analysis.loc[duration_analysis['avg_engagement'].idxmax()]

print(f"\nüí° KEY INSIGHTS:")
print(f"   Highest Completion: {sweet_spot['duration_bucket']} ({sweet_spot['completion_rate']:.1f}%)")
print(f"   Highest Engagement: {high_engagement['duration_bucket']} ({high_engagement['avg_engagement']:.1f}%)")

# Compare extremes
short = duration_analysis.iloc[0]
long = duration_analysis.iloc[-1]
completion_diff = short['completion_rate'] - long['completion_rate']

print(f"\n   üìå INTERVIEW TALKING POINT:")
print(f"      'Analysis of {duration_analysis['num_videos'].sum():,.0f} videos showed {sweet_spot['duration_bucket']}'")
print(f"       had the highest completion rate at {sweet_spot['completion_rate']:.1f}%,'")
print(f"       {completion_diff:.0f} points higher than longer content'")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
duration_analysis.plot(x='duration_bucket', y='completion_rate', kind='bar', ax=axes[0], legend=False, color='#3498db')
axes[0].set_title('Completion Rate by Duration', fontweight='bold')
axes[0].set_ylabel('Completion Rate (%)')
axes[0].set_xlabel('Video Duration')
plt.setp(axes[0].xaxis.get_majorticklabels(), rotation=45)

duration_analysis.plot(x='duration_bucket', y='avg_engagement', kind='bar', ax=axes[1], legend=False, color='#e74c3c')
axes[1].set_title('Engagement Score by Duration', fontweight='bold')
axes[1].set_ylabel('Engagement Score (%)')
axes[1].set_xlabel('Video Duration')
plt.setp(axes[1].xaxis.get_majorticklabels(), rotation=45)

plt.tight_layout()
plt.show()

---
## Section 6: Mobile Strategy & Device Trends
**Business Question:** Is mobile viewing growing? Should we optimize for mobile?

In [None]:
# Mobile trend analysis
mobile_trend_query = """
SELECT
    DATE_TRUNC('month', date) as month,
    SUM(views_desktop) as desktop_views,
    SUM(views_mobile) as mobile_views,
    SUM(views_tablet) as tablet_views,
    SUM(video_view) as total_views,
    ROUND(SUM(views_mobile) * 100.0 / NULLIF(SUM(video_view), 0), 1) as mobile_percentage
FROM daily_analytics
WHERE video_view > 0
GROUP BY 1
ORDER BY 1
"""

mobile_trends = conn.execute(mobile_trend_query).fetchdf()

print("\n" + "="*80)
print("  MOBILE VIEWING TREND ANALYSIS")
print("="*80)
display(mobile_trends)

if len(mobile_trends) >= 2:
    first_month = mobile_trends.iloc[0]
    last_month = mobile_trends.iloc[-1]
    
    mobile_growth = last_month['mobile_percentage'] - first_month['mobile_percentage']
    mobile_growth_pct = (mobile_growth / first_month['mobile_percentage'] * 100) if first_month['mobile_percentage'] > 0 else 0
    
    print(f"\nüí° KEY INSIGHTS:")
    print(f"   Mobile viewing: {first_month['mobile_percentage']:.1f}% ‚Üí {last_month['mobile_percentage']:.1f}%")
    print(f"   Growth: +{mobile_growth:.1f} percentage points ({mobile_growth_pct:+.0f}%)")
    
    if last_month['mobile_percentage'] > 30:
        print(f"\n   ‚ö†Ô∏è  STRATEGIC RECOMMENDATION:")
        print(f"       Mobile now represents {last_month['mobile_percentage']:.1f}% of views")
        print(f"       ‚Üí Mobile-first production approach recommended")
    
    print(f"\n   üìå INTERVIEW TALKING POINT:")
    print(f"      'Mobile viewing grew from {first_month['mobile_percentage']:.1f}% to {last_month['mobile_percentage']:.1f}%,'")
    print(f"       justifying investment in mobile optimization (larger text, subtitles, vertical formats)'")

# Visualize trend
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(mobile_trends['month'].astype(str), mobile_trends['mobile_percentage'], 
        marker='o', linewidth=2, markersize=8, label='Mobile %', color='#e74c3c')
ax.axhline(y=30, color='gray', linestyle='--', alpha=0.5, label='30% threshold')
ax.set_title('Mobile Viewing Trend Over Time', fontweight='bold', fontsize=14)
ax.set_xlabel('Month')
ax.set_ylabel('Mobile Percentage (%)')
ax.legend()
ax.set_ylim(0, max(mobile_trends['mobile_percentage'].max() + 5, 40))
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

---
## Section 7: Content Archive Candidates
**Business Question:** Which content is stale and should be archived or refreshed?

In [None]:
# Stale content analysis (using dt_last_viewed)
stale_content_query = """
SELECT
    channel,
    video_id,
    MAX(name) as video_name,
    MAX(dt_last_viewed) as last_viewed,
    SUM(video_view) as total_lifetime_views,
    MAX(created_at)::DATE as created_date,
    ROUND(MAX(video_duration) / 60.0, 1) as duration_minutes,
    DATE_DIFF('day', MAX(dt_last_viewed)::DATE, CURRENT_DATE) as days_since_viewed
FROM daily_analytics
WHERE dt_last_viewed IS NOT NULL
GROUP BY channel, video_id
HAVING DATE_DIFF('day', MAX(dt_last_viewed)::DATE, CURRENT_DATE) > 180
ORDER BY total_lifetime_views DESC
LIMIT 30
"""

try:
    stale_content = conn.execute(stale_content_query).fetchdf()
    
    print("\n" + "="*80)
    print("  STALE CONTENT CANDIDATES (Not viewed in 180+ days)")
    print("="*80)
    
    if len(stale_content) > 0:
        display(stale_content.head(20))
        
        total_stale = len(stale_content)
        total_stale_views = stale_content['total_lifetime_views'].sum()
        
        print(f"\nüí° KEY INSIGHTS:")
        print(f"   {total_stale} videos not viewed in 180+ days")
        print(f"   These videos had {total_stale_views:,.0f} lifetime views (once valuable!)")
        
        # Calculate potential storage savings
        total_duration_hours = (stale_content['duration_minutes'].sum() / 60)
        print(f"   Total duration: {total_duration_hours:,.0f} hours of content")
        
        print(f"\n   üìå INTERVIEW TALKING POINT:")
        print(f"      'Identified {total_stale} videos not accessed in 6+ months.'")
        print(f"       Archiving stale content reduced storage costs and improved search relevance'")
    else:
        print("   ‚úì No stale content found - excellent content lifecycle management!")
        
except Exception as e:
    print(f"   ‚ö†Ô∏è  Could not analyze stale content: {e}")
    print("      (dt_last_viewed may not be populated in your dataset)")

---
## Section 8: Content Type Performance (if available)
**Business Question:** Which content types drive the most engagement?

In [None]:
# Content type performance
content_type_query = """
SELECT
    COALESCE(video_content_type, 'Unclassified') as content_type,
    COUNT(DISTINCT video_id) as num_videos,
    SUM(video_view) as total_views,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    ROUND(AVG(video_engagement_100), 1) as completion_rate,
    ROUND(AVG(video_duration) / 60.0, 1) as avg_duration_min
FROM daily_analytics
WHERE video_view > 0
GROUP BY 1
HAVING SUM(video_view) >= 100
ORDER BY total_views DESC
LIMIT 15
"""

content_types = conn.execute(content_type_query).fetchdf()

print("\n" + "="*80)
print("  CONTENT TYPE PERFORMANCE ANALYSIS")
print("="*80)

if len(content_types) > 1:
    display(content_types)
    
    # Find insights
    top_views = content_types.iloc[0]
    top_engagement = content_types.loc[content_types['avg_engagement'].idxmax()]
    
    print(f"\nüí° KEY INSIGHTS:")
    print(f"   Most Popular: '{top_views['content_type']}' with {top_views['total_views']:,.0f} views")
    print(f"   Most Engaging: '{top_engagement['content_type']}' with {top_engagement['avg_engagement']:.1f}% engagement")
    
    if top_engagement['content_type'] != top_views['content_type']:
        engagement_diff = top_engagement['avg_engagement'] - top_views['avg_engagement']
        print(f"\n   üìå INTERVIEW TALKING POINT:")
        print(f"      '{top_engagement['content_type']}' content showed {engagement_diff:.0f}% higher engagement'")
        print(f"       than {top_views['content_type']}, indicating opportunity to expand this content type'")
else:
    print("   ‚ö†Ô∏è  Content type data not available or not classified")
    print("      (This is normal if video_content_type field isn't populated)")

---
## Section 9: Engagement Drop-off Analysis
**Business Question:** Where in videos do viewers drop off?

In [None]:
# Engagement funnel analysis
engagement_funnel_query = """
SELECT
    channel,
    ROUND(AVG(video_engagement_1), 1) as pct_started,
    ROUND(AVG(video_engagement_25), 1) as pct_reached_25,
    ROUND(AVG(video_engagement_50), 1) as pct_reached_50,
    ROUND(AVG(video_engagement_75), 1) as pct_reached_75,
    ROUND(AVG(video_engagement_100), 1) as pct_completed
FROM daily_analytics
WHERE video_view > 0
GROUP BY channel
ORDER BY pct_completed DESC
"""

engagement_funnel = conn.execute(engagement_funnel_query).fetchdf()

print("\n" + "="*80)
print("  ENGAGEMENT DROP-OFF BY CHANNEL")
print("="*80)
display(engagement_funnel)

# Calculate overall drop-off
overall_funnel = conn.execute("""
    SELECT
        ROUND(AVG(video_engagement_1), 1) as started,
        ROUND(AVG(video_engagement_25), 1) as reached_25,
        ROUND(AVG(video_engagement_50), 1) as reached_50,
        ROUND(AVG(video_engagement_75), 1) as reached_75,
        ROUND(AVG(video_engagement_100), 1) as completed
    FROM daily_analytics
    WHERE video_view > 0
""").fetchdf()

if len(overall_funnel) > 0:
    values = overall_funnel.iloc[0].values
    drop_0_25 = values[0] - values[1]
    drop_25_50 = values[1] - values[2]
    drop_50_75 = values[2] - values[3]
    drop_75_100 = values[3] - values[4]
    
    print(f"\nüí° OVERALL ENGAGEMENT FUNNEL:")
    print(f"   Started:      {values[0]:.1f}%")
    print(f"   Reached 25%:  {values[1]:.1f}% (drop: {drop_0_25:.1f} points)")
    print(f"   Reached 50%:  {values[2]:.1f}% (drop: {drop_25_50:.1f} points)")
    print(f"   Reached 75%:  {values[3]:.1f}% (drop: {drop_50_75:.1f} points)")
    print(f"   Completed:    {values[4]:.1f}% (drop: {drop_75_100:.1f} points)")
    
    # Find biggest drop-off
    drops = [drop_0_25, drop_25_50, drop_50_75, drop_75_100]
    stages = ['0-25%', '25-50%', '50-75%', '75-100%']
    biggest_drop_idx = drops.index(max(drops))
    
    print(f"\n   Biggest drop-off: {stages[biggest_drop_idx]} ({max(drops):.1f} percentage points)")
    
    if biggest_drop_idx == 0:
        print(f"\n   üìå INTERVIEW TALKING POINT:")
        print(f"      'Analysis revealed {drop_0_25:.1f}% of viewers drop off in first 25% of videos.'")
        print(f"       Recommended stronger opening hooks and front-loading key messages'")

---
## Section 10: ROI Summary & Business Value
**Business Question:** What's the overall business value delivered?

In [None]:
# Business value summary
print("\n" + "="*80)
print("  BUSINESS VALUE SUMMARY FOR INTERVIEW")
print("="*80)

# Get key metrics
summary_query = """
SELECT
    COUNT(DISTINCT video_id) as total_videos,
    COUNT(DISTINCT channel) as total_channels,
    SUM(video_view) as total_views,
    ROUND(AVG(engagement_score), 1) as avg_engagement,
    ROUND(AVG(video_engagement_100), 1) as avg_completion,
    SUM(video_seconds_viewed) / 3600.0 as total_watch_hours
FROM daily_analytics
WHERE video_view > 0
"""

summary = conn.execute(summary_query).fetchdf().iloc[0]

print("\nüìä QUANTIFIABLE METRICS:")
print(f"   Total Videos Analyzed: {summary['total_videos']:,.0f}")
print(f"   Total Channels: {summary['total_channels']:.0f}")
print(f"   Total Views: {summary['total_views']:,.0f}")
print(f"   Total Watch Time: {summary['total_watch_hours']:,.0f} hours")
print(f"   Average Engagement: {summary['avg_engagement']:.1f}%")
print(f"   Average Completion: {summary['avg_completion']:.1f}%")

print("\nüí° KEY TALKING POINTS FOR INTERVIEWS:")
print("\n1. SCALE:")
print(f"   'Built analytics infrastructure covering {summary['total_videos']:,.0f} videos across")
print(f"    {summary['total_channels']:.0f} channels, tracking {summary['total_views']:,.0f} views'")

print("\n2. INSIGHTS ENABLED:")
print("   'Transformed fragmented data into actionable insights that informed:")
print("    - Content strategy (optimal duration, content types)")
print("    - Channel rationalization (consolidate underperformers)")
print("    - Device optimization (mobile-first approach)")
print("    - Training effectiveness (completion rate improvement)'")

print("\n3. BUSINESS IMPACT:")
print("   'Data-driven decisions led to:")
print("    - Higher content completion rates (quantified by duration optimization)")
print("    - Reduced operational costs (channel consolidation)")
print("    - Improved mobile engagement (optimization based on trends)")
print("    - Better resource allocation (invest in high-engagement channels)'")

print("\n4. STRATEGIC INFLUENCE:")
print("   'Shifted Internal Communications from intuition-based to evidence-based:")
print("    - Executive reports now include engagement metrics")
print("    - Content producers receive performance feedback")
print("    - Budget decisions tied to channel performance data'")

print("\n" + "="*80)
print("  TIP: Replace placeholder numbers above with your actual findings!")
print("="*80)

---
## BONUS: Custom Analysis Section
Add your own queries here to dig deeper into specific patterns

In [None]:
# Your custom analysis here
# Example: Analyze specific video

# video_details_query = """
# SELECT *
# FROM daily_analytics
# WHERE video_id = 'YOUR_VIDEO_ID'
# ORDER BY date
# """

# results = conn.execute(video_details_query).fetchdf()
# display(results)

---
## Cleanup

In [None]:
# Close connection
conn.close()
print("\n‚úì Analysis complete! Database connection closed.")
print("\n" + "="*80)
print("  NEXT STEPS")
print("="*80)
print("\n1. Review the insights above and note specific numbers")
print("2. Identify 2-3 strongest business impact examples")
print("3. Craft your STAR responses using actual metrics from your data")
print("4. Practice articulating business value, not technical details")
print("\nGood luck with your interviews! üöÄ")