# Advanced Call Center Analytics with AI_TRANSCRIBE

This notebook demonstrates the power of Snowflake's AI_TRANSCRIBE function combined with other Cortex AI functions to extract valuable insights from call center audio files. We will:

- **Transcribe** audio files using the new AI_TRANSCRIBE function
- **Analyze** call patterns and trends using Cortex Complete
- **Extract** key metrics and insights using structured prompts
- **Classify** calls by intent, urgency, and satisfaction
- **Discover** anomalies and opportunities for improvement
- **Generate** actionable recommendations for call center optimization

## Key Features of AI_TRANSCRIBE
- High-quality transcription using latest AI models
- Support for 40+ languages
- Maximum file size: 700 MB
- Maximum duration: 90 minutes
- Supports .mp3 and .wav formats


In [None]:
# Import required packages
import pandas as pd
import json
import numpy as np
import time
from datetime import datetime, timedelta
import streamlit as st

from snowflake.snowpark import Session, DataFrame
from snowflake.snowpark import functions as F
from snowflake.snowpark import types as T
from snowflake.snowpark.version import VERSION

# Get active session
session = get_active_session()
session.use_role("call_center_analytics_role")
session.use_schema("analytics")

# Add query tag for monitoring
session.query_tag = {"origin":"sf_sit", "name":"call_center_analytics_2", "version":{"major":1, "minor":0}, "attributes":{"is_quickstart":1, "source":"notebook"}}

print(f"❄️ Snowflake Session Details:")
print(f"Role: {session.get_current_role()}")
print(f"Warehouse: {session.get_current_warehouse()}")
print(f"Database.Schema: {session.get_fully_qualified_current_schema()}")
print(f"Snowpark Version: {VERSION}")


In [None]:
-- List files in the stage
LIST @audio_files/;

## Transcribing Audio Files with AI_TRANSCRIBE

Now we'll use AI_TRANSCRIBE to convert our audio files into text. We'll create a table to store FILE objects and process them in batch.


In [None]:
-- Create table with FILE objects for batch transcription
CREATE OR REPLACE TABLE audio_files_for_transcription AS
SELECT 
    RELATIVE_PATH as file_path,
    TO_FILE('@audio_files', RELATIVE_PATH) as audio_file,
    SIZE as file_size_bytes,
    LAST_MODIFIED as upload_time,
    SPLIT_PART(RELATIVE_PATH, '.', -1) as file_extension,
    REPLACE(RELATIVE_PATH, '.mp3', '') as call_id
FROM DIRECTORY('@audio_files')
WHERE RELATIVE_PATH ILIKE '%.mp3' OR RELATIVE_PATH ILIKE '%.wav';


In [None]:
-- Perform AI transcription and save results
-- 🎙️ Starting AI transcription process...
CREATE OR REPLACE TABLE ai_transcribed_calls AS
SELECT 
    call_id,
    file_path,
    file_size_bytes,
    upload_time,
    AI_TRANSCRIBE(audio_file) as transcription_result,
    transcription_result:text::STRING as transcript_text,
    CURRENT_TIMESTAMP() as transcription_timestamp,
    LENGTH(transcription_result:text::STRING) as transcript_length,
    ARRAY_SIZE(SPLIT(transcription_result:text::STRING, ' ')) as word_count,
    CASE 
        WHEN transcription_result:text IS NULL THEN 'FAILED'
        WHEN LENGTH(transcription_result:text::STRING) < 10 THEN 'SHORT'
        ELSE 'SUCCESS'
    END as transcription_status
FROM audio_files_for_transcription
ORDER BY file_size_bytes ASC;  -- Start with smaller files


In [None]:
# let's play an example audio file before looking at the transcript
# with Snowflake notebooks, you can use Streamlit components directly
stage_path = "@call_center_analytics_db.analytics.audio_files/CALL_20250728_10050.mp3"

try:
    # Read the audio file from the internal stage as bytes
    with session.file.get_stream(stage_path) as f:
        audio_bytes = f.read()

    # Use st.audio to play the MP3
    # Specify the format as "audio/mpeg" for MP3s
    st.audio(audio_bytes, format="audio/mpeg", start_time=0)

    st.success(f"Successfully loaded and playing: {stage_path}")

except Exception as e:
    st.error(f"Error loading or playing audio: {e}")
    st.info("Please ensure the audio file exists on the stage and the stage path is correct.")

In [None]:
-- View example transcript
SELECT
    transcript_text
FROM ai_transcribed_calls
WHERE call_id = 'CALL_20250728_10050';

In [None]:
-- View transcription results summary
SELECT 
    transcription_status,
    COUNT(*) as call_count,
    AVG(word_count) as avg_word_count,
    AVG(transcript_length) as avg_transcript_length
FROM ai_transcribed_calls
GROUP BY transcription_status
ORDER BY call_count DESC;


## Advanced Analytics with Cortex Functions

Now we'll use various Cortex AI functions to extract meaningful insights from our transcriptions.


In [None]:
-- 🔍 Performing advanced AI analysis on transcriptions...
-- Create comprehensive structured analysis using AI_COMPLETE
CREATE OR REPLACE TABLE comprehensive_call_analysis AS
SELECT 
    call_id,
    transcript_text,
    word_count,
    
    -- Sentiment Analysis
    SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) as sentiment_score,
    CASE 
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) > 0.1 THEN 'POSITIVE'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) < -0.1 THEN 'NEGATIVE'
        ELSE 'NEUTRAL'
    END as sentiment_category,
    
    -- Call Summary
    SNOWFLAKE.CORTEX.SUMMARIZE(transcript_text) as call_summary,
    
    -- Advanced structured extraction using AI_COMPLETE with JSON response format
    AI_COMPLETE(
        model => 'claude-4-sonnet',
        prompt => 'Analyze this call center conversation and extract structured information. Call transcript: ' || transcript_text,
        model_parameters => {'temperature': 0.1, 'max_tokens': 2048},
        response_format => {
            'type': 'json',
            'schema': {
                'type': 'object',
                'properties': {
                    'call_type': {'type': 'string', 'enum': ['inbound', 'outbound', 'transfer']},
                    'customer_name': {'type': 'string'},
                    'agent_name': {'type': 'string'},
                    'primary_intent': {'type': 'string', 'enum': ['billing', 'technical_support', 'complaint', 'information', 'sales', 'cancellation', 'other']},
                    'urgency_level': {'type': 'string', 'enum': ['low', 'medium', 'high', 'critical']},
                    'issue_resolved': {'type': 'string', 'enum': ['yes', 'no', 'partial']},
                    'escalation_required': {'type': 'string', 'enum': ['yes', 'no']},
                    'customer_satisfaction': {'type': 'string', 'enum': ['satisfied', 'neutral', 'dissatisfied']},
                    'call_duration_estimate': {'type': 'string', 'enum': ['short', 'medium', 'long']},
                    'key_issues': {'type': 'array', 'items': {'type': 'string'}},
                    'action_items': {'type': 'array', 'items': {'type': 'string'}},
                    'policy_numbers': {'type': 'array', 'items': {'type': 'string'}},
                    'monetary_amounts': {'type': 'array', 'items': {'type': 'string'}},
                    'appointment_scheduled': {'type': 'string', 'enum': ['yes', 'no']},
                    'callback_requested': {'type': 'string', 'enum': ['yes', 'no']}
                },
                'required': ['call_type', 'customer_name', 'agent_name', 'primary_intent', 'urgency_level', 'issue_resolved', 'escalation_required', 'customer_satisfaction']
            }
        }
    ) as call_analysis,
    
    -- Quality scoring with AI_COMPLETE
    TRY_CAST(
        AI_COMPLETE(
            model => 'claude-4-sonnet',
            prompt => 'Rate this call center conversation on a scale of 1-10 for agent performance considering: professionalism, problem-solving, communication clarity, and customer service. Provide only the numeric score (no text). If you cannot determine a score, return null and nothing else: ' || transcript_text,
            model_parameters => {'temperature': 0, 'max_tokens': 10}
       )::VARCHAR AS NUMBER(3,1)
    ) as agent_performance_score,
    
    -- Identify improvement opportunities using AI_COMPLETE
    AI_COMPLETE(
        model => 'claude-4-sonnet',
        prompt => 'List 3 specific improvement opportunities for this call center conversation in bullet points: ' || transcript_text,
        model_parameters => {'temperature': 0.3, 'max_tokens': 500}
    ) as improvement_opportunities,

    CURRENT_TIMESTAMP() as analysis_timestamp
    
FROM ai_transcribed_calls
WHERE transcription_status = 'SUCCESS'
AND transcript_text IS NOT NULL
AND LENGTH(transcript_text) > 50;  -- Filter out very short transcripts


In [None]:
-- preview the data
SELECT * FROM comprehensive_call_analysis;

In [None]:
-- Extract JSON fields for easier querying
ALTER TABLE comprehensive_call_analysis 
ADD COLUMN 
    call_type STRING,
    customer_name STRING,
    agent_name STRING,
    primary_intent STRING,
    urgency_level STRING,
    issue_resolved STRING,
    escalation_required STRING,
    customer_satisfaction STRING;


UPDATE comprehensive_call_analysis
SET
    call_type = call_analysis:call_type::STRING,
    customer_name = call_analysis:customer_name::STRING,
    agent_name = call_analysis:agent_name::STRING,
    primary_intent = call_analysis:primary_intent::STRING,
    urgency_level = call_analysis:urgency_level::STRING,
    issue_resolved = call_analysis:issue_resolved::STRING,
    escalation_required = call_analysis:escalation_required::STRING,
    customer_satisfaction = call_analysis:customer_satisfaction::STRING;

In [None]:
-- 📊 Analysis Summary
SELECT 
    COUNT(*) as total_calls,
    ROUND(AVG(sentiment_score), 3) as avg_sentiment,
    ROUND(AVG(agent_performance_score), 1) as avg_agent_score,
    COUNT(DISTINCT agent_name) as unique_agents,
    COUNT(DISTINCT primary_intent) as unique_call_types
FROM comprehensive_call_analysis;


## Discovering Key Insights and Patterns

Let's analyze the data to uncover interesting patterns and actionable insights.


In [None]:
-- 🏆 Agent Performance Analysis
SELECT 
    agent_name,
    COUNT(*) as total_calls,
    ROUND(AVG(sentiment_score), 3) as avg_sentiment,
    ROUND(AVG(agent_performance_score), 1) as avg_performance_score,
    
    -- Resolution effectiveness
    SUM(CASE WHEN issue_resolved = 'yes' THEN 1 ELSE 0 END) as resolved_calls,
    ROUND(SUM(CASE WHEN issue_resolved = 'yes' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as resolution_rate,
    
    -- Customer satisfaction
    SUM(CASE WHEN customer_satisfaction = 'satisfied' THEN 1 ELSE 0 END) as satisfied_customers,
    ROUND(SUM(CASE WHEN customer_satisfaction = 'satisfied' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as satisfaction_rate,
    
    -- Escalation patterns
    SUM(CASE WHEN escalation_required = 'yes' THEN 1 ELSE 0 END) as escalations,
    ROUND(SUM(CASE WHEN escalation_required = 'yes' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as escalation_rate
    
FROM comprehensive_call_analysis
WHERE agent_name != 'Not Available' AND agent_name IS NOT NULL
GROUP BY agent_name
ORDER BY avg_performance_score DESC;


In [None]:
-- 📊 Call Pattern Analysis
WITH call_patterns AS (
    SELECT 
        primary_intent,
        urgency_level,
        COUNT(*) as call_count,
        ROUND(AVG(sentiment_score), 3) as avg_sentiment,
        ROUND(AVG(agent_performance_score), 1) as avg_agent_score,
        
        -- Resolution patterns
        ROUND(SUM(CASE WHEN issue_resolved = 'yes' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as resolution_rate,
        
        -- Satisfaction patterns
        ROUND(SUM(CASE WHEN customer_satisfaction = 'satisfied' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as satisfaction_rate,
        
        -- Escalation patterns
        ROUND(SUM(CASE WHEN escalation_required = 'yes' THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) as escalation_rate
        
    FROM comprehensive_call_analysis
    WHERE primary_intent IS NOT NULL AND primary_intent != 'Not Available'
    GROUP BY primary_intent, urgency_level
)
SELECT 
    primary_intent,
    urgency_level,
    call_count,
    avg_sentiment,
    avg_agent_score,
    resolution_rate || '%' as resolution_rate_pct,
    satisfaction_rate || '%' as satisfaction_rate_pct,
    escalation_rate || '%' as escalation_rate_pct,
    
    -- Performance flags
    CASE 
        WHEN resolution_rate < 70 THEN '⚠️ Low Resolution'
        WHEN satisfaction_rate < 60 THEN '⚠️ Low Satisfaction'
        WHEN escalation_rate > 30 THEN '⚠️ High Escalation'
        ELSE '✅ Good Performance'
    END as performance_flag
    
FROM call_patterns
ORDER BY call_count DESC;


## Conclusion and Next Steps

This notebook has demonstrated the powerful capabilities of AI_TRANSCRIBE combined with Snowflake's Cortex AI functions to:

✅ **Automatically transcribe** call center audio files with high accuracy  
✅ **Extract structured insights** from unstructured conversation data  
✅ **Identify performance patterns** and improvement opportunities  
✅ **Detect anomalies** and quality issues automatically  
✅ **Generate actionable recommendations** using AI analysis  
✅ **Create comprehensive reports** for management decision-making  

### Key Advantages of AI_TRANSCRIBE:
- **No infrastructure management** - serverless transcription
- **Multi-language support** - 40+ languages supported
- **High accuracy** - latest AI models for transcription
- **Scalable processing** - handle large volumes of audio files
- **Integrated analytics** - seamless combination with other Cortex functions

### Recommended Next Steps:
1. **View Streamlit application** for natural language analysis using Cortex Agents
1. **Upload your own recordings** on the stage to apply this to your use case
2. **Set up automated pipelines** for real-time call analysis