# Call Center Audio Transcription with Snowflake Cortex AI

This notebook demonstrates Snowflake's native audio transcription capabilities using the **AI_TRANSCRIBE** function. We'll process real call center audio files and convert them to text for further AI-powered analysis.

## What We'll Accomplish:
* **Audio File Processing**: Load and process English MP3 call center recordings
* **AI-Powered Transcription**: Convert speech to text using Snowflake's built-in AI_TRANSCRIBE function
* **Data Foundation**: Create structured tables for downstream AI analytics

## Business Value:
* **No Complex Setup**: Native Snowflake function - no external services required
* **Enterprise Ready**: Secure, scalable audio processing within your data warehouse
* **Foundation for Insights**: Transcribed text becomes the foundation for advanced AI analytics

**Competing with GONG**: Showcasing Snowflake's integrated approach to call analytics


## Setup Session and Context


In [None]:
# Import python packages
import streamlit as st
import pandas as pd

from snowflake.snowpark.functions import *
from snowflake.snowpark.types import *

# Get active Snowflake session
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# Set context
session.sql("USE DATABASE CALL_CENTER_ANALYTICS").collect()
session.sql("USE SCHEMA AUDIO_PROCESSING").collect()
session.sql("USE WAREHOUSE AUDIO_CORTEX_WH").collect()

st.write("✅ Session setup complete - Ready for audio transcription!")


## Explore Available Audio Files
Let's examine the call center audio files we'll be processing.


In [None]:
-- List available English MP3 audio files
SELECT 
    RELATIVE_PATH AS filename,
    SIZE AS file_size_bytes,
    ROUND(SIZE/1024/1024, 2) AS file_size_mb,
    LAST_MODIFIED
FROM DIRECTORY('@CALL_CENTER_AUDIO_FILES')
WHERE RELATIVE_PATH LIKE '%.mp3'
ORDER BY RELATIVE_PATH;

## Audio File Preview
Let's listen to our call center recordings to understand what we're working with.


In [None]:
# Get list of audio files and create interactive player
files_df = session.sql("""
    SELECT 
        RELATIVE_PATH,
        GET_PRESIGNED_URL('@CALL_CENTER_AUDIO_FILES', RELATIVE_PATH) AS URL
    FROM DIRECTORY('@CALL_CENTER_AUDIO_FILES')
    WHERE RELATIVE_PATH LIKE '%.mp3'
    ORDER BY RELATIVE_PATH
""").to_pandas()

if not files_df.empty:
    selected_file = st.selectbox('🎧 Select Call Recording to Listen:', files_df['RELATIVE_PATH'])
    
    if selected_file:
        url = files_df[files_df['RELATIVE_PATH'] == selected_file]['URL'].iloc[0]
        st.audio(url, format="audio/mpeg")
        st.write(f"**Playing**: {selected_file}")
else:
    st.error("No MP3 files found. Please check the setup.")


## Audio Transcription with AI_TRANSCRIBE

Now for the magic! We'll use Snowflake's native **AI_TRANSCRIBE** function to convert our audio files to text. This function leverages advanced speech recognition models to provide accurate transcriptions.


In [None]:
-- Create a table to store FILE objects for transcription
CREATE OR REPLACE TABLE AUDIO_FILES_FOR_TRANSCRIPTION AS
SELECT 
    RELATIVE_PATH AS filename,
    TO_FILE('@CALL_CENTER_AUDIO_FILES', RELATIVE_PATH) AS audio_file
FROM DIRECTORY('@CALL_CENTER_AUDIO_FILES')
WHERE RELATIVE_PATH LIKE '%.mp3'
ORDER BY RELATIVE_PATH;

SELECT * FROM AUDIO_FILES_FOR_TRANSCRIPTION;


In [None]:
-- Perform AI transcription on our audio files
CREATE OR REPLACE TABLE CALL_TRANSCRIPTS AS
SELECT 
    filename AS audio_file_name,
    AI_TRANSCRIBE(audio_file):text::STRING AS transcript_text,
    CURRENT_TIMESTAMP() AS processing_timestamp
FROM AUDIO_FILES_FOR_TRANSCRIPTION;

-- View transcription results
SELECT * FROM CALL_TRANSCRIPTS;

## Transcription Results Display

Let's create an interactive display of our transcription results with some basic analysis.


In [None]:
# Display transcription results with analysis
transcripts_df = session.table('CALL_TRANSCRIPTS').to_pandas()

st.markdown("### 🎯 Transcription Results")

if not transcripts_df.empty:
    # Display metrics
    col1, col2, col3 = st.columns(3)
    
    with col1:
        st.metric("Files Processed", len(transcripts_df))
    
    with col2:
        avg_length = transcripts_df['TRANSCRIPT_TEXT'].str.len().mean()
        st.metric("Avg Transcript Length", f"{avg_length:.0f} chars")
    
    with col3:
        total_words = transcripts_df['TRANSCRIPT_TEXT'].str.split().str.len().sum()
        st.metric("Total Words", f"{total_words:,}")
    
    # Display individual transcripts
    st.markdown("### 📝 Individual Call Transcripts")
    
    for idx, row in transcripts_df.iterrows():
        with st.expander(f"📞 {row['AUDIO_FILE_NAME']}", expanded=False):
            st.write(f"**Processing Time**: {row['PROCESSING_TIMESTAMP']}")
            st.write(f"**Transcript Length**: {len(row['TRANSCRIPT_TEXT'])} characters")
            st.write(f"**Word Count**: {len(row['TRANSCRIPT_TEXT'].split())} words")
            st.text_area("Full Transcript:", row['TRANSCRIPT_TEXT'], height=200, key=f"transcript_{idx}")
else:
    st.error("No transcripts found. Please check the transcription process.")


In [None]:
-- Apply basic AI functions to our audio transcripts
CREATE OR REPLACE TABLE AUDIO_AI_ANALYSIS AS
SELECT 
    audio_file_name,
    transcript_text,
    processing_timestamp,
    -- AI Summarization
    SNOWFLAKE.CORTEX.SUMMARIZE(transcript_text) AS call_summary,
    -- Sentiment Analysis
    SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) AS sentiment_score,
    CASE 
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) >= 0.1 THEN '😊 Positive'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(transcript_text) <= -0.1 THEN '😞 Negative'
        ELSE '😐 Neutral'
    END AS sentiment_category,
    -- Basic Classification
    AI_CLASSIFY(
        transcript_text, 
        ['Insurance Inquiry', 'Technical Support', 'Complaint', 'Sales Call', 'General Information']
    ):labels[0]::STRING AS call_classification
FROM CALL_TRANSCRIPTS;

-- Display AI analysis results
SELECT 
    audio_file_name,
    sentiment_category,
    sentiment_score,
    call_classification,
    call_summary
FROM AUDIO_AI_ANALYSIS;

In [None]:
-- Add basic priority filtering to audio transcripts
CREATE OR REPLACE TABLE AUDIO_PRIORITY_ANALYSIS AS
SELECT 
    *,
    -- Check for frustration or dissatisfaction
    AI_FILTER(PROMPT('Does this call indicate customer frustration or dissatisfaction? {0}', transcript_text)) AS shows_frustration,
    -- Check for urgent requests
    AI_FILTER(PROMPT('Does this call contain urgent requests or time-sensitive issues? {0}', transcript_text)) AS urgent_request,
    -- Priority level assignment
    CASE 
        WHEN AI_FILTER(PROMPT('Does this call indicate customer frustration or dissatisfaction? {0}', transcript_text)) 
             AND sentiment_score <= -0.1 THEN '🔴 HIGH PRIORITY'
        WHEN AI_FILTER(PROMPT('Does this call contain urgent requests or time-sensitive issues? {0}', transcript_text)) THEN '🟡 MEDIUM PRIORITY'
        ELSE '🟢 STANDARD'
    END AS priority_level
FROM AUDIO_AI_ANALYSIS;

-- Display priority analysis
SELECT 
    audio_file_name,
    call_classification,
    sentiment_category,
    shows_frustration,
    urgent_request,
    priority_level,
    call_summary
FROM AUDIO_PRIORITY_ANALYSIS
ORDER BY 
    CASE priority_level 
        WHEN '🔴 HIGH PRIORITY' THEN 1
        WHEN '🟡 MEDIUM PRIORITY' THEN 2
        ELSE 3
    END;
