# Getting Started with 🎤 Audio Sentiment Analysis using Snowflake Notebooks 📊

Transform audio files into actionable insights by analyzing emotional tone and sentiment using Snowflake Notebooks! ⚡️

This notebook demonstrates how to build an end-to-end application that:
1. Processes audio files using PyTorch and Hugging Face pipelines
2. Extracts emotional tone and transcripts from audio
3. Performs sentiment analysis on transcribed text
4. Compares emotional tone with sentiment scores

## Setting Up Your Environment 🎒

First, we'll install required packages
- `torch`: For deep learning and neural network operations
- `librosa`: For loading and manipulating audio files
- `transformers`: For accessing pre-trained models

In [None]:
!pip install librosa transformers torch

## Configuring the Environment 🔧

Import required packages:

In [None]:
import torch
import librosa
import pandas as pd

from transformers import pipeline
from snowflake.snowpark.context import get_active_session
from snowflake.cortex import Sentiment 

session = get_active_session()
stage_name = 'audio_files'
device = "cuda" if torch.cuda.is_available() else "cpu"

## Processing Audio Files 🎧

The main processing function:
1. Loads audio files from Snowflake stage
2. Analyzes emotional tone using wav2vec2 model
3. Transcribes audio using Whisper model
4. Performs sentiment analysis on transcripts
5. Compares emotional tone with sentiment scores

Key components:
- Audio classification pipeline using for learning tonality `ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition`
- Speach to text with `whisper`
- Sentiment analysis using `Snowflake Cortex`

In [None]:
def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # for multi-GPU
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

#For consistent output for quickstart
set_seed(1280)

# Create empty lists to store the results
results = []

# Initialize both pipelines
audio_pipeline = pipeline("audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition", device=device)
whisper_pipeline = pipeline("automatic-speech-recognition", model="openai/whisper-base", device=device)

files = session.sql(f"LIST @{stage_name}").collect()
file_names = [file['name'].split('/')[-1] for file in files]

for file_name in file_names:
    session.file.get(f'@{stage_name}/{file_name}', "downloads/")
    audio, rate = librosa.load(f'downloads/{file_name}', sr=16000, mono=True)
    
    # Get emotional tone
    result = audio_pipeline(audio)
    emotion = sorted(result, key=lambda x: x['score'], reverse=True)[0]
    
    # Get transcript and sentiment
    transcript = whisper_pipeline(audio)
    sentiment_score = Sentiment(transcript['text'])
    
    # Determine if emotion and sentiment match
    match = "Unknown"
    if emotion['label'] == "angry":
        match = "Match" if sentiment_score < 0 else "Do Not Match"
    elif emotion['label'] == "happy":
        match = "Match" if sentiment_score > 0 else "Do Not Match"
    
    # Store results in dictionary
    results.append({
        'File': file_name,
        'Emotion': emotion['label'],
        'Emotion_Score': round(emotion['score'], 3),
        'Transcript': transcript['text'],
        'Sentiment_Score': sentiment_score,
        'Tone_Sentiment_Match': match
    })

# Create DataFrame
df = pd.DataFrame(results)

# Display the DataFrame
print(df)
