# Build a Call Center Speech Analytics Workflow in Python

Chanin Nantasenamat, PhD

[Data Professor YouTube channel](https://youtube.com/dataprofessor)

> This tutorial demonstrates how to build a simple call center analytics tool in Python.
>
> You'll learn how to transcribe audio, identify speakers, analyze sentiment and perform data visualization of the call recording.

## Setting Up the Environment

Let's start by installing and importing the necessary libraries.


In [None]:
pip install assemblyai

In [None]:
import assemblyai as aai
import json
import time
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, Audio, Markdown, HTML
import os

## Load AssemblyAI API token

First, let's load in the API token.

In [None]:
from google.colab import userdata
aai_key = userdata.get('AAI_KEY')

Assign the API token to the AssemblyAI SDK.

In [None]:
aai.settings.api_key = aai_key

## Instantiate the Transcriber

Let's instantiate the transcriber function so that we can transcribe the text from audio.

In [None]:
transcriber = aai.Transcriber()

## Audio Selection

You can use a sample URL or upload your own audio file. Here, we provide options for both.

In [None]:
# Option 1: Use a sample call center audio from a URL
audio_input = "https://github.com/dataprofessor/assemblyai/raw/refs/heads/master/call-center-recording.wav"

# Option 2: Use a local file (uncomment and update path)
# audio_input = "./call-center-recording.wav"

In [None]:
# Hear the audio
display(Audio(audio_input))

## Process the Call Recording


Let's transcribe the audio and specify the transcription configuration through the AssemblyAI SDK.


In [None]:
config = aai.TranscriptionConfig(speaker_labels=True,
                                 sentiment_analysis=True
                                 )

Let's instantiate the `Transcriber()` function and apply it to transcribe text.

In [None]:
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config)

In [None]:
transcript.audio_duration

In [None]:
len(transcript.words)

## Speaker identification
Process the transcript with speaker labels:

In [None]:
transcript.utterances

In [None]:
text_with_speaker_labels = ""

for utt in transcript.utterances:
    text_with_speaker_labels += f"Speaker {utt.speaker}: {utt.text}\n"

In [None]:
print(text_with_speaker_labels)

## Infer and count the number of unique speakers

Count the unique speakers, then create a `LemurQuestion` for each speaker. Lastly, ask LeMUR the questions, specifying `text_with_speaker_labels` as the input_text.

In [None]:
unique_speakers = set(utterance.speaker for utterance in transcript.utterances)

questions = []
for speaker in unique_speakers:
    questions.append(
        aai.LemurQuestion(
        question=f"Who is speaker {speaker}?",
        answer_format="<First Name> <Last Name (if applicable)>")

    )

result = aai.Lemur().question(
    questions,
    input_text=text_with_speaker_labels,
    final_model=aai.LemurModel.claude3_5_sonnet,
    context="Your task is to infer the speaker's name from the speaker-labelled transcript"
)


In [None]:
result.response

## Map speaker labels in transcript

Here, we're ...
- Identifying speakers
  - By default, you've seen that different speakers were identified and assigned generic speaker labels of A and B.
  - Here, we're asking the LeMUR LLM model to identify who is the speaker. Simply put, LLM helps us figure out the speaker names based on their mention in the transcript.
- Mapping speaker labels in the transcript
  - Speakers A and B labels were replaced with the identified speakers through mapping.
  - `A = Michael Johnson` and `B = Sarah`

In [None]:
import re

speaker_mapping = {}

for qa_response in result.response:
    pattern = r"Who is speaker (\w)\?"
    match = re.search(pattern, qa_response.question)
    if match and match.group(1) not in speaker_mapping.keys():
        speaker_mapping.update({match.group(1): qa_response.answer})

In [None]:
speaker_mapping

Print the Transcript with speaker names.

In [None]:
for utterance in transcript.utterances:
   speaker_name = speaker_mapping[utterance.speaker]
   print(f"{speaker_name}: {utterance.text}...")

Previously, we've just printed out the transcript with mapped speakers.

Next, we're aggregating the transcript as a list so that we can save the mapped speakers.

In [None]:
dialogue_list = []

for utterance in transcript.utterances:
   speaker_name = speaker_mapping[utterance.speaker]
   dialogue_list.append(f"{speaker_name}: {utterance.text}")

dialogue_list

## Entities visualization

Named entities in text can be visualized using the `displacy()` function from the `spacy` library.

First, we'll prepare the text by joining the dialogue into a string (the original data type is a list and is not compatible with the `displacy` function.

In [None]:
text = '\n'.join(dialogue_list)

In [None]:
text

Next, we're visualizing the text in terms of the identified entities.

In [None]:
# Visualizing the entities
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent", jupyter=True)

## Sentiment analysis

Now, let's analyze the sentiment of the transcript and we can do that using the `sentiment_analysis` method. To use it, you can append it to the transcript like so `transcript.sentiment_analysis`.

In [None]:
transcript.sentiment_analysis

Let's structure the data into a DataFrame

In [None]:
# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0  # Initialize an index counter

for sentiment in transcript.sentiment_analysis:
    # speaker = sentiment.speaker
    speaker = speaker_mapping[sentiment.speaker]  # Applies our speaker mapping
    sentiment_value = sentiment.sentiment.value
    text = sentiment.text
    data.append({'speaker': speaker, 'sentiment': sentiment_value, 'text': text, 'index': index_value})
    index_value += 1  # Increment the index

df = pd.DataFrame(data)
df

### Heatmap of Sentiment Analysis 1

Here, we'll count the occurrences of each speaker-sentiment combination

In [None]:
# Count the occurrences of each speaker-sentiment combination
import altair as alt

heatmap_data = df.groupby(['speaker', 'sentiment']).size().reset_index(name='count')

font_size = 14

# Create the base chart
base = alt.Chart(heatmap_data).encode(
    x=alt.X('speaker', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment', titleFontSize=font_size, labelFontSize=font_size))
)

# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
    color=alt.Color('count', title='Count', scale=alt.Scale(range='heatmap')),
    tooltip=['speaker', 'sentiment', 'count']
)

# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
    text=alt.Text('count'),
    color=alt.condition(
        alt.datum.count > heatmap_data['count'].max() / 2,  # Adjust the threshold as needed
        alt.value('white'),
        alt.value('black')
    )
)

# Combine the heatmap and text
chart = (heatmap + text).properties(
    # title='Sentiment by Speaker',
    width=300,
    height=300
).interactive()

Once we have the structured data, we'll generate a heatmap showing the sentiment occurence as a function of the speakers.

In [None]:
# Display the chart
chart

### Heatmap of Sentiment Analysis 2
Let's now zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.

In [None]:
font_size = 12

# Define the color scale for sentiment
sentiment_colors = {
    'POSITIVE': '#4CAF50',  # Green
    'NEUTRAL': '#9E9E9E',   # Gray
    'NEGATIVE': '#F44336'    # Red
}

# Create the base chart
base = alt.Chart(df).encode(
    x=alt.X('speaker:N', axis=alt.Axis(title='Speaker', titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False))  # Use 'index' for Y-axis, hide labels
)

# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
    color=alt.Color('sentiment:N', scale=alt.Scale(domain=list(sentiment_colors.keys()), range=list(sentiment_colors.values())),
                    legend=alt.Legend(orient='bottom')),  # Move legend to the bottom
    tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
    width=200,  # Reduced width for the heatmap
    height=df.shape[0] * 20  # Adjust height based on the number of rows
)

# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', baseline='middle', dx=5).encode(
    y=alt.Y('index:O', axis=None),  # Remove y-axis from text
    text=alt.Text('text:N'),
    color=alt.value('black')
).properties(
    width=10,  # Adjust width for the text column
    height=df.shape[0] * 20  # Ensure consistent height
)

# Combine the heatmap and the text
chart = alt.concat(
    heatmap,
    text_right
).properties(
    # title='Call Center Data Visualization',
).configure_axis(
    labelFontSize=font_size,
    titleFontSize=font_size
).configure_view(
    strokeOpacity=0
    #strokeWidth=1,  # Add a border to the entire view
    #stroke='black'  # Make the border black
).interactive()

chart

## References

Here are additional resources to dive deeper into topics mentioned in this tutorial.

- [🔑 Sign up to get free AssemblyAI API token](https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=dataprofessor&utm_content=apr_25)
- [📚 AssemblyAI Documentation](https://www.assemblyai.com/docs/)
- [🙂 Sentiment analysis](https://www.assemblyai.com/docs/audio-intelligence/sentiment-analysis)
- [👥 Speaker identification](https://www.assemblyai.com/docs/guides/speaker-identification)
- [📊 Altair User Guide](https://altair-viz.github.io/user_guide/data.html)