<a href="https://colab.research.google.com/github/csetanmayjain/Customer-Care-Call-Analysis/blob/main/Customer_Care_Call_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**Input:** Expected Audio: Stereo Type

**Output:** Summarization of the audio with the following insights:

        1. Problem Resolution
        2. Important Keyword Detection
        3. Agent Behavior Analysis
        4. Product Enhancement Opportunities
        5. New Product Features

**Approach:**
To achieve the output, we used the following tools and frameworks:

  1. Silero VAD: For audio chunking and voice activity detection (VAD)
  2. NVIDIA NeMo: For performing Automatic Speech Recognition (ASR)
  3. LLaMA 3: For analyzing and extracting insights from the transcription using a large language model (LLM)


In [None]:
## Install vad dependencies
!pip install --quiet pydub
!pip install --quiet silero-vad

## Install ASR dependencies
!pip install --quiet wget
!apt-get install -y sox libsndfile1 ffmpeg
!pip install --quiet text-unidecode

## Install NeMo
BRANCH = 'main'
!python -m pip --quiet install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[asr]

## Install LLM dependencies
!pip install --quiet groq
!pip install --quiet gradio

In [None]:
#import vad dependencies
from silero_vad import load_silero_vad, read_audio, get_speech_timestamps
from pydub import AudioSegment

#import asr dependencies
import nemo.collections.asr as nemo_asr
from nemo.utils import logging
logging.setLevel(logging.CRITICAL)

#import llm dependencies
from groq import Groq

import os
import numpy as np
from huggingface_hub import login

In [None]:
def load_models(asr_model_path):

    login(token=os.environ['hf_token'])

    global vad_model, asr_model, llm_model

    if vad_model == None:
        vad_model = load_silero_vad()

    if asr_model == None:
        asr_model = nemo_asr.models.EncDecCTCModelBPE.restore_from(asr_model_path)

    return vad_model, asr_model

In [None]:
def split_stereo_to_mono(input_file, output_left, output_right):

    # Load the stereo audio file
    audio = AudioSegment.from_file(input_file)

    # Ensure the audio is stereo
    if audio.channels != 2:
        print("The input audio is not stereo.")
        return

    # Split into left and right channels
    left_channel = audio.split_to_mono()[0]
    right_channel = audio.split_to_mono()[1]

    # Set sample rate to 16000 Hz and bit depth to 16 bits
    left_channel = left_channel.set_frame_rate(16000).set_sample_width(2)
    right_channel = right_channel.set_frame_rate(16000).set_sample_width(2)

    # Export the left and right channels as separate mono files
    left_channel.export(output_left, format="wav")
    right_channel.export(output_right, format="wav")

In [None]:
def get_chunks_timestamps(audio_file_path):
    split_stereo_to_mono(audio_file_path, "left_channel.wav", "right_channel.wav")

    speaker0 = read_audio('left_channel.wav') # backend (sox, soundfile, or ffmpeg) required!
    speaker0_speech_timestamps = get_speech_timestamps(speaker0, vad_model)

    speaker1 = read_audio('right_channel.wav') # backend (sox, soundfile, or ffmpeg) required!
    speaker1_speech_timestamps = get_speech_timestamps(speaker1, vad_model)

    # Combine both lists with an additional key to store their origin
    combined = [{'start': item['start'], 'end': item['end'], 'origin': 'speaker0'} for item in speaker0_speech_timestamps] + \
               [{'start': item['start'], 'end': item['end'], 'origin': 'speaker1'} for item in speaker1_speech_timestamps]

    # Sort the combined list based on the start time
    sorted_combined = sorted(combined, key=lambda x: x['start'])

    audio_chunks = []
    for sorted_combined_i in sorted_combined:
        start = sorted_combined_i['start']
        end = sorted_combined_i['end']
        origin = sorted_combined_i['origin']

        if origin == "speaker0":
            audio_chunks.append(speaker0[start:end])

        if origin == "speaker1":
            audio_chunks.append(speaker1[start:end])

    os.remove("left_channel.wav")
    os.remove("right_channel.wav")

    return sorted_combined, audio_chunks

In [None]:
#ASR

In [None]:
def get_asr(audio_chunks, timestamping):

    # Perform inference
    transcriptions = asr_model.transcribe(audio=audio_chunks, batch_size=8)

    text = ""
    for i, transcription in enumerate(transcriptions):
    #     print(f"{timestamping[i]['origin']}: {transcription}")

        text += timestamping[i]['origin'] + ": " + transcription
        text += "\n"

    return text

In [None]:
#LLM

In [None]:
def get_analysis(text):

    output = ""
    client = Groq()
    completion = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[
            {
                "role": "system",
                "content": "Give the output in english only.\nWrite a summary of the chunk of text that includes the main points and any important details.\nAdditionally extract the following six insights from the conversation:\n1. Sentiment Analysis: What is the customer's sentiment in one word from the following Satisfaction, Anger, Frustration, Resolution, Escalation.\n2. Problem Resolution: Was the issue resolved? (In-Progress/ Resolved/ Unresolved/ Unclear)\n3. Keyword Detection: List important keyword or phrase spoken if any\n4. Agent Behavior: Evaluate the agent's performance and professionalism up to 2 words only.\n5. Enhancement Opportunities: Suggest ways to improve service or operations.\n6. New Product Features: Identify any new feature requests or suggestions that can be incorporated in the existing product."
            },
            {
                "role": "user",
                "content": text
            }
        ],
        temperature=0.6,
        max_tokens=1024,
        top_p=1,
        stream=True,
        stop=None,
    )

    output = ""
    for chunk in completion:
        output += chunk.choices[0].delta.content or ""

    return output


In [None]:
#Driver Function

In [None]:
global vad_model, asr_model
vad_model = None
asr_model = None

In [None]:
def driver_code(input_audio_path):

    print("Getting Speaker Timestamp")
    timestamping, audio_chunks = get_chunks_timestamps(input_audio_path)

    print("Getting Transcription")
    text = get_asr(audio_chunks, timestamping)

    print("Getting Analysis")
    analysis = get_analysis(text)

    return analysis

In [None]:
# Set up environment variables for Hugging Face token and GROQ API key
os.environ['hf_token'] = ""  # Insert your Hugging Face token here
os.environ['GROQ_API_KEY'] = ""  # Insert your GROQ API key here

# Define the path for the Nemo ASR model to be used
asr_model_path = ""  # Specify the path where your Nemo ASR model is stored

# Check if the VAD (Voice Activity Detection) and ASR models are already loaded
if vad_model == None or asr_model == None:
    # If either VAD or ASR models are not loaded, load them from the specified path
    vad_model, asr_model = load_models(asr_model_path)

# Provide the path for the input audio file that needs to be processed
input_audio_path = ""  # Specify the path to the input audio file

# Perform analysis on the audio using the loaded models
analysis = driver_code(input_audio_path)  # Call the driver function to analyze the audio


In [None]:
print(analysis)

**Summary:**

The customer is upset about the delayed delivery of their order, which was supposed to be delivered two days ago. The agent apologizes and confirms that the order is being processed. The customer asks to schedule a delivery time and the agent agrees to send a delivery person between 10:00 am to 1:00 pm the next day. The customer requests that the delivery person calls them before arriving and that they will be available at home after 6:00 pm. The agent confirms the details and assures the customer that the delivery will be made.

**Insights:**

1. **Sentiment Analysis:** Frustration
2. **Problem Resolution:** Resolved
3. **Keyword Detection:** Delivery, Order Number, Schedule, Security
4. **Agent Behavior:** Professional and courteous
5. **Enhancement Opportunities:** Improve communication with customers about delivery schedules and provide a clear timeline for delivery.
6. **New Product Features:** None mentioned
