# Contact Center Insights

This notebook demonstrates extracting valuable insights for contact centers using NVIDIA Riva and NVIDIA NIM microservices. 

Utilizing NVIDIA's Parakeet CTC 1.1b ASR model, it accurately transcribes audio interactions between two speakers. Subsequently, NVIDIA NIM Llama 3.3 70B processes the transcripts to extract key entities and evaluate agent performance, providing actionable insights to enhance contact center operations.

Here is an architecture diagram of the workflow:

![Contact Center Insights Architecture Diagram](./Architecture_Diagram.png)

## Contact Center Insights generation involves two steps:

1. **Transcription with speaker Diarization**
   - **NVIDIA Riva Integration:** Transcribes incoming audio calls between two speakers using NVIDIA Riva's Parakeet CTC 1.1b ASR model and creates a structured transcript.

2. **Insight Generation**
   - **Entity Extraction:** Extracts key entities like customer and agent names, topic and subtopic of the conversation.
    - **Agent Performance Evaluation:** Evaluates agent performance based several key metrics.
    - **Combine Insights:** Combines all extracted insights into a structured JSON.

## Content Overview
1. [Install dependencies](#Install-dependencies)
2. [Set required environment variables](#Set-required-environment-variables)
3. [Transcribe Audio](#Transcribe-Audio)
4. [Generate Insights](#Generate-Insights)

# 1. Install dependencies

In [None]:
%pip install -r requirements.txt

# 2. Set required environment variables

In [None]:
import getpass
import os
from dotenv import load_dotenv
from io import BytesIO
from pydub import AudioSegment
import riva.client

load_dotenv()

# validate we have the required variables
REQUIRED_VARIABLES = [
    "NVIDIA_PARAKEET_NIM_API_KEY",
    "NVIDIA_LLAMA_NIM_API_KEY",
]

for var in REQUIRED_VARIABLES:
    if var not in os.environ:
        os.environ[var] = getpass.getpass(f"Please set the {var} environment variable.")

# Check if there is an audio file in audio folder and get the file name
audio_files = os.listdir("audio")
if len(audio_files) == 0:
    raise Exception("No audio files found in the audio folder.")
    

AUDIO_FILE = audio_files[0]
print(f"Using audio file: {AUDIO_FILE}")

# 3. Transcribe Audio

In [2]:
def prepare_audio_file(filename) -> BytesIO:
    """Return the audio file as a BytesIO object and convert it to mono channel."""
    
    audio_segment = AudioSegment.from_file(f"audio/{filename}", format="wav")
    audio_segment = audio_segment.set_channels(1)

    result_audio_bytes = BytesIO()
    audio_segment.export(result_audio_bytes, format="wav")

    return result_audio_bytes

In [4]:
def transcribe_with_riva(audio_bytes)
    NVIDIA_PARAKEET_NIM_API_KEY = os.environ["NVIDIA_PARAKEET_NIM_API_KEY"]

    # Initialize Riva ASR client
    auth = riva.client.Auth(
        uri="grpc.nvcf.nvidia.com",  # Replace with your Riva server address
        use_ssl=True,
        metadata_args=[['authorization', 'Bearer ' + NVIDIA_PARAKEET_NIM_API_KEY],['function-id', '1598d209-5e27-4d3c-8079-4751568b1081']]
    )

    asr_client = riva.client.ASRService(auth)

    # Audio file and configuration
    config = riva.client.RecognitionConfig(
        language_code="en-US",
        enable_word_time_offsets=True,  # Enables word timestamps
        max_alternatives=1,  # Set to 1 for single-best result
    )

    # Transcribe the audio file
    response = asr_client.offline_recognize(audio_bytes, config)

    # Process and display transcription results with timestamps
    return response

In [5]:
audio_bytes = prepare_audio_file(AUDIO_FILE)
riva_response = transcribe_with_riva(audio_bytes.getvalue())

In [None]:
# TODO: Convert the response to a dialog like format

# 4. Generate Insights

In [None]:
# TODO:  Set up nvidia langchain with access to the llama model

In [None]:
# TODO:  Prompt llama 3.3 70B to generate entities - Agent name, Customer name, Topic

In [None]:
# TODO:  Prompt llama 3.3 70B to generate agent performance metrics