# ITI108 Assignment 1 - Shayman/9442307k

## **Assignment Overview**

This program implements an AI-based solution for auditing customer service calls. It combines
speech recognition (using Whisper) with language model analysis (using Gemini) to evaluate
customer service quality across multiple criteria.

The solution consists of several key components:
1. Audio transcription using Whisper
2. Word Error Rate (WER) calculation using jiwer
3. Quality assessment using Gemini
4. Comprehensive report generation

Author: Shayman/9442307k

# **Base Code**

## 1: Installing Libraries

The solution requires several specialized libraries:
- openai-whisper: Advanced speech recognition model
- jiwer: Industry-standard WER calculation
- transformers: NLP model support
- torch: Deep learning framework (required by Whisper)
- accelerate: Performance optimization
- sentencepiece: Text tokenization
- google-generativeai: Gemini API for quality assessment

In [None]:
#Installing of Libraries
!pip install openai-whisper
!pip install jiwer
!pip install transformers
!pip install torch
!pip install accelerate
!pip install sentencepiece
!pip install google-generativeai



## 2: Importing Libraries

This block imports the libraries that were installed in the previous step. It also mounts Google Drive to access files stored there. The genai library is configured with an API key for accessing generative AI functionalities. Each import is essential for the subsequent functions defined in the code.

In [None]:
#Importing of Libraries
from jiwer import wer  # For WER calculation
import whisper  # For speech-to-text
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline  # NLP support
import string  # String manipulation utilities
import re  # Regular expression support
import torch  # Deep learning framework
from google.colab import drive  # Google Drive integration
drive.mount('/content/drive')
import google.generativeai as genai  # Gemini API

# Configure Gemini API
genai.configure(api_key="AIzaSyCTsMVWlI7hRtD9Q9CYBoQfEaQIpPycabo")

Mounted at /content/drive


## 3: Transcribing Audio

Transcribes an audio file to text using the Whisper ASR model.

Args:
    audio_file_path (str): Path to the audio file to be transcribed
    
Returns:
    str: Transcribed text from the audio file
    
Implementation details:
- Uses the 'medium' size Whisper model for balanced accuracy and performance
- Handles automatic language detection and timestamps
- Returns only the transcribed text portion of the result

In [None]:
def transcribe_audio(audio_file_path):
    model = whisper.load_model("medium") # Loading the Whisper model
    result = model.transcribe(audio_file_path) # Transcribing audio

    return result['text']

## 4: Calculating Word Error Rate

Calculates Word Error Rate between reference and transcribed text with preprocessing.

Args:
    reference (str): Ground truth text
    hypothesis (str): Transcribed text to compare
    
Returns:
    float: Word Error Rate score (lower is better)
    
Preprocessing steps:
1. Convert to lowercase
2. Remove punctuation
3. Normalize whitespace

This preprocessing improves WER accuracy by focusing on word-level differences
rather than formatting variations.

In [None]:
def calculate_wer(reference, hypothesis):
    def preprocess(text):
        text = text.lower() # Converting to lowercase
        text = text.translate(str.maketrans('', '', string.punctuation)) # Removing punctuation
        text = re.sub(r'\s+', ' ', text).strip() # Removing extra whitespaces
        
        return text

    cleaned_reference = preprocess(reference)
    cleaned_hypothesis = preprocess(hypothesis)

    return wer(cleaned_reference, cleaned_hypothesis)

## 5: Auditing Transcription

Evaluates customer service quality using the Gemini language model.

Args:
    transcribed_text (str): Transcribed conversation to evaluate
    
Returns:
    dict: Evaluation results for each criterion
    
Evaluation criteria:
- Introduction
- Customer Information Collection
- Politeness
- Empathy
- Gratitude
- Conclusion
- Clarifying Questions
- Language Clarity
- Information Relevance

Each criterion is evaluated with:
- Pass/Fail result
- Confidence score (0-100%)
- Detailed reasoning

In [None]:
def audit_transcription(transcribed_text):
    # Defining audit criteria
    audit_criteria = [
        {
            "name": "Introduction",
            "description": "Did the agent conduct a proper introductory greeting before starting the conversation?"
        },
        {
            "name": "Acquire Customer Information",
            "description": "Did the agent systematically collect essential customer information before proceeding?"
        },
        {
            "name": "Politeness and Respect",
            "description": "Was the agent's language consistently polite and respectful throughout the interaction?"
        },
        {
            "name": "Empathy and Understanding",
            "description": "Did the agent demonstrate empathy and a genuine understanding of the customer's needs?"
        },
        {
            "name": "Gratitude",
            "description": "Did the agent express gratitude for the customer's interest or engagement?"
        },
        {
            "name": "Provide Conclusion",
            "description": "Did the agent provide a clear summary of the customer's request and next steps?"
        },
        {
            "name": "Clarifying Questions",
            "description": "Did the agent ask clarifying questions to fully understand the customer's requirements?"
        },
        {
            "name": "Clarity of Language",
            "description": "Was the agent's language clear, concise, and easily understandable?"
        },
        {
            "name": "Relevance of Information",
            "description": "Did the agent provide information directly relevant to the customer's request?"
        }
    ]

    # Initializing Gemini model
    model = genai.GenerativeModel('gemini-pro')

    # System prompt for Gemini
    system_prompt = """You are an expert customer service quality analyst. Your task is to evaluate customer service call transcripts based on specific criteria. Provide clear, objective assessments with specific examples from the transcript to support your evaluation."""

    audit_results = {}

    for criterion in audit_criteria:
        try:
            # Creating the prompt for Gemini
            prompt = f"""{system_prompt}

Please evaluate the following customer service transcript for this specific criterion:

Criterion: {criterion['name']}
Description: {criterion['description']}

Transcript:
{transcribed_text}

Provide your evaluation in the following format:
RESULT: (Write either "Pass" or "Fail")
CONFIDENCE: (Provide a percentage between 0-100)
REASONING: (Provide specific examples from the transcript to justify your evaluation)
"""

            # Generate evaluation using Gemini
            response = model.generate_content(prompt)
            response_text = response.text

            # Parse the response
            result = "Not Applicable"
            confidence = 0.0
            reasoning = "Unable to evaluate"

            # Extract information using regex
            result_match = re.search(r'RESULT:\s*(Pass|Fail)', response_text, re.IGNORECASE)
            confidence_match = re.search(r'CONFIDENCE:\s*(\d+)', response_text)
            reasoning_match = re.search(r'REASONING:\s*(.+?)(?=\n|$)', response_text, re.DOTALL)

            # Update results if matches found
            if result_match:
                result = result_match.group(1)
            if confidence_match:
                confidence = float(confidence_match.group(1)) / 100
            if reasoning_match:
                reasoning = reasoning_match.group(1).strip()

            # Store the result
            audit_results[criterion['name']] = {
                'Result': result,
                'Confidence': confidence,
                'Reason': reasoning
            }

        except Exception as e:
            # Handling errors
            audit_results[criterion['name']] = {
                'Result': 'Not Applicable',
                'Confidence': 0.0,
                'Reason': f"Error in evaluation: {str(e)}"
            }

    return audit_results

## 6: Generating Audit Report

Creates a formatted audit report combining transcription accuracy and quality assessment.
   
   Args:
       transcribed_text (str): The transcribed conversation
       audit_results (dict): Results from audit_transcription()
       wer (float): Word Error Rate from calculate_wer()
       
   Returns:
       str: Formatted audit report
       
   Report sections:
   1. WER score
   2. Detailed results for each criterion
   3. Summary statistics

In [None]:
def generate_audit_report(transcribed_text, audit_results, wer):
    report = "Customer Service Call Audit Report\n" + "="*50 + "\n\n"
    report += f"Transcription Word Error Rate: {wer:.2%}\n\n"
    report += "Detailed Audit Results:\n" + "-"*30 + "\n\n"

    # Add detailed results
    for criteria, result in audit_results.items():
        report += f"Criterion: {criteria}\n"
        report += f"Result: {result['Result']}\n"
        report += f"Confidence: {result['Confidence']:.2%}\n"
        report += f"Justification: {result['Reason']}\n\n"
        report += "-"*30 + "\n\n"

    # Calculate and add summary
    pass_count = sum(1 for result in audit_results.values() if result['Result'] == 'Pass')
    total_count = len(audit_results)
    overall_score = (pass_count / total_count) * 100 if total_count > 0 else 0

    report += f"\nSummary:\n"
    report += f"Total Criteria: {total_count}\n"
    report += f"Passed Criteria: {pass_count}\n"
    report += f"Overall Score: {overall_score:.2f}%\n"

    return report

## 7: Processing Audio File and Reading of Ground Truth Text

**def process_audio_file:**

Orchestrates the complete audio processing and evaluation workflow.

Args:
    audio_path (str): Path to the audio file
    ground_truth_path (str): Path to the ground truth transcription
    
Returns:
    tuple: (audit_report, transcribed_text, word_error_rate)
    
Workflow:
1. Transcribe audio using transcribe_audio()
2. Read ground truth using read_ground_truth()
3. Calculate WER using calculate_wer()
4. Perform audit using audit_transcription()
5. Generate report using generate_audit_report()

**def read_ground_truth:**

Reads and preprocesses the ground truth transcription file.

Args:
    file_path (str): Path to the ground truth text file
    
Returns:
    str: Cleaned ground truth text


In [9]:
def process_audio_file(audio_path, ground_truth_path):
    # Calling transcribe audio function
    transcribed_text = transcribe_audio(audio_path)

    # Read ground truth text
    ground_truth_text = read_ground_truth(ground_truth_path)

    # Calling WER function to calculate WER
    word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

    # Calling Audit transcription function
    audit_results = audit_transcription(transcribed_text)

    # Calling Audit Report generation function
    audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)

    return audit_report, transcribed_text, word_error_rate


def read_ground_truth(file_path):
    with open(file_path, 'r') as f:
        return f.read().strip()

# Test with each of the given audio files

**Esure you printout the results in each cell for marking**

## Custom-Home-Builder.mp3

In [10]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Custom-Home-Builder.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Custom-Home-Builder.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

100%|█████████████████████████████████████| 1.42G/1.42G [00:22<00:00, 67.8MiB/s]
  checkpoint = torch.load(fp, map_location=device)


Transcription:  Call is now being recorded. Good afternoon, Elkins Builders. Yeah, hi. I'm calling to speak to someone about building a house and a property I'm looking to purchase. Oh, okay, great. Let me get your name. What's your first name, please? Kenny. And your last name? Lindstrom. It's L-I-N-D-S-T-R-O-N. Thank you. And may I have your callback number? It's 610-265-1715. That's 610-265-1715? Yes. And where is the property that you're looking for an estimate on? It's in Westchester. I haven't purchased the land yet. I'd like to see if I could get an estimate or have them take a look at it before I do. Okay, no problem. Is there a good time to reach you with this number or is that at any time? That's my cell phone. If they could call me back today, that would be great. Okay, no problem. I'll pass your message along and somebody should be getting back to you this afternoon. Great. Thank you so much. You're welcome and thank you for calling Elkins Builders. Bye-bye. Bye. Thank you.

## Inbound-sales-audio-sample.mp3

In [11]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Inbound-sales-audio-sample.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Inbound-sales-audio-sample.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

  checkpoint = torch.load(fp, map_location=device)


Transcription:  Thank you for calling Brentburg. This is Jessica. How may I help you? Hi, Jessica. My name is John and I'm from Sydney, Australia, and I run a tech marketplace business, a startup, and I'm run off my feet at the moment. I'm looking for someone to help me get a virtual assistant. Is that something you can help me with? Yes, definitely, John. Thank you so much for that information. May I know what are the different tasks this virtual assistant would be doing? Yeah, I just really need basic administrative work. I need someone to do my email management. I need someone to manage my calendar, do some scheduling for me, maybe book some travel if I need to from time to time. Some data entry, some pretty basic stuff just to help me with work that I shouldn't be focused on while I'm trying to launch this new tech company. Yes, definitely. I can help you out with that. You said that what you're looking for is someone who can do email management, calendar management, data entry, a 

## Local-Plumber.mp3

In [12]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Local-Plumber.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Local-Plumber.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

  checkpoint = torch.load(fp, map_location=device)


Transcription:  Call is now being recorded. ABC Plumbing and Heating, this is Betty. Hi Betty, I'm having a problem with my sewer drain. Oh, I'm so sorry to hear that sir. Would you like me to get a hold of the plumber for you? Do you know how much it's going to cost? Unfortunately, I wouldn't be able to quote prices, but I can get a hold of someone who would be able to give you a better idea. I'm getting a backup throughout the entire house. Are you a client of ABC? I did use them before to put in my garbage disposal, but nothing major like this. Okay, let me get your name and number and I can patch you right through to the plumber. Your name sir? Mike Barry. Would you spell your last name for me please? That's B-A-R-R-Y. Okay, and your callback number? 610-265-1714. Okay, that's 610-265-1714. Yes, will you call me right back? Actually Mr. Barry, I'm going to call him right now and patch you directly through to the plumber. Would you stay on the line for a moment? Oh sure, awesome, th

## Property-Management-Office.mp3

In [13]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Property-Management-Office.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Property-Management-Office.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

  checkpoint = torch.load(fp, map_location=device)


Transcription:  Call is now being recorded. Good evening, Kingswood Apartments. This is Alex. How may I help you? Hey, yeah, I'm in Apartment 104 on the first floor. I'm calling to complain about my neighbors. Okay. What seems to be the problem, sir? It's more or less I just got my newborn baby to sleep, and they're being loud again. I've brought this to their attention several times, but they never, you know, never stop. Okay. I'm very sorry about that. I'm going to take down your contact information, and I'll contact the landlord right away to get this all straightened out. Okay. My name's Jeff Matthews. Okay. Mr. Matthews, can you spell that for me? First name is Jeff, J-E-F-F. Last name is Matthews, M-A-T-T-H-E-W-S. Okay. Mr. Matthews, can I have the best number you can be reached at and also your apartment number again? Sure. It's 610-265-1714, and I'm in Apartment 104 on the first floor. Okay. I have 610-265-1714 and Apartment 104. Yes. All right. Mr. Matthews, I'm going to pass 

## Real-State-Lead-Gen-1.mp3



In [14]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Real-State-Lead-Gen-1.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Real-State-Lead-Gen-1.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

  checkpoint = torch.load(fp, map_location=device)


Transcription:  Hello, Eloise speaking. Hi, Eloise. This is Sophia. Good day. I'm phoning from the realestateleagenerations.com.au. We just want to quickly let you know. Our custom lead generation packages are now live. We're getting real estate agent leads as we speak. Now, I know you're busy, Eloise, but I hope your team is already set up with a stable stream of incoming leads, property appraisals, property management, and general inquiries. That's fine. We'll leave you to it. But we've been working with real estate agencies for over 10 years, and there's always a common theme. Where is your next lead going to come from? So my call today is just to book a time with one of our real estate lead specialists who are based in both Sydney and Melbourne. They'll quickly introduce the package to you. Just run through a few of the finer details, Eloise. It'll only take 10 to 20 minutes to give you the background of the package to let you ask any questions you may have. Are you free next week,

## Travel-Reservation.mp3

In [15]:
audio_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Travel-Reservation.mp3'
ground_truth_file = '/content/drive/MyDrive/NYP AAI DATASET/ITI108/Audio_label/Travel-Reservation.txt'

# Transcribe audio and calculate WER
transcribed_text = transcribe_audio(audio_file)
ground_truth_text = read_ground_truth(ground_truth_file)
word_error_rate = calculate_wer(ground_truth_text, transcribed_text)

# Audit transcription using Gemini
audit_results = audit_transcription(transcribed_text)

# Generate and print the report
audit_report = generate_audit_report(transcribed_text, audit_results, word_error_rate)
print("Transcription:", transcribed_text)
print(audit_report)

  checkpoint = torch.load(fp, map_location=device)


Transcription:  Hi, thank you for calling Hotel California. This is Candice. How may I help you? Hi Candice, this is Steven. I'd like to book for four people please. And that would be for August 18th. Sure, Steven. I'd be happy to help you with that. So which room do you have in mind? I gotta tell you first that I am super busy. So I haven't been able to browse through your website. And I don't know, I don't really know the rooms you have right now. But I could give you my requirements and then you suggest the best rooms for me. Could we do that? Absolutely. What do you need? So there are four of us including me. And we need something with a wifi and maybe a swimming pool. Got it. Will you four be staying in one room or separate rooms? Oh, sorry. I forgot to tell you. That would be separate because you see we're two couples. So we need two rooms. I see. So in that case, I recommend two deluxe Kings. Each room has a king size bed which fits two people, a wifi, and access to a swimming p