# Project: Candidate Interview Audio Analysis

In this project, we will analyze a candidate's interview response video to evaluate their communication skills. The process involves several key steps:

1. **Video Processing and Audio Extraction**: We will extract the audio track from the provided video file and save it as a .wav file.
2. **Audio Preprocessing and Feature Extraction**: We will load the extracted audio file and implement functions to extract relevant features, including speech rate, average volume/energy, and pauses.
3. **Basic Sentiment Analysis**: Finally, we will transcribe the audio to text and perform sentiment analysis on the transcribed text to categorize the overall sentiment as positive, neutral, or negative.

The project aims to provide insights into the candidate's communication skills based on the audio analysis of their interview response.


## Import Required Libraries

In this cell, we will import the necessary libraries that will be used throughout the project, including `pydub`, `speech_recognition`, and `textblob`.


In [7]:

import os
from pydub import AudioSegment
import speech_recognition as sr
from textblob import TextBlob


## Video Processing and Audio Extraction

In this cell, we will define a function to extract the audio track from the video file. 

**Function: `extract_audio_from_video`**
- **Input**: 
  - `video_path`: The path to the input video file (str).
  - `audio_output_path`: The path to save the extracted audio file (str).
- **Output**: 
  - The audio is saved as a .wav file at the specified output path.


In [8]:
# For this task, ensure you have the video file 'candidate_interview.mp4' in the 'data' folder.

from moviepy.editor import VideoFileClip

def extract_audio_from_video(video_path, audio_output_path):
    video = VideoFileClip(video_path)
    audio = video.audio
    audio.write_audiofile(audio_output_path, codec='pcm_s16le')
    video.close()

# Define paths
video_path = "data/candidate_interview.mp4"
audio_output_path = "data/extracted_audio.wav"

# Extract audio
extract_audio_from_video(video_path, audio_output_path)
print("Audio extraction complete.")


MoviePy - Writing audio in data/extracted_audio.wav


                                                                        

MoviePy - Done.
Audio extraction complete.


## Splitting Audio into Chunks

In this cell, we will define a function to split the extracted audio file into smaller chunks of a maximum duration of 5 minutes.

**Function: `split_audio`**
- **Input**: 
  - `audio_path`: The path to the audio file (str).
  - `max_duration_ms`: The maximum duration for each chunk in milliseconds (int).
- **Output**: 
  - A list of audio chunks, each as an `AudioSegment` object.


### Audio Feature Extraction Across Chunks

In this cell, we:
1. **Split the Audio into Chunks**: Since Google’s Speech Recognition API has a limitation for files longer than 5 minutes, we split the audio file into chunks of 5 minutes or less.
2. **Calculate Speech Rate (Words Per Minute)**:
   - For each chunk, we transcribe the audio and count the words.
   - We compute the speech rate in words per minute (WPM) and average it across all chunks.
3. **Calculate Average Volume**:
   - We measure the Root Mean Square (RMS) volume, which reflects the audio’s average energy, across each chunk.
4. **Determine Number and Duration of Pauses**:
   - We analyze each chunk to identify pauses based on low audio volume.
   - For each pause, we record its start and end times, allowing us to calculate both the number of pauses and total pause duration for the complete audio.

Each function returns a metric that we then average or accumulate across all chunks for a comprehensive analysis of the candidate's audio. This provides valuable insights into communication style, including pacing, volume, and pauses.


In [15]:
import os
import librosa
import speech_recognition as sr
from pydub import AudioSegment
import numpy as np

# Load audio helper function
def load_audio(audio_path):
    audio, sr = librosa.load(audio_path, sr=None)
    return audio, sr

# Function to split audio into chunks (5 minutes max)
def split_audio_chunks(audio_path, chunk_duration_ms=5 * 60 * 1000):
    audio = AudioSegment.from_file(audio_path)
    return [audio[i:i + chunk_duration_ms] for i in range(0, len(audio), chunk_duration_ms)]

# Calculate speech rate (words per minute) from audio chunks
def calculate_speech_rate_from_chunks(chunks):
    recognizer = sr.Recognizer()
    words_count = 0
    total_duration = 0
    for i, chunk in enumerate(chunks):
        chunk_path = f"temp_chunk_{i}.wav"
        chunk.export(chunk_path, format="wav")
        
        with sr.AudioFile(chunk_path) as source:
            audio_data = recognizer.record(source)
            try:
                text = recognizer.recognize_google(audio_data)
                words_count += len(text.split())
                total_duration += librosa.get_duration(filename=chunk_path)
            except sr.RequestError:
                print("API unavailable for chunk:", i)
            except sr.UnknownValueError:
                print("Unable to recognize speech for chunk:", i)
        
        os.remove(chunk_path)
    
    words_per_minute = (words_count / total_duration) * 60 if total_duration > 0 else 0
    return words_per_minute

# Calculate average volume across audio chunks
def calculate_average_volume_from_chunks(chunks):
    rms_values = []
    for chunk in chunks:
        audio, _ = librosa.load(chunk.export("temp_chunk.wav", format="wav"), sr=None)
        rms = np.sqrt(np.mean(audio ** 2))
        rms_values.append(rms)
        os.remove("temp_chunk.wav")
    return np.mean(rms_values)

# Calculate number and duration of pauses across audio chunks
def calculate_pauses_from_chunks(chunks, pause_threshold=0.3):
    num_pauses = 0
    total_pause_duration = 0
    for chunk in chunks:
        audio, sr = librosa.load(chunk.export("temp_chunk.wav", format="wav"), sr=None)
        pause_frames = librosa.effects.split(audio, top_db=30)
        pauses = [(start / sr, end / sr) for start, end in pause_frames]
        num_pauses += len(pauses)
        total_pause_duration += sum(end - start for start, end in pauses)
        os.remove("temp_chunk.wav")
    return num_pauses, total_pause_duration

# Example usage
audio_path = "data/extracted_audio.wav"  
audio_chunks = split_audio_chunks(audio_path)

# Extract features from chunks and calculate averages
speech_rate = calculate_speech_rate_from_chunks(audio_chunks)
average_volume = calculate_average_volume_from_chunks(audio_chunks)
num_pauses, total_pause_duration = calculate_pauses_from_chunks(audio_chunks)

print(f"Speech Rate: {speech_rate:.2f} words per minute")
print(f"Average Volume: {average_volume:.4f} (RMS)")
print(f"Number of Pauses: {num_pauses}")
print(f"Total Duration of Pauses: {total_pause_duration:.2f} seconds")


Speech Rate: 115.57 words per minute
Average Volume: 0.1012 (RMS)
Number of Pauses: 1876
Total Duration of Pauses: 591.79 seconds


In [9]:

def split_audio(audio_path, max_duration_ms):
    audio = AudioSegment.from_wav(audio_path)
    chunks = []
    for i in range(0, len(audio), max_duration_ms):
        chunk = audio[i:i + max_duration_ms]
        chunks.append(chunk)
    return chunks

# Set maximum duration (5 minutes in milliseconds)
max_duration_minutes = 5
max_duration_ms = max_duration_minutes * 60 * 1000  # Convert minutes to milliseconds

# Split the audio into chunks
audio_chunks = split_audio(audio_output_path, max_duration_ms)
print(f"Audio split into {len(audio_chunks)} chunks of {max_duration_minutes} minutes or less.")


Audio split into 4 chunks of 5 minutes or less.


## Transcribing Audio Chunks

In this cell, we will define a function to transcribe the audio chunks using Google Speech Recognition.

**Function: `transcribe_audio_chunks`**
- **Input**: 
  - `chunks`: A list of audio chunks (list of `AudioSegment`).
- **Output**: 
  - A string containing the full transcription of the audio chunks combined.


In [10]:

def transcribe_audio_chunks(chunks):
    recognizer = sr.Recognizer()
    transcriptions = []
    
    for i, chunk in enumerate(chunks):
        # Save each chunk to a temporary file
        chunk_path = f"temp_chunk_{i}.wav"
        chunk.export(chunk_path, format="wav")
        
        # Transcribe the chunk
        with sr.AudioFile(chunk_path) as source:
            audio_data = recognizer.record(source)
            try:
                transcription = recognizer.recognize_google(audio_data)
                transcriptions.append(transcription)
                print(f"Chunk {i + 1}: {transcription}")
            except sr.UnknownValueError:
                print(f"Chunk {i + 1}: Google Speech Recognition could not understand the audio.")
            except sr.RequestError as e:
                print(f"Chunk {i + 1}: Could not request results from Google Speech Recognition service: {e}")
        
        # Remove the temporary file
        os.remove(chunk_path)

    return " ".join(transcriptions)

# Transcribe the audio chunks
full_transcription = transcribe_audio_chunks(audio_chunks)
print("Full Transcription Completed.")


Chunk 1: SO2 if you talking I cannot hear you oh yeah yes can you hear me now yes I can how much you can you hear me how are you doing today yeah I'm doing quite well hope you're not as well on my head where are you calling from yes I'm currently at Accra Accra where where in a car are you basin site yeah ok great awesome cool well I really do appreciate taking the time out to to meet with me on today kick off I was going to just tell you there but ending opportunity and then afterwards actually a few questions then we could talk about the Next Steps how does that sound that sounds fine so bridge of that we are a software developer staffing company where we intentionally only higher African developers we should be believe that Africa is the next Wave of tech Talent often when us companies are hiring official developers they tend to go to India is in Europe and those regions however that comes with the lord of disadvantages one is the time zone India and the US has a significant time zo

In [14]:
print(full_transcription)

SO2 if you talking I cannot hear you oh yeah yes can you hear me now yes I can how much you can you hear me how are you doing today yeah I'm doing quite well hope you're not as well on my head where are you calling from yes I'm currently at Accra Accra where where in a car are you basin site yeah ok great awesome cool well I really do appreciate taking the time out to to meet with me on today kick off I was going to just tell you there but ending opportunity and then afterwards actually a few questions then we could talk about the Next Steps how does that sound that sounds fine so bridge of that we are a software developer staffing company where we intentionally only higher African developers we should be believe that Africa is the next Wave of tech Talent often when us companies are hiring official developers they tend to go to India is in Europe and those regions however that comes with the lord of disadvantages one is the time zone India and the US has a significant time zone differ

## Basic Sentiment Analysis

In this cell, we will define a function to analyze the sentiment of the transcribed text.

**Function: `analyze_sentiment`**
- **Input**: 
  - `text`: The transcribed text to analyze (str).
- **Output**: 
  - A tuple containing the sentiment category (str) and polarity score (float).


In [11]:

def analyze_sentiment(text):
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity
    
    # Classify sentiment based on polarity
    if polarity > 0.1:
        sentiment = "Positive"
    elif polarity < -0.1:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"
    
    print(f"Sentiment: {sentiment} (Polarity: {polarity:.2f})")
    return sentiment, polarity

# Analyze Sentiment on the full transcription
if full_transcription:
    sentiment, polarity = analyze_sentiment(full_transcription)


Sentiment: Positive (Polarity: 0.21)
