# Dataset

This project utilizes the **RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song)** dataset.

Original Source:
* Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English for research on emotion perception. PLoS ONE, 13(5), e0196391.

Dataset hosted on Kaggle:
* [RAVDESS Emotional Speech Audio Dataset](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio/data)

# Speaker Identification Pipeline
## Project Overview
This project implements a complete, end-to-end speaker identification pipeline utilizing a pre-trained ECAPA-TDNN deep learning model from the SpeechBrain toolkit. It focuses on identifying known speakers from audio inputs and robustly rejecting unknown or unauthorized individuals, forming the core of a voice biometric system.

## Methodology & Pipeline Stages
The pipeline encompasses several critical stages:

### Data Preparation

Meticulously filters and organizes audio files from the RAVDESS emotional speech dataset (Actor_01 to Actor_24).
Splits data into distinct enrollment (training) and test sets for enrolled actors (Actors 01-06) based on specific criteria (e.g., statement, emotion, repetition).
Prepares a large set of "unknown" (imposter) actor data (Actors 07-24) for robust imposter rejection testing, covering all statement 1 files with diverse emotions and intensities.

### Speaker Enrollment:

Generates unique voiceprint embeddings for each enrolled speaker.
This is achieved by extracting and averaging multiple embeddings from their designated enrollment audio files, creating a robust representation.

### 1:N Speaker Identification:

Develops a core function to identify a speaker from any new audio input.
Compares the incoming audio's embedding against all enrolled voiceprints using cosine similarity.
Applies a dynamically calibrated similarity threshold to make identification decisions (accept/reject).


### Imposter Rejection Testing:

Performs extensive evaluation of the model's ability to reject unknown (imposter) speakers.
Tests against a large, dedicated set of unenrolled actors (Actors 07-24) to quantify the system's robustness against unauthorized access attempts.

### Key Findings & Achievements
High Imposter Rejection Accuracy: Achieved a remarkable 95.7% accuracy in correctly rejecting unseen and unenrolled actors (imposters) across a diverse set of audio files.
Identified Model Nuances: Through detailed analysis, observed specific instances of potential confusion, notably between Actor 01 and Actor 23. This highlights areas for future model refinement or the need for more diverse enrollment data to enhance intra-class separation.
Threshold Optimization Insight: Demonstrated the critical importance and practical utility of adjusting the detection threshold. This allows for strategic prioritization, such as minimizing False Acceptances for security-sensitive applications, even at the potential cost of slightly higher False Rejection Rates.

In [1]:
import os
import torch
import torchaudio
from speechbrain.inference.speaker import EncoderClassifier
from pydub import AudioSegment
from pydub.playback import play
import numpy as np
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
from tqdm import tqdm

In [2]:
# --- Configuration ---
RAVDESS_PATH = r"D:\Data_and_AI\Datasets\RAVDESS Emotion Classification Dataset"
OUTPUT_DIR = "voice_auth_data" # Directory to save embeddings and results
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Selected actors (3 males, 3 females)
# Actor IDs: Odd are male, Even are female.
# Actor 01, 02, 03, 04, 05, 06
SELECTED_ACTORS = [f"Actor_{i:02d}" for i in [1, 2, 3, 4, 5, 6]]

# RAVDESS file naming convention mapping
# File format: 03-01-XX-YY-01-ZZ-AA.wav
# Index 2: Emotion (01=neutral, 02=calm, 03=happy, 04=sad, 05=angry, 06=fearful, 07=disgust, 08=surprised)
# Index 3: Emotional intensity (01=normal, 02=strong)
# Index 4: Statement (01="Kids are talking by the door", 02="Dogs are sitting by the door")
# Index 5: Repetition (01=1st repetition, 02=2nd repetition)

STATEMENT_ID = '01' # "Kids are talking by the door"
EMOTION_IDS = ['02', '03', '05'] # Calm, Happy, Angry
INTENSITY_IDS = ['01', '02'] # Normal, Strong
REPETITION_ENROLL = '01' # Use 1st repetition for enrollment
REPETITION_TEST = '02' # Use 2nd repetition for testing/calibration

In [3]:
# --- Step 1: Setup and Imports (SpeechBrain Model) ---
print("Loading SpeechBrain ECAPA-TDNN model...")
# Using `savedir` to cache the model locally after first download
speaker_model = EncoderClassifier.from_hparams(
    source="speechbrain/spkrec-ecapa-voxceleb",
    savedir="pretrained_ecapa_model",
    run_opts={"device":"cpu"} # Force CPU usage
)
print("Model loaded successfully to CPU.")

Loading SpeechBrain ECAPA-TDNN model...


  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)


Model loaded successfully to CPU.


In [4]:
# --- Helper Functions ---

def load_audio(path):
    """Loads an audio file and resamples it to 16kHz if necessary."""

    signal, fs = torchaudio.load(path)

    if fs != 16000:
        resampler = torchaudio.transforms.Resample(orig_freq=fs, new_freq=16000)
        signal = resampler(signal)
      
    # Ensure signal is mono and has shape [batch, samples] for SpeechBrain
    if signal.dim() == 1:
        # [samples] → [1, samples]
        signal = signal.unsqueeze(0)

    elif signal.dim() == 2:
        # [channels, samples] → [1, samples] if multi-channel (e.g., from torchaudio.load)
        if signal.shape[0] > 1:
            signal = signal.mean(dim=0, keepdim=True)

    elif signal.dim() == 3:
        # [batch, channels, samples] → [batch, samples] (mono)
        signal = signal.mean(dim=1)

    else:
        raise ValueError(f"Unexpected signal shape: {signal.shape}")

    return signal


def get_embedding(audio_path):
    """Extracts speaker embedding from an audio file."""
    signal = load_audio(audio_path)
    
    with torch.no_grad(): # Disable gradient computation for inference
        # The encode_batch method often returns [batch, 1, embedding_dim]
        embeddings = speaker_model.encode_batch(signal.to(speaker_model.device))
    
    # Return the embedding vector, typically squeezing out unnecessary dimensions
    return embeddings.squeeze().cpu() # Move to CPU and remove batch/channel dims


def cosine_similarity(emb1, emb2):
    """Computes cosine similarity between two embedding vectors."""
    return torch.nn.functional.cosine_similarity(emb1, emb2, dim=0).item()

In [5]:
# --- Step 2: Data Collection and Pre-processing ---
# List to store actor info and file paths
actor_data = {}

print("\nCollecting RAVDESS audio file paths...")
for actor_id_num in [int(a.split('_')[1]) for a in SELECTED_ACTORS]:
    actor_folder = f"Actor_{actor_id_num:02d}"
    actor_full_path = os.path.join(RAVDESS_PATH, actor_folder)

    actor_files_enroll = []
    actor_files_test = []

    for filename in os.listdir(actor_full_path):
        if not filename.endswith('.wav'):
            continue
        
        parts = filename.split('-')
        modality, vocal_channel, emotion, intensity, statement, repetition, actor_num = parts[0], parts[1], parts[2], parts[3], parts[4], parts[5], parts[6].split('.')[0]
        
        # Filter based on your criteria
        if (modality == '03' and vocal_channel == '01' and # audio-only speech
            statement == STATEMENT_ID and # "Kids are talking by the door"
            emotion in EMOTION_IDS and # Calm, Happy, Angry
            intensity in INTENSITY_IDS): # Normal, Strong

            full_file_path = os.path.join(actor_full_path, filename)
            
            if repetition == REPETITION_ENROLL:
                actor_files_enroll.append(full_file_path)
            elif repetition == REPETITION_TEST:
                actor_files_test.append(full_file_path)

    actor_data[f"Actor_{actor_id_num:02d}"] = {
        'enrollment_files': actor_files_enroll,
        'test_files': actor_files_test
    }
    print(f"Actor {actor_id_num:02d}: {len(actor_files_enroll)} enrollment files, {len(actor_files_test)} test files found.")



Collecting RAVDESS audio file paths...
Actor 01: 6 enrollment files, 6 test files found.
Actor 02: 6 enrollment files, 6 test files found.
Actor 03: 6 enrollment files, 6 test files found.
Actor 04: 6 enrollment files, 6 test files found.
Actor 05: 6 enrollment files, 6 test files found.
Actor 06: 6 enrollment files, 6 test files found.


In [6]:
# --- Step 3: Speaker Enrollment ---
# Dictionary to store enrolled voiceprints (mean embeddings)
enrolled_voiceprints = {}

print("\nEnrolling selected actors...")
for actor_id, data in tqdm(actor_data.items(), desc="Enrollment Progress"):
    enrollment_embeddings = []
    for filepath in data['enrollment_files']:
        emb = get_embedding(filepath)
        enrollment_embeddings.append(emb)
    
    # Compute the mean embedding for enrollment
    if enrollment_embeddings:
        mean_embedding = torch.mean(torch.stack(enrollment_embeddings), dim=0)
        enrolled_voiceprints[actor_id] = mean_embedding
        print(f"  Enrolled {actor_id} with mean embedding of shape {mean_embedding.shape}")
    else:
        print(f"  Warning: No enrollment files found for {actor_id}. Skipping enrollment.")

# Save enrolled voiceprints for later use
torch.save(enrolled_voiceprints, os.path.join(OUTPUT_DIR, "enrolled_voiceprints.pt"))
print(f"Enrolled voiceprints saved to {os.path.join(OUTPUT_DIR, 'enrolled_voiceprints.pt')}")


Enrolling selected actors...


Enrollment Progress:  17%|██████████▌                                                    | 1/6 [00:02<00:11,  2.35s/it]

  Enrolled Actor_01 with mean embedding of shape torch.Size([192])


Enrollment Progress:  33%|█████████████████████                                          | 2/6 [00:04<00:09,  2.31s/it]

  Enrolled Actor_02 with mean embedding of shape torch.Size([192])


Enrollment Progress:  50%|███████████████████████████████▌                               | 3/6 [00:07<00:07,  2.36s/it]

  Enrolled Actor_03 with mean embedding of shape torch.Size([192])


Enrollment Progress:  67%|██████████████████████████████████████████                     | 4/6 [00:09<00:04,  2.33s/it]

  Enrolled Actor_04 with mean embedding of shape torch.Size([192])


Enrollment Progress:  83%|████████████████████████████████████████████████████▌          | 5/6 [00:11<00:02,  2.43s/it]

  Enrolled Actor_05 with mean embedding of shape torch.Size([192])


Enrollment Progress: 100%|███████████████████████████████████████████████████████████████| 6/6 [00:14<00:00,  2.45s/it]

  Enrolled Actor_06 with mean embedding of shape torch.Size([192])
Enrolled voiceprints saved to voice_auth_data\enrolled_voiceprints.pt





In [7]:
# --- Step 4: Speaker Verification Function ---

def identify_speaker(audio_path, enrolled_prints, threshold=0.5):
    """
    Identifies the speaker in audio_path from enrolled_prints.
    Returns (identified_actor_id, highest_score) if identified and above threshold,
    else returns (None, highest_score) if not identified (imposter).
    """
    live_embedding = get_embedding(audio_path)

    best_match_id = None
    highest_score = -1.0 # Cosine similarity ranges from -1 to 1

    for actor_id, stored_embedding in enrolled_prints.items():
        score = cosine_similarity(live_embedding, stored_embedding)
        if score > highest_score:
            highest_score = score
            best_match_id = actor_id
    
    # Apply the threshold for open-set identification
    if highest_score >= threshold:
        return best_match_id, highest_score # Identified
    else:
        return None, highest_score # Not identified (imposter or unknown speaker)

In [15]:
actor_data['Actor_01']['test_files']

['D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-02-01-01-02-01.wav',
 'D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-02-02-01-02-01.wav',
 'D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-03-01-01-02-01.wav',
 'D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-03-02-01-02-01.wav',
 'D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-05-01-01-02-01.wav',
 'D:\\Data_and_AI\\Datasets\\RAVDESS Emotion Classification Dataset\\Actor_01\\03-01-05-02-01-02-01.wav']

In [22]:
def parse_ravdess_filename(filename):
    """
    Parses a RAVDESS filename to extract relevant features.
    Example: '03-01-02-01-01-02-01.wav'
    """
    parts = os.path.basename(filename).split('.')[0].split('-')
    
    # Define mappings (based on RAVDESS documentation)
    emotion_map = {
        '01': 'Neutral', '02': 'Calm', '03': 'Happy', '04': 'Sad',
        '05': 'Angry', '06': 'Fearful', '07': 'Disgust', '08': 'Surprised'
    }
    intensity_map = {'01': 'Normal', '02': 'Strong'}
    
    # Extract relevant parts
    actor_id = int(parts[6]) # Actor ID is 7th part (index 6)
    emotion_code = parts[2]  # Emotion is 3rd part (index 2)
    intensity_code = parts[3] # Intensity is 4th part (index 3)
    repetition_code = parts[4] # Repetition is 5th part (index 4)

    emotion = emotion_map.get(emotion_code, f'Unknown({emotion_code})')
    intensity = intensity_map.get(intensity_code, f'Unknown({intensity_code})')
    repetition = f'Repetition {repetition_code}'
    
    return f"Actor {actor_id:02d}, {emotion}, {intensity}, {repetition}"

### Showing Scores and Labels for Training and Test Sets

In [24]:
# --- Step 5: Calibration/Testing ---

print("\n--- Calibration and Testing for 1:N Identification ---")

# --- Individual Scores for ENROLLMENT Files (Training Set) ---
print("\n--- Scores for ENROLLMENT Files (Individual Voiceprints) ---")
print("{:<35} {:<10} {:<10}".format("File Details", "Actor ID", "Similarity Score"))
print("-" * 60)

for actor_id, data in tqdm(actor_data.items(), desc="Processing Enrollment Files"):
    for enroll_file_path in data['enrollment_files']:
        live_embedding = get_embedding(enroll_file_path)
        
        # Compare enrollment file embedding against its OWN enrolled voiceprint
        # (This is expected to be high, as it's part of the averaged embedding)
        score_to_self = cosine_similarity(live_embedding, enrolled_voiceprints[actor_id])
        
        file_details = parse_ravdess_filename(enroll_file_path)
        print(f"{file_details:<35} {actor_id:<10} {score_to_self:.4f}")

# --- Generate 1:N Test Scores ---
print("\n--- Generating Scores for 1:N Test Trials ---")
genuine_scores = []
imposter_scores = [] 

labels = [] # 1 for genuine (best match is true actor), 0 for imposter (best match is not true actor)
scores = [] # The highest similarity score for each trial


# Iterate through each actor's test files to simulate identification attempts
for true_actor_id, data in tqdm(actor_data.items(), desc="Generating 1:N Test Scores"):
    for test_file_path in data['test_files']:
        live_embedding = get_embedding(test_file_path)

        best_match_id = None
        highest_score_for_trial = -1.0 # Cosine similarity ranges from -1 to 1

        # Find the best match among ALL enrolled speakers for this test file
        for enrolled_actor_id, stored_embedding in enrolled_voiceprints.items():
            score = cosine_similarity(live_embedding, stored_embedding)
            if score > highest_score_for_trial:
                highest_score_for_trial = score
                best_match_id = enrolled_actor_id
        
        scores.append(highest_score_for_trial)
        
        file_details = parse_ravdess_filename(test_file_path)
        
        print_line = f"{file_details:<35} | True Speaker: {true_actor_id:<10} | " \
                     f"Best Match: {best_match_id:<10} | Score: {highest_score_for_trial:.4f}"
        
        if best_match_id == true_actor_id:
            labels.append(1) # Correctly identified as the true actor
            genuine_scores.append(highest_score_for_trial)
            print(f"{print_line} | Status: GENUINE")
        else:
            labels.append(0) # Incorrectly identified or an imposter scenario
            imposter_scores.append(highest_score_for_trial)
            print(f"{print_line} | Status: IMPOSTER/MISIDENTIFIED")

print(f"\nTotal Genuine Test Trials: {len(genuine_scores)}")
print(f"Total Imposter/Misidentified Test Trials: {len(imposter_scores)}")


--- Calibration and Testing for 1:N Identification ---

--- Scores for ENROLLMENT Files (Individual Voiceprints) ---
File Details                        Actor ID   Similarity Score
------------------------------------------------------------


Processing Enrollment Files:   0%|                                                               | 0/6 [00:00<?, ?it/s]

Actor 01, Calm, Normal, Repetition 01 Actor_01   0.8230
Actor 01, Calm, Strong, Repetition 01 Actor_01   0.7639
Actor 01, Happy, Normal, Repetition 01 Actor_01   0.7962
Actor 01, Happy, Strong, Repetition 01 Actor_01   0.7975
Actor 01, Angry, Normal, Repetition 01 Actor_01   0.7862


Processing Enrollment Files:  17%|█████████▏                                             | 1/6 [00:02<00:11,  2.35s/it]

Actor 01, Angry, Strong, Repetition 01 Actor_01   0.5373
Actor 02, Calm, Normal, Repetition 01 Actor_02   0.8253
Actor 02, Calm, Strong, Repetition 01 Actor_02   0.7205
Actor 02, Happy, Normal, Repetition 01 Actor_02   0.8404
Actor 02, Happy, Strong, Repetition 01 Actor_02   0.7677
Actor 02, Angry, Normal, Repetition 01 Actor_02   0.7722


Processing Enrollment Files:  33%|██████████████████▎                                    | 2/6 [00:04<00:09,  2.37s/it]

Actor 02, Angry, Strong, Repetition 01 Actor_02   0.5553
Actor 03, Calm, Normal, Repetition 01 Actor_03   0.8496
Actor 03, Calm, Strong, Repetition 01 Actor_03   0.8558
Actor 03, Happy, Normal, Repetition 01 Actor_03   0.8216
Actor 03, Happy, Strong, Repetition 01 Actor_03   0.7216
Actor 03, Angry, Normal, Repetition 01 Actor_03   0.6990


Processing Enrollment Files:  50%|███████████████████████████▌                           | 3/6 [00:07<00:07,  2.44s/it]

Actor 03, Angry, Strong, Repetition 01 Actor_03   0.6535
Actor 04, Calm, Normal, Repetition 01 Actor_04   0.8455
Actor 04, Calm, Strong, Repetition 01 Actor_04   0.7776
Actor 04, Happy, Normal, Repetition 01 Actor_04   0.8834
Actor 04, Happy, Strong, Repetition 01 Actor_04   0.8027
Actor 04, Angry, Normal, Repetition 01 Actor_04   0.8435


Processing Enrollment Files:  67%|████████████████████████████████████▋                  | 4/6 [00:09<00:04,  2.42s/it]

Actor 04, Angry, Strong, Repetition 01 Actor_04   0.6844
Actor 05, Calm, Normal, Repetition 01 Actor_05   0.7486
Actor 05, Calm, Strong, Repetition 01 Actor_05   0.6939
Actor 05, Happy, Normal, Repetition 01 Actor_05   0.7772
Actor 05, Happy, Strong, Repetition 01 Actor_05   0.7324
Actor 05, Angry, Normal, Repetition 01 Actor_05   0.7271


Processing Enrollment Files:  83%|█████████████████████████████████████████████▊         | 5/6 [00:12<00:02,  2.51s/it]

Actor 05, Angry, Strong, Repetition 01 Actor_05   0.6164
Actor 06, Calm, Normal, Repetition 01 Actor_06   0.7294
Actor 06, Calm, Strong, Repetition 01 Actor_06   0.7514
Actor 06, Happy, Normal, Repetition 01 Actor_06   0.6801
Actor 06, Happy, Strong, Repetition 01 Actor_06   0.5477
Actor 06, Angry, Normal, Repetition 01 Actor_06   0.7065


Processing Enrollment Files: 100%|███████████████████████████████████████████████████████| 6/6 [00:14<00:00,  2.47s/it]


Actor 06, Angry, Strong, Repetition 01 Actor_06   0.6546

--- Generating Scores for 1:N Test Trials ---


Generating 1:N Test Scores:   0%|                                                                | 0/6 [00:00<?, ?it/s]

Actor 01, Calm, Normal, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.7423 | Status: GENUINE
Actor 01, Calm, Strong, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.7867 | Status: GENUINE
Actor 01, Happy, Normal, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.7854 | Status: GENUINE
Actor 01, Happy, Strong, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.6680 | Status: GENUINE
Actor 01, Angry, Normal, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.7347 | Status: GENUINE


Generating 1:N Test Scores:  17%|█████████▎                                              | 1/6 [00:02<00:12,  2.55s/it]

Actor 01, Angry, Strong, Repetition 01 | True Speaker: Actor_01   | Best Match: Actor_01   | Score: 0.5512 | Status: GENUINE
Actor 02, Calm, Normal, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.6408 | Status: GENUINE
Actor 02, Calm, Strong, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.6663 | Status: GENUINE
Actor 02, Happy, Normal, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.7390 | Status: GENUINE
Actor 02, Happy, Strong, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.7165 | Status: GENUINE
Actor 02, Angry, Normal, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.7023 | Status: GENUINE


Generating 1:N Test Scores:  33%|██████████████████▋                                     | 2/6 [00:05<00:10,  2.53s/it]

Actor 02, Angry, Strong, Repetition 01 | True Speaker: Actor_02   | Best Match: Actor_02   | Score: 0.5924 | Status: GENUINE
Actor 03, Calm, Normal, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.8258 | Status: GENUINE
Actor 03, Calm, Strong, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.7887 | Status: GENUINE
Actor 03, Happy, Normal, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.7558 | Status: GENUINE
Actor 03, Happy, Strong, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.7060 | Status: GENUINE
Actor 03, Angry, Normal, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.6774 | Status: GENUINE


Generating 1:N Test Scores:  50%|████████████████████████████                            | 3/6 [00:07<00:07,  2.62s/it]

Actor 03, Angry, Strong, Repetition 01 | True Speaker: Actor_03   | Best Match: Actor_03   | Score: 0.6122 | Status: GENUINE
Actor 04, Calm, Normal, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.8695 | Status: GENUINE
Actor 04, Calm, Strong, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.8403 | Status: GENUINE
Actor 04, Happy, Normal, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.8045 | Status: GENUINE
Actor 04, Happy, Strong, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.7520 | Status: GENUINE
Actor 04, Angry, Normal, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.6777 | Status: GENUINE


Generating 1:N Test Scores:  67%|█████████████████████████████████████▎                  | 4/6 [00:10<00:05,  2.62s/it]

Actor 04, Angry, Strong, Repetition 01 | True Speaker: Actor_04   | Best Match: Actor_04   | Score: 0.6347 | Status: GENUINE
Actor 05, Calm, Normal, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.6742 | Status: GENUINE
Actor 05, Calm, Strong, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.6826 | Status: GENUINE
Actor 05, Happy, Normal, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.6762 | Status: GENUINE
Actor 05, Happy, Strong, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.7258 | Status: GENUINE
Actor 05, Angry, Normal, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.6503 | Status: GENUINE


Generating 1:N Test Scores:  83%|██████████████████████████████████████████████▋         | 5/6 [00:13<00:02,  2.64s/it]

Actor 05, Angry, Strong, Repetition 01 | True Speaker: Actor_05   | Best Match: Actor_05   | Score: 0.5128 | Status: GENUINE
Actor 06, Calm, Normal, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.6851 | Status: GENUINE
Actor 06, Calm, Strong, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.6571 | Status: GENUINE
Actor 06, Happy, Normal, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.7246 | Status: GENUINE
Actor 06, Happy, Strong, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.6075 | Status: GENUINE
Actor 06, Angry, Normal, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.6920 | Status: GENUINE


Generating 1:N Test Scores: 100%|████████████████████████████████████████████████████████| 6/6 [00:15<00:00,  2.64s/it]

Actor 06, Angry, Strong, Repetition 01 | True Speaker: Actor_06   | Best Match: Actor_06   | Score: 0.6207 | Status: GENUINE

Total Genuine Test Trials: 36
Total Imposter/Misidentified Test Trials: 0





### Testing the pipeline on the rest of the dataset for further accuracy assestment
#### Printing tests with a score higher than 0.4 for manual inspection

In [36]:
print("\n--- Testing on UNKNOWN Speakers (Actors 07-24) with Full RAVDESS Criteria ---")

# Define ALL unenrolled actors
ALL_UNENROLLED_ACTORS = [f"Actor_{i:02d}" for i in range(7, 25)]

identification_threshold = 0.5
print(f"Using identification threshold: {identification_threshold:.4f}")

UNKNOWN_TEST_STATEMENT_ID = '01'
UNKNOWN_TEST_EMOTION_IDS = ['01', '02', '03', '04', '05', '06', '07', '08'] # All 8 emotions
UNKNOWN_TEST_REPETITION_IDS = ['01', '02'] # Both repetitions

unknown_actor_data_full = {}
print("\nCollecting unknown speaker audio files based on new criteria...")
for actor_id_num in tqdm(range(7, 25), desc="Collecting Unknown Actor Files"):
    actor_folder = f"Actor_{actor_id_num:02d}"
    actor_full_path = os.path.join(RAVDESS_PATH, actor_folder)

    actor_files_test = [] 

    if not os.path.exists(actor_full_path):
        continue

    for filename in os.listdir(actor_full_path):
        if not filename.endswith('.wav'):
            continue
        
        parts = filename.split('-')
        if len(parts) < 7: 
            continue

        modality, vocal_channel, emotion_code, intensity_code, statement_code, repetition_code, actor_num_str = \
            parts[0], parts[1], parts[2], parts[3], parts[4], parts[5], parts[6].split('.')[0]
        
        is_speech = (modality == '03' and vocal_channel == '01') # Audio-only speech
        is_statement_1 = (statement_code == UNKNOWN_TEST_STATEMENT_ID)
        is_valid_emotion = (emotion_code in UNKNOWN_TEST_EMOTION_IDS)
        is_valid_repetition = (repetition_code in UNKNOWN_TEST_REPETITION_IDS)

        is_valid_intensity = False
        if emotion_code == '01': # Neutral emotion (01) must be normal intensity (01)
            if intensity_code == '01':
                is_valid_intensity = True
        else: 
            if intensity_code in ['01', '02']:
                is_valid_intensity = True

        if (is_speech and is_statement_1 and is_valid_emotion and 
            is_valid_intensity and is_valid_repetition):
            
            full_file_path = os.path.join(actor_full_path, filename)
            actor_files_test.append(full_file_path)

    unknown_actor_data_full[f"Actor_{actor_id_num:02d}"] = {'test_files': actor_files_test}


imposter_rejections_total = 0
false_acceptances_total = 0 
false_acceptance_details = [] 

print("\nProcessing all unknown speaker test files with new criteria...")

printed_header_for_details = False

for true_actor_id, data in tqdm(unknown_actor_data_full.items(), desc="Testing All Unknown Actors"):
    if not data['test_files']:
        continue

    for test_file_path in data['test_files']:
        identified_id, score = identify_speaker(test_file_path, enrolled_voiceprints, threshold=identification_threshold)
        
        file_details = parse_ravdess_filename(test_file_path)
        filename_only = os.path.basename(test_file_path)
        
        status_text = ""
        is_false_acceptance = False
        if identified_id is not None: # If it was identified as someone (not None)
            false_acceptances_total += 1
            false_acceptance_details.append({
                'file_details': file_details,
                'filename': filename_only,
                'true_actor': true_actor_id,
                'identified_as': identified_id,
                'score': score
            })
            is_false_acceptance = True
            status_text = "False Acceptance"
        else:
            imposter_rejections_total += 1
            status_text = "Correctly Rejected (Imposter)"
        
        # Print correctly rejected cases ONLY if score > 0.4 and all False Acceptances
        if is_false_acceptance or (not is_false_acceptance and score > 0.4):
            if not printed_header_for_details:

                print(f"{'File Details':<40} {'True':<8} {'Identified':<10} {'Score':<8} {'Status'}")
                print("-" * 80)
                printed_header_for_details = True
            
            print(f"{file_details:<40} {true_actor_id:<8} {str(identified_id):<10} {score:<8.4f} {status_text}")


print("\n--- Summary for All Unknown Speaker Tests (Actors 07-24) ---")
print(f"Total Unknown Speaker Trials: {imposter_rejections_total + false_acceptances_total}")
print(f"Correct Imposter Rejections: {imposter_rejections_total}")
print(f"False Acceptances (Unknown speaker identified as enrolled): {false_acceptances_total}")

if false_acceptances_total > 0:
    fa_scores = [d['score'] for d in false_acceptance_details]
    print("\nFalse Acceptance Score Statistics:")
    print(f"  Min FA Score: {np.min(fa_scores):.4f}")
    print(f"  Mean FA Score: {np.mean(fa_scores):.4f}")
    print(f"  Max FA Score: {np.max(fa_scores):.4f}")
    
    print("\nDetails of False Acceptances (All FAs listed regardless of score):")

    print(f"{'Filename':<15} {'File Details':<40} {'True':<8} {'Identified':<10} {'Score':<8}")
    print("-" * 95)
    for detail in false_acceptance_details:

        print(f"{detail['filename']:<15} {detail['file_details']:<40} {detail['true_actor']:<8} {detail['identified_as']:<10} {detail['score']:<8.4f}")
else:
    print("\nNo False Acceptances detected. Excellent imposter rejection for all tested unknown speakers!")


--- Testing on UNKNOWN Speakers (Actors 07-24) with Full RAVDESS Criteria ---
Using identification threshold: 0.5000

Collecting unknown speaker audio files based on new criteria...


Collecting Unknown Actor Files: 100%|████████████████████████████████████████████████| 18/18 [00:00<00:00, 2578.38it/s]



Processing all unknown speaker test files with new criteria...


Testing All Unknown Actors:   0%|                                                               | 0/18 [00:00<?, ?it/s]

File Details                             True     Identified Score    Status
--------------------------------------------------------------------------------
Actor 07, Sad, Strong, Repetition 01     Actor_07 None       0.4204   Correctly Rejected (Imposter)
Actor 07, Disgust, Normal, Repetition 01 Actor_07 None       0.4203   Correctly Rejected (Imposter)


Testing All Unknown Actors:   6%|███                                                    | 1/18 [00:10<02:59, 10.55s/it]

Actor 08, Neutral, Normal, Repetition 01 Actor_08 None       0.4310   Correctly Rejected (Imposter)
Actor 08, Neutral, Normal, Repetition 01 Actor_08 None       0.4571   Correctly Rejected (Imposter)
Actor 08, Calm, Normal, Repetition 01    Actor_08 None       0.4652   Correctly Rejected (Imposter)
Actor 08, Calm, Normal, Repetition 01    Actor_08 None       0.4965   Correctly Rejected (Imposter)
Actor 08, Calm, Strong, Repetition 01    Actor_08 None       0.4737   Correctly Rejected (Imposter)
Actor 08, Calm, Strong, Repetition 01    Actor_08 None       0.4464   Correctly Rejected (Imposter)
Actor 08, Happy, Normal, Repetition 01   Actor_08 None       0.4360   Correctly Rejected (Imposter)
Actor 08, Happy, Normal, Repetition 01   Actor_08 None       0.4951   Correctly Rejected (Imposter)
Actor 08, Happy, Strong, Repetition 01   Actor_08 None       0.4766   Correctly Rejected (Imposter)
Actor 08, Happy, Strong, Repetition 01   Actor_08 None       0.4707   Correctly Rejected (Imposter)


Testing All Unknown Actors:  11%|██████                                                 | 2/18 [00:22<02:57, 11.11s/it]

Actor 08, Surprised, Strong, Repetition 01 Actor_08 None       0.4139   Correctly Rejected (Imposter)


Testing All Unknown Actors:  22%|████████████▏                                          | 4/18 [00:48<02:55, 12.56s/it]

Actor 11, Neutral, Normal, Repetition 01 Actor_11 None       0.4320   Correctly Rejected (Imposter)
Actor 11, Neutral, Normal, Repetition 01 Actor_11 None       0.4652   Correctly Rejected (Imposter)
Actor 11, Calm, Normal, Repetition 01    Actor_11 None       0.4223   Correctly Rejected (Imposter)
Actor 11, Calm, Normal, Repetition 01    Actor_11 None       0.4670   Correctly Rejected (Imposter)
Actor 11, Calm, Strong, Repetition 01    Actor_11 None       0.4490   Correctly Rejected (Imposter)
Actor 11, Happy, Normal, Repetition 01   Actor_11 None       0.4868   Correctly Rejected (Imposter)
Actor 11, Happy, Strong, Repetition 01   Actor_11 None       0.4222   Correctly Rejected (Imposter)
Actor 11, Sad, Normal, Repetition 01     Actor_11 Actor_01   0.5737   False Acceptance
Actor 11, Sad, Normal, Repetition 01     Actor_11 Actor_01   0.5300   False Acceptance
Actor 11, Angry, Strong, Repetition 01   Actor_11 None       0.4261   Correctly Rejected (Imposter)
Actor 11, Angry, Strong, R

Testing All Unknown Actors:  33%|██████████████████▎                                    | 6/18 [01:15<02:37, 13.17s/it]

Actor 13, Angry, Strong, Repetition 01   Actor_13 None       0.4181   Correctly Rejected (Imposter)
Actor 13, Fearful, Strong, Repetition 01 Actor_13 Actor_03   0.5076   False Acceptance


Testing All Unknown Actors:  39%|█████████████████████▍                                 | 7/18 [01:27<02:19, 12.66s/it]

Actor 14, Neutral, Normal, Repetition 01 Actor_14 None       0.4719   Correctly Rejected (Imposter)
Actor 14, Neutral, Normal, Repetition 01 Actor_14 None       0.4100   Correctly Rejected (Imposter)
Actor 14, Calm, Strong, Repetition 01    Actor_14 None       0.4210   Correctly Rejected (Imposter)
Actor 14, Happy, Normal, Repetition 01   Actor_14 None       0.4309   Correctly Rejected (Imposter)
Actor 14, Angry, Normal, Repetition 01   Actor_14 None       0.4108   Correctly Rejected (Imposter)
Actor 14, Surprised, Normal, Repetition 01 Actor_14 Actor_04   0.5132   False Acceptance
Actor 14, Surprised, Normal, Repetition 01 Actor_14 None       0.4700   Correctly Rejected (Imposter)


Testing All Unknown Actors:  44%|████████████████████████▍                              | 8/18 [01:40<02:07, 12.78s/it]

Actor 14, Surprised, Strong, Repetition 01 Actor_14 None       0.4021   Correctly Rejected (Imposter)


Testing All Unknown Actors:  56%|██████████████████████████████                        | 10/18 [02:05<01:42, 12.80s/it]

Actor 17, Fearful, Strong, Repetition 01 Actor_17 None       0.4265   Correctly Rejected (Imposter)
Actor 17, Fearful, Strong, Repetition 01 Actor_17 None       0.4299   Correctly Rejected (Imposter)


Testing All Unknown Actors:  61%|█████████████████████████████████                     | 11/18 [02:18<01:29, 12.82s/it]

Actor 18, Calm, Normal, Repetition 01    Actor_18 None       0.4254   Correctly Rejected (Imposter)
Actor 18, Sad, Strong, Repetition 01     Actor_18 None       0.4011   Correctly Rejected (Imposter)
Actor 18, Angry, Normal, Repetition 01   Actor_18 None       0.4103   Correctly Rejected (Imposter)
Actor 18, Angry, Normal, Repetition 01   Actor_18 None       0.4317   Correctly Rejected (Imposter)
Actor 18, Disgust, Strong, Repetition 01 Actor_18 None       0.4722   Correctly Rejected (Imposter)
Actor 18, Surprised, Strong, Repetition 01 Actor_18 None       0.4274   Correctly Rejected (Imposter)


Testing All Unknown Actors:  67%|████████████████████████████████████                  | 12/18 [02:31<01:16, 12.73s/it]

Actor 18, Surprised, Strong, Repetition 01 Actor_18 None       0.4750   Correctly Rejected (Imposter)
Actor 19, Neutral, Normal, Repetition 01 Actor_19 Actor_01   0.5439   False Acceptance
Actor 19, Neutral, Normal, Repetition 01 Actor_19 None       0.4637   Correctly Rejected (Imposter)
Actor 19, Calm, Normal, Repetition 01    Actor_19 None       0.4068   Correctly Rejected (Imposter)
Actor 19, Calm, Normal, Repetition 01    Actor_19 None       0.4840   Correctly Rejected (Imposter)
Actor 19, Calm, Strong, Repetition 01    Actor_19 None       0.4028   Correctly Rejected (Imposter)
Actor 19, Happy, Normal, Repetition 01   Actor_19 None       0.4353   Correctly Rejected (Imposter)
Actor 19, Happy, Strong, Repetition 01   Actor_19 None       0.4473   Correctly Rejected (Imposter)
Actor 19, Surprised, Normal, Repetition 01 Actor_19 None       0.4380   Correctly Rejected (Imposter)
Actor 19, Surprised, Strong, Repetition 01 Actor_19 None       0.4834   Correctly Rejected (Imposter)


Testing All Unknown Actors:  72%|███████████████████████████████████████               | 13/18 [02:44<01:05, 13.00s/it]

Actor 20, Neutral, Normal, Repetition 01 Actor_20 None       0.4035   Correctly Rejected (Imposter)
Actor 20, Happy, Normal, Repetition 01   Actor_20 None       0.4185   Correctly Rejected (Imposter)
Actor 20, Sad, Normal, Repetition 01     Actor_20 None       0.4028   Correctly Rejected (Imposter)
Actor 20, Sad, Normal, Repetition 01     Actor_20 None       0.4110   Correctly Rejected (Imposter)
Actor 20, Fearful, Normal, Repetition 01 Actor_20 None       0.4074   Correctly Rejected (Imposter)
Actor 20, Disgust, Strong, Repetition 01 Actor_20 None       0.4182   Correctly Rejected (Imposter)


Testing All Unknown Actors:  78%|██████████████████████████████████████████            | 14/18 [02:57<00:51, 12.78s/it]

Actor 21, Sad, Normal, Repetition 01     Actor_21 None       0.4341   Correctly Rejected (Imposter)
Actor 21, Sad, Normal, Repetition 01     Actor_21 None       0.4635   Correctly Rejected (Imposter)
Actor 21, Surprised, Normal, Repetition 01 Actor_21 None       0.4222   Correctly Rejected (Imposter)
Actor 21, Surprised, Normal, Repetition 01 Actor_21 None       0.4013   Correctly Rejected (Imposter)
Actor 21, Surprised, Strong, Repetition 01 Actor_21 None       0.4448   Correctly Rejected (Imposter)


Testing All Unknown Actors:  83%|█████████████████████████████████████████████         | 15/18 [03:09<00:38, 12.74s/it]

Actor 21, Surprised, Strong, Repetition 01 Actor_21 None       0.4106   Correctly Rejected (Imposter)


Testing All Unknown Actors:  89%|████████████████████████████████████████████████      | 16/18 [03:21<00:24, 12.47s/it]

Actor 23, Neutral, Normal, Repetition 01 Actor_23 Actor_01   0.5640   False Acceptance
Actor 23, Neutral, Normal, Repetition 01 Actor_23 Actor_01   0.5306   False Acceptance
Actor 23, Calm, Normal, Repetition 01    Actor_23 Actor_01   0.5007   False Acceptance
Actor 23, Calm, Normal, Repetition 01    Actor_23 Actor_01   0.5955   False Acceptance
Actor 23, Calm, Strong, Repetition 01    Actor_23 Actor_01   0.5213   False Acceptance
Actor 23, Calm, Strong, Repetition 01    Actor_23 Actor_01   0.5434   False Acceptance
Actor 23, Happy, Normal, Repetition 01   Actor_23 Actor_01   0.6119   False Acceptance
Actor 23, Happy, Normal, Repetition 01   Actor_23 Actor_01   0.6359   False Acceptance
Actor 23, Happy, Strong, Repetition 01   Actor_23 Actor_01   0.5205   False Acceptance
Actor 23, Happy, Strong, Repetition 01   Actor_23 Actor_01   0.5779   False Acceptance
Actor 23, Sad, Normal, Repetition 01     Actor_23 Actor_01   0.5101   False Acceptance
Actor 23, Sad, Normal, Repetition 01     Ac

Testing All Unknown Actors:  94%|███████████████████████████████████████████████████   | 17/18 [03:33<00:12, 12.24s/it]

Actor 23, Surprised, Strong, Repetition 01 Actor_23 Actor_01   0.5670   False Acceptance
Actor 24, Fearful, Strong, Repetition 01 Actor_24 None       0.4751   Correctly Rejected (Imposter)


Testing All Unknown Actors: 100%|██████████████████████████████████████████████████████| 18/18 [03:46<00:00, 12.56s/it]


--- Summary for All Unknown Speaker Tests (Actors 07-24) ---
Total Unknown Speaker Trials: 540
Correct Imposter Rejections: 517
False Acceptances (Unknown speaker identified as enrolled): 23

False Acceptance Score Statistics:
  Min FA Score: 0.5005
  Mean FA Score: 0.5457
  Max FA Score: 0.6359

Details of False Acceptances (All FAs listed regardless of score):
Filename        File Details                             True     Identified Score   
-----------------------------------------------------------------------------------------------
03-01-04-01-01-01-11.wav Actor 11, Sad, Normal, Repetition 01     Actor_11 Actor_01   0.5737  
03-01-04-01-01-02-11.wav Actor 11, Sad, Normal, Repetition 01     Actor_11 Actor_01   0.5300  
03-01-06-02-01-02-13.wav Actor 13, Fearful, Strong, Repetition 01 Actor_13 Actor_03   0.5076  
03-01-08-01-01-01-14.wav Actor 14, Surprised, Normal, Repetition 01 Actor_14 Actor_04   0.5132  
03-01-01-01-01-01-19.wav Actor 19, Neutral, Normal, Repetition 01 Acto




* The model seems to confuse Actor 1 and Actor 23. It confuses other actors as well, but with a lower margin.
* The model has 95.7% accuracy on unseen actors.
* The detection threshold can be adjusted to prioritize eliminating False Acceptances.