# Speaker Verification using PyAnnote

This notebook compares two audio files and determines if they are spoken by the same person using a speaker embedding model from `pyannote.audio`.

It uses a pre-trained model from Hugging Face and compares embeddings with cosine similarity.


In [7]:

from pyannote.audio import Audio
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
from scipy.spatial.distance import cosine
import torchaudio
import torch

# Load speaker embedding model from Hugging Face
model = PretrainedSpeakerEmbedding("pyannote/embedding", device="cpu")
audio = Audio(sample_rate=16000)


c:\Users\Aadil\anaconda3\envs\voice-auth-project-env\lib\site-packages\pytorch_lightning\utilities\migration\migration.py:208: You have multiple `ModelCheckpoint` callback states in this checkpoint, but we found state keys that would end up colliding with each other after an upgrade, which means we can't differentiate which of your checkpoint callbacks needs which states. At least one of your `ModelCheckpoint` callbacks will not be able to reload the state.
Lightning automatically upgraded your loaded checkpoint from v1.2.7 to v2.5.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Aadil\.cache\torch\pyannote\models--pyannote--embedding\snapshots\4db4899737a38b2d618bbd74350915aa10293cb2\pytorch_model.bin`


Model was trained with pyannote.audio 0.0.1, yours is 3.3.2. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.5.1+cu121. Bad things might happen unless you revert torch to 1.x.


c:\Users\Aadil\anaconda3\envs\voice-auth-project-env\lib\site-packages\pytorch_lightning\core\saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['loss_func.W']


In [None]:
from scipy.spatial.distance import cosine

def get_embedding(file_path):
    waveform, sample_rate = audio(file_path)
    with torch.inference_mode():
        embedding = model(waveform)
    return embedding.squeeze()  

def compare_speakers(file1, file2, threshold=0.75):
    emb1 = get_embedding(file1)
    emb2 = get_embedding(file2)
    similarity = 1 - cosine(emb1, emb2)
    same_speaker = similarity >= threshold
    return similarity, same_speaker



In [17]:
file1 = "datasets/vox1_dev/id10001/1zcIwhmdeo4/00001.wav"
file2 = "datasets/vox1_dev/id11250/1BmQvhvvrhY/00001.wav"

similarity, same = compare_speakers(file1, file2)
print(f"Similarity: {similarity:.4f}")
print("Same speaker:" if same else "Different speakers")



Similarity: 0.1032
Different speakers
