# Speaker diarization.
In this project we wish to do Speaker Diarization. Specifically we wish to build custom pipelines in order to answer the following questions:

- Which clustering algorithms are best for speaker diarization?

- Does Clustering Algorithms on Deep Neural Network embeddings outperform traditional clustering algorithms?

- Can end-to-end Deep Neural Network models outperform traditional clustering algorithms?

The ground truth are the RTTM files. The RTTM files are in the following format:
```

SPEAKER <NA> 1 0.00 0.39 <NA> <NA> spk_0 <NA>
SPEAKER <NA> 1 0.39 0.01 <NA> <NA> spk_1 <NA>

```
The first number is the start time, the second number is the duration, and the last number is the speaker id.



    

### All imports 

In [6]:
# Typical imports
from tqdm import tqdm
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import os
from dotenv import load_dotenv
import speechbrain as sb

# Scipy
from scipy.spatial.distance import cdist


# All Pyannote imports
from pyannote.audio import Pipeline
from pyannote.core import Segment, Annotation
from pyannote.audio import Model, Inference
from pyannote.audio import Audio

# Loading the envs
load_dotenv("auths.env")
api_key = os.getenv("API_KEY")

### Loading the train data and the ground truth on the train data
- Below is a playground to load the train data and the ground truth on the train data for one of the files. Later this will be done on all the files. 

In [2]:
train_data_path = "../Dataset/Audio/Dev"
train_label_path = "../Dataset/RTTMs/Dev"

# Experimental data --> just one audio file and its corresponding label
dummy_train_data_path = "../Dataset/Audio/Dev/ahnss.wav"
dummy_train_label_path = "../Dataset/RTTMs/Dev/ahnss.rttm"


# Load the pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/voice-activity-detection", use_auth_token=api_key
)
pipeline.to(torch.device("cuda"))

# 1. Voice Activity Detection
vad_pipeline = pipeline(dummy_train_data_path)
vad_timeline = vad_pipeline.get_timeline().support()

# 2. Overlapped speech detection
osd_pipeline = Pipeline.from_pretrained(
    "pyannote/overlapped-speech-detection", use_auth_token=api_key
)
output = osd_pipeline(dummy_train_data_path)
osd_timeline = output.get_timeline().support()

# Combine the two timelines 
combined_timeline = vad_timeline.union(osd_timeline)
combined_annotation = Annotation()
for segment in combined_timeline:
    combined_annotation[segment] = "speech"



Lightning automatically upgraded your loaded checkpoint from v1.1.3 to v2.2.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\rakin\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\059e96f964841d40f1a5e755bb7223f76666bba4\pytorch_model.bin`


Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.7.1, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.


Lightning automatically upgraded your loaded checkpoint from v1.1.3 to v2.2.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\rakin\.cache\torch\pyannote\models--pyannote--segmentation\snapshots\059e96f964841d40f1a5e755bb7223f76666bba4\pytorch_model.bin`


Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.7.1, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.
