This repository is an implementation of the paper Siamese Capsule Network
for End-to-End Speaker Recognition in the Wild (Paper link).
This repository contains the implementation for the front-end part of the model
and the back-end part needs to be implemented. I have changed some of the parameters
from the paper which includes window length, hop size, etc.
One of the drawbacks of the siamese network is that for a dataset with N samples,
the dataset preprocessor will make the dataset size N x N and hence
requires more computational power and also more training time. So with a bigger window
length, the dimensions of the spectrograms would also increase and will take a huge amount
of space on disk.
In my implementation I have used a customised version of Vox Celeb Dataset. This dataset contains only the recordings of the Indian celebrities, further for the ease of implementation for each speaker I took only 25-30 recordings.
- Speaker-Verification a repo for DNN based speaker recognition and verification.