Skip to content

Tensorflow-2 Implementation of the paper Siamese Capsule Network for End-to-End Speaker Recognition in the Wild

Notifications You must be signed in to change notification settings

ND15/SiameseSpeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Siamese Capsule Network for End-to-End Speaker Recognition in the Wild

This repository is an implementation of the paper Siamese Capsule Network for End-to-End Speaker Recognition in the Wild (Paper link). This repository contains the implementation for the front-end part of the model and the back-end part needs to be implemented. I have changed some of the parameters from the paper which includes window length, hop size, etc.

Comments

One of the drawbacks of the siamese network is that for a dataset with N samples, the dataset preprocessor will make the dataset size N x N and hence requires more computational power and also more training time. So with a bigger window length, the dimensions of the spectrograms would also increase and will take a huge amount of space on disk.

In my implementation I have used a customised version of Vox Celeb Dataset. This dataset contains only the recordings of the Indian celebrities, further for the ease of implementation for each speaker I took only 25-30 recordings.

Updates

Links

About

Tensorflow-2 Implementation of the paper Siamese Capsule Network for End-to-End Speaker Recognition in the Wild

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages