Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.wav inputs specifics #49

Closed
loregagliard opened this issue Dec 17, 2018 · 6 comments
Closed

.wav inputs specifics #49

loregagliard opened this issue Dec 17, 2018 · 6 comments

Comments

@loregagliard
Copy link

Hi guys,
I have a question regarding the input wav files used for training.
What are the audio format specifications?
I used voxceleb ( http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ ) as dataset, but it is giving me some troubles.
Do you know about any other usable dataset?

Thank you ;)

@imranparuk
Copy link

Voxceleb is a good one, could you be more specific on what issues you are having?
From what I know, you need mono wav files for one. The shape needs to be 2 dimensional.

@loregagliard
Copy link
Author

loregagliard commented Dec 20, 2018

Does it work for you?
Because I am checking the files and sometimes there are noisy audios, even though the interviewed talks most of the time (like the interviewer talking, a guitar playing, ecc...).
I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.
With larger batches I don't have much more fortune.
I know that the dataset has to have a Voice Activity Detection to remove silence and be effective, maybe it is that. What algorithm did you use?
I'd like to know if there were boundaries on the 'quality' of the audio files.

@MSAlghamdi
Copy link

@loregagliard

I made it run with a batch size of 3 and it gives me a train accuracy of 100 or 0,no middle values.

I have the same issue due to the features map values. Did you use input_feature.py published with the project as it is? If you did, then the problem is in the input features coming out from input_feature.py. I think (correct me if I'm wrong) that's because it uses the log-energy which most likely negative values.

Please let me know once you solve this issue since I'm stuck with it.

@imranparuk
Copy link

Hi guys, if you read the paper the author does do VAD, however he did state it was done in Matlab if I'm not mistaken. You will be able to find some VAD solutions in python but they do not produce good results. My advice is to not worry about the VAD. The models will work without them. Please try out the keras implication here - > https://github.com/imranparuk/speaker-recognition-3d-cnn and see if that works for you. It's a working progress, then if that works it will help you understand what is being accomplished in this repository.

@loregagliard
Copy link
Author

loregagliard commented Jan 7, 2019

@MSAlghamdi

Did you use input_feature.py published with the project as it is?

Yes, the input_feature.py file is the same as the project.
I just added code to generate the hdf5 files, 'development_sample_dataset_speaker.hdf5' and 'enrollment-evaluation_sample_dataset.hdf5'. I took the code from other issues discussions here.
Just to be completely clear, VoxCeleb appears to me as a directory containing directories of various identities.
Each of those speaker-directories cointains sub-directories containing the wav files (finally!).
Thus I generated the dataset by copying the audios and adding the name of the speaker-directory and the sub-directory to the file name.
The speaker labels has been generated by applying ASCII table to the name of the speaker-directories and then reindexed to be 0,1,2,3,... .
The audios have a duration range from a bunch of seconds to minutes.
Maybe should I merge the audio of a certain speaker?
Anyway I chose the even files to be my training set and the odds to be the testing set, so that each speaker have a sufficient number of audios.
Is there a way to feed just a feature to the network and see what is the outcome?

Thank you! ( and happy new year!!)

@astorfi
Copy link
Owner

astorfi commented Jan 7, 2019

Dear all,

Please refer to the Pytorch Implementation which uses VoxCeleb dataset.

@astorfi astorfi closed this as completed Jan 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants