Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with evaluation. #35

Closed
duynguyen5896 opened this issue Jul 14, 2018 · 16 comments
Closed

Problem with evaluation. #35

duynguyen5896 opened this issue Jul 14, 2018 · 16 comments

Comments

@duynguyen5896
Copy link

duynguyen5896 commented Jul 14, 2018

Hi @astorfi ,thank for your great work, i also use all the same settings but use hdf5 to store training data instead of Audio Dataset. However, my evaluation result is low, EER is up to 40%. I think there is something wrong with my work. Do you have any idea to fix this?
I use VoxCeleb dataset for background model and only use 1 sample per speaker.
50 people for enrollment, 50 for un-enrollment (reject).
4 samples for evaluation.

Thank for your help.

@SpongebBob
Copy link

SpongebBob commented Jul 16, 2018

I try ai-shell dataset the kaldi i-vector is around 2% eer.
But this 3D-convolutional-speaker-recognition is 20% eer.
I think it is not a so novel model actually.

@duynguyen5896
Copy link
Author

@SpongebBob , how can you get 20% eer in the evaluation phase, do you reuse this code?

@astorfi
Copy link
Owner

astorfi commented Jul 16, 2018

@duynguyen5896
I think one sample per speaker is not too much for training the background model.
What do you mean by unenrollment??

The samples for evaluations cannot project the correct statistics as well

@SpongebBob
Copy link

@duynguyen5896 more details #33

@astorfi
Copy link
Owner

astorfi commented Jul 16, 2018

@SpongebBob Please make sure to do the correct experiments ... This is a deep learning method and for a new dataset, it needs a lot of tweaking.

@astorfi
Copy link
Owner

astorfi commented Jul 16, 2018

@SpongebBob Please reopen #33 if the problem has not been resolved.

@duynguyen5896
Copy link
Author

duynguyen5896 commented Jul 16, 2018

@astorfi, unenrollment mean that rejection ( i don't enroll those people) and want to recognize if the model can classify them or not.
For the samples for evaluations, how many samples you use for evaluation per person?

@duynguyen5896
Copy link
Author

@SpongebBob can you update your evaluation source code? I don't understand how you evaluate the model through the #33.

@astorfi
Copy link
Owner

astorfi commented Jul 16, 2018

@duynguyen5896 I don't think SpongebBob did a similar experimental setup for 3D-Conv and Kaldi i-vector. 2% EER is not very realistic for 0.8 seconds of data and text-independent setting.

About VoxCeleb, I am trying to use Pytorch for the same setup. However, the VoxCeleb is huge and parameter tuning does not seem to be trivial.

@duynguyen5896
Copy link
Author

@astorfi Can you give me more detail about your experiment in enrollment and evaluation phase, i can see in your paper that you used 100 speakers for enrollment and evaluation.
Did you enroll all 100 speakers?

@astorfi
Copy link
Owner

astorfi commented Jul 18, 2018

Yes, all 100 speakers are enrolled. In the evaluation, different enrollments of the same speakers are used.

@duynguyen5896
Copy link
Author

@astorfi , For the enrollment phase, i see that you merge 20 utterances of 1 speaker. Are those utterances selected randomly or they are the continuous chain of speech?

I tried to enroll all 100 speakers and 50 enrolled-50 not enrolled for testing. However, the results are not good for both, the result seem to be regardless to the number of enrolled speakers, they are still about 40% EER

@astorfi
Copy link
Owner

astorfi commented Jul 21, 2018

@duynguyen5896 For selecting utterances, either of the cases works. However, I did that selection randomly.

May I know why you are splitting like that? 50 enrolled and 50 unenrolled?
I think you are making a mistake. Please read the paper as I have to emphasize once again.

All 100 speakers must be used in enrollment and evaluation stages as we are comparing the known speakers with the speaker models. For unenrolled subjects, we do not have any model since this model is not end-to-end. Please make sure that you understand the speaker verification setup we are using.

@duynguyen5896
Copy link
Author

duynguyen5896 commented Jul 21, 2018

@astorfi Actually, i want to try if the model can predict unenrolled well or not. However, when i do the same setup as you, 100 enrolled the result also not good. I think the reason is different dataset.
Anyway, thank for your help and kindness.
I'm trying if the model work well on small dataset. I have a small dataset (only 46 speakers) and i want to try if the result will be better or not when using 10 utterances per sample. How you setup the 10 utterances model structure?

@astorfi
Copy link
Owner

astorfi commented Aug 22, 2018

@duynguyen5896 Yes, unfortunately, the dataset is not public and tune it for a new dataset needs tuning.
I don't think it works for small datasets.

@astorfi astorfi closed this as completed Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants