Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asking for clarification #12

Closed
decuvillo opened this issue Mar 7, 2019 · 3 comments
Closed

Asking for clarification #12

decuvillo opened this issue Mar 7, 2019 · 3 comments

Comments

@decuvillo
Copy link

Hi,
Thank you for the code that is extremely helpful. I have some questions please:

  1. In the paper, it is noticed that the reported results are both of speaker identification and verification. I presume that the accuracy stands for the identification result. But i'm still confused since the speakers in the training set are different from those of the testing set.

  2. what is the difference between select_batch.create_data_producer and stochastic_mini_batch?

  3. Which variable indicates the number of epochs used?

  4. In the paper, it is indicated that 64 utterances are used per mini-batch. Does it corresponds to candidates_per_batch variable that is set to 640?

  5. For running the code (training and testing). Is running train.py sufficient ? How much time it takes to get results please?

Thank you in advance

@Walleclipse
Copy link
Owner

Walleclipse commented Mar 13, 2019

Hi,
I am apologize to reply late.

  1. The core of this paper is speaker embedding. I think, if you try to use embedding to identification task (classification), you need to ensure the speaker in test data must be seen in train data. I think, identification result in the paper is evaluated in the same speakers. (mainly for pretraining procedure)

  2. random_batch.stochastic_mini_batch select the negative sample totally randomly. Thus, select_batch.create_data_producer create the batch with hard negative samples. (It is run in multiprocessing). You can check the issue11 and issue8

  3. Sorry, I do not record the number of epochs for training procedure. I just record the number of steps as grad_steps. You can calculate the epoch as epoch = grad_steps// (len(train_data)). PS: I do not set
    the maximum step (or epoch) to terminate the training. You can set by yourself, just modify the line 117 in train.py.

  4. candidates_per_batch is not need to corresponds to 64 utterances. Please check the select_batch.py.
    candidates_per_batch=640 means, in each step I have 640 candidate utterance, I will select the best 64 (or batch_size) utterances from them. candidates_per_batch can be adjusted by yourself.

  5. If you are already collected data, you just need to run train.py. For this repo, you can just clone this and run train.py (I already prepared the data in audio folder). I run the code in gpu, it takes about 4~5 hours.

@decuvillo
Copy link
Author

Thank you very much!

@Walleclipse
Copy link
Owner

You are welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants