-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paper details #3
Comments
|
Hi @cyrta Thanks for the reply.
Also, what is the input shape to the network (or shape of input STFT). These details are not present in the paper. If possible can you give Keras model summary which clears the confusion? Regards. |
Hi @cyrta, Can you please elaborate this paragraph in the paper. This is my understanding please correct me if I am wrong.
we use activations from the last layer of neural network as speaker embeddings
. This is weird because the last layer would be softmax layer according to the loss function of the network. Or you meant to say that there is a dense layer with sigmoid activation before softmax layer and its activation are used as speaker embeddings. What is the size of the embeddings that are being extracted ?Thanks.
The text was updated successfully, but these errors were encountered: