Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerating evaluation speed #4

Closed
dopiwoo opened this issue Oct 28, 2021 · 8 comments
Closed

Accelerating evaluation speed #4

dopiwoo opened this issue Oct 28, 2021 · 8 comments

Comments

@dopiwoo
Copy link

dopiwoo commented Oct 28, 2021

During evaluation, the current implementation calculates the similarity scores one by one using a for loop, that could be slow when the size of "lines" gets larger. Is there an elegant way of vectorizing it?

@TaoRuijie
Copy link
Owner

One method is to use the dataloader for evaluation.

Considering we have M utterances in the evaluation set, we sort it by the length of the utterances. In each minibatch, we select the data with a similar length (all utterances in the same batch will be cut into the same length) to extract the features. That will be faster.

I did not do that in this project. You can check it in another project:(https://github.com/TaoRuijie/TalkNet_ASD/blob/main/dataLoader.py), line 96 to 104.

My explanation can be found here (https://github.com/TaoRuijie/TalkNet_ASD/blob/main/FAQ.md): "1.2 How to figure the variable length of data during training ? "

@dopiwoo
Copy link
Author

dopiwoo commented Oct 28, 2021

I have found an alternative way of doing this, by using torch.combinations.
Let's say list1 contains (utter1, utter2, ..., utter5), list2 contains (utter1, utter2, ..., utter5).
We want to compute all possible similiraty combinations between list1 and list2, so we need index (0, 0) ... (4, 4). One way without DataLoader could be 'torch.combinations(torch.arange(N), r=2, with_replacement=True)', where N is the number of utterances. After generating the index, the files(features) can be loaded with the correct labels.

@TaoRuijie
Copy link
Owner

I understand. While if I understand correctly, the utilization of the GPU is still the same. Each time you can only feed one utterance into the GPU to extract the features. Am I right?

@dopiwoo
Copy link
Author

dopiwoo commented Oct 28, 2021

Yes.
Currectly, I do not modify anything before embedding, the wav are loaded one by one into GPU and the features are calculated separately during evaluation, which is inefficient. Only EER, minDCF are calculated in a vectorized form in GPU. Maybe I will further modify into a fully vectorized form (DataLoader) and submit a pull request?

@TaoRuijie
Copy link
Owner

Yes, may I ask how long you used for evaluation?

Because in my case, the evaluation in Vox1_O takes only 1+ mins, so I did not change it into Dataloader format. Because that will add more code.

For Vox1_E and Vox_H, it takes about 20 mins. But I only do it once after all the experiments, so I did not add the dataloader.

@dopiwoo
Copy link
Author

dopiwoo commented Oct 28, 2021

OK I see. I use a custom dataset where the number of comparison pairs are 7million+ which takes more than half an hour running ECAPAModel.py: L82-L91, so I have to rewrite calculation part of the evaluation score. Maybe it's good to remain unchanged if it works fast on VoxCeleb. (only 3k pairs? Not all possible utterance pairs are considered maybe.)

@TaoRuijie
Copy link
Owner

Oh, I understand your meaning and your proposed method.

I guess you mean: the most time-consuming part of your project is not to extract the speaker embedding for each utterance. But to compute the score between these embeddings.

Since that, I think the method you proposed is reasonable.

  1. If L57 to L79 is time-consuming, you can try the method I mentioned (Use dataloader to extract the embeddings parallelly )
  2. If L82 to L91 is time-consuming, the method you mentioned might work to save time.

Btw, I am sth supervised about that, 7million+pairs is a very huge number.

@dopiwoo
Copy link
Author

dopiwoo commented Oct 28, 2021

Yes, exactly. I have modified the full evaluation into vectorized form in GPU, and the EER of ECAPA-TDNN + ArcFace is better than my expectation. So I think the issue can be safely closed since training time is not a major issue on VoxCeleb.

@dopiwoo dopiwoo closed this as completed Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants