Accelerating evaluation speed #4

dopiwoo · 2021-10-28T08:22:06Z

During evaluation, the current implementation calculates the similarity scores one by one using a for loop, that could be slow when the size of "lines" gets larger. Is there an elegant way of vectorizing it?

TaoRuijie · 2021-10-28T08:45:07Z

One method is to use the dataloader for evaluation.

Considering we have M utterances in the evaluation set, we sort it by the length of the utterances. In each minibatch, we select the data with a similar length (all utterances in the same batch will be cut into the same length) to extract the features. That will be faster.

I did not do that in this project. You can check it in another project:(https://github.com/TaoRuijie/TalkNet_ASD/blob/main/dataLoader.py), line 96 to 104.

My explanation can be found here (https://github.com/TaoRuijie/TalkNet_ASD/blob/main/FAQ.md): "1.2 How to figure the variable length of data during training ? "

dopiwoo · 2021-10-28T09:20:56Z

I have found an alternative way of doing this, by using torch.combinations.
Let's say list1 contains (utter1, utter2, ..., utter5), list2 contains (utter1, utter2, ..., utter5).
We want to compute all possible similiraty combinations between list1 and list2, so we need index (0, 0) ... (4, 4). One way without DataLoader could be 'torch.combinations(torch.arange(N), r=2, with_replacement=True)', where N is the number of utterances. After generating the index, the files(features) can be loaded with the correct labels.

TaoRuijie · 2021-10-28T09:28:31Z

I understand. While if I understand correctly, the utilization of the GPU is still the same. Each time you can only feed one utterance into the GPU to extract the features. Am I right?

dopiwoo · 2021-10-28T09:40:29Z

Yes.
Currectly, I do not modify anything before embedding, the wav are loaded one by one into GPU and the features are calculated separately during evaluation, which is inefficient. Only EER, minDCF are calculated in a vectorized form in GPU. Maybe I will further modify into a fully vectorized form (DataLoader) and submit a pull request?

TaoRuijie · 2021-10-28T09:51:19Z

Yes, may I ask how long you used for evaluation?

Because in my case, the evaluation in Vox1_O takes only 1+ mins, so I did not change it into Dataloader format. Because that will add more code.

For Vox1_E and Vox_H, it takes about 20 mins. But I only do it once after all the experiments, so I did not add the dataloader.

dopiwoo · 2021-10-28T09:56:48Z

OK I see. I use a custom dataset where the number of comparison pairs are 7million+ which takes more than half an hour running ECAPAModel.py: L82-L91, so I have to rewrite calculation part of the evaluation score. Maybe it's good to remain unchanged if it works fast on VoxCeleb. (only 3k pairs? Not all possible utterance pairs are considered maybe.)

TaoRuijie · 2021-10-28T10:04:48Z

Oh, I understand your meaning and your proposed method.

I guess you mean: the most time-consuming part of your project is not to extract the speaker embedding for each utterance. But to compute the score between these embeddings.

Since that, I think the method you proposed is reasonable.

If L57 to L79 is time-consuming, you can try the method I mentioned (Use dataloader to extract the embeddings parallelly )
If L82 to L91 is time-consuming, the method you mentioned might work to save time.

Btw, I am sth supervised about that, 7million+pairs is a very huge number.

dopiwoo · 2021-10-28T15:30:20Z

Yes, exactly. I have modified the full evaluation into vectorized form in GPU, and the EER of ECAPA-TDNN + ArcFace is better than my expectation. So I think the issue can be safely closed since training time is not a major issue on VoxCeleb.

dopiwoo closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating evaluation speed #4

Accelerating evaluation speed #4

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

Accelerating evaluation speed #4

Accelerating evaluation speed #4

Comments

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021

TaoRuijie commented Oct 28, 2021

dopiwoo commented Oct 28, 2021