[QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples #5690

dbkest · 2024-03-01T04:49:28Z

Hello, Could you help me solve a question, please? In the script espnet2/samplers/num_elements_batch_sampler.py, the method of dividing batches based on bin is implemented, and each batch loses the randomness of the samples. Does that greatly affect the training effect (although there is randomness between batches)? For TTS task, it will first sort the data globally and divide each batch of samples based on the sorting and bin. This process will be completed before training and will only be executed once (espnet2/tasks/abs_task.py: build_batch_sampler). The training process only shuffles all batches.

I sincerely await your reply.

sw005320 · 2024-03-01T12:33:17Z

Good question.
The reason for this implementation is to make the balance of random shuffling and GPU memory usage.

Actually, we had an experiment before (7 years ago) for ASR between utterance-level shuffling and batch-level shuffling, and the difference was marginal (but this experiment causes different effective batch sizes, and the comparison could have been better).
Also, some people even sort it from short to long for all utterances and report that it is better (due to curriculum learning effects).
So, the entirely random shuffling may not be needed.

However, this is an old experience.
Nowadays, many technologies have changed, and we may have different conclusions. It's worth revisiting.
Also, we started to use fixed-length utterances (with padding) in some projects, where we can perform random shuffling for all utterances.

It would be great if you could do some investigations.

dbkest added the Question Question label Mar 1, 2024

dbkest changed the title ~~[QUESTION] [TTS] num_elements_batch_sampler loses the randomness of the samples~~ [QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples #5690

[QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples #5690

dbkest commented Mar 1, 2024

sw005320 commented Mar 1, 2024

[QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples #5690

[QUESTION] [TTS] 'num_elements_batch_sampler' loses the randomness of the samples #5690

Comments

dbkest commented Mar 1, 2024

sw005320 commented Mar 1, 2024