Feature Request: Loss coincidence report for sample data #390

GuyPaddock · 2021-03-18T18:01:52Z

In https://discourse.mozilla.org/t/custom-voice-tts-not-learning/40897/5, @erogol mentioned that a way to weed out bad samples in the data is to run the training network on the data to see which have the highest loss. Is there any easy way to see this? I am taking the comment to mean that we'd need to narrow the training list to just a few files at a time, run training, and check the loss value; then repeat for each handful of sample files to see a pattern. If so, that could take quite some time. Unless there is a report or something that I'm not aware of?

As we all know, training data set quality is the biggest factor influencing training. So, anything we can do to flag sub-optimal training samples that the CheckDataset notebook otherwise doesn't flag would be ideal.

To that end, is there any opportunity for the model to track and spit out a coincidence report of files to the average loss with those files? In other words, what if the training process tracked the average loss value observed each time each file is in a batch. Over time, that could be used to drive a heatmap of which files happen to be coincident with higher loss. That way, users would quickly identify the outliers in the data set that are contributing most to the loss.

erogol · 2021-03-19T10:54:15Z

I dont think it is a part of what is aimed by TTS . Being said that it is easy to hack it in the training code. Just take the loss values for the whole epoch and report them in a sorted order.

erogol closed this as completed Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Loss coincidence report for sample data #390

Feature Request: Loss coincidence report for sample data #390

GuyPaddock commented Mar 18, 2021 •

edited

erogol commented Mar 19, 2021

Feature Request: Loss coincidence report for sample data #390

Feature Request: Loss coincidence report for sample data #390

Comments

GuyPaddock commented Mar 18, 2021 • edited

erogol commented Mar 19, 2021

GuyPaddock commented Mar 18, 2021 •

edited