You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In https://discourse.mozilla.org/t/custom-voice-tts-not-learning/40897/5, @erogol mentioned that a way to weed out bad samples in the data is to run the training network on the data to see which have the highest loss. Is there any easy way to see this? I am taking the comment to mean that we'd need to narrow the training list to just a few files at a time, run training, and check the loss value; then repeat for each handful of sample files to see a pattern. If so, that could take quite some time. Unless there is a report or something that I'm not aware of?
As we all know, training data set quality is the biggest factor influencing training. So, anything we can do to flag sub-optimal training samples that the CheckDataset notebook otherwise doesn't flag would be ideal.
To that end, is there any opportunity for the model to track and spit out a coincidence report of files to the average loss with those files? In other words, what if the training process tracked the average loss value observed each time each file is in a batch. Over time, that could be used to drive a heatmap of which files happen to be coincident with higher loss. That way, users would quickly identify the outliers in the data set that are contributing most to the loss.
The text was updated successfully, but these errors were encountered:
I dont think it is a part of what is aimed by TTS . Being said that it is easy to hack it in the training code. Just take the loss values for the whole epoch and report them in a sorted order.
In https://discourse.mozilla.org/t/custom-voice-tts-not-learning/40897/5, @erogol mentioned that a way to weed out bad samples in the data is to run the training network on the data to see which have the highest loss. Is there any easy way to see this? I am taking the comment to mean that we'd need to narrow the training list to just a few files at a time, run training, and check the loss value; then repeat for each handful of sample files to see a pattern. If so, that could take quite some time. Unless there is a report or something that I'm not aware of?
As we all know, training data set quality is the biggest factor influencing training. So, anything we can do to flag sub-optimal training samples that the CheckDataset notebook otherwise doesn't flag would be ideal.
To that end, is there any opportunity for the model to track and spit out a coincidence report of files to the average loss with those files? In other words, what if the training process tracked the average loss value observed each time each file is in a batch. Over time, that could be used to drive a heatmap of which files happen to be coincident with higher loss. That way, users would quickly identify the outliers in the data set that are contributing most to the loss.
The text was updated successfully, but these errors were encountered: