Skip to content

How to interpret the training result: high Loss, low WER? #4423

Answered by titu1994
psydok asked this question in Q&A
Discussion options

You must be logged in to vote

Your model is overfitting severely, see the training loss continue to go down but Val loss spike. You should use more spec augment (say 5 time masks rather than 2) to slow down overfitting.

Another thing is your train duration distribution is very large - max duration of 55 seconds limits your batch size too much. Conformer requires a minimum global batch size of at least 256 to converge stably with CTC loss.

Val loss and val wer are loosely correlated for ASR training, which is one of the reasons we directly select model using wer as metric rather than loss. However I've not seen such a large divergence still be able to reduce wer.

The fact that train wer is so much lower means that ther…

Replies: 6 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by ericharper
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@psydok
Comment options

@titu1994
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@titu1994
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #4351 on June 22, 2022 16:38.