Unstable training #388
Replies: 3 comments 27 replies
-
@WeberJulian which Attention mechanism are you using? Are you using T1 or T2? For me, training with T1 GST only yielded good results with the original attention, I think Graves has a bug with T1 - #383 and the Dynamic Convolution yielded noise after 80k steps or so. Not sure about the dynamic conv but original attention definitely works, Graves just needs debugging. |
Beta Was this translation helpful? Give feedback.
-
you can try Also |
Beta Was this translation helpful? Give feedback.
-
So I tried with |
Beta Was this translation helpful? Give feedback.
-
I'm training on french mailabs with GST, speaker embedding and mixed precision enabled and my training is very chaotic. (dev)
Here is my tensorboard: (the blue part correspond to a continue_training at 10k)
The test samples speak for themselves:
Here is 13.8k (very good):
https://sndup.net/6h5y
Here is 15.7k (hell):
https://sndup.net/7qmz
here is a link to my config:
https://pastebin.com/2iCygE62
Beta Was this translation helpful? Give feedback.
All reactions