-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[src] batch renormalization finished #65
base: svd_draft
Are you sure you want to change the base?
[src] batch renormalization finished #65
Conversation
…r#2907) [scripts] Fix bug related to multi-task in train_raw_rnn.py. Thx:tessfu2001@gmail.com
…dlium, in accordance with the IS17 paper. (kaldi-asr#2774)
…di-asr#2945) thx: Maxim Korenevsky.
…aldi-asr#2947) note: if this breaks someone's build we'll have to debug it then.
struct Memo { | ||
// number of frames (after any reshaping). | ||
int32 num_frames; | ||
// 'sum_sumsq_scale' is of dimension 5 by block_dim_: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to keep the documentation up to date!
But I may rewrite parts of this so this may no end up mattering.
My concern is that the original formulation of BatchNorm does not make sense when minibatch sizes differ and where the stats may differ substantially (e.g. because the language differs).
WriteToken(os, binary, "</BatchRenormComponent>"); | ||
} | ||
|
||
void BatchRenormComponent::Scale(BaseFloat scale) { | ||
void BatchRenormComponent::Scale_Training(BaseFloat scale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is against the Google style guide; should be ScaleTraining.
However, I think it would be better to just use the regular Scale() function, you can see how I've done it in my version.
I have finished the draft of batch-renorm.
I tested it under Switchboard + 2-layer-TDNN-F(without dropout) + 3epoch + 64 batch-size
A: batch-renorm r:1.0 d:0.0
B: batch-renorm r:1.0 d:0.0 -> r:1.2 d:0.4 (at iter 8) -> r:1.6 d:0.8 (at iter 45)
Some notes: