Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] batch renormalization finished #65

Open
wants to merge 30 commits into
base: svd_draft
Choose a base branch
from

Conversation

GaofengCheng
Copy link

@GaofengCheng GaofengCheng commented Jan 5, 2019

I have finished the draft of batch-renorm.
I tested it under Switchboard + 2-layer-TDNN-F(without dropout) + 3epoch + 64 batch-size
A: batch-renorm r:1.0 d:0.0
B: batch-renorm r:1.0 d:0.0 -> r:1.2 d:0.4 (at iter 8) -> r:1.6 d:0.8 (at iter 45)

 A         B
# WER on train_dev(tg)      18.37     18.47
# WER on train_dev(fg)      16.69     16.79
# WER on eval2000(tg)        20.7      20.7
# WER on eval2000(fg)        18.8      18.6
# WER on rt03(tg)            25.7      25.6
# WER on rt03(fg)            22.2      22.2
# Final train prob         -0.121    -0.113
# Final valid prob         -0.130    -0.128
# Final train prob (xent)        -2.545    -2.525
# Final valid prob (xent)       -2.4666   -2.4552
# Num-parameters               20992380  20992380

Some notes:

  1. I have verified that the batch-renorm with r-1.0 d-0.0 performs almost the same as the previous batch norm
  2. for batch renorm, I picked the moving averages of one of the parallel jobs instead of averaging them
  3. for debuggin, i kept the sum_mean_/uvar_ of batch-renorm, but they are useless for batch-renorm, we can remove them later
  4. I have checked my derivatives of BP, I should do it right. Maybe you can double check with the original equations from the authors (I have checked it, but this is a fundamental component, I think it's necessary to do it twice).

keli78 and others added 26 commits December 5, 2018 23:17
…r#2907)

[scripts] Fix bug related to multi-task in train_raw_rnn.py. Thx:tessfu2001@gmail.com
…aldi-asr#2947)

note: if this breaks someone's build we'll have to debug it then.
struct Memo {
// number of frames (after any reshaping).
int32 num_frames;
// 'sum_sumsq_scale' is of dimension 5 by block_dim_:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to keep the documentation up to date!
But I may rewrite parts of this so this may no end up mattering.
My concern is that the original formulation of BatchNorm does not make sense when minibatch sizes differ and where the stats may differ substantially (e.g. because the language differs).

WriteToken(os, binary, "</BatchRenormComponent>");
}

void BatchRenormComponent::Scale(BaseFloat scale) {
void BatchRenormComponent::Scale_Training(BaseFloat scale) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is against the Google style guide; should be ScaleTraining.
However, I think it would be better to just use the regular Scale() function, you can see how I've done it in my version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet