-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is ResNet 3f causal? #18
Comments
Hi,
The layer normalisation typically employed in networks is sequence-wise layer normalisation. The statistics are thus calculated over all the features of a frame and over the entire sequence. So you are right, layer normalisation is non-causal.
However, 3f is using frame-wise layer normalisation. The statistics are calculated only over the features of the frame for frame-wise layer normalisation. This means that no features from future frames are used for the current frame, giving us a causal form of layer normalisation.
Hope this helps.
…________________________________
From: cohendrake <notifications@github.com>
Sent: Tuesday, 21 January 2020 1:37 PM
To: anicolson/DeepXi <DeepXi@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: [anicolson/DeepXi] Is ResNet 3f causal? (#18)
Good job. As layer normalization is widely used in the ResNet 3f, I doubt that it's a causal network—‘future’ features are actually included. I've tried to remove all the layer normalizations and the results turn out to be much worse.
Is layer normalization dispensable?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#18?email_source=notifications&email_token=AGHGZ7SUVNUCBWUTBV3ZJZDQ6ZUXJA5CNFSM4KJM4SK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHQDEIA>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGHGZ7TEQ657AIJ4KEWXXZTQ6ZUXJANCNFSM4KJM4SKQ>.
|
Thanks! Do you mean this causal form of layer normalization is applied over different dimensions of the feature for the current frame? |
So the statistics are calculated over all of the dimensions of the current frame. e.g. over the 257 frequency bins of the current frame, giving a mean and variance both of size 1. Each of the frequency bins is then normalised using the same mean and variance. Hope this helps |
Got it. |
Good job. As layer normalization is widely used in the ResNet 3f, I doubt that it's a causal network—‘future’ features are actually included. I've tried to remove all the layer normalizations and the results turn out to be much worse.
Is layer normalization dispensable?
The text was updated successfully, but these errors were encountered: