New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mean subtraction inside the network declaration (Python) #2465
Comments
Hi @pepetko80, The above code doesn't substract the mean, as you can see, there is no substract operator inside the graph. The following should do the job : def create_my_network(features, num_classes): This way you're substracting from the second dense layer output (h2) the mean taken from this same layer. After what you send the result to the next layer input (h3). Morgan |
Thanks Morgan! Please, forgive the "bug" as I was not focusing on the operation itself. My concern was with the data access. I understand from your response that I do indeed get access to the minibatch data, between two layers, by operating on the second dimension. |
Glad it helped :) |
I am afraid that the problem persists! The code above didn't work since the input to the network is a 1D tensor (effectively a bunch of features for ASR). Then, based on an example in the documentation I tried using cntk.Axis.default_batch_axis. Here is the setup: def create_my_network(features, num_classes)
........................................................... So, is there a way to achieve what I need? Any ideas and possible workarounds are welcome. Thank you! |
Does your 1D input tensor needs the sequence axis ? If no, you can remove the sequence axis from the input definition :
This way features will only have one dynamic axis, the batch one. And the above reduce_mean will work. If you do need the sequence axis, that's a problem... |
Thanks Morgan! This is definitely getting a step closer. I can now pass the mean-subtraction stage in the network declaration and continue towards optimization. However, the trainer now fails. There are no error messages, it simply exits on the first minibatch/utterance optimization round. Could there be something else that needs to be set/reset so that the batch axis becomes the default dynamic axis? Regarding use of the sequence axis, if I understand this correctly, it would be needed if I used multiple utterances in the minibatch, e.g., for a very large minibatch size. In my case I stick to minibatch size which is smaller or equal to the utterance length. As a result, the seq axis is always equal to one. I don't make any explicit use of this axis. |
The features variable definition already sets the batch axis as the only dynamic axis. Should be fine in this direction.
When you say "it exits", does the global Python script exits ? If yes, what is the result code ? Without any message it's quite hard to guess what's going on :( |
Indeed, it's hard to guess. Below is the exception I get, the key being in the first two lines. It seems that the sequence axis should somehow be eliminated completely such that the value rank is maximum +1 bigger than the variable rank. For a reference [1 x 728 x 440] means one sequence/sentence, 728 frames in the batch, 440 features (40 +/- 5x context). Any idea how I could go around this? Value rank (3) should be larger than the Variable rank (1) at most by number of dynamic axes (1); Variable = 'Input('Input3', [#], [440])', Value shape = '[1 x 728 x 440]'. [CALL STACK] |
I still have not found a solution to my problem. Could it be possible to use a custom BatchNormalization layer without learnable parameters? The issue then is that I don't want to touch the variance but only the mean. |
Perhaps, I was looking at it from the wrong angle. Starting from #1855 my current approach is listed below. Can someone comment on caveats with the solution, thank you! Will that subtract the mean of the minibatch from each minibatch member at that particular point in the network? def create_my_network(features, num_classes): # Network declaration
|
Going back to a previous question of mine (#2192), I need to subtract the minibatch mean between two network layers, i.e., the value will be changing as the optimization progresses.
The problem I have is that I don't understand whether I can operate on this axis from where the network is declared. So, (as a dummy example) if I were to do:
def create_my_network(features, num_classes):
h1 = Dense(...)(features)
h2 = Dense(....)(h1)
h2_mr = cntk.reduce_mean(h2, axis=1)
h3 = Dense(...)(h2_mr)
........
would that subtract the minibatch mean at hidden layer two or not?
If not, how can I achieve this?
Thank you kindly!
Petko
The text was updated successfully, but these errors were encountered: