Mean subtraction inside the network declaration (Python) #2465

pnpetkov · 2017-10-09T17:15:59Z

Going back to a previous question of mine (#2192), I need to subtract the minibatch mean between two network layers, i.e., the value will be changing as the optimization progresses.

The problem I have is that I don't understand whether I can operate on this axis from where the network is declared. So, (as a dummy example) if I were to do:

def create_my_network(features, num_classes):
h1 = Dense(...)(features)
h2 = Dense(....)(h1)
h2_mr = cntk.reduce_mean(h2, axis=1)
h3 = Dense(...)(h2_mr)
........

would that subtract the minibatch mean at hidden layer two or not?

If not, how can I achieve this?

Thank you kindly!
Petko

mfuntowicz · 2017-10-09T21:14:57Z

Hi @pepetko80,

The above code doesn't substract the mean, as you can see, there is no substract operator inside the graph.

The following should do the job :

def create_my_network(features, num_classes):
h1 = Dense(...)(features)
h2 = Dense(....)(h1)
h2_mr = h2 - cntk.reduce_mean(h2, axis=1)
h3 = Dense(...)(h2_mr)

This way you're substracting from the second dense layer output (h2) the mean taken from this same layer. After what you send the result to the next layer input (h3).

Morgan

pnpetkov · 2017-10-10T08:31:56Z

Thanks Morgan! Please, forgive the "bug" as I was not focusing on the operation itself. My concern was with the data access.

I understand from your response that I do indeed get access to the minibatch data, between two layers, by operating on the second dimension.

mfuntowicz · 2017-10-10T11:37:08Z

Glad it helped :)

pnpetkov · 2017-10-11T16:31:02Z

I am afraid that the problem persists! The code above didn't work since the input to the network is a 1D tensor (effectively a bunch of features for ASR). Then, based on an example in the documentation I tried using cntk.Axis.default_batch_axis. Here is the setup:

def create_my_network(features, num_classes)
new_features = some_nonlinear_operation(features) # This will contain optimizable parameters
mean_sbtr_ftrs = new_features - cntk.reduce_mean(new_features, cntk.Axis.default_batch_axis())

h1 = Dense(...)(mean_sbtr_ftrs)
h2 = ...
...
return hk

...........................................................
The error I get is as follows:
"RuntimeError: ReduceElements: operand Output('Pow6_Output_0', [#, *], [440]); Reduction along the batch axis on input sequence is currently unsupported."

So, is there a way to achieve what I need? Any ideas and possible workarounds are welcome.

Thank you!

mfuntowicz · 2017-10-11T22:17:26Z

Does your 1D input tensor needs the sequence axis ? If no, you can remove the sequence axis from the input definition :

features = input([shape], dynamic_axes=Axis.default_batch_axis())

This way features will only have one dynamic axis, the batch one. And the above reduce_mean will work.

If you do need the sequence axis, that's a problem...

pnpetkov · 2017-10-12T16:04:28Z

Thanks Morgan! This is definitely getting a step closer. I can now pass the mean-subtraction stage in the network declaration and continue towards optimization. However, the trainer now fails. There are no error messages, it simply exits on the first minibatch/utterance optimization round. Could there be something else that needs to be set/reset so that the batch axis becomes the default dynamic axis?

Regarding use of the sequence axis, if I understand this correctly, it would be needed if I used multiple utterances in the minibatch, e.g., for a very large minibatch size. In my case I stick to minibatch size which is smaller or equal to the utterance length. As a result, the seq axis is always equal to one. I don't make any explicit use of this axis.

mfuntowicz · 2017-10-12T16:19:16Z

Could there be something else that needs to be set/reset so that the batch axis becomes the default dynamic axis?

The features variable definition already sets the batch axis as the only dynamic axis. Should be fine in this direction.

it simply exits on the first minibatch/utterance optimization round

When you say "it exits", does the global Python script exits ? If yes, what is the result code ?
It is possible you're training for only one epoch ?

Without any message it's quite hard to guess what's going on :(

pnpetkov · 2017-10-13T14:23:04Z

Indeed, it's hard to guess. Below is the exception I get, the key being in the first two lines. It seems that the sequence axis should somehow be eliminated completely such that the value rank is maximum +1 bigger than the variable rank. For a reference [1 x 728 x 440] means one sequence/sentence, 728 frames in the batch, 440 features (40 +/- 5x context). Any idea how I could go around this?

Value rank (3) should be larger than the Variable rank (1) at most by number of dynamic axes (1); Variable = 'Input('Input3', [#], [440])', Value shape = '[1 x 728 x 440]'.

[CALL STACK]
[0x7f34d17f99cc] + 0x63c9cc
[0x7f34d18c20e9] CNTK::Utils:: VerifyVariableValueCompatibility (CNTK::Variable const&, std::shared_ptrCNTK::Value const&, CNTK::NDShape*) + 0xa09
[0x7f34d183c6ce] CNTK::CompositeFunction:: InferFreeDimensionsOfArguments (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&) + 0x11e
[0x7f34d184117e] CNTK::CompositeFunction:: Forward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&) + 0x46e
[0x7f34d1803891] CNTK::Function:: Forward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&) + 0x81
[0x7f34d18b633d] CNTK::Trainer:: ExecuteForwardBackward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&) + 0x24d
[0x7f34d18b6e83] CNTK::Trainer:: TrainLocalMinibatch (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, bool, CNTK::DeviceDescriptor const&) + 0xd3
[0x7f34d18b790b] CNTK::Trainer:: TrainMinibatch (std::unordered_map<CNTK::Variable,CNTK::MinibatchData,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,CNTK::MinibatchData>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&) + 0x8b
[0x7f34d18b7ab0] CNTK::Trainer:: TrainMinibatch (std::unordered_map<CNTK::Variable,CNTK::MinibatchData,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,CNTK::MinibatchData>>> const&, CNTK::DeviceDescriptor const&) + 0xa0
[0x7f34d21e52d6] + 0x2172d6
[0x7f3503fc5bc2] PyEval_EvalFrameEx + 0x8992
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc5450] PyEval_EvalFrameEx + 0x8220
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc661b] PyEval_EvalCode + 0x3b
[0x7f3503fb9baa] + 0x10ebaa
[0x7f3503fc5782] PyEval_EvalFrameEx + 0x8552
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc661b] PyEval_EvalCode + 0x3b
[0x7f3503fe26c4] + 0x1376c4
[0x7f3503fe4c0d] PyRun_FileExFlags + 0x9d
[0x7f3503fe63d1] PyRun_SimpleFileExFlags + 0x111
[0x7f3503ffbdfc] Py_Main + 0xd8c
[0x400ad9] main + 0x169
[0x7f35031e4c05] __libc_start_main + 0xf5
[0x400b86]

pnpetkov · 2017-10-17T12:27:18Z

I still have not found a solution to my problem. Could it be possible to use a custom BatchNormalization layer without learnable parameters? The issue then is that I don't want to touch the variance but only the mean.
Thank you!

pnpetkov · 2017-10-26T10:08:43Z

Perhaps, I was looking at it from the wrong angle. Starting from #1855 my current approach is listed below. Can someone comment on caveats with the solution, thank you! Will that subtract the mean of the minibatch from each minibatch member at that particular point in the network?

def create_my_network(features, num_classes): # Network declaration

.....                            # Layer with optimizable parameters

.....                            # Layer with optimizable parameters   

intrm_rprs = ...    # intermediate representation: 440x1 

# To subtract the mean of the minibatch (at this point in the network)
seq_sum = cntk.sequence.reduce_sum(intrm_rprs)
seq_cnt = cntk.sequence.reduce_sum( plus(element_times(intrm_rprs, 0), 1))  # Gives me the number of elements in the sequence (contained by the minibatch)

seq_cnt_rbr = cntk.sequence.broadcast_as(seq_cnt, intrm_rprs)   # Rebroadcast to desired size
seq_sum_rbr = cntk.sequence.broadcast_as(seq_sum, intrm_rprs)

ms_intrm_rprs = minus(intrm_rprs, element_divide(seq_sum_rbr, seq_cnt_rbr))


....... #   more network layers 

return ...

pnpetkov closed this as completed Oct 10, 2017

pnpetkov reopened this Oct 11, 2017

cha-zhang added the question label Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean subtraction inside the network declaration (Python) #2465

Mean subtraction inside the network declaration (Python) #2465

pnpetkov commented Oct 9, 2017

mfuntowicz commented Oct 9, 2017

pnpetkov commented Oct 10, 2017

mfuntowicz commented Oct 10, 2017

pnpetkov commented Oct 11, 2017

mfuntowicz commented Oct 11, 2017 •

edited

pnpetkov commented Oct 12, 2017

mfuntowicz commented Oct 12, 2017 •

edited

pnpetkov commented Oct 13, 2017

pnpetkov commented Oct 17, 2017

pnpetkov commented Oct 26, 2017

Mean subtraction inside the network declaration (Python) #2465

Mean subtraction inside the network declaration (Python) #2465

Comments

pnpetkov commented Oct 9, 2017

mfuntowicz commented Oct 9, 2017

pnpetkov commented Oct 10, 2017

mfuntowicz commented Oct 10, 2017

pnpetkov commented Oct 11, 2017

mfuntowicz commented Oct 11, 2017 • edited

pnpetkov commented Oct 12, 2017

mfuntowicz commented Oct 12, 2017 • edited

pnpetkov commented Oct 13, 2017

pnpetkov commented Oct 17, 2017

pnpetkov commented Oct 26, 2017

mfuntowicz commented Oct 11, 2017 •

edited

mfuntowicz commented Oct 12, 2017 •

edited