Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean subtraction inside the network declaration (Python) #2465

Open
pnpetkov opened this issue Oct 9, 2017 · 10 comments
Open

Mean subtraction inside the network declaration (Python) #2465

pnpetkov opened this issue Oct 9, 2017 · 10 comments
Labels

Comments

@pnpetkov
Copy link

pnpetkov commented Oct 9, 2017

Going back to a previous question of mine (#2192), I need to subtract the minibatch mean between two network layers, i.e., the value will be changing as the optimization progresses.

The problem I have is that I don't understand whether I can operate on this axis from where the network is declared. So, (as a dummy example) if I were to do:

def create_my_network(features, num_classes):
h1 = Dense(...)(features)
h2 = Dense(....)(h1)
h2_mr = cntk.reduce_mean(h2, axis=1)
h3 = Dense(...)(h2_mr)
........

would that subtract the minibatch mean at hidden layer two or not?

If not, how can I achieve this?

Thank you kindly!
Petko

@mfuntowicz
Copy link

Hi @pepetko80,

The above code doesn't substract the mean, as you can see, there is no substract operator inside the graph.

The following should do the job :

def create_my_network(features, num_classes):
h1 = Dense(...)(features)
h2 = Dense(....)(h1)
h2_mr = h2 - cntk.reduce_mean(h2, axis=1)
h3 = Dense(...)(h2_mr)

This way you're substracting from the second dense layer output (h2) the mean taken from this same layer. After what you send the result to the next layer input (h3).

Morgan

@pnpetkov
Copy link
Author

Thanks Morgan! Please, forgive the "bug" as I was not focusing on the operation itself. My concern was with the data access.

I understand from your response that I do indeed get access to the minibatch data, between two layers, by operating on the second dimension.

@mfuntowicz
Copy link

Glad it helped :)

@pnpetkov
Copy link
Author

I am afraid that the problem persists! The code above didn't work since the input to the network is a 1D tensor (effectively a bunch of features for ASR). Then, based on an example in the documentation I tried using cntk.Axis.default_batch_axis. Here is the setup:

def create_my_network(features, num_classes)
new_features = some_nonlinear_operation(features) # This will contain optimizable parameters
mean_sbtr_ftrs = new_features - cntk.reduce_mean(new_features, cntk.Axis.default_batch_axis())

h1 = Dense(...)(mean_sbtr_ftrs)
h2 = ...
...
return hk

...........................................................
The error I get is as follows:
"RuntimeError: ReduceElements: operand Output('Pow6_Output_0', [#, *], [440]); Reduction along the batch axis on input sequence is currently unsupported."

So, is there a way to achieve what I need? Any ideas and possible workarounds are welcome.

Thank you!

@pnpetkov pnpetkov reopened this Oct 11, 2017
@mfuntowicz
Copy link

mfuntowicz commented Oct 11, 2017

Does your 1D input tensor needs the sequence axis ? If no, you can remove the sequence axis from the input definition :

features = input([shape], dynamic_axes=Axis.default_batch_axis())

This way features will only have one dynamic axis, the batch one. And the above reduce_mean will work.

If you do need the sequence axis, that's a problem...

@pnpetkov
Copy link
Author

Thanks Morgan! This is definitely getting a step closer. I can now pass the mean-subtraction stage in the network declaration and continue towards optimization. However, the trainer now fails. There are no error messages, it simply exits on the first minibatch/utterance optimization round. Could there be something else that needs to be set/reset so that the batch axis becomes the default dynamic axis?

Regarding use of the sequence axis, if I understand this correctly, it would be needed if I used multiple utterances in the minibatch, e.g., for a very large minibatch size. In my case I stick to minibatch size which is smaller or equal to the utterance length. As a result, the seq axis is always equal to one. I don't make any explicit use of this axis.

@mfuntowicz
Copy link

mfuntowicz commented Oct 12, 2017

Could there be something else that needs to be set/reset so that the batch axis becomes the default dynamic axis?

The features variable definition already sets the batch axis as the only dynamic axis. Should be fine in this direction.

it simply exits on the first minibatch/utterance optimization round

When you say "it exits", does the global Python script exits ? If yes, what is the result code ?
It is possible you're training for only one epoch ?

Without any message it's quite hard to guess what's going on :(

@pnpetkov
Copy link
Author

Indeed, it's hard to guess. Below is the exception I get, the key being in the first two lines. It seems that the sequence axis should somehow be eliminated completely such that the value rank is maximum +1 bigger than the variable rank. For a reference [1 x 728 x 440] means one sequence/sentence, 728 frames in the batch, 440 features (40 +/- 5x context). Any idea how I could go around this?

Value rank (3) should be larger than the Variable rank (1) at most by number of dynamic axes (1); Variable = 'Input('Input3', [#], [440])', Value shape = '[1 x 728 x 440]'.

[CALL STACK]
[0x7f34d17f99cc] + 0x63c9cc
[0x7f34d18c20e9] CNTK::Utils:: VerifyVariableValueCompatibility (CNTK::Variable const&, std::shared_ptrCNTK::Value const&, CNTK::NDShape*) + 0xa09
[0x7f34d183c6ce] CNTK::CompositeFunction:: InferFreeDimensionsOfArguments (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&) + 0x11e
[0x7f34d184117e] CNTK::CompositeFunction:: Forward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&) + 0x46e
[0x7f34d1803891] CNTK::Function:: Forward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&, std::unordered_set<CNTK::Variable,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocatorCNTK::Variable> const&) + 0x81
[0x7f34d18b633d] CNTK::Trainer:: ExecuteForwardBackward (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&) + 0x24d
[0x7f34d18b6e83] CNTK::Trainer:: TrainLocalMinibatch (std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, bool, CNTK::DeviceDescriptor const&) + 0xd3
[0x7f34d18b790b] CNTK::Trainer:: TrainMinibatch (std::unordered_map<CNTK::Variable,CNTK::MinibatchData,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,CNTK::MinibatchData>>> const&, std::unordered_map<CNTK::Variable,std::shared_ptrCNTK::Value,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,std::shared_ptrCNTK::Value>>>&, CNTK::DeviceDescriptor const&) + 0x8b
[0x7f34d18b7ab0] CNTK::Trainer:: TrainMinibatch (std::unordered_map<CNTK::Variable,CNTK::MinibatchData,std::hashCNTK::Variable,std::equal_toCNTK::Variable,std::allocator<std::pair<CNTK::Variable const,CNTK::MinibatchData>>> const&, CNTK::DeviceDescriptor const&) + 0xa0
[0x7f34d21e52d6] + 0x2172d6
[0x7f3503fc5bc2] PyEval_EvalFrameEx + 0x8992
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc5450] PyEval_EvalFrameEx + 0x8220
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc661b] PyEval_EvalCode + 0x3b
[0x7f3503fb9baa] + 0x10ebaa
[0x7f3503fc5782] PyEval_EvalFrameEx + 0x8552
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc4022] PyEval_EvalFrameEx + 0x6df2
[0x7f3503fc654e] PyEval_EvalCodeEx + 0x88e
[0x7f3503fc661b] PyEval_EvalCode + 0x3b
[0x7f3503fe26c4] + 0x1376c4
[0x7f3503fe4c0d] PyRun_FileExFlags + 0x9d
[0x7f3503fe63d1] PyRun_SimpleFileExFlags + 0x111
[0x7f3503ffbdfc] Py_Main + 0xd8c
[0x400ad9] main + 0x169
[0x7f35031e4c05] __libc_start_main + 0xf5
[0x400b86]

@pnpetkov
Copy link
Author

I still have not found a solution to my problem. Could it be possible to use a custom BatchNormalization layer without learnable parameters? The issue then is that I don't want to touch the variance but only the mean.
Thank you!

@pnpetkov
Copy link
Author

Perhaps, I was looking at it from the wrong angle. Starting from #1855 my current approach is listed below. Can someone comment on caveats with the solution, thank you! Will that subtract the mean of the minibatch from each minibatch member at that particular point in the network?

def create_my_network(features, num_classes): # Network declaration

.....                            # Layer with optimizable parameters

.....                            # Layer with optimizable parameters   

intrm_rprs = ...    # intermediate representation: 440x1 

# To subtract the mean of the minibatch (at this point in the network)
seq_sum = cntk.sequence.reduce_sum(intrm_rprs)
seq_cnt = cntk.sequence.reduce_sum( plus(element_times(intrm_rprs, 0), 1))  # Gives me the number of elements in the sequence (contained by the minibatch)

seq_cnt_rbr = cntk.sequence.broadcast_as(seq_cnt, intrm_rprs)   # Rebroadcast to desired size
seq_sum_rbr = cntk.sequence.broadcast_as(seq_sum, intrm_rprs)

ms_intrm_rprs = minus(intrm_rprs, element_divide(seq_sum_rbr, seq_cnt_rbr))


....... #   more network layers 

return ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants