Mean aggregator now supports more than scalars. #595

sotelo · 2015-04-24T16:07:59Z

With these modifications, the mean aggregator should be able to work with more than scalars. I tested it with vectors and it seems to work well. Maybe further tests with matrices or tensors would be appropriate.

This issue is referred here:

Issue #132

I think there might be some issues with formatting and I would REALLY appreciate if you could check the test I provided, its the first time that I wrote a test like this. This is also the first time that I contribute to blocks so it's quite likely that I'll be missing something. I'd be very willing to make all the suggested changes to improve this PR.

rizar · 2015-04-24T16:16:50Z

blocks/monitoring/aggregation.py

+                                    self.numerator + numerator_acc,
+                                    self.numerator)
+        initialization_updates = [(numerator_acc, tensor.zeros_like(numerator_acc)),
+                                  (denominator_acc, 0.0),


We could allow tensor denominator by using tensor.zeros_like(denominator_acc), couldn't we? I can hardly ever imagine who might need it, but it is nice to have complete generality when it comes almost for free, I think.

rizar · 2015-04-24T16:24:44Z

Very nice PR! You will have to fix lots of code style issues reported here though.

A question: could we branch on numerator_acc.shape.sum() instead? This way we would not have an additional shared variable. I guess there is no reason whatsoever for people to use shapes containing zeros in normal conditions.

sotelo · 2015-04-24T16:33:41Z

Thanks for the fast review!

I'll try branching on the number of elements of the numerator, extending the denominator for tensors and work on the style issues. Mohammed recommended me a sublime text plugin that would check pep8 but I can't remember what it was. Do you know about it?

sotelo · 2015-04-24T16:38:29Z

I was also thinking that it could be a good idea to change the default behaviour in here:

https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L148

rizar · 2015-04-24T16:41:29Z

I was also thinking that it could be a good idea to change the default behaviour in here:

Sure, good point!

I am using flake8, pip install flake8 in order to check syntax. Using a plugin sounds like a nice idea though :)

sotelo · 2015-04-24T17:46:08Z

I'm not sure exactly why, but using numerator_acc.shape.sum() instead of the extra variable gives different results when computing the mean of scalars.

In the example I have, instead of giving 35, it gives 11.5 as result. This means that it's just giving the last value (divided by 2 = number of minibatches) and not computing the mean.

rizar · 2015-04-24T18:27:55Z

Sure, my bad! For a scalar the shape always sums to zero.

Seems like an additional variable is indeed necessary.

sotelo · 2015-04-25T00:43:22Z

Could you also have a look at what @dwf suggested in here:

PR #513

I think this would be the proper way of implementing his idea would be using the aggregators functionality. I'd be willing to implement it. Would you suggest creating a new aggregation MeanAndVariance that deals with this functionality? Would it be better to replace the Mean aggregator so that we don't repeat ourselves?

Thanks for your comments!

rizar · 2015-04-25T11:28:30Z

I am quite happy with this PR and can merge it. Would you like to change the default from here first?

As for #513, please give me some time to understand the context :)

…an for multidimensional variables.

Mean aggregator now supports more than scalars.

rizar · 2015-04-26T11:49:30Z

Good job and thanks!

sotelo · 2015-04-26T16:48:26Z

Sorry for raising this so late but I have a concern about this work that maybe should be documented or maybe should be changed. I came across this by experimenting with the code. I think that if you use mini batches of different size, like if the last minibatch is of a different size, you can get undesired results or an error.

First, this code works well if the monitored quantity is of the same size for all minibatches. So it works well if you are monitoring something like the mean over each minibatch of some quantity, for example each unit in a hidden layer. The problem comes when you monitor the units of the hidden layer instead (not a new theano variable that corresponds to the mean). If the last minibatch doesn't have the same number of examples, 2 things can happen:

If it only has 1 example, the broadcasting rules will cause the last example to repeat several times in order to perform the sum to old values.
If it has more than 1 example, but not the same number of examples like the others it will raise a Input dimension mismatch error.

I think most users would prefer to not having to specify a new theano variable (the mean of the minibatch) in their code. To deal with this, an approach that I can think of, is to assume that the first dimension of the tensor that represents each variable correspond to the (number of examples)-dimension and to perform the mean also over that dimension. I think this should be discussed further, but at least it should be documented. I'd be willing to document this in the code if you find it appropriate.

Sorry for not catching this before.

rizar · 2015-04-26T19:47:10Z

Hold on, it does not make sense to aggregate a variable that has different shapes for different batch sizes. Maybe I understand you wrong, could you provide an example to illustrate what you mean in this case?

sotelo · 2015-04-27T17:45:09Z

Sorry for the delay in answering. What would be the best way to provide this example. Can I email you a commented script with examples of what I mean? Maybe attaching it in the google groups discussion that started this discussion? So that a record of it is public?

I think I've found a solution but it depends on assuming that the first dimension of the tensor corresponds to the mini-batch axis, which might not be always the case (this would not be considered in the case of scalars - like the cost, which is usually calculated as the mean over the minibatch).

bartvm · 2015-04-27T17:51:57Z

You can just post the code here. Gets a bit lengthy, but it's worthwhile to have the code somewhere where it's accessible to everyone (and GitHub does syntax highlighting, unlike Google Groups).

bartvm · 2015-04-27T17:52:49Z

Alternatively, create a Gist and link to that here.

sotelo · 2015-04-27T18:59:52Z

I hope this is useful:
Gist

I would be happy to make the changes necessary for making this work well. Sorry if I'm missing something trivial. In particular, what do you guys think of making the assumption that the first dimension
correspond to the num-examples dimension when aggregating tensors. For scalars, I think that we should adjust for the batch size in some way. This could probably be done using the denominator argument, unfortunately it is not done now by default.

rizar · 2015-04-27T19:04:42Z

Thanks for the example Jose,

it seems to me the right way to implement what you want is

y.tag.aggregation_scheme = Mean(y.sum(axis=0), y.shape[0])

That is, one should marginalize out the dimension that varies from batch of batch before attaching the aggregation scheme.

sotelo · 2015-04-27T19:32:33Z

For tensor variables that seems to work well, I was suggesting changing that here:

https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L145

However, there are 2 caveats:

I can run that with no issues on cpu, however on gpu i get: TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None. I think this refers to the y.shape[0] part
- Since the shape of scalars is not defined in the same way, something else should be done. Maybe using the shape of the inputs instead?

rizar · 2015-04-27T19:47:50Z

We can not change in evaluators.py because at that point it is not known which dimension stands for the index in the batch. There is no global convention in Blocks for that, at least because for RNN inputs this is typically the second dimension and for feed-forward nets this is typically the first.

Your error is weird, because it says that you some variable equal to None. Can you make a runnable gist that reproduces this error?

If you are aggregating a scalar over a minibatch, then typically you first compute its value for all batch elements. That yields you a vector, e.g. example_costs, for which you can then create an aggregation scheme Mean(example_costs.sum(), example_costs.shape[0]).

sotelo · 2015-05-18T15:48:33Z

@rizar Sorry for the SUPER LATE reply.
In this gist, there's an example of the last issue I was explaining. Curiously enough, it gives different results when running on a cpu and gpu. When running on cpu it gives the correct result. However, there's an error in gpu.

sotelo · 2015-05-18T15:54:56Z

This Theano/Theano#2914 solves that problem.

Mean aggregator now supports more than scalars.

d5bdd31

rizar reviewed Apr 24, 2015
View reviewed changes

Changed the test file and format changes

e0d9fff

Added support for tensors in denominator.

a44e5a8

sotelo mentioned this pull request Apr 25, 2015

WIP: Batch normalization #513

Closed

Change the default behaviour in AggregationBuffer from TakeLast to Me…

b770146

…an for multidimensional variables.

rizar added a commit that referenced this pull request Apr 26, 2015

Merge pull request #595 from jimsotelo/fix_aggregation

003b2d6

Mean aggregator now supports more than scalars.

rizar merged commit 003b2d6 into mila-iqia:master Apr 26, 2015

rizar mentioned this pull request Apr 26, 2015

Aggregation Schemes Fail to Work with Tensors. #132

Closed

rizar mentioned this pull request May 15, 2015

Optimization Failure #633

Closed

sotelo deleted the fix_aggregation branch October 27, 2015 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean aggregator now supports more than scalars. #595

Mean aggregator now supports more than scalars. #595

sotelo commented Apr 24, 2015

rizar Apr 24, 2015

rizar commented Apr 24, 2015

sotelo commented Apr 24, 2015

sotelo commented Apr 24, 2015

rizar commented Apr 24, 2015

sotelo commented Apr 24, 2015

rizar commented Apr 24, 2015

sotelo commented Apr 25, 2015

rizar commented Apr 25, 2015

rizar commented Apr 26, 2015

sotelo commented Apr 26, 2015

rizar commented Apr 26, 2015

sotelo commented Apr 27, 2015

bartvm commented Apr 27, 2015

bartvm commented Apr 27, 2015

sotelo commented Apr 27, 2015

rizar commented Apr 27, 2015

sotelo commented Apr 27, 2015

rizar commented Apr 27, 2015

sotelo commented May 18, 2015

sotelo commented May 18, 2015

Mean aggregator now supports more than scalars. #595

Mean aggregator now supports more than scalars. #595

Conversation

sotelo commented Apr 24, 2015

rizar Apr 24, 2015

Choose a reason for hiding this comment

rizar commented Apr 24, 2015

sotelo commented Apr 24, 2015

sotelo commented Apr 24, 2015

rizar commented Apr 24, 2015

sotelo commented Apr 24, 2015

rizar commented Apr 24, 2015

sotelo commented Apr 25, 2015

rizar commented Apr 25, 2015

rizar commented Apr 26, 2015

sotelo commented Apr 26, 2015

rizar commented Apr 26, 2015

sotelo commented Apr 27, 2015

bartvm commented Apr 27, 2015

bartvm commented Apr 27, 2015

sotelo commented Apr 27, 2015

rizar commented Apr 27, 2015

sotelo commented Apr 27, 2015

rizar commented Apr 27, 2015

sotelo commented May 18, 2015

sotelo commented May 18, 2015