Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean aggregator now supports more than scalars. #595

Merged
merged 4 commits into from
Apr 26, 2015
Merged

Mean aggregator now supports more than scalars. #595

merged 4 commits into from
Apr 26, 2015

Conversation

sotelo
Copy link
Member

@sotelo sotelo commented Apr 24, 2015

With these modifications, the mean aggregator should be able to work with more than scalars. I tested it with vectors and it seems to work well. Maybe further tests with matrices or tensors would be appropriate.

This issue is referred here:

Issue #132

I think there might be some issues with formatting and I would REALLY appreciate if you could check the test I provided, its the first time that I wrote a test like this. This is also the first time that I contribute to blocks so it's quite likely that I'll be missing something. I'd be very willing to make all the suggested changes to improve this PR.

self.numerator + numerator_acc,
self.numerator)
initialization_updates = [(numerator_acc, tensor.zeros_like(numerator_acc)),
(denominator_acc, 0.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow tensor denominator by using tensor.zeros_like(denominator_acc), couldn't we? I can hardly ever imagine who might need it, but it is nice to have complete generality when it comes almost for free, I think.

@rizar
Copy link
Contributor

rizar commented Apr 24, 2015

Very nice PR! You will have to fix lots of code style issues reported here though.

A question: could we branch on numerator_acc.shape.sum() instead? This way we would not have an additional shared variable. I guess there is no reason whatsoever for people to use shapes containing zeros in normal conditions.

@sotelo
Copy link
Member Author

sotelo commented Apr 24, 2015

Thanks for the fast review!

I'll try branching on the number of elements of the numerator, extending the denominator for tensors and work on the style issues. Mohammed recommended me a sublime text plugin that would check pep8 but I can't remember what it was. Do you know about it?

@sotelo
Copy link
Member Author

sotelo commented Apr 24, 2015

I was also thinking that it could be a good idea to change the default behaviour in here:

https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L148

@rizar
Copy link
Contributor

rizar commented Apr 24, 2015

I was also thinking that it could be a good idea to change the default behaviour in here:

Sure, good point!

I am using flake8, pip install flake8 in order to check syntax. Using a plugin sounds like a nice idea though :)

@sotelo
Copy link
Member Author

sotelo commented Apr 24, 2015

I'm not sure exactly why, but using numerator_acc.shape.sum() instead of the extra variable gives different results when computing the mean of scalars.

In the example I have, instead of giving 35, it gives 11.5 as result. This means that it's just giving the last value (divided by 2 = number of minibatches) and not computing the mean.

@rizar
Copy link
Contributor

rizar commented Apr 24, 2015

Sure, my bad! For a scalar the shape always sums to zero.

Seems like an additional variable is indeed necessary.

@sotelo sotelo mentioned this pull request Apr 25, 2015
@sotelo
Copy link
Member Author

sotelo commented Apr 25, 2015

Could you also have a look at what @dwf suggested in here:

PR #513

I think this would be the proper way of implementing his idea would be using the aggregators functionality. I'd be willing to implement it. Would you suggest creating a new aggregation MeanAndVariance that deals with this functionality? Would it be better to replace the Mean aggregator so that we don't repeat ourselves?

Thanks for your comments!

@rizar
Copy link
Contributor

rizar commented Apr 25, 2015

I am quite happy with this PR and can merge it. Would you like to change the default from here first?

As for #513, please give me some time to understand the context :)

rizar added a commit that referenced this pull request Apr 26, 2015
Mean aggregator now supports more than scalars.
@rizar rizar merged commit 003b2d6 into mila-iqia:master Apr 26, 2015
@rizar
Copy link
Contributor

rizar commented Apr 26, 2015

Good job and thanks!

@sotelo
Copy link
Member Author

sotelo commented Apr 26, 2015

Sorry for raising this so late but I have a concern about this work that maybe should be documented or maybe should be changed. I came across this by experimenting with the code. I think that if you use mini batches of different size, like if the last minibatch is of a different size, you can get undesired results or an error.

First, this code works well if the monitored quantity is of the same size for all minibatches. So it works well if you are monitoring something like the mean over each minibatch of some quantity, for example each unit in a hidden layer. The problem comes when you monitor the units of the hidden layer instead (not a new theano variable that corresponds to the mean). If the last minibatch doesn't have the same number of examples, 2 things can happen:

  • If it only has 1 example, the broadcasting rules will cause the last example to repeat several times in order to perform the sum to old values.
  • If it has more than 1 example, but not the same number of examples like the others it will raise a Input dimension mismatch error.

I think most users would prefer to not having to specify a new theano variable (the mean of the minibatch) in their code. To deal with this, an approach that I can think of, is to assume that the first dimension of the tensor that represents each variable correspond to the (number of examples)-dimension and to perform the mean also over that dimension. I think this should be discussed further, but at least it should be documented. I'd be willing to document this in the code if you find it appropriate.

Sorry for not catching this before.

@rizar
Copy link
Contributor

rizar commented Apr 26, 2015

Hold on, it does not make sense to aggregate a variable that has different shapes for different batch sizes. Maybe I understand you wrong, could you provide an example to illustrate what you mean in this case?

@sotelo
Copy link
Member Author

sotelo commented Apr 27, 2015

Sorry for the delay in answering. What would be the best way to provide this example. Can I email you a commented script with examples of what I mean? Maybe attaching it in the google groups discussion that started this discussion? So that a record of it is public?

I think I've found a solution but it depends on assuming that the first dimension of the tensor corresponds to the mini-batch axis, which might not be always the case (this would not be considered in the case of scalars - like the cost, which is usually calculated as the mean over the minibatch).

@bartvm
Copy link
Member

bartvm commented Apr 27, 2015

You can just post the code here. Gets a bit lengthy, but it's worthwhile to have the code somewhere where it's accessible to everyone (and GitHub does syntax highlighting, unlike Google Groups).

@bartvm
Copy link
Member

bartvm commented Apr 27, 2015

Alternatively, create a Gist and link to that here.

@sotelo
Copy link
Member Author

sotelo commented Apr 27, 2015

I hope this is useful:
Gist

I would be happy to make the changes necessary for making this work well. Sorry if I'm missing something trivial. In particular, what do you guys think of making the assumption that the first dimension
correspond to the num-examples dimension when aggregating tensors. For scalars, I think that we should adjust for the batch size in some way. This could probably be done using the denominator argument, unfortunately it is not done now by default.

@rizar
Copy link
Contributor

rizar commented Apr 27, 2015

Thanks for the example Jose,

it seems to me the right way to implement what you want is

y.tag.aggregation_scheme = Mean(y.sum(axis=0), y.shape[0])

That is, one should marginalize out the dimension that varies from batch of batch before attaching the aggregation scheme.

@sotelo
Copy link
Member Author

sotelo commented Apr 27, 2015

For tensor variables that seems to work well, I was suggesting changing that here:

https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L145

However, there are 2 caveats:

  • I can run that with no issues on cpu, however on gpu i get: TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None. I think this refers to the y.shape[0] part
    • Since the shape of scalars is not defined in the same way, something else should be done. Maybe using the shape of the inputs instead?

@rizar
Copy link
Contributor

rizar commented Apr 27, 2015

We can not change in evaluators.py because at that point it is not known which dimension stands for the index in the batch. There is no global convention in Blocks for that, at least because for RNN inputs this is typically the second dimension and for feed-forward nets this is typically the first.

Your error is weird, because it says that you some variable equal to None. Can you make a runnable gist that reproduces this error?

If you are aggregating a scalar over a minibatch, then typically you first compute its value for all batch elements. That yields you a vector, e.g. example_costs, for which you can then create an aggregation scheme Mean(example_costs.sum(), example_costs.shape[0]).

@rizar rizar mentioned this pull request May 15, 2015
@sotelo
Copy link
Member Author

sotelo commented May 18, 2015

@rizar Sorry for the SUPER LATE reply.
In this gist, there's an example of the last issue I was explaining. Curiously enough, it gives different results when running on a cpu and gpu. When running on cpu it gives the correct result. However, there's an error in gpu.

@sotelo
Copy link
Member Author

sotelo commented May 18, 2015

This Theano/Theano#2914 solves that problem.

@sotelo sotelo deleted the fix_aggregation branch October 27, 2015 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants