-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mean aggregator now supports more than scalars. #595
Conversation
self.numerator + numerator_acc, | ||
self.numerator) | ||
initialization_updates = [(numerator_acc, tensor.zeros_like(numerator_acc)), | ||
(denominator_acc, 0.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could allow tensor denominator by using tensor.zeros_like(denominator_acc)
, couldn't we? I can hardly ever imagine who might need it, but it is nice to have complete generality when it comes almost for free, I think.
Very nice PR! You will have to fix lots of code style issues reported here though. A question: could we branch on |
Thanks for the fast review! I'll try branching on the number of elements of the numerator, extending the denominator for tensors and work on the style issues. Mohammed recommended me a sublime text plugin that would check pep8 but I can't remember what it was. Do you know about it? |
I was also thinking that it could be a good idea to change the default behaviour in here: https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L148 |
Sure, good point! I am using |
I'm not sure exactly why, but using numerator_acc.shape.sum() instead of the extra variable gives different results when computing the mean of scalars. In the example I have, instead of giving 35, it gives 11.5 as result. This means that it's just giving the last value (divided by 2 = number of minibatches) and not computing the mean. |
Sure, my bad! For a scalar the shape always sums to zero. Seems like an additional variable is indeed necessary. |
Could you also have a look at what @dwf suggested in here: PR #513 I think this would be the proper way of implementing his idea would be using the aggregators functionality. I'd be willing to implement it. Would you suggest creating a new aggregation MeanAndVariance that deals with this functionality? Would it be better to replace the Mean aggregator so that we don't repeat ourselves? Thanks for your comments! |
…an for multidimensional variables.
Mean aggregator now supports more than scalars.
Good job and thanks! |
Sorry for raising this so late but I have a concern about this work that maybe should be documented or maybe should be changed. I came across this by experimenting with the code. I think that if you use mini batches of different size, like if the last minibatch is of a different size, you can get undesired results or an error. First, this code works well if the monitored quantity is of the same size for all minibatches. So it works well if you are monitoring something like the mean over each minibatch of some quantity, for example each unit in a hidden layer. The problem comes when you monitor the units of the hidden layer instead (not a new theano variable that corresponds to the mean). If the last minibatch doesn't have the same number of examples, 2 things can happen:
I think most users would prefer to not having to specify a new theano variable (the mean of the minibatch) in their code. To deal with this, an approach that I can think of, is to assume that the first dimension of the tensor that represents each variable correspond to the (number of examples)-dimension and to perform the mean also over that dimension. I think this should be discussed further, but at least it should be documented. I'd be willing to document this in the code if you find it appropriate. Sorry for not catching this before. |
Hold on, it does not make sense to aggregate a variable that has different shapes for different batch sizes. Maybe I understand you wrong, could you provide an example to illustrate what you mean in this case? |
Sorry for the delay in answering. What would be the best way to provide this example. Can I email you a commented script with examples of what I mean? Maybe attaching it in the google groups discussion that started this discussion? So that a record of it is public? I think I've found a solution but it depends on assuming that the first dimension of the tensor corresponds to the mini-batch axis, which might not be always the case (this would not be considered in the case of scalars - like the cost, which is usually calculated as the mean over the minibatch). |
You can just post the code here. Gets a bit lengthy, but it's worthwhile to have the code somewhere where it's accessible to everyone (and GitHub does syntax highlighting, unlike Google Groups). |
Alternatively, create a Gist and link to that here. |
I hope this is useful: I would be happy to make the changes necessary for making this work well. Sorry if I'm missing something trivial. In particular, what do you guys think of making the assumption that the first dimension |
Thanks for the example Jose, it seems to me the right way to implement what you want is
That is, one should marginalize out the dimension that varies from batch of batch before attaching the aggregation scheme. |
For tensor variables that seems to work well, I was suggesting changing that here: https://github.com/bartvm/blocks/blob/master/blocks/monitoring/evaluators.py#L145 However, there are 2 caveats:
|
We can not change in Your error is weird, because it says that you some variable equal to If you are aggregating a scalar over a minibatch, then typically you first compute its value for all batch elements. That yields you a vector, e.g. |
This Theano/Theano#2914 solves that problem. |
With these modifications, the mean aggregator should be able to work with more than scalars. I tested it with vectors and it seems to work well. Maybe further tests with matrices or tensors would be appropriate.
This issue is referred here:
Issue #132
I think there might be some issues with formatting and I would REALLY appreciate if you could check the test I provided, its the first time that I wrote a test like this. This is also the first time that I contribute to blocks so it's quite likely that I'll be missing something. I'd be very willing to make all the suggested changes to improve this PR.