-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer #361
Transformer #361
Conversation
I cleaned up the docs and added some very simple tests. |
Thanks. I'll wait for #364 to be merged |
Merged! If you want, you can rebase your branch now:
|
And no doc errors, except for my own... |
The network that calculates the parameters of the affine | ||
transformation. See the example for how to initialize to the identity | ||
transform. Note that all parameters are clipped to lie in the range | ||
``[-1, 1]``, similar to [1]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean you're clipping the values in the code? Wouldn't it be more efficient to just require the localization_network
to output values in [-1, 1] in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you could do that. But if you happen to output values larger than -1,+1 i think you would get indexing errors which are pretty hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But i'm very open to suggestions. Not sure if i have done it in the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an option clip_transform=True
that clips transformation parameters when activated (which should be the default), but that can be disabled for improved performance if your network has tanh output, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but i think that performance drop from using clipping is negligible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be. Let's just leave that for later (i.e., a future PR, if it turns out to be worth it)!
Just realized that what if have said about the range of the affine parameters was not correct. The parameters are free to be any value. The clips are to ensure that we do not look up illegal indices. I removed the references to [-1, +1] from the documetation. ] |
def get_output_shape_for(self, input_shapes): | ||
shp = input_shapes[0] | ||
dsf = self.downsample_factor | ||
return shp[:2] + tuple(int(s//dsf) for s in shp[2:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that this won't work with Nones. You'd need:
return (shp[:2] + tuple(None if s is None
else int(s//dsf) for s in shp[2:]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we should't use integer division if we allow floats for downsample factors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non integer downsample factor i just floor the output shapes. Is that the question?
Great, I like your documentation extensions! Just two minor things, and then I'd be all for squashing and merging! |
I think this is done if any one else whant to have a look? I would like to test the code on MNIST before I merge it. I have used the code at it worked fine, but just in case I have messed something up :) |
input : a :class:`Layer` instance | ||
The input where the affine transformation is applied. This should | ||
have convolution format, i.e. (num_batch, channels, height, width). | ||
localization_network: : a :class:`Layer` instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a colon too much (the one in localization_network:
). Didn't see that before, sorry. Guess it might confuse Sphinx. Feel free to amend your commit if you have the time, otherwise we can leave that for some general docstring proofreading sometime.
Looks good except for one minor thing I've overlooked before, sorry!
When you're testing again on MNIST, could you try replacing the integer Let me know when you've tested on MNIST and are fine with this PR to be merged! |
I fixed the last docstring error. Using I'll get back when i have run the MNIST test. I'll probably add a recipe to reproduce the results in the paper. |
Cool! Feel free to commit / squash / push so Travis can run already. (Just to be sure, note that there are two occurrences of |
Any news? Can we merge this today, or do you still want to test something? |
No. Im still in Montreal but i’m going home today. I’ll be able to test it in the weekend. On 13 Aug 2015, at 08:20, Jan Schlüter notifications@github.com wrote:
|
I ran a test on cluttered MNIST and the layer seems to be working fine. I'm fine with merging. The experiment is here: https://github.com/skaae/Recipes/blob/spatial_transform/examples/spatial_transformer_network.ipynb The last plot shows that its does zoom as expected. |
Cool, thanks a lot! Will merge this after #382. Your recipe looks very nice as well. |
Thinking again, didn't we want to move that to Do you want to rename it here (and squash or amend), or shall we merge #373, rebase yours and move it? I guess the former would be less work for you, then I can care about rebasing #373. |
All right, merged it. Rebasing this one should just trigger trivial conflicts in |
I moved it to special.py . Hopefully Travis passes :) |
Cool, that was fast! Looks good to me, let's see what Travis and Coveralls think! |
Merging, thanks a lot! We can now figure out if there are ways to make it faster if the image shape is known at compile time (which is probably a common use case). |
Spatial transformer Layer
TODO/questions