GitHub - OwalnutO/spatial_transformer_network: Tensorflow Implementation of Spatial Transformer Networks

Spatial Transformer Networks

This is a Tensorflow implementation of Spatial Transformer Networks by Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu.

Spatial Transformer Networks (STN) is a differentiable module that can be inserted anywhere in ConvNet architecture to increase its geometric invariance. It effectively gives the network the ability to spatially transform feature maps at no extra data or supervision cost.

To-DO

fixed slicing issue which was causing incorrect output
add option to upsample or downsample output image
add option to restrict transformation to "attention"

Background Information

The STN is composed of 3 elements.

localization network: takes the feature map as input and outputs the parameters of the affine transformation that should be applied to that feature map.
grid generator: generates a grid of (x,y) coordinates using the parameters of the affine transformation that correspond to a set of points where the input feature map should be sampled to produce the transformed output feature map.
bilinear sampler: takes as input the input feature map and the grid generated by the grid generator and produces the output feature map using bilinear interpolation.

The affine transformation has been constrained to one of " attention ". It allows cropping, translation and isotropic scaling through a 6 parameter transformation matrix.

API

Calling the STN layer is done as follows:

out = spatial_transformer_network(input_feature_map, theta, out_dims)

Parameters

input_feature_map: the output of the layer preceding the localization network. If the STN layer is the first layer of the network, then this corresponds to the input images. Shape should be (B, H, W, C).
theta: this is the output of the localization network. Shape should be (B, 6)
out_dims: desired (H, W) of the output feature map. Useful for upsampling or downsampling. If not specified, then output dimensions will be equal to input_feature_map dimensions.

Note

You must define a localization network right before using this layer. The localization network is usually a ConvNet or a FC-net that has 6 output nodes (the 6 parameters of the affine transformation).

You can initialize the localization network to the identity transform before starting the training process. Here's a small sample code for illustration purposes.

# params
n_fc = 6
B, H, W, C = (2, 200, 200, 3)

# identity transform
initial = np.array([[1., 0, 0], [0, 1., 0]])
initial = initial.astype('float32').flatten()

# input placeholder
x = tf.placeholder(tf.float32, [B, H, W, C])

# localisation network
W_fc1 = tf.Variable(tf.zeros([H*W*C, n_fc]), name='W_fc1')
b_fc1 = tf.Variable(initial_value=initial, name='b_fc1')
h_fc1 = tf.matmul(tf.zeros([B, H*W*C]), W_fc1) + b_fc1

# spatial transformer layer
h_trans = spatial_transformer_network(x, h_fc1)

Requirements

Attribution

Torch Blog Post on STN's
daviddao's Tensorflow Implementation
Shoutout to Eder Santana for introducing and helping me understand the paper!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
img		img
utils		utils
README.md		README.md
sanity_check.py		sanity_check.py
spatial_transformer.py		spatial_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial Transformer Networks

To-DO

Background Information

API

Requirements

Attribution

About

Uh oh!

Releases

Packages

Languages

OwalnutO/spatial_transformer_network

Folders and files

Latest commit

History

Repository files navigation

Spatial Transformer Networks

To-DO

Background Information

API

Requirements

Attribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages