New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpooling layer in tensorflow #632
Comments
For deconv, you can use "conv2d_backprop_input" with stride to achieve similar effect. It is the gradient of the conv with stride. |
my implementation using def unpool(value, name='unpool'):
"""N-dimensional version of the unpooling operation from
https://www.robots.ox.ac.uk/~vgg/rg/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf
:param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
:return: A Tensor of shape [b, 2*d0, 2*d1, ..., 2*dn, ch]
"""
with tf.name_scope(name) as scope:
sh = value.get_shape().as_list()
dim = len(sh[1:-1])
out = (tf.reshape(value, [-1] + sh[-dim:]))
for i in range(dim, 0, -1):
out = tf.concat([out, tf.zeros_like(out)], i)
out_size = [-1] + [s * 2 for s in sh[1:-1]] + [sh[-1]]
out = tf.reshape(out, out_size, name=scope)
return out
def pool(value, name='pool'):
"""Downsampling operation.
:param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
:return: A Tensor of shape [b, d0/2, d1/2, ..., dn/2, ch]
"""
with tf.name_scope(name) as scope:
sh = value.get_shape().as_list()
out = value
for sh_i in sh[1:-1]:
assert sh_i % 2 == 0
for i in range(len(sh[1:-1])):
out = tf.reshape(out, (-1, 2, np.prod(sh[i + 2:])))
out = out[:, 0, :]
out_size = [-1] + [math.ceil(s / 2) for s in sh[1:-1]] + [sh[-1]]
out = tf.reshape(out, out_size, name=scope)
return out |
I've been interested in this as well; currently working on 'what-where' / convolutional autoencoders (ala. Zhao et al.) Thanks @daeyun for the code, I've been trying to figure this out myself. Dosovitskiy uses a kronecker product w/ a block mask (same shape as pooling, all zeros w/ a 1 in the upper left) to unpool. However, as observed in the paper (fig 9) this fails to reconstruct meaningful structure in deeper feature maps. An alternative proposed by Zeiler uses 'switches' (essentially the argmax of the maxpooling operation) to reconstruct using the exact location of the maxima I've been playing around with tf.maxpool_with_argmax in an attempt to reproduce the 'switched' unpooling experiments first explored by Zeiler and extended by Zhao. Any thoughts on how this could be implemented? |
What's the mathematical definition of unpooling? |
The unpooling that I had on my ming is described in here http://www.matthewzeiler.com/pubs/iccv2011/iccv2011.pdf |
@ziky90 That's the gradient of max pooling, which we already have an as op. |
@girving Thank you for pointing me at gradient of max pooling. Though it's really difficult to find it as a gradient of max pooling, plus it's also not much documented. Btw. It seems, that it confuses and makes other people to build custom solutions instead of simply using something like |
Yes, giving it a name like As a tip for the future, though: this is one advantage of trying to understand the mathematical relationship between different operations. Once you know that unpooling is just the gradient of pooling, it's clear that TensorFlow already implements it, even if the name is different from what one might expect. |
Could you share a code example of how to implement unpooling using the gradient of max pooling? |
It's currently hidden as https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_grad.py#L353 There's also |
Any plans to get a unpool layer to tensorflow? @girving as you point out, if the gradient operation already exists, then it doesn't seem like much work to get it working? |
@LeavesBreathe I was wrong initially about how easy it would be, since the gradient operators as written taken the original input. Thus, we probably do need a new exposed op, though it may be able to use the same underlying compute kernels (I'm not sure). |
Are there any performance gain/loss if one uses the second output of |
@syed-ahmed That doesn't work: if you are doing unpooling, you don't start out with an input that you could pass to |
@girving Can we not just save the indices from tf.nn.max_pool_with_argmax during downsampling for reuse during upsampling? We would use the saved argmax indices to inform us where we want the input to the corresponding upsample layer to go. |
@syed-ahmed To clarify, it will work but it's a bit awkward. You can certainly store the indices, but the current The same bug occurred in the initial version of If anyone does this, the new op can be given a nicer name like |
@girving Thanks for clarifying! I totally forgot the case about the gradient. I'll try to fix this issue. |
Hi @girving, could you please tell what error would result with the memory usage bug? Just wanted to clarify, is it a bug because it's not best practice or did you encounter an error during that initial version of conv_3d? I get the following error for the implementation described above with MaxPoolWithArgmax and was wondering if anybody encountered it before:
|
@syed-ahmed It's not an actual error unless you run out of memory. The issue is that if the gradient takes the original input tensor rather than the shape, the original input must be stored for the remainder of the forward pass and the backward pass up to that point. If only the shape is needed, that's a long time to hold onto otherwise unneeded memory. |
@girving Thanks for your reply. I am defining a MaxUnpoolGrad for the corresponding MaxUnpool operation that I have implemented. Following is what I declare as top_offset and bottom_offset for MaxUnpoolGrad:
The correspoding cuda kernel declared in maxpooling_op_gpu.cu.cc is:
My graph builds but it is when the session runs that I get the following error:
I am also returning in nn_grad.py like this:
where:
I have made sure the maxunpooling and its grad operation is taking a input shape rather than a input 4D tensor. Do you know how to debug this cuda errors/any tool that can help in finding the origin of these errors? What does these errors indicate? I read a comment on the maxpooling_op_gpu.cu.cc about racing conditions. Is it anyhow related to this? |
@syed-ahmed Is it possible to use cuDNN for these operations? Writing them yourself will result in very slow code. The same goes for CPU: it would be better to use existing Eigen code if possible. |
@girving Thank you for your reply. I will try implementing the cudnn version once i get this cuda one running. I was able to use cuda-gdb to get some sort of trace where my error is originating from. Here's the output from cuda-gdb:
Here's how it is defined in the cu.cc file:
I am kinda lost since I'm a beginner with cuda. Anybody has any idea what might be going wrong? |
It's impossible to debug this without seeing your code. As a wild guess: maybe you are running GPU kernels on Tensor objects stored on the CPU? |
Hi @girving. Sorry for not posting the full code. I didn't want to lengthen this issue by posting all the code. You can review the changes in this link. I am calling the max unpool like this:
I am not sure if the origin_input_tensor and argmax_tensor objects are in CPU or GPU. The cuda-gdb output of MaxUnpoolForward suggests that "This occurs when any thread within a warp accesses an address that is outside the valid range of local or shared memory regions." gpu error reporting |
Also there is a lot of code duplication in my changes. I can make the unpool op use the same compute kernel. I was just trying out if using the same compute kernel was causing the CUDA error in the version I posted here. |
In the Tensorflow implementation (https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn32_vgg.py) of fully convolutional model (https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf), author define a function of
Looks like author just uses tf.nn.conv2d_transpose to do the upsampling. Is my understanding correct? |
@wenouyang Yes in the FCN in https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf they use only |
Sorry for the delay, taking a look at your code now. |
I must not understand your code. How are you doing an effectively 3D unpooling operation (batch, height, width) with a 1D loop that does only one integer division? One integer division is only powerful enough to express a 2D loop. |
@girving I followed the MaxPoolBackward code in the maxpooling_op_gpu.cu.cc. I thought n-dimensions of the tensor is taken care of by the following in maxpooling_op.cc in the LaunchMaxUnpooling function I defined (like LaunchMaxPoolingGradWithArgmax):
|
@chrisranderson Hi, I'am newer in Deep Learning and Tensorflow. Would you please instruct me more detail about how to implement unpooling by tensorflow/tensorflow#16885? Thanks a lot! |
@XXY0118 Copy and paste these lines https://github.com/rayanelleuch/tensorflow/blob/b46d50583d8f4893f1b1d629d0ac9cb2cff580af/tensorflow/contrib/layers/python/layers/layers.py#L2291-L2327, and you should be good to go. I wish GitHub allowed some kind of DM for occasions like this. |
@daeyun , please swap parameters in Other than that works fine for unpooling without positions indices. Thanks! |
Is there a strided version of unpool function? |
We are going to close this issue. Feel free to reopen it if you want to contribute and link the PR to it. |
A differentiable and GPU-safe def avg_unpool2d(x, factor):
'''
Performs "average un-pooling", i.e. nearest neighbor upsampling,
without the faulty `tf.image.resize_nearest_neighbor` op.
'''
x = tf.transpose(x, [1, 2, 3, 0])
x = tf.expand_dims(x, 0)
x = tf.tile(x, [factor**2, 1, 1, 1, 1])
x = tf.batch_to_space_nd(x, [factor, factor], [[0, 0], [0, 0]])
x = tf.transpose(x[0], [3, 0, 1, 2])
return x |
I believe that a TensorFlow doesn't ask people to implement deconvolution, even though technically it can be expressed as a convolution. Why? It's convenient and it lets researchers focus on more important things. The same goes for unpooling. |
@greydanus @jkyl I'd love to approve a PR adding this max_unpool implementation to tf and a unit test. |
I'm working on a PR + unit test. More to come. |
Reopening so @graydanus's PR can close it |
I revisited the implementations in current thread and found that @rayanelleuch solution from Oct 24, 2017 works the best for me. It works with batches (i.e. first dimension of the input tensor is None), produces known output shape and produces no type errors. I also added tf.keras layers for MaxPoolingWithArgmax and Unpooling (previously mentioned versions did not work for tf.keras but worked with just keras, somehow) here https://github.com/yselivonchyk/Tensorflow_WhatWhereAutoencoder/blob/master/pooling.py |
Hello everybody! As @Panaetius highlighted it, the Unpooling layers presented here have a drawback. They don't account for the padding due to the fact that def max_unpool(pool, ind, prev_tensor, scope='unpool_2d'):
"""
Implement the unpooling operation, as explained here:
https://stackoverflow.com/questions/36548736/tensorflow-unpooling
Args:
pool (tensor): Input tensor of shape (N, H, W, C)
ind (tensor): Input tensor of shape (N, H, W, C) containing the maximum
flatten indices (see https://www.tensorflow.org/api_docs/python/tf.nn.max_pool_with_argmax)
prev_tensor (tensor): previous tensor shape
scope (str): scope in which to register the operations
Return:
ret (tensor): tensor same shape as prev_tensor that corresponds to the "invert" of the
max pooling operation
"""
with tf.variable_scope(scope):
# input_shape = [N, H, W, C]
input_shape = tf.shape(pool)
o_shape = tf.shape(prev_tensor)
output_shape = [input_shape[0], o_shape[1], o_shape[2], input_shape[3]]
# N * H * W * C
flat_input_size = tf.reduce_prod(input_shape)
# flat output_shape = [N, 4 * H * W * C]
flat_output_shape = [output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]
updates = tf.reshape(pool, [flat_input_size])
# create the tensor [ [[[1]]], [[[0]]], ..., [[[N-1]]] ]
batch_range = tf.reshape(
tf.range(tf.cast(output_shape[0], tf.int64), dtype=ind.dtype),
shape=[input_shape[0], 1, 1, 1])
# b is a tensor of size (N, H, W, C) whose first element of the batch are 3D-array full of 0
# second element of the batch are 3D-array full of 1, ...
b = tf.ones_like(ind) * batch_range
b = tf.reshape(b, [flat_input_size, 1])
# indices = [ [0, ind_1], [0, ind_2], ... [0, ind_k], ..., [N-1, ind_{N*H*W*C}], [N-1, ind_{N*H*W*C-1}] ]
indices = tf.reshape(ind, [flat_input_size, 1])
indices = tf.concat([b, indices], axis=-1)
ret = tf.scatter_nd(indices, updates, shape=tf.cast(flat_output_shape, tf.int64))
ret = tf.reshape(ret, output_shape)
set_input_shape = pool.get_shape()
prev_tensor_shape = prev_tensor.get_shape()
set_output_shape = [set_input_shape[0], prev_tensor_shape[1], prev_tensor_shape[2], set_input_shape[3]]
ret.set_shape(set_output_shape)
return ret You can use it as follow: maxpool_layer, maxpool_idx = tf.nn.max_pool_with_argmax(
your_input,
[1, 2, 2, 1], [1, 2, 2, 1],
padding='SAME',
name="max_pooling_5")
conv_layer = tf.layers.conv2d(
maxpool_layer,
filters=4096,
kernel_size=7,
name='conv')
deconv_layer = tf.layers.conv2d(
conv_layer,
filters=512,
kernel_size=1,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
name="deconv")
unpooling_layer5 = max_unpool(deconv_layer, maxpool_idx5, your_input, scope="Unpooling_5") This implemention works well with padding and doesn't need to use set_shape() while reading the tf_records, which means that during the prediction time you can pass one single image at a time (batch_size=1) and have image of totally different sizes and it won't break. |
MaxUnpool is not supported by tensorflow by default. Refer to https://github.com/tensorflow/tensorflow/issues/2169 for more information. The current solution uses proposed code from the above issue with modifications to support for padding and strides.
This sounds like a great feature! Adding support for unpooling is outside of the scope of TensorFlow Core, but would be a fantastic addition to TensorFlow Addons. Transferring this issue now; @seanpmorgan for visibility. |
Thanks for transferring. This seems like a nice fit in addons, though it will need to be converted to fit the Keras Layer API and have appropriate test cases. |
Here it is my implementation also posted stackoverflow. You should apply the max-pooling using tf.nn.max_pool_with_argmax and then pass the
This has a small bug/feature that is that if argmax has a repeated value it will perform an addition instead of just putting the value once. Beware of this if stride is 1. I don't know, however, if this is desired or not. This feature was also present in @Twice22 solution as I based my implementation on his code. |
It would be nice to have in TensorFlow also the unpooling layer as it is described in the paper on deconvolution networks: http://cvlab.postech.ac.kr/research/deconvnet/
I was googling a bit and I found that the added unpooling layer would be handful also for others:
http://stackoverflow.com/questions/36548736/tensorflow-unpooling
The text was updated successfully, but these errors were encountered: