Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize bilinear interpolation filler to N-D multilinear/multicubic/lanczos filler #4198

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

christianpayer
Copy link

This branch implements an n-dimensional generalization of the bilinear filler (#2213) and adds cubic and lanczos fillers.

It makes #3984 obsolete, as it uses a new common base class for all interpolation fillers. The base class InterpolationFillerBase calculates the weight values by calling the virtual interpolation functions of its derived classes. This PR implements linear, cubic and lanczos interpolation fillers, while additional interpolation functions (e.g. Hermite, Mitchell, Gaussian) may be implemented easily by deriving the class InterpolationFillerBase and implementing the virtual functions f() and support().

I have written a simple test script in python, which tests the fillers with various scaling factors in x and y. I will upload it after some cleanup.

If you have comments/suggestions, let me know!

Christian Payer added 4 commits May 23, 2016 13:59
add base class InterpolationFillerBase that implements common interpolation function calculations
the derived fillers support N dimensions and different integer scaling factors per dimension
change 'linear' filler to 'multilinear' filler

add generic interpolation weight filler

add base class InterpolationFillerBase that implements common interpolation function calculations
change 'linear' filler to 'multilinear' filler
add 'cubic', 'lanczos2', 'lanczos3' and 'lanczos4' fillers
all fillers support N dimensions and different integer scaling factors per dimension

remove everything except multilinear
@christianpayer
Copy link
Author

This jupyter notebook tests the new fillers:
http://nbviewer.jupyter.org/gist/christianpayer/3ee54b5c6463927410fff0fbc7cf556d

@Coldmooon
Copy link

Coldmooon commented May 24, 2016

Can this PR be used to downsample a feature map with a fractional factor?

@christianpayer
Copy link
Author

@Coldmooon The fillers can also be used to downsample a feature map with fractional factors (1/2, 1/3, 1/4,...), as they work with Deconvolution and Convolution layers, where the strides define the up- and downsampling factors, respectively.
As these fillers are originally intended to be used as deconvolution fillers, the results for convolution may not give the desired results, i.e. the resulting feature map values are scaled by factor_x * factor_y. So if you use these fillers for downsampling, you need to append an additional layer (e.g. PowerLayer) that scales the values with 1 / (factor_x * factor_y) in order to get the same results as with Matlab, etc.

@Coldmooon
Copy link

Coldmooon commented May 29, 2016

@christianpayer Thank you very much~! Recently, I've been studying resampling by caffe. Your PR helps me a lot. I modified Test_interpolation_fillers.ipynb a little to test downsampling. The code is:

def get_kernel(factor, support):
    return 2 * support * factor - factor % 2;
def get_pad(factor, support):
    return ((2 * support - 1) * factor - factor % 2) / 2.

def downsample_net_file(in_size, factor, method, support):
    kernel = (get_kernel(factor[0], support), get_kernel(factor[1], support))
    pad = (get_pad(factor[0], support), get_pad(factor[1], support))
    stride = (factor[0], factor[1])
    scale_factor = 1/(stride[0]*stride[1])
    with tempfile.NamedTemporaryFile(mode='w+', delete=False) as f:
        f.write("""name: 'pythonnet' force_backward: true
          input: 'data' input_shape { dim: %d dim: %d dim: %d dim: %d }
          layer { type: 'Convolution' name: 'downsample' bottom: 'data' top: 'downsample'
          convolution_param { kernel_size: %d kernel_size: %d stride: %d stride: %d pad: %d pad: %d
          num_output: %d group: %d weight_filler: { type: '%s' } bias_term: false } } \n

          layer { type: 'Power' name: 'scale' bottom: 'downsample' top: 'downsample'
          power_param {scale: %f } }""" % (
          in_size[0], in_size[1], in_size[2], in_size[3],
          kernel[0], kernel[1], stride[0], stride[1], pad[0], pad[1],
          in_size[1], in_size[1], method, scale_factor))
    return f.name

Using multilinear mode, the above network can output a 180*240 downsampled image which looks very well. But the comparison between out and reference is NOK. The code is:

img_blob = img.reshape(1, *img.shape).transpose(0, 3, 1, 2)
support = 1
factor = (2, 2)
net_file = downsample_net_file(img_blob.shape, factor, 'multilinear', support)
net = caffe.Net(net_file, caffe.TEST)
os.remove(net_file)
net.blobs['data'].data[...] = img_blob
net.forward()
out = net.blobs['downsample'].data[0].transpose(1, 2, 0)
reference = skimage.transform.rescale(img, (0.5,0.5), mode='constant', order=1, clip=False, cval=0)

if numpy.allclose(out, reference, atol=1e-05):
    print("factor %d %d: OK" % factor)
else:
    print("factor %d %d: NOK" % factor)

A further question is what if non-integer factor? For example, 1.4286 for upsample and 0.7143 for downsample. Fractional stride will be rounded down in caffe. In these cases, I use another way to compute kernel_size, stride, and pad. But I don't know if it's correct.

# factor: 1.4286; upsample image from 7*7 to 10*10
# solution:
# computer output_size first: 
# output_size = factor * input_size = 7 * 1.4286 = 10
# Then, set kernel_size, stride, pad accordingly:
# (ouput_size + 2*pad - kernel_size)/stride + 1 = input_size
# we obtain: kernel_size = 4; stride = 1; pad = 0;
# The final prototxt is:
layer {
  name: "upsample-7to10"
  type: "Deconvolution"
  bottom: "data"
  top: "upsample"
  param { 
    lr_mult: 0 
    decay_mult: 0 
  }
  convolution_param {
    kernel_size: 4
    stride: 1
    num_output: 3
    group: 3
    weight_filler { 
      type: "multilinear" 
    } 
    bias_term: false
  }
}

# factor: 0.7143; image size: 7*7 -> 5*5
# solution:
# output_size = factor * input_size = 0.7143 * 7 = 5
# Then, set kernel_size, stride, pad accordingly:
# (input_size + 2*pad - kernel_size)/stride + 1 = output_size
# we obtain: kernel_size = 3; stride = 1; pad = 0;
# The final prototxt is:
layer {
  name: "downsample-7to5"
  type: "Convolution"
  bottom: "data"
  top: "downsample"
  param { 
    lr_mult: 0 
    decay_mult: 0 
  }
  convolution_param {
    kernel_size: 3
    stride: 1
    num_output: 3
    group: 3
    weight_filler { 
      type: "multilinear" 
    } 
    bias_term: false
  }
}

@ajtulloch
Copy link
Contributor

Really nice work! Could you add some unit tests, following the existing examples in test_filler.cpp?

@christianpayer
Copy link
Author

@Coldmooon Your changes for downsampling look OK. You are right, there is a difference between the downsampling of this PR and the reference created with skimage. I don't know exactly, how skimage performs downsampling, as it can be implemented in multiple ways. Even if you compare the outputs of skimage's linear/cubic resampling and opencv's, you will see many differences.
With this PR I tried to resample (almost) exactly like Matlab's imresize. So if you compare the outputs pixel by pixel, you should compare them with Matlab (and hopefully see no difference).

Regarding your question of non-integer factors, this is not possible with a caffe convolution. With an integer factor, you will get the same kernel for every x/y-coordinate of the image. With non-integer factors, this is not the case. You would need different kernels for different coordinates of the image, which is not possible with the convolution layer.

If you want to get more insights into this, I suggest you to debug Matlab's imresize and look for the function 'contributions' and compare the outputs for integer and non-integer factors.

@christianpayer
Copy link
Author

@ajtulloch Thanks! I can add some unit test, but I don't know, what should be tested. I could check, whether the kernels sum up to a fixed value. Or I could compare with hardcoded weights (possibly created from Matlab?) and check for differences.
What would you suggest?

@Coldmooon
Copy link

Coldmooon commented Jun 1, 2016

@christianpayer I tested upsampling and downsampling with non-integer factors. The resampled images for the two cases look like noise and look like the downsampled image with an integer factor but without appending the Power layer. Now I can see what you said about the non-integer case. Thank you~.

Maybe it's possible to implement a new nearest interpolation layer for non-integer factors. By the way, it seems that STN could create a bilinear sampling kernel. I don't know if STN can work for non-integer factors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants