Skip to content

durandtibo/spatial-pooling.torch

Repository files navigation

Spatial Pooling for Torch7

This repositery proposed the implementation of several spatial poolings used for weakly supervised learning of deep ConvNets.

Installation

$ git clone https://github.com/durandtibo/spatial-pooling.torch.git
$ cd spatial-pooling.torch
$ luarocks make rocks/spatial-pooling-scm-1.rockspec

To test the installation, you can run

$ th test/test.lua

Modules

GlobalMaxPooling (GMP)

Global Max Pooling is a spatial pooling strategy used in "Is object localization for free? – Weakly Supervised Object Recognition with Convolutional Neural Networks ".

module = nn.GlobalMaxPooling()

Applies 2D max-pooling operation on the whole image. The number of output features is equal to the number of input planes.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{i,j} z^c_{i,j}
References
@inproceedings{Oquab_DeepMIL_CVPR15,
author = "Oquab, M. and Bottou, L. and Laptev, I. and Sivic, J.",
title = "Is object localization for free? – Weakly-supervised learning with convolutional neural networks",
booktitle =  "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
year = "2015"
}

GlobalkMaxPooling

GlobalkMaxPooling is a generalization of GlobalMaxPooling to multiple maximums. This spatial pooling strategy is inspired from Top Instances model "Multiple Instance Learning for Soft Bags via Top Instances".

module = nn.GlobalkMaxPooling(kMax)

Applies 2D k-max-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameter is the following:

  • kMax: The number of top instances. kMax can defined the number of selected regions (kMax >= 1) or the proportion of selected regions (0 < kMax < 1). If kMax <= 0, all the regions are selected. Default is kMax = 1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{h in H_kMax} 1 / kMax sum_{i,j} h_{i,j} z^c_{i,j}

where H_k is such that h in H_k satisfies h_{i,j} in {0, 1} and sum_{i,j} h_{i,j} = k

Special cases
  • GlobalMaxPooling: if kMax = 1
  • GlobalAveragePooling: if kMax = h x w

GlobalAveragePooling (GAP)

Global Max Pooling is a spatial pooling strategy used in "Learning Deep Features for Discriminative Localization ".

module = nn.GlobalAveragePooling()

Applies 2D average-pooling operation on the whole image. The number of output features is equal to the number of input planes.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = 1 / (h * w) sum_{i,j} z^c_{i,j}

References

@inproceedings{Zhou_2016_CVPR,
author = {Zhou, Bolei and Khosla, Aditya and Lapedriza, Agata and Oliva, Aude and Torralba, Antonio},
title = {{Learning Deep Features for Discriminative Localization}},
booktitle = {CVPR},
year = {2016}
}

LogSumExpPooling

LogSumExpPooling is a spatial pooling strategy used in "From Image-level to Pixel-level Labeling with Convolutional Networks ".

module = nn.LogSumExpPooling(beta)

Applies 2D LogSumExp-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameter is the following:

  • beta: The anti-temperature parameter. Default is beta=1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = log 1 / (h * w) sum_{i,j} exp(beta * z^c_{i,j})
Special cases
  • GlobalMaxPooling: if beta = +inf
  • GlobalAveragePooling: if beta = 0

References

@inproceedings{pinheiro_weak_seg_cvpr15,
Author = {Pedro O. Pinheiro and Ronan Collobert},
Title = {{From Image-level to Pixel-level Labeling with Convolutional Networks}},
booktitle = {CVPR},
Year = {2015}
}

WeldonPooling

WeldonPooling is a spatial pooling module used in "WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks".

module = nn.WeldonPooling(kMax, kMin)

Applies 2D WELDON-pooling operation on the whole image. The number of output features is equal to the number of input planes.

The parameters are the following:

  • kMax: The number of top instances. It is possible to define the number of selected regions (kMax >= 1) or the proportion of selected regions (0 <= kMax < 1). If kMax < 0, kMax is set to 0. Default is kMax = 1.
  • kMin: The number of low instances. It is possible to define the number of selected regions (kMin >= 1) or the proportion of selected regions (0 <= kMin < 1). If kMin < 0, kMin is set to 0. Default is kMin = 1.

If the input image is a 4D tensor nBatchImage x nInputPlane x w x h, the output image size will be nBatchImage x nInputPlane x 1 x 1 where w and h are spatial image dimensions.

If the input image is a 3D tensor nInputPlane x w x h, the output image size will be nInputPlane x 1 x 1 where w and h are spatial image dimensions.

Equation

We note z^c the c-th map of input, and s^c the c-th map of the output.

s^c = max_{h in H_kMax} 1 / kMax sum_{i,j} h_{i,j} z^c_{i,j} + min_{h in H_kMin} 1 / kMin sum_{i,j} h_{i,j} z^c_{i,j}

where H_k is such that h in H_k satisfies h_{i,j} in {0, 1} and sum_{i,j} h_{i,j} = k

Special cases
  • GlobalMaxPooling: if kMax = 1 and kMin = 0
  • GlobalAveragePooling: if kMax = h x w and kMin = 0
  • MantraPooling: if kMax = 1 and kMin = 1
References
@inproceedings{Durand_WELDON_CVPR_2016,
author = {Durand, Thibaut and Thome, Nicolas and Cord, Matthieu},
title = {{WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks}},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2016}
}

Licence

MIT License

About

Spatial Pooling for Torch7

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published