Sliding Window, Varying input/output size and Dense, multiscale extraction #189

Closed
akosiorek opened this Issue Mar 5, 2014 · 34 comments

Comments

Projects
None yet
@akosiorek
Contributor

akosiorek commented Mar 5, 2014

[1] enables varying input/output size in order to perform multiscale multiview image processing so as to to bolster classification confidence and to perform localisation and object detection. I wonder if and how could it be implemented in Caffe?

One possibility would be to set blob sizes to their maximum expected values and then account for the actual input size during computation at each layer. I am not familiar enough with Caffe sources to predict the overhead this approach might cause. I imagine it can lead to redundant memory copying and involved index arithmetic in order to access the right data.

What are other possibilities? I would be happy to PR it should we be able to work out a decent solution.

[1] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs.CV].

@nian-liu

This comment has been minimized.

Show comment
Hide comment
@nian-liu

nian-liu Mar 5, 2014

I am also concerning about similar issues: Does caffe support multiple input data layers and the situations that multiple layers input to a higher layer and vice versa?

nian-liu commented Mar 5, 2014

I am also concerning about similar issues: Does caffe support multiple input data layers and the situations that multiple layers input to a higher layer and vice versa?

@mavenlin

This comment has been minimized.

Show comment
Hide comment
@mavenlin

mavenlin Mar 5, 2014

Contributor

As for convolution, caffe processes image one by one.
In this sense, the size of each image can vary, im2col buffer can be preallocated to fit the largest image.
For innerproduct layer, the batch mode will no longer work. (But anyways, for network involving multiscaling there would be no innerproduct layer).
Dropout is also not a problem, but I didn't read pooling code, no idea whether it is a problem.

Contributor

mavenlin commented Mar 5, 2014

As for convolution, caffe processes image one by one.
In this sense, the size of each image can vary, im2col buffer can be preallocated to fit the largest image.
For innerproduct layer, the batch mode will no longer work. (But anyways, for network involving multiscaling there would be no innerproduct layer).
Dropout is also not a problem, but I didn't read pooling code, no idea whether it is a problem.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 5, 2014

Member

@kosiorekadam For varying output size with input size, the inner product layers for classification can be made convolutional too, such that the network makes a spatial output map. This can be done in Caffe as-is with the proper network definition. We will try to include an example.

Dense, multiscale feature extraction (that's fast!) is afforded by convolutional architectures if done right, and has been done within the BVLC in Caffe. We hope to publicly release this enhancement before long.

@mavenlin while convolution is the bottleneck in the current pipeline, images of varying dimensions and scale can be accommodated in a single convolutional pass with the right indexing. Essentially, one packs a pyramid or image set into a "plane" for processing through the net. This amortizes the convolutional computation across windows. By careful indexing one can extract the features/output as if they were processed one-by-one.

@forresti is a BVLC member working on amortized computation, reduced memory usage, and further efficiency improvements to Caffe among other projects.

Member

shelhamer commented Mar 5, 2014

@kosiorekadam For varying output size with input size, the inner product layers for classification can be made convolutional too, such that the network makes a spatial output map. This can be done in Caffe as-is with the proper network definition. We will try to include an example.

Dense, multiscale feature extraction (that's fast!) is afforded by convolutional architectures if done right, and has been done within the BVLC in Caffe. We hope to publicly release this enhancement before long.

@mavenlin while convolution is the bottleneck in the current pipeline, images of varying dimensions and scale can be accommodated in a single convolutional pass with the right indexing. Essentially, one packs a pyramid or image set into a "plane" for processing through the net. This amortizes the convolutional computation across windows. By careful indexing one can extract the features/output as if they were processed one-by-one.

@forresti is a BVLC member working on amortized computation, reduced memory usage, and further efficiency improvements to Caffe among other projects.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 5, 2014

Member

@nian-liu Caffe layers can have multiple inputs and outputs. Caffe networks can have any DAG (directed acyclic graph) structure #114 #129 , so many kinds of branching are supported. Although there aren't examples included yet, it is done by listing multiple outputs in the network definition, which are then automatically connected by inserting split layers.

For multiple inputs, you might find the concatenation layer helpful #125. This combines multiple input images into a single input blob. This could be used for example to process consecutive frames of video together.

Member

shelhamer commented Mar 5, 2014

@nian-liu Caffe layers can have multiple inputs and outputs. Caffe networks can have any DAG (directed acyclic graph) structure #114 #129 , so many kinds of branching are supported. Although there aren't examples included yet, it is done by listing multiple outputs in the network definition, which are then automatically connected by inserting split layers.

For multiple inputs, you might find the concatenation layer helpful #125. This combines multiple input images into a single input blob. This could be used for example to process consecutive frames of video together.

@akosiorek

This comment has been minimized.

Show comment
Hide comment
@akosiorek

akosiorek Mar 7, 2014

Contributor

I've done a little bit of code reading and as I understand both convolution and pooling layers can work with changing image sizes. They only have to be preallocated to fit the biggest image anticipated, just as @mavenlin mentioned.

However, this approach results in convolving and pooling a small image with a lot of padding (corresponding to the image of the maximum size). In order to narrow down the computation to the area of the currently processed image I need to store the size somewhere. I can feed the image to the network and compute the size after each layer inside the Net::forward or add a couple of fields to the Blob that would store the size. Of course I would have to change the API to allow input of different size than indicated in the layer_param. Am I correct?

Contributor

akosiorek commented Mar 7, 2014

I've done a little bit of code reading and as I understand both convolution and pooling layers can work with changing image sizes. They only have to be preallocated to fit the biggest image anticipated, just as @mavenlin mentioned.

However, this approach results in convolving and pooling a small image with a lot of padding (corresponding to the image of the maximum size). In order to narrow down the computation to the area of the currently processed image I need to store the size somewhere. I can feed the image to the network and compute the size after each layer inside the Net::forward or add a couple of fields to the Blob that would store the size. Of course I would have to change the API to allow input of different size than indicated in the layer_param. Am I correct?

@kloudkl

This comment has been minimized.

Show comment
Hide comment
@kloudkl

kloudkl Mar 9, 2014

Contributor

Torch7 PyramidPacker does exactly what we want.
@shelhamer, is your internal implementation in BVLC the same as Torch7? If it is and you cannot open source it shortly for some reasons, we understand you well and would like to implement one to benefit everyone as soon as possible.

Contributor

kloudkl commented Mar 9, 2014

Torch7 PyramidPacker does exactly what we want.
@shelhamer, is your internal implementation in BVLC the same as Torch7? If it is and you cannot open source it shortly for some reasons, we understand you well and would like to implement one to benefit everyone as soon as possible.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 10, 2014

Member

@kloudkl the BVLC implementation is the same as the Torch7 PyramidPacker at least in spirit; I have not read the Torch7 code yet to compare the details.

Our pipeline is not quite identical, but it is a pack, plane, unpack method.

I agree it is time for dense extraction in Caffe. Since there are several design choices, it is unlikely that the implementation planned here will be identical to the private (and still experimental) implementation. I suggest we move ahead on a public implementation, and then we can compare and draw from the strengths of both implementations in the end. The BVLC pyramid team agrees with this path and will continue work on their implementation too.

PRs for dense + pyramid extraction are welcome!

Member

shelhamer commented Mar 10, 2014

@kloudkl the BVLC implementation is the same as the Torch7 PyramidPacker at least in spirit; I have not read the Torch7 code yet to compare the details.

Our pipeline is not quite identical, but it is a pack, plane, unpack method.

I agree it is time for dense extraction in Caffe. Since there are several design choices, it is unlikely that the implementation planned here will be identical to the private (and still experimental) implementation. I suggest we move ahead on a public implementation, and then we can compare and draw from the strengths of both implementations in the end. The BVLC pyramid team agrees with this path and will continue work on their implementation too.

PRs for dense + pyramid extraction are welcome!

@sguada

This comment has been minimized.

Show comment
Hide comment
@sguada

sguada Mar 10, 2014

Contributor

@shelhamer making innerproduct layers into convolutional layers will slow down the process a lot. I made some tests by changing innerproduct layers to convolution layers with 4096 filters and the running time goes from 1.25 seconds per batch (of 256 227x227 images) to 4.37 seconds, so almost 4x slower.

When I increase the size of the images to 454x454 then I have to reduce the size of the batch to 128, otherwise it doesn't fit in the 12G of memory of the K40, and the then the time per batch is 4.23 seconds per batch, what means that the time to process 256 images would be 8.47 seconds.
That would make that network impractical for training since it will take ~30 days, however it could be used for testing or deploy.

Maybe a different way to do the convolutions could help in that case.
Also by changing the size of the inputs like #195 one could pass multiple scales independently instead of all together.

Contributor

sguada commented Mar 10, 2014

@shelhamer making innerproduct layers into convolutional layers will slow down the process a lot. I made some tests by changing innerproduct layers to convolution layers with 4096 filters and the running time goes from 1.25 seconds per batch (of 256 227x227 images) to 4.37 seconds, so almost 4x slower.

When I increase the size of the images to 454x454 then I have to reduce the size of the batch to 128, otherwise it doesn't fit in the 12G of memory of the K40, and the then the time per batch is 4.23 seconds per batch, what means that the time to process 256 images would be 8.47 seconds.
That would make that network impractical for training since it will take ~30 days, however it could be used for testing or deploy.

Maybe a different way to do the convolutions could help in that case.
Also by changing the size of the inputs like #195 one could pass multiple scales independently instead of all together.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 10, 2014

Member

Thanks for the timing evaluation @sguada. The convolutional bottleneck is an important target for improvement. @forresti, I think you had some ideas for this?.

However, it's important to note the overall efficiency of this scheme. In the 454x454 case an 8x8 classification map is computed, so the convolutional fully-connected net is doing 64x the work in ~8x the time (I did this math in my head, so someone might check me on this).

Further, one need not necessarily densely compute the classifier. One could fuse the dense and selective approaches by densely extracting features (across space and scale as desired), then selectively computing the fully-connected layers at selective search from the image mapped into feature space coordinates.

Perhaps #194 might help alleviate the issue if instead of convolution fully-connected layers one tiles the inner product layer weights to compute the classification map as one massive multiplication, although this is of course wasteful in memory.

Member

shelhamer commented Mar 10, 2014

Thanks for the timing evaluation @sguada. The convolutional bottleneck is an important target for improvement. @forresti, I think you had some ideas for this?.

However, it's important to note the overall efficiency of this scheme. In the 454x454 case an 8x8 classification map is computed, so the convolutional fully-connected net is doing 64x the work in ~8x the time (I did this math in my head, so someone might check me on this).

Further, one need not necessarily densely compute the classifier. One could fuse the dense and selective approaches by densely extracting features (across space and scale as desired), then selectively computing the fully-connected layers at selective search from the image mapped into feature space coordinates.

Perhaps #194 might help alleviate the issue if instead of convolution fully-connected layers one tiles the inner product layer weights to compute the classification map as one massive multiplication, although this is of course wasteful in memory.

@sguada

This comment has been minimized.

Show comment
Hide comment
@sguada

sguada Mar 10, 2014

Contributor

@shelhamer you math was almost correct, the final output map is 9x9, so in fact is doing 81x the work in ~8x the time. However there is something to look in the convolutional layer since when given the same image size and have to do exactly the same work it requires 4x the time.

However this approach would work for big images since the extra cost is amortized very quickly.

@forresti and me have been looking into how to speed up the convolution, but so far didn't success. However maybe for this case where there are a lot of filters with many channels it could work.

Contributor

sguada commented Mar 10, 2014

@shelhamer you math was almost correct, the final output map is 9x9, so in fact is doing 81x the work in ~8x the time. However there is something to look in the convolutional layer since when given the same image size and have to do exactly the same work it requires 4x the time.

However this approach would work for big images since the extra cost is amortized very quickly.

@forresti and me have been looking into how to speed up the convolution, but so far didn't success. However maybe for this case where there are a lot of filters with many channels it could work.

@forresti

This comment has been minimized.

Show comment
Hide comment
@forresti

forresti Mar 10, 2014

Contributor

@sguada oh cool, thanks for doing some benchmarking with the convolutional fc6 and fc7.

To begin with, I'll see if I can discern why conv is slower than innerproduct for the standard 227x227 setup.

@kloudkl Do you have any thoughts on the computational efficiency of Torch7 for Alexnet and similar deep models? Are there any particularly interesting scenarios where Caffe is much faster or slower than Torch7?

Contributor

forresti commented Mar 10, 2014

@sguada oh cool, thanks for doing some benchmarking with the convolutional fc6 and fc7.

To begin with, I'll see if I can discern why conv is slower than innerproduct for the standard 227x227 setup.

@kloudkl Do you have any thoughts on the computational efficiency of Torch7 for Alexnet and similar deep models? Are there any particularly interesting scenarios where Caffe is much faster or slower than Torch7?

@rodrigob

This comment has been minimized.

Show comment
Hide comment
@rodrigob

rodrigob Mar 10, 2014

Contributor

It might be interesting to look into the implementation details of OverFeat since it is supposedly optimized for the "dense sliding window" use case.
https://github.com/sermanet/OverFeat

Contributor

rodrigob commented Mar 10, 2014

It might be interesting to look into the implementation details of OverFeat since it is supposedly optimized for the "dense sliding window" use case.
https://github.com/sermanet/OverFeat

@rodrigob

This comment has been minimized.

Show comment
Hide comment
@rodrigob

rodrigob Mar 10, 2014

Contributor

Scratch my last comment for now OverFeat only released source code for the CPU version, and binaries for the GPU version (?!).

Then for now we can look at Torch's GPU code

https://github.com/torch/cunn
https://github.com/torch/cutorch/blob/master/lib/THC/THCTensorConv.cu

Contributor

rodrigob commented Mar 10, 2014

Scratch my last comment for now OverFeat only released source code for the CPU version, and binaries for the GPU version (?!).

Then for now we can look at Torch's GPU code

https://github.com/torch/cunn
https://github.com/torch/cutorch/blob/master/lib/THC/THCTensorConv.cu

@rodrigob

This comment has been minimized.

Show comment
Hide comment
@rodrigob

rodrigob Mar 10, 2014

Contributor

A nice demo for this new feature would be a face detector similar to
http://eblearn.sourceforge.net/face_detector.html

Contributor

rodrigob commented Mar 10, 2014

A nice demo for this new feature would be a face detector similar to
http://eblearn.sourceforge.net/face_detector.html

@kloudkl

This comment has been minimized.

Show comment
Hide comment
@kloudkl

kloudkl Mar 11, 2014

Contributor

@forresti, I have read some codes of Torch7 but never run it. Unlike Caffe, only a small part of Torch7 is written in CUDA. Torch7 and Theano have been benchmarked against each other in the pre-Caffe era. The results largely depends on who does the benchmarking, when(both teams never stop optimizing the preformance) and on which GPU device they are benchmarked.

To inspire further discussions, I excerpt the following contents that can be found in many CUDA courses. To gain insight into the performance bottlenecks and root causes, the orthogonal method is profiling. Usual suspects are device utilization and memory bus utilization. The former can be tuned with launch config (#111). The latter is higher when the memory access is coalesced. Latency hiding technique can also increase throughput while warp divergence does the opposite.

If optimization becomes a really high priority, systematically studying the related professional techniques will help a lot.

Contributor

kloudkl commented Mar 11, 2014

@forresti, I have read some codes of Torch7 but never run it. Unlike Caffe, only a small part of Torch7 is written in CUDA. Torch7 and Theano have been benchmarked against each other in the pre-Caffe era. The results largely depends on who does the benchmarking, when(both teams never stop optimizing the preformance) and on which GPU device they are benchmarked.

To inspire further discussions, I excerpt the following contents that can be found in many CUDA courses. To gain insight into the performance bottlenecks and root causes, the orthogonal method is profiling. Usual suspects are device utilization and memory bus utilization. The former can be tuned with launch config (#111). The latter is higher when the memory access is coalesced. Latency hiding technique can also increase throughput while warp divergence does the opposite.

If optimization becomes a really high priority, systematically studying the related professional techniques will help a lot.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 11, 2014

Member

I do not see Torch7 and Theano so much as guides for our computational pipeline and convolutional architecture, but as machine learning / deep learning libraries we can take as inspiration for features.

The central feature relevant to dense and pyramid processing in Torch7 is pyramid packing and unpacking. While optimization of the indexing, convolution, and fully-connected layers will be important for a widely-useful implementation, first we must have an implementation. From pyramid processing we can go in many directions, including for problems other than detection, and of course work on a Caffe reference implementation of OverFeat.

Thanks @kloudkl for the review of CUDA optimization and benchmark history. Perhaps we could have an "on CUDA optimization" section of the developer documentation to keep your pointers together.

The face detector highlighted by @rodrigob would be a nice demo for pyramid processing.

Member

shelhamer commented Mar 11, 2014

I do not see Torch7 and Theano so much as guides for our computational pipeline and convolutional architecture, but as machine learning / deep learning libraries we can take as inspiration for features.

The central feature relevant to dense and pyramid processing in Torch7 is pyramid packing and unpacking. While optimization of the indexing, convolution, and fully-connected layers will be important for a widely-useful implementation, first we must have an implementation. From pyramid processing we can go in many directions, including for problems other than detection, and of course work on a Caffe reference implementation of OverFeat.

Thanks @kloudkl for the review of CUDA optimization and benchmark history. Perhaps we could have an "on CUDA optimization" section of the developer documentation to keep your pointers together.

The face detector highlighted by @rodrigob would be a nice demo for pyramid processing.

@sergeyk sergeyk added this to the 1.0 milestone Mar 13, 2014

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 13, 2014

Member

The BVLC pyramid team is working on integrating their implementation into dev ASAP. The only hold-up is the usual integration hacking and a license complication that is being hammered out now. Thanks all for your patience while this feature coalesces.

However, I stand by my original suggestion that a Torch7 style pyramid pack/plane/unpack method be pursued in the community so that we can analyze and improve on the differences. There are many design choices in such a feature.

Member

shelhamer commented Mar 13, 2014

The BVLC pyramid team is working on integrating their implementation into dev ASAP. The only hold-up is the usual integration hacking and a license complication that is being hammered out now. Thanks all for your patience while this feature coalesces.

However, I stand by my original suggestion that a Torch7 style pyramid pack/plane/unpack method be pursued in the community so that we can analyze and improve on the differences. There are many design choices in such a feature.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 13, 2014

Member

Re: @forresti's #189 (comment), the convolutional implementation is slowed by the roll / unroll and copy instead of straight dgemm as in the InnerProduct layer.

Member

shelhamer commented Mar 13, 2014

Re: @forresti's #189 (comment), the convolutional implementation is slowed by the roll / unroll and copy instead of straight dgemm as in the InnerProduct layer.

@rodrigob

This comment has been minimized.

Show comment
Hide comment
@rodrigob

rodrigob Mar 22, 2014

Contributor

Thanks @shelhamer for looking into the topic. Any update on the BVLC pyramid integration plans ? It is there a branch where we can track progress on this topic ?

Contributor

rodrigob commented Mar 22, 2014

Thanks @shelhamer for looking into the topic. Any update on the BVLC pyramid integration plans ? It is there a branch where we can track progress on this topic ?

@kloudkl

This comment has been minimized.

Show comment
Hide comment
@kloudkl

kloudkl Mar 23, 2014

Contributor

It is scheduled to be released in the milestone 1.0. There is no PR or branch to track yet. But anyone could feel free to develop one.

Contributor

kloudkl commented Mar 23, 2014

It is scheduled to be released in the milestone 1.0. There is no PR or branch to track yet. But anyone could feel free to develop one.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 23, 2014

Member

The BVLC pyramid team hope to make a public PR in the next week. That said, it was developed somewhat independently of Caffe and is going to take serious effort to integrate, so the appearance of the PR doesn't signal that the feature is ready.

My honest suggestion is that anyone interested pursue the Torch7 pyramid pack/plane/unpack line of thought. There's understanding and improvements to be had in comparing implementations. As @kloudkl noted, this is a milestone feature or us so we could help review and discuss any contributions in this direction.

One could even prototype it in python instead of coding it directly into the library to first understand the choices to make. For instance, the Torch7 packing when convolved together will not produce the same filter activations as running separate inputs; there will be border effects according to the kernel sizes. Likewise, how should one pad to avoid false edge responses along the negative space where no image is packed? Yet another issue is that a mean image will not work, unless it is scaled and applied to each packed image, and one might instead use a channel mean that is spatially uniform.

These options are worth exploring in more than a single thread.

Member

shelhamer commented Mar 23, 2014

The BVLC pyramid team hope to make a public PR in the next week. That said, it was developed somewhat independently of Caffe and is going to take serious effort to integrate, so the appearance of the PR doesn't signal that the feature is ready.

My honest suggestion is that anyone interested pursue the Torch7 pyramid pack/plane/unpack line of thought. There's understanding and improvements to be had in comparing implementations. As @kloudkl noted, this is a milestone feature or us so we could help review and discuss any contributions in this direction.

One could even prototype it in python instead of coding it directly into the library to first understand the choices to make. For instance, the Torch7 packing when convolved together will not produce the same filter activations as running separate inputs; there will be border effects according to the kernel sizes. Likewise, how should one pad to avoid false edge responses along the negative space where no image is packed? Yet another issue is that a mean image will not work, unless it is scaled and applied to each packed image, and one might instead use a channel mean that is spatially uniform.

These options are worth exploring in more than a single thread.

@kloudkl

This comment has been minimized.

Show comment
Hide comment
@kloudkl

kloudkl Mar 23, 2014

Contributor

Although already mentioned in a comment two weeks ago, but I think it is still very relevant and useful to post the links related to @clementfarabet's implemention.

  1. torch7-demos / face-detector / PyramidPacker.lua
  2. torch7-demos / face-detector / PyramidUnPacker.lua
  3. Purdue's demonstrative tutorial
Contributor

kloudkl commented Mar 23, 2014

Although already mentioned in a comment two weeks ago, but I think it is still very relevant and useful to post the links related to @clementfarabet's implemention.

  1. torch7-demos / face-detector / PyramidPacker.lua
  2. torch7-demos / face-detector / PyramidUnPacker.lua
  3. Purdue's demonstrative tutorial
@rodrigob

This comment has been minimized.

Show comment
Hide comment
@rodrigob

rodrigob Apr 8, 2014

Contributor

Now DenseNet is out, thus we should be able to close this item soon ?

http://arxiv-web3.library.cornell.edu/abs/1404.1869

Contributor

rodrigob commented Apr 8, 2014

Now DenseNet is out, thus we should be able to close this item soon ?

http://arxiv-web3.library.cornell.edu/abs/1404.1869

@moskewcz

This comment has been minimized.

Show comment
Hide comment
@moskewcz

moskewcz Apr 8, 2014

i just pushed the code DenseNet code public and opened a PR #308 (not #307, wrong target branch) to discuss the various TODOs and/or integration plans.

moskewcz commented Apr 8, 2014

i just pushed the code DenseNet code public and opened a PR #308 (not #307, wrong target branch) to discuss the various TODOs and/or integration plans.

@bhack

This comment has been minimized.

Show comment
Hide comment
@bhack

bhack May 31, 2014

Contributor

Please take a look here: http://arxiv.org/abs/1405.3866v1

Contributor

bhack commented May 31, 2014

Please take a look here: http://arxiv.org/abs/1405.3866v1

@bhack bhack referenced this issue in lisa-lab/pylearn2 May 31, 2014

Closed

sliding window #943

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Jun 12, 2014

Member

#455 might be of interest in the meantime. It shows how to make a fully-convolutional model for dense feature extraction or sliding window classification inference.

Member

shelhamer commented Jun 12, 2014

#455 might be of interest in the meantime. It shows how to make a fully-convolutional model for dense feature extraction or sliding window classification inference.

@bhack

This comment has been minimized.

Show comment
Hide comment
@bhack

bhack Sep 8, 2014

Contributor

This could include also regression support on a variabile length set of bounding boxes coordinates and sizes

Contributor

bhack commented Sep 8, 2014

This could include also regression support on a variabile length set of bounding boxes coordinates and sizes

@dasguptar

This comment has been minimized.

Show comment
Hide comment
@dasguptar

dasguptar Sep 28, 2014

I was a bit curious regarding the current status of sliding window based dense multiscale extraction. Any plans to integrate it into Caffe anytime soon?

I was a bit curious regarding the current status of sliding window based dense multiscale extraction. Any plans to integrate it into Caffe anytime soon?

@melgor

This comment has been minimized.

Show comment
Hide comment
@melgor

melgor Oct 27, 2014

Since the start of the talks start 7 months ago, is there any progress in that field? Maybe someone implement "Efficient Sliding Window" like in OverFeat?

melgor commented Oct 27, 2014

Since the start of the talks start 7 months ago, is there any progress in that field? Maybe someone implement "Efficient Sliding Window" like in OverFeat?

@EvanWeiner

This comment has been minimized.

Show comment
Hide comment
@EvanWeiner

EvanWeiner Apr 10, 2015

Echo @melgor -- any progress on the "Efficient Sliding Window" like in OverFeat in Caffe?

Echo @melgor -- any progress on the "Efficient Sliding Window" like in OverFeat in Caffe?

@melgor

This comment has been minimized.

Show comment
Hide comment
@melgor

melgor Apr 10, 2015

@EvanWeiner, now everything is implemented in Caffe, in the similar fashion like in OverFeat.
For running it you need to things:

  • #1313, which will enable you to varying Input/Output size. You could use mutliple scale of same image to get multi-scale extraction. Just call "Reshape" with new size
  • #455, which enable you to transform your model to fully-convolutioanal. It is like "Efficient Sliding Window"

This two things are merged in Caffe, so you can use it. As a example of output, take a look here: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

I think, that this Issue can be closed.

melgor commented Apr 10, 2015

@EvanWeiner, now everything is implemented in Caffe, in the similar fashion like in OverFeat.
For running it you need to things:

  • #1313, which will enable you to varying Input/Output size. You could use mutliple scale of same image to get multi-scale extraction. Just call "Reshape" with new size
  • #455, which enable you to transform your model to fully-convolutioanal. It is like "Efficient Sliding Window"

This two things are merged in Caffe, so you can use it. As a example of output, take a look here: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

I think, that this Issue can be closed.

@EvanWeiner

This comment has been minimized.

Show comment
Hide comment
@EvanWeiner

EvanWeiner Apr 11, 2015

@melgor Thank you. But how can I use an output matrix like:

[[282 282 281 281 281 281 277 282]
[281 283 283 281 281 281 281 282]
[283 283 283 283 283 283 287 282]
[283 283 283 281 283 283 283 259]
[283 283 283 283 283 283 283 259]
[283 283 283 283 283 283 259 259]
[283 283 283 283 259 259 259 277]
[335 335 283 259 263 263 263 277]]

To locate an object within the photo? These values correspond to the ImageNet classification value. But it seems the output has the same or similiar class in all locations. How to discern a particular object?

@melgor Thank you. But how can I use an output matrix like:

[[282 282 281 281 281 281 277 282]
[281 283 283 281 281 281 281 282]
[283 283 283 283 283 283 287 282]
[283 283 283 281 283 283 283 259]
[283 283 283 283 283 283 283 259]
[283 283 283 283 283 283 259 259]
[283 283 283 283 259 259 259 277]
[335 335 283 259 263 263 263 277]]

To locate an object within the photo? These values correspond to the ImageNet classification value. But it seems the output has the same or similiar class in all locations. How to discern a particular object?

@melgor

This comment has been minimized.

Show comment
Hide comment
@melgor

melgor Apr 11, 2015

@EvanWeiner find more information at Caffe mailing list: https://groups.google.com/forum/#!searchin/caffe-users/Object$20Detection/caffe-users/5TyzPCEjuRs/7sJA0DXhJ-kJ

Here I point, that you could do to detect objects. Read OverFeat paper, here are all the informations.

melgor commented Apr 11, 2015

@EvanWeiner find more information at Caffe mailing list: https://groups.google.com/forum/#!searchin/caffe-users/Object$20Detection/caffe-users/5TyzPCEjuRs/7sJA0DXhJ-kJ

Here I point, that you could do to detect objects. Read OverFeat paper, here are all the informations.

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Mar 23, 2017

Member

Closing as this is handled by fully convolutional networks and their implementation in Caffe through coordinate mapping, and cropping.

Member

shelhamer commented Mar 23, 2017

Closing as this is handled by fully convolutional networks and their implementation in Caffe through coordinate mapping, and cropping.

@shelhamer shelhamer closed this Mar 23, 2017

@shelhamer shelhamer removed this from the 1.0 milestone Mar 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment