About the optimization of memory consumption #3

KaneKun · 2019-04-23T02:09:45Z

In the section 4 of the paper, an efficient PointConv implementation is discussed. From the analysis, particularly the comparison between Fig.3 and Fig.5, we know that the naive implementation costs K x C_in x C_out memory, while the optimized solution costs only C_in x C_mid. Therefore, after optimization, the memory footprint can be reduced to C_min / (K x C_out).

Here I have a question. In Fig. 5, the input to the 1x1 conv is a feature of size 1 x (C_in x C_mid), and the kernel size of 1 x 1 conv is C_mid x C_in x C_out. So does this mean that the "multiplication" applied here for the convolution induces a memory consumption C_mid x C_in x C_out (which is for the storage of such convolution kernel parameters)? If so, the improvement ratio should be C_mid / K. And since the parameters set in the experiment are C_mid = K = 32, there is no memory consumption reduction.

I wonder if my analysis is incorrect.

DylanWusee · 2019-04-23T02:56:35Z

Thank you for interesting in our work.

The part that you miss might be that the weight in 1x1 conv of Fig.5 is shared across all the points in the point cloud. However, the output of the MLP1' in Fig.5 is not. The reason for that is because the local region of a point cloud is not in a standard grid shape like images, refer to Fig.2(b) and (c) as an example.

So, in the original version of PointConv, the main memory consumption should be B x N x K x (C_in x C_out); And the efficient version should be B x N x K x C_mid + B x N x C_in x C_mid. The C_mid x C_in x C_out you mentioned is just a standard convolution kernel which is not the main memory consumption in this case.

ratio : B x N x (K + C_in) x C_mid / (B x N x K x C_in x C_out) = (K + C_in) x C_mid / (K x C_in x C_out)

The general value for C_in would be 64~~1024, and K would be 8~~32. So, (K + C_in) is the same magnitude as C_in.

So, the ratio becomes C_mid / (K x C_out) .

Hope that answered your question.

KaneKun · 2019-04-23T03:13:03Z

Your analysis is correct. Hope that you would add such detail into the main paper :)

By the way, in practice, I wonder roughly about how many memory footprint you could save using the efficient solution (e.g., for the classification network with a certain batch size and number of points)?

DylanWusee · 2019-04-23T03:19:19Z

That is simple.

In practice, we could not be able to train an Original Version of PointConv on a general GPU such as GTX1080 Ti. But, we can easily train an efficient version of PointConv with the same structure.

The memory save is:
untrainable/unrunnable -> trainable/runnable

KaneKun · 2019-04-23T03:20:08Z

Excellent! Thank you for your kind explanation.

DylanWusee closed this as completed Apr 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the optimization of memory consumption #3

About the optimization of memory consumption #3

KaneKun commented Apr 23, 2019

DylanWusee commented Apr 23, 2019

KaneKun commented Apr 23, 2019 •

edited

Loading

DylanWusee commented Apr 23, 2019

KaneKun commented Apr 23, 2019

About the optimization of memory consumption #3

About the optimization of memory consumption #3

Comments

KaneKun commented Apr 23, 2019

DylanWusee commented Apr 23, 2019

KaneKun commented Apr 23, 2019 • edited Loading

DylanWusee commented Apr 23, 2019

KaneKun commented Apr 23, 2019

KaneKun commented Apr 23, 2019 •

edited

Loading