Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the optimization of memory consumption #3

Closed
KaneKun opened this issue Apr 23, 2019 · 4 comments
Closed

About the optimization of memory consumption #3

KaneKun opened this issue Apr 23, 2019 · 4 comments

Comments

@KaneKun
Copy link

KaneKun commented Apr 23, 2019

In the section 4 of the paper, an efficient PointConv implementation is discussed. From the analysis, particularly the comparison between Fig.3 and Fig.5, we know that the naive implementation costs K x C_in x C_out memory, while the optimized solution costs only C_in x C_mid. Therefore, after optimization, the memory footprint can be reduced to C_min / (K x C_out).

Here I have a question. In Fig. 5, the input to the 1x1 conv is a feature of size 1 x (C_in x C_mid), and the kernel size of 1 x 1 conv is C_mid x C_in x C_out. So does this mean that the "multiplication" applied here for the convolution induces a memory consumption C_mid x C_in x C_out (which is for the storage of such convolution kernel parameters)? If so, the improvement ratio should be C_mid / K. And since the parameters set in the experiment are C_mid = K = 32, there is no memory consumption reduction.

I wonder if my analysis is incorrect.

@DylanWusee
Copy link
Owner

Thank you for interesting in our work.

The part that you miss might be that the weight in 1x1 conv of Fig.5 is shared across all the points in the point cloud. However, the output of the MLP1' in Fig.5 is not. The reason for that is because the local region of a point cloud is not in a standard grid shape like images, refer to Fig.2(b) and (c) as an example.

So, in the original version of PointConv, the main memory consumption should be B x N x K x (C_in x C_out); And the efficient version should be B x N x K x C_mid + B x N x C_in x C_mid. The C_mid x C_in x C_out you mentioned is just a standard convolution kernel which is not the main memory consumption in this case.

ratio : B x N x (K + C_in) x C_mid / (B x N x K x C_in x C_out) = (K + C_in) x C_mid / (K x C_in x C_out)

The general value for C_in would be 641024, and K would be 832. So, (K + C_in) is the same magnitude as C_in.

So, the ratio becomes C_mid / (K x C_out) .

Hope that answered your question.

@KaneKun
Copy link
Author

KaneKun commented Apr 23, 2019

Your analysis is correct. Hope that you would add such detail into the main paper :)

By the way, in practice, I wonder roughly about how many memory footprint you could save using the efficient solution (e.g., for the classification network with a certain batch size and number of points)?

@DylanWusee
Copy link
Owner

That is simple.

In practice, we could not be able to train an Original Version of PointConv on a general GPU such as GTX1080 Ti. But, we can easily train an efficient version of PointConv with the same structure.

The memory save is:
untrainable/unrunnable -> trainable/runnable

@KaneKun
Copy link
Author

KaneKun commented Apr 23, 2019

Excellent! Thank you for your kind explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants