Improve ncnn memory estimation #2352

JeremyRand · 2023-11-29T17:08:45Z

Splitting out from #2070 since this deserves its own issue.

Since you have so much RAM, I'm wondering if you'd be willing to run some tests. I've done some testing on NCNN VRAM estimation in the past (though only with ESRGAN), and I can already tell you that this isn't really correct, and neither is what we already have in place. I've only got 8GB VRAM, so I could never extend my tests far enough to gather fully complete data.

What I can say is that:

Model size is not strongly correlative with VRAM usage. This is because a model can be made larger simply by performing more convolutions, which does not matter for VRAM usage because they are being done in sequence. The only thing it definitely correlates with is how much VRAM it takes to store the model itself.

Individual weight sizes have a correlation with VRAM usage when running a model.

Scale needs to be accounted for, which our estimation does not currently do. This estimation is based around a 4x scale, but an 8x model will blow past it.

I abandoned this back in the day because there seemed to be further factors I couldn't account for with the data I had, but maybe we can finally figure it out. Unfortunately, I seem to have deleted the set of different scale/nf/nb ESRGAN models I generated for these tests. If I can remember how I generated them all, I could send them to you.

I'm hoping that there may be some way to parse the ncnn parameter file in a way that reveals the memory usage, so that we don't need to use heuristics and black-box RE like described above; I haven't investigated this yet.

joeyballentine · 2023-11-29T17:34:55Z

if you're able to figure something out, that knowledge could also be used to improve vram estimation for pytorch as well, since they should have the same kind of vram usage. very interested to see what comes of this, if anything

theflyingzamboni · 2023-11-29T19:57:12Z

I'm hoping that there may be some way to parse the ncnn parameter file in a way that reveals the memory usage, so that we don't need to use heuristics and black-box RE like described above; I haven't investigated this yet.

So this is basically what I was doing, aside from model size. My process was that I generated some synthetic ESRGAN models at 1x, 2x, 4x, and 8x scale, and permuted those scales with varying nf (number of convolution filters, which pertains to layer weight), nb (number of batches, basically how many times the model runs through a sequence of layers), and image size. If I recall correctly, I also directly used maximum layer weight (as in, the largest weight for a single convolution layer in the model). Scale, nf, nb, and layer weight were all taken from the param file.

I then generated a series of plots against VRAM usage for each of these variables to look for trends/effects. What I found as I recall was the nb was irrelevant, and nf, scale, and image size each had an effect, and I believe there was an interaction effect between some or all of those three as well. The problem was that my GPU is only 8GB, so my data was incomplete in a way that made it impossible to fully extrapolate the trends. At higher scales and image sizes, I simply could not process the test images. That also doesn't get into the potential differences between model arches as well, since really this was a test of very basic convolution models. This is why it would be valuable to have those tests rerun by someone with a ton of memory to work with.

joeyballentine · 2023-11-29T20:33:16Z

Testing with something like compact or maybe even something simpler I think would allow you to see a much clearer trend. Compact is a really simple arch by comparison, but it also doesn't use upconv like esrgan, it uses pixelshuffle, so the VRAM usage when upscaling is going to be different.

Anyway, I think what matters most is the max tensor size of the model, given the specific image. Idk how to verify that though

JeremyRand · 2023-11-29T20:56:21Z

There are a bunch of configurable options in ncnn that affect memory usage. Winograd convolution is the main one -- it makes things much faster but also uses more memory. It would be nice to know exactly what the impact of those are.

JeremyRand mentioned this issue Dec 3, 2023

Make ncnn memory budget configurable (v2) #2351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ncnn memory estimation #2352

Improve ncnn memory estimation #2352

JeremyRand commented Nov 29, 2023

joeyballentine commented Nov 29, 2023

theflyingzamboni commented Nov 29, 2023 •

edited

Loading

joeyballentine commented Nov 29, 2023

JeremyRand commented Nov 29, 2023

Improve ncnn memory estimation #2352

Improve ncnn memory estimation #2352

Comments

JeremyRand commented Nov 29, 2023

joeyballentine commented Nov 29, 2023

theflyingzamboni commented Nov 29, 2023 • edited Loading

joeyballentine commented Nov 29, 2023

JeremyRand commented Nov 29, 2023

theflyingzamboni commented Nov 29, 2023 •

edited

Loading