Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ncnn memory budget configurable (v2) #2351

Merged
merged 2 commits into from Dec 5, 2023

Conversation

JeremyRand
Copy link
Contributor

#1867 didn't support automatic estimation of tile size, which was not great for UX. This PR adds that missing support, by allowing the user to choose a custom memory budget. Choosing a memory budget is likely to be better UX than choosing a tile size (since users are likely to have a better knowledge of how much RAM or VRAM they have, than what tile size will work with their hardware and the model they picked). This should also work fine with Vulkan (and is likely to help with this UX issue), though I wasn't able to test that.

This PR is a rewritten version of #2070 and supersedes it; chaiNNer has had enough code churn since then that it was easier to just rewrite as a separate PR. This PR was substantially easier for me to do than messing with rebasing #2088, and is also hopefully easier to review (though #2088 is still worth doing eventually).

@JeremyRand
Copy link
Contributor Author

@RunDevelopment Your concern from the previous PR about exposing the 1 PiB number to the UI has hopefully been addressed; 0 is now special-cased to mean "no limit".

@joeyballentine
Copy link
Member

Thanks for taking the time to redo this and sorry for all the code churn making this an issue. I'll review this in a bit

@JeremyRand
Copy link
Contributor Author

Thanks for taking the time to redo this and sorry for all the code churn making this an issue. I'll review this in a bit

The code churn isn't a bad thing in this case; it cut the effort involved in writing and testing the new PR by about an order of magnitude compared to how the old PR worked.

joeyballentine
joeyballentine previously approved these changes Nov 30, 2023
Copy link
Member

@joeyballentine joeyballentine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code itself looks good. I'll test it out tomorrow when I have some time.

@joeyballentine
Copy link
Member

Do you have a good test case for this? It doesn't seem to work for me. @theflyingzamboni would you mind also testing this?

@JeremyRand
Copy link
Contributor Author

@joeyballentine I took a 1600x1200 image, and upscaled it 4x in ncnn CPU mode with automatic tiling. With no limit (i.e. set to 0), the max memory usage (near the end of the upscale) was 83 GiB; I then set it to a much lower limit (I think around 20 GiB?), rebooted chaiNNer, did another upscale, and the memory usage didn't go above that limit (IIRC it came within 10% of the limit, I don't remember the exact numbers).

Hopefully that helps you narrow down what's different about your tests and mine.

Copy link
Collaborator

@theflyingzamboni theflyingzamboni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to have little effect at all when used with a GPU. I set the memory budget to 2GB and upscaled a 2048x2048 image, and it still split to 512x512 tiles and utilized around 6GB. It only drops down to 256x256 tiles if I set it to 1GB, but that really uses more like 2GB. This is using an fp16 esrgan model, so the bin is ~32MB.

Also, is there anything to enforce the budget limit during tile splitting if memory is mis-estimated?

@JeremyRand
Copy link
Contributor Author

This appears to have little effect at all when used with a GPU. I set the memory budget to 2GB and upscaled a 2048x2048 image, and it still split to 512x512 tiles and utilized around 6GB. It only drops down to 256x256 tiles if I set it to 1GB, but that really uses more like 2GB. This is using an fp16 esrgan model, so the bin is ~32MB.

@theflyingzamboni That sounds like the memory usage estimate for your model on a GPU is faulty. This PR doesn't change that logic for Vulkan mode (unless I fucked up when refactoring -- please double-check me on this), all it does in Vulkan mode is lower the memory budget (which gets compared to that estimate) if the user picks one that's lower than what the GPU says it supports. So my tentative guess is that the issue you're seeing is out of scope for this PR (although it would be great to fix this in a follow-up PR, see #2352).

Also, is there anything to enforce the budget limit during tile splitting if memory is mis-estimated?

This PR doesn't change that behavior (again, unless I made a mistake -- I might have). AFAIK the pre-existing behavior is that in GPU mode, an OOM error will be caught and the tile size will be lowered; in CPU mode Very Bad Things (TM) will happen; those should still be the case with this PR applied.

@theflyingzamboni
Copy link
Collaborator

theflyingzamboni commented Dec 3, 2023

Regarding your first point, that may indeed be outside the scope of the PR. If you do want to do it in another PR, I wonder if we should only show the memory budget setting if the provider is set to CPU. Otherwise, we have an option that looks like it should apply universally, but is definitely not functioning as intended on GPU.

Looking at the estimation code however, it is bugged. When I manually divide the estimation to get the amount required for a 256x256 tile, I get 1.84GB, which is certainly too much for a 1GB budget. So that calculation is causing us to generate a tile size that uses more memory than the budget allows for. So this isn't a problem with memory estimation necessarily (though we know that's not super accurate as well), it's that we're not even producing a correct tile size based on budget using the estimation we have.

For the latter point, I was wondering if there were some means by which we could set a memory allocation so that the OoM point would align with the budget set. I'm not sure if NCNN provides the tools for that though.

@JeremyRand
Copy link
Contributor Author

Regarding your first point, that may indeed be outside the scope of the PR. If you do want to do it in another PR, I wonder if we should only show the memory budget setting if the provider is set to CPU. Otherwise, we have an option that looks like it should apply universally, but is definitely not functioning as intended on GPU.

@theflyingzamboni I think it's still useful to have this available on Vulkan mode, even if the actual number needs to be fiddled with by the user to get the desired results. Part of my motivation of this feature was to facilitate using chaiNNer simultaneously with something else that uses nontrivial memory (or, for that matter, multiple chaiNNer instances at the same time), and for that purpose, the main requirement is that the budget can be decreased by the user, not so much that the exact number matches reality.

I would be totally fine with adding a warning to the text indicating that the user might have to fiddle with the number because it's not exact yet. Let me know if you'd like me to do that. Also I don't feel incredibly strongly about whether this is enabled for Vulkan yet, so if @joeyballentine wants it disabled completely in Vulkan mode, I'll defer to him until the issues are worked out.

Looking at the estimation code however, it is bugged. When I manually divide the estimation to get the amount required for a 256x256 tile, I get 1.84GB, which is certainly too much for a 1GB budget. So that calculation is causing us to generate a tile size that uses more memory than the budget allows for. So this isn't a problem with memory estimation necessarily (though we know that's not super accurate as well), it's that we're not even producing a correct tile size based on budget using the estimation we have.

That's really interesting, it hadn't even occurred to me that there could be a bug there. Can we factor that out into its own GitHub issue so it doesn't get lost here?

@joeyballentine
Copy link
Member

A warning and an explanation as to why it's inaccurate would be good enough for me.

@JeremyRand
Copy link
Contributor Author

A warning and an explanation as to why it's inaccurate would be good enough for me.

@joeyballentine OK, works for me. I'll add a warning; hopefully will have it pushed by tomorrow.

@theflyingzamboni
Copy link
Collaborator

That's really interesting, it hadn't even occurred to me that there could be a bug there. Can we factor that out into its own GitHub issue so it doesn't get lost here?

Actually scratch this, I was miscalculating.

@theflyingzamboni
Copy link
Collaborator

One other suggestion: Would you change it to allow one decimal place? It would be nice to be a little more specific than a full GB.

@JeremyRand
Copy link
Contributor Author

One other suggestion: Would you change it to allow one decimal place? It would be nice to be a little more specific than a full GB.

@theflyingzamboni No objection in principle, but SettingsParser doesn't have any mechanism for retrieving float settings, so we'd have to either abuse an int or abuse a str, which would not be great for type safety. So I'd prefer to make that change in a subsequent PR once SettingsParser supports either float or decimal types.

@JeremyRand
Copy link
Contributor Author

A warning and an explanation as to why it's inaccurate would be good enough for me.

@joeyballentine Warning added, let me know if the wording needs any changes.

Copy link
Member

@joeyballentine joeyballentine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@joeyballentine joeyballentine merged commit 747a82f into chaiNNer-org:main Dec 5, 2023
14 checks passed
@JeremyRand JeremyRand deleted the budget-v2 branch December 5, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants