Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Size Optimization #486

Merged
merged 14 commits into from
Jan 13, 2023
Merged

Batch Size Optimization #486

merged 14 commits into from
Jan 13, 2023

Conversation

NullSenseStudio
Copy link
Collaborator

@NullSenseStudio NullSenseStudio commented Dec 29, 2022

I've seen great speed improvement when generating multiple images in batches. So far I've tested batch sizes of 2-5 and reduced run time by 26-39% using a GTX 1070 along with the default half precision and attention slicing optimizations.

It's implemented as a new speed optimization option that'll only go into effect if iterations or file batch are used. It'll split up the requested images into as many full batches that it can and do a smaller batch on any remaining at the end.

Progress

  • prompt_to_image
  • image_to_image
  • inpaint
  • outpaint
  • depth_to_image
  • tiled step preview
  • Stability SDK
  • VAE slicing
  • upscale (waiting for Automatic Seamless Detection #410 first)

Issues

This feature may only be suitable for CUDA users right now.
https://huggingface.co/docs/diffusers/optimization/mps#known-issues

Generating multiple prompts in a batch crashes or doesn’t work reliably. We believe this is related to the mps backend in PyTorch. This is being resolved, but for now we recommend to iterate instead of batching.

Only other downsides are extra memory usage and results aren't entirely deterministic.
brick
Testing indicates that the deterministic nature isn't affected by the seeds or prompts of other images in the same batch. The same batch size with the same settings for any single or multiple images in it will result in the same image, only the batch size amount is causing the odd behavior. I'm uncertain if this is a bug in diffusers or this implementation yet.

@NullSenseStudio NullSenseStudio added the enhancement New feature or request label Dec 29, 2022
@NullSenseStudio
Copy link
Collaborator Author

I've found that VAE slicing will allow for a much higher batch size, and I've got more data on how much less time is required for certain batch sizes on a GTX 1070.
image
Denoising is strictly the time used for the denoising loop, whereas total also includes VAE decoding. Time saving might even be a little higher if I'd factor in the full round trip time from requesting images to them being available on the frontend.

All testing was done with VAE slicing enabled. VAE slicing disabled was actually showing worse performance just with a batch size of 4, but I assume it could be better for cards with more VRAM (though a larger batch size will likely see more time saving than a small enough batch size that allows VAE slicing to be disabled).

I've expanded the scope of this PR to include an optimization toggle for VAE slicing and will also attempt to implement batching in upscaling since it requires so many tiles.

@NullSenseStudio
Copy link
Collaborator Author

Seeing similar gains in upscaling but not as sharp of a performance increase as prompt_to_image.
image
Using the same optimization settings and the default 128 tile size.

The VAE only runs in float32 and seems to be the main reason for me to not be able to achieve as high of a batch size. I can't even bump the tile size up to 192 without running out of VRAM when decoding latents.

@NullSenseStudio
Copy link
Collaborator Author

Just found out in image_to_image.py the line image = init_image or image is caching the init_image variable when first called and continues to use that regardless of what image is used. Didn't realize I'd accidentally fix a bug when removing it.

@NullSenseStudio
Copy link
Collaborator Author

I'm having trouble trying to implement batching for Stability SDK properly. It's able to accept a list of prompts, but I'm only able to get it to generate images based off the first one. I'll just leave it as simple iteration.

@NullSenseStudio NullSenseStudio marked this pull request as ready for review January 6, 2023 23:03
@carson-katri carson-katri changed the base branch from iterations-fix to main January 8, 2023 15:44
Copy link
Owner

@carson-katri carson-katri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, and it also worked on mps. The only thing to look out for is huggingface/diffusers#1909, some schedulers don't handle a list of generators properly yet. Could it be adjusted to pass a single generator if the batch size is 1?

@NullSenseStudio
Copy link
Collaborator Author

I believe that's all the necessary changes. I'll also note that upscaling doesn't use a list of generators when batching like other actions since I don't think seed reproducibility is as necessary there.

@carson-katri carson-katri merged commit f70c30d into main Jan 13, 2023
@carson-katri carson-katri added this to the v0.0.10 milestone Jan 15, 2023
@NullSenseStudio NullSenseStudio deleted the batch-size branch January 15, 2023 04:20
NullSenseStudio added a commit that referenced this pull request Jan 22, 2023
carson-katri added a commit that referenced this pull request Jan 30, 2023
* proof of concept

* other actions

* add package release workflow

* Update .github/workflows/package-release.yml

Co-authored-by: Carson Katri <Carson.katri@gmail.com>

* fix seamless axes

* half precision

* fix model download

* half precision fixes

inpainting, upscaling, depth with color

* more half precision fixes

* modify revision selection

* remove obsolete patches

* remove #486 upscale timing

* validate snapshot before selection

Co-Authored-By: Carson Katri <Carson.katri@gmail.com>

* tqdm update missing in pipelines

* fix upscaling non-square images

fixes #528

---------

Co-authored-by: Carson Katri <carson.katri@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compositing Dream Textures renders the same frame over and over again (animation and image).
2 participants