Batch Size Optimization #486

NullSenseStudio · 2022-12-29T04:01:26Z

I've seen great speed improvement when generating multiple images in batches. So far I've tested batch sizes of 2-5 and reduced run time by 26-39% using a GTX 1070 along with the default half precision and attention slicing optimizations.

It's implemented as a new speed optimization option that'll only go into effect if iterations or file batch are used. It'll split up the requested images into as many full batches that it can and do a smaller batch on any remaining at the end.

Progress

Issues

This feature may only be suitable for CUDA users right now.
https://huggingface.co/docs/diffusers/optimization/mps#known-issues

Generating multiple prompts in a batch crashes or doesn’t work reliably. We believe this is related to the mps backend in PyTorch. This is being resolved, but for now we recommend to iterate instead of batching.

Only other downsides are extra memory usage and results aren't entirely deterministic.

Testing indicates that the deterministic nature isn't affected by the seeds or prompts of other images in the same batch. The same batch size with the same settings for any single or multiple images in it will result in the same image, only the batch size amount is causing the odd behavior. I'm uncertain if this is a bug in diffusers or this implementation yet.

NullSenseStudio · 2022-12-30T23:58:14Z

I've found that VAE slicing will allow for a much higher batch size, and I've got more data on how much less time is required for certain batch sizes on a GTX 1070.

Denoising is strictly the time used for the denoising loop, whereas total also includes VAE decoding. Time saving might even be a little higher if I'd factor in the full round trip time from requesting images to them being available on the frontend.

All testing was done with VAE slicing enabled. VAE slicing disabled was actually showing worse performance just with a batch size of 4, but I assume it could be better for cards with more VRAM (though a larger batch size will likely see more time saving than a small enough batch size that allows VAE slicing to be disabled).

I've expanded the scope of this PR to include an optimization toggle for VAE slicing and will also attempt to implement batching in upscaling since it requires so many tiles.

NullSenseStudio · 2023-01-02T03:59:05Z

Seeing similar gains in upscaling but not as sharp of a performance increase as prompt_to_image.

Using the same optimization settings and the default 128 tile size.

The VAE only runs in float32 and seems to be the main reason for me to not be able to achieve as high of a batch size. I can't even bump the tile size up to 192 without running out of VRAM when decoding latents.

NullSenseStudio · 2023-01-05T00:38:26Z

Just found out in image_to_image.py the line image = init_image or image is caching the init_image variable when first called and continues to use that regardless of what image is used. Didn't realize I'd accidentally fix a bug when removing it.

NullSenseStudio · 2023-01-06T23:02:24Z

I'm having trouble trying to implement batching for Stability SDK properly. It's able to accept a list of prompts, but I'm only able to get it to generate images based off the first one. I'll just leave it as simple iteration.

carson-katri

Overall looks good, and it also worked on mps. The only thing to look out for is huggingface/diffusers#1909, some schedulers don't handle a list of generators properly yet. Could it be adjusted to pass a single generator if the batch size is 1?

NullSenseStudio · 2023-01-09T18:28:56Z

I believe that's all the necessary changes. I'll also note that upscaling doesn't use a list of generators when batching like other actions since I don't think seed reproducibility is as necessary there.

* proof of concept * other actions * add package release workflow * Update .github/workflows/package-release.yml Co-authored-by: Carson Katri <Carson.katri@gmail.com> * fix seamless axes * half precision * fix model download * half precision fixes inpainting, upscaling, depth with color * more half precision fixes * modify revision selection * remove obsolete patches * remove #486 upscale timing * validate snapshot before selection Co-Authored-By: Carson Katri <Carson.katri@gmail.com> * tqdm update missing in pipelines * fix upscaling non-square images fixes #528 --------- Co-authored-by: Carson Katri <carson.katri@gmail.com>

NullSenseStudio added 2 commits December 28, 2022 14:42

fix iterations and file batch

5f6e3ba

batch size initial

fd33a01

NullSenseStudio added the enhancement New feature or request label Dec 29, 2022

NullSenseStudio added 3 commits December 29, 2022 16:41

tiled batch preview

c5128be

other actions

995f01b

outpaint batch preview

5ee2dfe

NullSenseStudio added 2 commits January 1, 2023 15:57

Merge remote-tracking branch 'origin' into iterations-fix

e5f4591

support seamless detected result in history

6e77e1a

NullSenseStudio force-pushed the batch-size branch from 7872199 to 5ee2dfe Compare January 1, 2023 23:25

NullSenseStudio added 3 commits January 1, 2023 18:39

Merge branch 'iterations-fix' into batch-size

50d3fac

vae slicing optimization

481315a

upscale batch

b805a70

improve texture node positioning

2b06798

NullSenseStudio linked an issue Jan 5, 2023 that may be closed by this pull request

Compositing Dream Textures renders the same frame over and over again (animation and image). #483

Closed

update stability sdk note

e8cd5a0

NullSenseStudio marked this pull request as ready for review January 6, 2023 23:03

NullSenseStudio requested a review from carson-katri January 6, 2023 23:03

carson-katri changed the base branch from iterations-fix to main January 8, 2023 15:44

carson-katri requested changes Jan 8, 2023

View reviewed changes

NullSenseStudio added 2 commits January 9, 2023 12:49

Merge branch 'main' into batch-size

7a39192

don't use a list of generators when batch size is 1

9600702

NullSenseStudio requested a review from carson-katri January 9, 2023 18:29

carson-katri merged commit f70c30d into main Jan 13, 2023

carson-katri added this to the v0.0.10 milestone Jan 15, 2023

NullSenseStudio deleted the batch-size branch January 15, 2023 04:20

NullSenseStudio added a commit that referenced this pull request Jan 22, 2023

remove #486 upscale timing

20c333b

NullSenseStudio mentioned this pull request Feb 2, 2023

Negative Prompt Broken In dream_textures.py (on 279f163) #524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Size Optimization #486

Batch Size Optimization #486

NullSenseStudio commented Dec 29, 2022 •

edited

NullSenseStudio commented Dec 30, 2022

NullSenseStudio commented Jan 2, 2023

NullSenseStudio commented Jan 5, 2023

NullSenseStudio commented Jan 6, 2023

carson-katri left a comment

NullSenseStudio commented Jan 9, 2023

Batch Size Optimization #486

Batch Size Optimization #486

Conversation

NullSenseStudio commented Dec 29, 2022 • edited

Progress

Issues

NullSenseStudio commented Dec 30, 2022

NullSenseStudio commented Jan 2, 2023

NullSenseStudio commented Jan 5, 2023

NullSenseStudio commented Jan 6, 2023

carson-katri left a comment

Choose a reason for hiding this comment

NullSenseStudio commented Jan 9, 2023

NullSenseStudio commented Dec 29, 2022 •

edited