Memory-efficient attention (also code cleaned up and a colab added) #103

neonsecret · 2022-09-02T11:52:30Z

Uses less memory, now can generate 576x1280 images with 6 gb vram

Update license, readme, models, model card, requirements and sampling script.

Disa-Kizonda · 2022-09-03T00:55:01Z

now i can run 704x704 images on my 1050ti and 640x640 in turbo

basujindal · 2022-09-03T09:20:36Z

Hi, a big thank you for the commits, I am travelling right now, will merge these changes as soon as I get stable internet. Thanks again!

scottmudge · 2022-09-04T00:45:26Z

It would probably be better to remove the cleanups and "beautifying" commits from this PR. Along with the other unrelated changes (Add diffusers, colab notebooks, etc).

They're unrelated to the attention.py change and all the whitespace and code-rearrangements are going to make future PRs and code merges a major pain.

ZeroCool22 · 2022-09-04T02:43:17Z

We just replace the code on attention.py with the new one and that's all right?

zoru22 · 2022-09-04T03:07:38Z

LICENSE

+Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors
+
+CreativeML Open RAIL-M


From the CompVis SD github. This looks like it was merged then brought over here.

CompVis@69ae4b3

zoru22 · 2022-09-04T03:09:25Z

I would not merge this PR without combing through it. There's a lot of bullshit in here. Gotta love the Rick Astley image, which makes me wonder if the attention.py changes even work.

Unfortunately I recently accidentally nuked my main install so I can port the relevant changes over one by one, so that might be a bit :v

qdot · 2022-09-04T03:24:00Z

I would not merge this PR without combing through it. There's a lot of bullshit in here. Gotta love the Rick Astley image, which makes me wonder if the attention.py changes even work.

The license update, Rick Astley image, etc are all from the original SD github. It looks like the original commiter made their changes then rebased the CompVis repo on top of it for some reason.

Core SD rickrolls you when you trigger its NSFW filter.

https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py#L79

satvikpendem · 2022-09-04T04:17:25Z

Maybe it's better for @basujindal to merge the changes from the original SD repo first, and then @neonsecret can then add their changes as a PR? Because otherwise when going through the git history, it will seem as if @neonsecret changed the license and added a Rickroll image when in fact it was the original repo changing first. They say PRs should be clean from other stuff that's not specific to the main change being made.

neonsecret · 2022-09-04T06:00:49Z

Maybe it's better for @basujindal to merge the changes from the original SD repo first, and then @neonsecret can then add their changes as a PR? Because otherwise when going through the git history, it will seem as if @neonsecret changed the license and added a Rickroll image when in fact it was the original repo changing first. They say PRs should be clean from other stuff that's not specific to the main change being made.

well I can roll back those changes to the essential ones and then we can merge mine and then merge the original SD

noamnelke · 2022-09-04T06:55:56Z

LICENSE

+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
+3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.


sell, sell, import

typo

"offer to sell" and "sell" are distinct

This reverts commit b713734.

pastuh · 2022-09-04T06:59:29Z

Can someone explain why everywhere I see line:
demo.launch(share=True)

I don't think person by default wants to share ideas. And connect to gradio website..

So really suggestion to add only improvements, not random code :E

neonsecret · 2022-09-04T07:03:26Z

Can someone explain why everywhere I see line: demo.launch(share=True)

I don't think person by default wants to share ideas. And connect to gradio website..

it's to be able access your gradio remotely
don't worry no one will have access to it except you (or those you give the link to)

jspraul · 2022-09-04T07:29:31Z

47f8784 seems to be the entirety of tightening up memory usage, very nice!

git fetch origin pull/103/head:pull103
git cherry-pick 47f8784

If doesn't affect runtime or output (discussed elsewhere) this should probably be upstreamed.

neonsecret · 2022-09-04T07:49:38Z

up to you guys what changes to apply

Doggettx · 2022-09-04T10:37:52Z

There's another optimization you can do for less memory in there by changing

    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
    del q, k

into

    sim = einsum('b i d, b j d -> b i j', q, k)
    sim *= self.scale
    del q, k

saves quite a bit of memory

neonsecret · 2022-09-04T10:44:34Z

There's another optimization you can do for less memory in there by changing
    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
    del q, k
into
    sim = einsum('b i d, b j d -> b i j', q, k)
    sim *= self.scale
    del q, k
saves quite a bit of memory

I don't think it will be noticeable
the main problem is the einsum, I don't know how to replace it because pytorch doesn't have any alternative

Doggettx · 2022-09-04T10:53:20Z

I don't think it will be noticeable the main problem is the einsum, I don't know how to replace it because pytorch doesn't have any alternative

For me it allows me to go up a few steps in resolution, I can go completely insane if I do the the softmax even more split up

    sim[:4] = sim[:4].softmax(dim=-1)
    sim[4:8] = sim[4:8].softmax(dim=-1)
    sim[8:12] = sim[8:12].softmax(dim=-1)
    sim[12:] = sim[12:].softmax(dim=-1)

but not sure if it's always 16?

I do run on a 3090 though...

neonsecret · 2022-09-04T10:56:06Z

for me in a bigger resolution the problem is with other modules, but l dont think rewriting the entire module is possible

fpierfed · 2022-09-04T13:06:58Z

FWIW I am seeing a significant performance degradation (I cherry picked 47f8784) on a 16GB M1 MacMini: from < 3s/it to >7s/it. Others might want to double check and make sure this is not just my environment.

Disa-Kizonda · 2022-09-04T13:50:30Z

FWIW I am seeing a significant performance degradation (I cherry picked 47f8784) on a 16GB M1 MacMini: from < 3s/it to >7s/it. Others might want to double check and make sure this is not just my environment.

for me speed is the same

patrickvonplaten · 2022-09-04T14:18:28Z

ldm/modules/attention.py

-        attn = sim.softmax(dim=-1)
+        # attention, what we cannot get enough of, by halves
+        sim[4:] = sim[4:].softmax(dim=-1)
+        sim[:4] = sim[:4].softmax(dim=-1)


This leads to different similarity results though, no? -> do the outputs still look good without retraining the model?

exactly the same

neonsecret · 2022-09-04T14:34:22Z

okay guys you know what, ima do a separate branch with only attention change and do a PR for it separately

neonsecret · 2022-09-04T14:39:32Z

#117

neonsecret · 2022-09-04T14:39:46Z

closing this because #117 will be more relevant

TheEnhas · 2022-09-04T15:49:55Z

The inpaint mask fix should have a pull request too IMO, that's fairly important

tusharbhutt · 2022-09-04T15:53:56Z

Apologies... where is the Colab?

tusharbhutt · 2022-09-04T18:32:30Z

Oh, I tried that (I thought that the basujindal fork was a Colab now) and it fails to launch an Xterm window at the end of setting up the Colab.

scottmudge · 2022-09-04T19:25:42Z

sim = einsum('b i d, b j d -> b i j', q, k)
sim *= self.scale
del q, k

I just tried this and unfortunately observed no changes to memory utilization, and a very slight decrease in performance.

neonsecret and others added 8 commits September 2, 2022 13:51

Update attention.py

47f8784

various clean-ups, code now beautified

7e80056

various clean-ups, code now beautified

d9c9205

Add diffusers as a way to inference in the model

0ef7c71

Release under CreativeML Open RAIL M License

b713734

Update license, readme, models, model card, requirements and sampling script.

colab added

8c49d1a

colab added

0ff8e62

colab download option added

bac8f18

Update optimized_colab.ipynb

b518ffd

Update README.md

c128b11

zoru22 reviewed Sep 4, 2022

View reviewed changes

noamnelke reviewed Sep 4, 2022

View reviewed changes

Revert "Release under CreativeML Open RAIL M License"

666cd36

This reverts commit b713734.

neonsecret changed the title ~~Memory-efficient attention.py~~ Memory-efficient attention (also code cleaned up and a colab added) Sep 4, 2022

inpaint gradio mask mode fixed

ddde264

xueyuanl mentioned this pull request Sep 4, 2022

Daily Hacker News 04-09-2022 xueyuanl/daily-hackernews#729

Open

mgcrea mentioned this pull request Sep 4, 2022

Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM invoke-ai/InvokeAI#364

Closed

patrickvonplaten reviewed Sep 4, 2022

View reviewed changes

neonsecret closed this Sep 4, 2022

AmericanPresidentJimmyCarter mentioned this pull request Sep 4, 2022

optimized attention CompVis/stable-diffusion#177

Open

bryanlyon pushed a commit to bryanlyon/Dreambooth-Stable-Diffusion that referenced this pull request Sep 20, 2022

Added lower memory attention from basujindal/stable-diffusion#103

a532a7e

PaintOnBrush mentioned this pull request Sep 21, 2022

RuntimeError: expected scalar type Half but found Float carson-katri/dream-textures#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory-efficient attention (also code cleaned up and a colab added) #103

Memory-efficient attention (also code cleaned up and a colab added) #103

neonsecret commented Sep 2, 2022

Disa-Kizonda commented Sep 3, 2022 •

edited

Loading

basujindal commented Sep 3, 2022

scottmudge commented Sep 4, 2022

ZeroCool22 commented Sep 4, 2022

zoru22 Sep 4, 2022

qdot Sep 4, 2022

zoru22 commented Sep 4, 2022 •

edited

Loading

qdot commented Sep 4, 2022 •

edited

Loading

satvikpendem commented Sep 4, 2022

neonsecret commented Sep 4, 2022

noamnelke Sep 4, 2022

pastuh commented Sep 4, 2022 •

edited

Loading

neonsecret commented Sep 4, 2022

jspraul commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

fpierfed commented Sep 4, 2022

Disa-Kizonda commented Sep 4, 2022

patrickvonplaten Sep 4, 2022 •

edited

Loading

neonsecret Sep 4, 2022

neonsecret commented Sep 4, 2022

neonsecret commented Sep 4, 2022

neonsecret commented Sep 4, 2022 •

edited

Loading

TheEnhas commented Sep 4, 2022 •

edited

Loading

tusharbhutt commented Sep 4, 2022

tusharbhutt commented Sep 4, 2022

scottmudge commented Sep 4, 2022

		Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors

		CreativeML Open RAIL-M


		2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
		3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.

Memory-efficient attention (also code cleaned up and a colab added) #103

Memory-efficient attention (also code cleaned up and a colab added) #103

Conversation

neonsecret commented Sep 2, 2022

Disa-Kizonda commented Sep 3, 2022 • edited Loading

basujindal commented Sep 3, 2022

scottmudge commented Sep 4, 2022

ZeroCool22 commented Sep 4, 2022

zoru22 Sep 4, 2022

Choose a reason for hiding this comment

qdot Sep 4, 2022

Choose a reason for hiding this comment

zoru22 commented Sep 4, 2022 • edited Loading

qdot commented Sep 4, 2022 • edited Loading

satvikpendem commented Sep 4, 2022

neonsecret commented Sep 4, 2022

noamnelke Sep 4, 2022

Choose a reason for hiding this comment

pastuh commented Sep 4, 2022 • edited Loading

neonsecret commented Sep 4, 2022

jspraul commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

fpierfed commented Sep 4, 2022

Disa-Kizonda commented Sep 4, 2022

patrickvonplaten Sep 4, 2022 • edited Loading

Choose a reason for hiding this comment

neonsecret Sep 4, 2022

Choose a reason for hiding this comment

neonsecret commented Sep 4, 2022

neonsecret commented Sep 4, 2022

neonsecret commented Sep 4, 2022 • edited Loading

TheEnhas commented Sep 4, 2022 • edited Loading

tusharbhutt commented Sep 4, 2022

tusharbhutt commented Sep 4, 2022

scottmudge commented Sep 4, 2022

Disa-Kizonda commented Sep 3, 2022 •

edited

Loading

zoru22 commented Sep 4, 2022 •

edited

Loading

qdot commented Sep 4, 2022 •

edited

Loading

pastuh commented Sep 4, 2022 •

edited

Loading

patrickvonplaten Sep 4, 2022 •

edited

Loading

neonsecret commented Sep 4, 2022 •

edited

Loading

TheEnhas commented Sep 4, 2022 •

edited

Loading