Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory-efficient attention (also code cleaned up and a colab added) #103

Closed
wants to merge 12 commits into from

Conversation

neonsecret
Copy link

Uses less memory, now can generate 576x1280 images with 6 gb vram

@Disa-Kizonda
Copy link

Disa-Kizonda commented Sep 3, 2022

now i can run 704x704 images on my 1050ti and 640x640 in turbo

@basujindal
Copy link
Owner

Hi, a big thank you for the commits, I am travelling right now, will merge these changes as soon as I get stable internet. Thanks again!

@scottmudge
Copy link

It would probably be better to remove the cleanups and "beautifying" commits from this PR. Along with the other unrelated changes (Add diffusers, colab notebooks, etc).

They're unrelated to the attention.py change and all the whitespace and code-rearrangements are going to make future PRs and code merges a major pain.

@ZeroCool22
Copy link

We just replace the code on attention.py with the new one and that's all right?

LICENSE Outdated
Copyright (c) 2022 Robin Rombach and Patrick Esser and contributors

CreativeML Open RAIL-M
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?????

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the CompVis SD github. This looks like it was merged then brought over here.

CompVis@69ae4b3

@zoru22
Copy link

zoru22 commented Sep 4, 2022

I would not merge this PR without combing through it. There's a lot of bullshit in here. Gotta love the Rick Astley image, which makes me wonder if the attention.py changes even work.

Unfortunately I recently accidentally nuked my main install so I can port the relevant changes over one by one, so that might be a bit :v

@qdot
Copy link

qdot commented Sep 4, 2022

I would not merge this PR without combing through it. There's a lot of bullshit in here. Gotta love the Rick Astley image, which makes me wonder if the attention.py changes even work.

The license update, Rick Astley image, etc are all from the original SD github. It looks like the original commiter made their changes then rebased the CompVis repo on top of it for some reason.

Core SD rickrolls you when you trigger its NSFW filter.

https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py#L79

@satvikpendem
Copy link

Maybe it's better for @basujindal to merge the changes from the original SD repo first, and then @neonsecret can then add their changes as a PR? Because otherwise when going through the git history, it will seem as if @neonsecret changed the license and added a Rickroll image when in fact it was the original repo changing first. They say PRs should be clean from other stuff that's not specific to the main change being made.

@neonsecret
Copy link
Author

Maybe it's better for @basujindal to merge the changes from the original SD repo first, and then @neonsecret can then add their changes as a PR? Because otherwise when going through the git history, it will seem as if @neonsecret changed the license and added a Rickroll image when in fact it was the original repo changing first. They say PRs should be clean from other stuff that's not specific to the main change being made.

well I can roll back those changes to the essential ones and then we can merge mine and then merge the original SD

LICENSE Outdated

2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Complementary Material, the Model, and Derivatives of the Model.
3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution incorporated within the Model and/or Complementary Material constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Work shall terminate as of the date such litigation is asserted or filed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sell, sell, import

typo

"offer to sell" and "sell" are distinct

@pastuh
Copy link

pastuh commented Sep 4, 2022

Can someone explain why everywhere I see line:
demo.launch(share=True)

I don't think person by default wants to share ideas. And connect to gradio website..

So really suggestion to add only improvements, not random code :E

@neonsecret
Copy link
Author

Can someone explain why everywhere I see line: demo.launch(share=True)

I don't think person by default wants to share ideas. And connect to gradio website..

it's to be able access your gradio remotely
don't worry no one will have access to it except you (or those you give the link to)

@neonsecret neonsecret changed the title Memory-efficient attention.py Memory-efficient attention (also code cleaned up and a colab added) Sep 4, 2022
@jspraul
Copy link

jspraul commented Sep 4, 2022

47f8784 seems to be the entirety of tightening up memory usage, very nice!

git fetch origin pull/103/head:pull103
git cherry-pick 47f8784

If doesn't affect runtime or output (discussed elsewhere) this should probably be upstreamed.

@neonsecret
Copy link
Author

up to you guys what changes to apply

@Doggettx
Copy link

Doggettx commented Sep 4, 2022

There's another optimization you can do for less memory in there by changing

    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
    del q, k

into

    sim = einsum('b i d, b j d -> b i j', q, k)
    sim *= self.scale
    del q, k

saves quite a bit of memory

@neonsecret
Copy link
Author

There's another optimization you can do for less memory in there by changing

    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
    del q, k

into

    sim = einsum('b i d, b j d -> b i j', q, k)
    sim *= self.scale
    del q, k

saves quite a bit of memory

I don't think it will be noticeable
the main problem is the einsum, I don't know how to replace it because pytorch doesn't have any alternative

@Doggettx
Copy link

Doggettx commented Sep 4, 2022

I don't think it will be noticeable the main problem is the einsum, I don't know how to replace it because pytorch doesn't have any alternative

For me it allows me to go up a few steps in resolution, I can go completely insane if I do the the softmax even more split up

    sim[:4] = sim[:4].softmax(dim=-1)
    sim[4:8] = sim[4:8].softmax(dim=-1)
    sim[8:12] = sim[8:12].softmax(dim=-1)
    sim[12:] = sim[12:].softmax(dim=-1)

but not sure if it's always 16?

I do run on a 3090 though...

@neonsecret
Copy link
Author

for me in a bigger resolution the problem is with other modules, but l dont think rewriting the entire module is possible

@fpierfed
Copy link

fpierfed commented Sep 4, 2022

FWIW I am seeing a significant performance degradation (I cherry picked 47f8784) on a 16GB M1 MacMini: from < 3s/it to >7s/it. Others might want to double check and make sure this is not just my environment.

@Disa-Kizonda
Copy link

FWIW I am seeing a significant performance degradation (I cherry picked 47f8784) on a 16GB M1 MacMini: from < 3s/it to >7s/it. Others might want to double check and make sure this is not just my environment.

for me speed is the same

attn = sim.softmax(dim=-1)
# attention, what we cannot get enough of, by halves
sim[4:] = sim[4:].softmax(dim=-1)
sim[:4] = sim[:4].softmax(dim=-1)
Copy link

@patrickvonplaten patrickvonplaten Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to different similarity results though, no? -> do the outputs still look good without retraining the model?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly the same

@neonsecret
Copy link
Author

okay guys you know what, ima do a separate branch with only attention change and do a PR for it separately

@neonsecret
Copy link
Author

#117

@neonsecret
Copy link
Author

neonsecret commented Sep 4, 2022

closing this because #117 will be more relevant

@neonsecret neonsecret closed this Sep 4, 2022
@TheEnhas
Copy link

TheEnhas commented Sep 4, 2022

The inpaint mask fix should have a pull request too IMO, that's fairly important

@tusharbhutt
Copy link

Apologies... where is the Colab?

@tusharbhutt
Copy link

Oh, I tried that (I thought that the basujindal fork was a Colab now) and it fails to launch an Xterm window at the end of setting up the Colab.

@scottmudge
Copy link

sim = einsum('b i d, b j d -> b i j', q, k)
sim *= self.scale
del q, k

I just tried this and unfortunately observed no changes to memory utilization, and a very slight decrease in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet