-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some feedback #6
Comments
Thanks for your feedback. I tried AnimateDiff according to https://github.com/guoyww/AnimateDiff. The following results without pick: FreeU (different factors): |
Maybe it works better for cartoon-style images and animations because they naturally lack high frequencies... What were the factors that you used in the above? I can try with anime models. |
This is just a simple attempt according to the readme of AnimateDiff. We will be providing more results on the FreeU page and paper. We appreciate your continued interest. |
@rkfg I also had poor results with FreeU, and then I started switching s1/s2/b1/b2 back to 1.0 at some point during the denoising process. The global features seem to be mostly settled early so that you can transition back to normal values between about 30% and 75% of the way through your steps. And the results are greatly improved. I've tested it thoroughly, and most of the FreeU "fixes" are kept while still letting the fine details shine through at the end. This is basically what I'm doing: steps = 30
unet.freeu.sd21()
def cb(step, _, __):
if step == int(steps * 0.5):
unet.freeu.ones()
output = pipe(prompt, num_inference_steps=steps, callback=cb) |
Mario is noticeably improved, yes, but I prefer the vanilla waterfall, it's more detailed and interesting even though the contrast is a bit lower. SD 2.1 isn't that good in general, even with finetunes. Can you try on the best 1.5 models? Both cartoon and realistic? Would be interesting to see if this method can improve the output over what we can get without it. |
Can you share full code snippet? |
@ChenyangSi , @rkfg |
I describe my changes in the diffusers repo: huggingface/diffusers#5164 (comment) |
To fix this you can just cast x to float for this operation in the first line of the Fourier_filter function |
@ChenyangSi Hi, can you share how to add FreeU code in T2V, like AnimateDiff? |
ykk648/AnimateDiff-I2V@0842585 |
@ykk648 Thank u for your codes |
Disclaimer: I'm not an AI researcher so I could've done something wrong.
So I got it working after some minor changes and import fixes. Now when I run it with 512x768 resolution the issue is:
RuntimeError: cuFFT only supports dimensions whose sizes are powers of two when computing in half precision, but got a signal size of[12, 8]
I suppose that's the corresponding layer size which becomes rectangular because of the base resolution being like that. 512x512 should result in a [8, 8] array here and it all works fine.As expected, running A1111 with
--no-half
makes it work but the speed is much worse.I used the parameters for SD1.4 and simply hardcoded them to quickly test if it works at all. On a fine tuned model
epiCRealism naturalSin
FreeU makes the images worse: they become more saturated, the skin texture turns into plastic (maybe because we suppress the high frequency features exactly?). It starts looking more like the base models or the early fine tuned models:Original:
![1](https://private-user-images.githubusercontent.com/184066/270100262-a2f7e8bb-a49e-4bff-8182-1019f833f004.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1OTE2MzksIm5iZiI6MTcyMTU5MTMzOSwicGF0aCI6Ii8xODQwNjYvMjcwMTAwMjYyLWEyZjdlOGJiLWE0OWUtNGJmZi04MTgyLTEwMTlmODMzZjAwNC5qcGc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMVQxOTQ4NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNmQyMzdkMGMwZDgxMTJlYjFkNTlkNzBkNDU1ZWUwYjQzMWE2ZGQ1N2U1ZjA4MjA5YjVjN2MyZTI3N2FlOTUwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.feix-5KJ1bstAfo6_LR5bbU1UI73ms6PXAh0wcQKQew)
FreeU:
![2](https://private-user-images.githubusercontent.com/184066/270100271-98b89d50-0354-46d8-a2cc-f3c31929b6f3.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1OTE2MzksIm5iZiI6MTcyMTU5MTMzOSwicGF0aCI6Ii8xODQwNjYvMjcwMTAwMjcxLTk4Yjg5ZDUwLTAzNTQtNDZkOC1hMmNjLWYzYzMxOTI5YjZmMy5qcGc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMVQxOTQ4NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hODQ0M2YyODAwMjA0ZWM2NzhhMzJjMDFjNTdlYjEyZWZlMWFlYzUyZGYyYzFkYTVlY2JiOTM4YWQyZjM5MzAxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.p0wVJ_LX4KyMUvKwLH3jjFCVYTJAA4PfWW0awSOLaro)
AnimateDiff doesn't seem to work in
--no-half
mode, throws a CUDA error. So we're limited by 512x512. Same symptoms of oversaturation, the skin quality doesn't apply due to grain and artifacts. However, FreeU added a third hand. I tried two slightly different prompts.Original:
![02086-751880090](https://private-user-images.githubusercontent.com/184066/270101034-ec3f1532-ac8e-44d0-be9d-4acc7d7b5bde.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1OTE2MzksIm5iZiI6MTcyMTU5MTMzOSwicGF0aCI6Ii8xODQwNjYvMjcwMTAxMDM0LWVjM2YxNTMyLWFjOGUtNDRkMC1iZTlkLTRhY2M3ZDdiNWJkZS5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMVQxOTQ4NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mZDAwZTRjZGRlYWNhOWMwMGM4NDk0N2YyYjI1YmYzODczYWMxMzBlYmNkMGI3ZGQ0NGU4ODg5M2Q3YjQ3YTI3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Dr6xXLwF3qozS4w8RMTx0lw9lbXatCf__8mtOgZidQE)
![02085-751880090](https://private-user-images.githubusercontent.com/184066/270101036-bccb06bd-2667-40d8-b164-41a0b3d1b834.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE1OTE2MzksIm5iZiI6MTcyMTU5MTMzOSwicGF0aCI6Ii8xODQwNjYvMjcwMTAxMDM2LWJjY2IwNmJkLTI2NjctNDBkOC1iMTY0LTQxYTBiM2QxYjgzNC5naWY_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMVQxOTQ4NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03NTE1NDEyNjViZGM2OWVhYTkwNTZmZTIyMTM5M2E3ODY4ZDA3ZTI4Mjc2NDQzNzU2ZmUyM2UxYTE5MzA0NGU3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Z5mxDe4SUAEj8mT3_vOThDx-lNnxUcllM7R1aPXzfQM)
FreeU:
The faces are garbled in all cases but to be honest I much prefer the results without FreeU. The colors are better, the anatomy is better, the skirt is much more detailed and moves more naturally.
My patch, applied to
stable-diffusion-webui/repositories/stable-diffusion-stability-ai
:The simplest way to switch between FreeU and vanilla is to change
if True:
toif False:
. Again, it's just a hack to test if it works.In conclusion, if everything is correct on my end, it's probably not worth it for the best fine tuned models. On the opposite, to make it work in all cases you have to run it in full 32 bit resolution at ≈3x slowdown and get images that look worse than without it. The base models sure benefit from it but honestly, who uses them except the researchers and LoRA trainers?
I hope I did a mistake somewhere so these results are all wrong. After all, I just copied the part that differs from the original code and fixed the errors to make it work, but who knows.
The text was updated successfully, but these errors were encountered: