-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable Cascade #2785
Comments
new stability model wondering if we could get native comfyui support for it |
I'm waiting for support too! Thanks |
It's inevitably gonna be supported, just be patient.
It is obviously exciting enough that it will be supported soon. Just be very patient. It takes a while to analyze the new architecture, creating new nodes, and figuring it all out. Let's not rush or demand anything! 😉 By the way, Stable Cascade isn't even finished yet. It's still in "early development" research/training. Their codebase is changing constantly. The large model currently uses 20 GB VRAM (!) and they think they can optimize this to 1/2 or 1/3 the usage when they are done. Furthermore, the diffusers library isn't even ready, and the code is far from reaching the quality needed for merging. So let's breathe. Update: One of the Stability employees commented on Reddit that Comfy will have support in a week or two. |
Well. There's a basic node which doesn't implement anything and just uses the official code and wraps it in a ComfyUI node. No interesting support for anything special like controlnets, prompt conditioning or anything else really. It's just basic wrapper for some prompt strings and a seed. It doesn't support model loading or unloading, so it will hog your VRAM. I also see that it has a bug (loads one of the models as float16 instead of bfloat16). But hopefully still good enough for impatient people like @GaleBoro, while waiting. |
Alright. Here's another option to experiment with it locally in the meantime. It's Pinokio's tweak of the unofficial gradio web UI, and has removed the need for huggingface token / login. Needs: Python 3.11. The two largest models need ~15.5 GB of VRAM at 1024x1024, or ~18.0-20.0 GB of VRAM on 1536x1536 with bfloat16 model format. git clone https://huggingface.co/spaces/cocktailpeanut/stable-cascade pinokio
cd pinokio
python -m venv .venv
. .venv/bin/activate
pip install gradio spaces itsdangerous
pip install -r requirements.txt
python app.py |
Even with Comfy UI vram optimizations, it seems that unless you have a GPU that supports bfloat16 or has 16 GB vram, you can only use the "lite" version of the "Stage C" model, right ? https://huggingface.co/stabilityai/stable-cascade/tree/main edit: Apparently, it could theoretically be possible to split the model for VRAM optimization. https://news.ycombinator.com/item?id=39360106
|
@JorgeR81 The CEO of Stability has made statements that SD1.0 originally used more than 20 GB VRAM too. And that they are confident that they can reduce Stable Cascade to 1/2 or 1/3 of the current VRAM requirements. He didn't go into detail what techniques they'd use to achieve that, but it sounds good to me, because the 20 GB VRAM usage right now is very painful. Edit: The statements. Take them with a big grain of salt. But their hope is to reach 8 GB VRAM usage.
Edit: And another statement that I found really interesting. Saying that RTX 40-series and newer cards will become increasingly required for future AI networks, because they support fp8 and optical flow: |
I think it's worth waiting for the official release from comfyanonymous, but there is the huggingface space, this colab from Camenduru (Gradio UI, runs on T4 - 80s for 1 image, slow): https://github.com/camenduru/stable-cascade-jupyter & another one i put together on launch day (runs on A100 - 14 seconds for 4 images, fast): https://github.com/MushroomFleet/StableCascade-text2image personally i can't wait to get this into Comfy, and there is a diffusers custom node here, if you can't wait! |
It's implemented in the main repo now, you can use this workflow until I write an examples page for it. https://gist.github.com/comfyanonymous/0f09119a342d0dd825bb2d99d19b781c |
stage、stage_b16、stage_lite、stage_lite_b16,Is there a big gap between them when it comes to generating images? |
I get the following error, that's weird. Edit: Solved! stage_c wasn't loaded properly. |
Is there any workflow example for img-to-img or controlnet? |
Getting an error when running the example workflow for Stable Cascade with both bf16 and lite models:
Might be a problem on my end since I am running on AMD ROCm but wanted to leave this here if it crops up anywhere else. Setting the clip type in "Load CLIP" to |
Guys, sorry, where to get a proper CLIP G SDXL BF16? |
Rename "model.safetensors" to "clip_g_sdxl.fp16.safetensors" or simply select "model.safetensors". |
I'm using the full size models, but it's worth trying the [stage_c_lite] as well, since the results are very different, like a different model. The bf16 models look the same, as the regular models. The full size models are usable even on a GTX 1070 (8GB). stage_c ( default workflow settings ) stage_c_lite ( same seed ) stage_c_lite ( better seed ) |
Thank you! |
This should be fixed now. |
Thanks for the insanely fast turnaround. Works like a charm now. |
The [stage c] steps can be reduced from 20 to 10, at least with my simple test prompt, for a generation time of 84 sec. stage_c ( 10 steps ) ( same seed as the first image in the post above ) stage_c ( 10 steps ) |
Yes, I also noticed that skin detail is not as good as the latest SDXL community finetunes, but I'm sure this will be improved. In the huggingface model card it's stated one of the "Limitations" as "Faces and people in general may not be generated properly", so this is already much better than what I was expecting. The only downside is that a single set of cascade models at full size is 20 GB ! |
on amd directml mode and cpu mode i get the error unet_dtype() got an unexpected keyword argument 'supported_dtypes' File "E:\AI\ComfyUI\execution.py", line 152, in recursive_execute |
ah it was some of my custom nodes, idk which one(s) tho lol |
@CraftMaster163 Did you update Comfy? This issue is supposedly fixed with a recent commit (to this specified custom node). Edit: Seems like I indeed misread the issue I mentioned. Looks like beyond main Comfy any modules replacing the affected modules will have to be updated, too. |
i updated my comfyui. got to update nodes too tho |
When we start getting stable cascade models from the community, I'm going to run out of space very fast, on my main drive. But, by editing the [ extra_model_paths.yaml ], I was able to load the models in the "unet" folder, from another drive in my PC. https://github.com/comfyanonymous/ComfyUI/blob/master/extra_model_paths.yaml.example Instructions are in this discussion: The only new step is to add the "unet" folder to the example provided, like this:
If it works you should see extras lines in the cmd line, for each new path, when opening Comfy UI, like this: "Adding extra search path unet d:/ComfyUI/models/unet/" |
SDXL is broken now for some reason. Stable Cascade is working.
|
There was a small fix to the stage b sampling, if you update ComfyUI and use the updated workflows: You should see a small quality increase. |
I noticed the img2img workflow uses less compression by default. This makes sense. Since the model has a picture as reference, we are less likely to have distortions, due to low compression, when changing the aspect ratio ( e.g. a long neck in portrait aspect ratio ). So when can have the benefits of lower compression, like more crip, images, without the drawbacks. |
Yes, in the new version, I noticed that the dark circumference around the iris is slimmer, which is more realistic. But, Was there also a change on the shift parameter ? After this update, when I use the shift node on Stage C, the image is very different. I managed to get a more similar look with shift = 3.2, but still quite different. The braid is gone now. So we just need to play with the shift value, or perhaps is another setting you can add to the node ? EDIT : It was this commit: c6b7a15 I made the code changes manually, and I can confirm it was this. This commit seems to affect the image much more, when the shift parameter is changed. The differences created by this commit, when the shift parameter is changed, are comparable to the differences between some of the current scheduler and samplers. These are 10 + 10 steps: |
How are the comfyui checkpoints different from the others? others: Do the comfyUI checkpoints include the VAE? |
Is super resolution controlnet supported yet? |
Hi guys, I updated ComfyUI, I downloaded both models but when I run the default example from stable_cascade__text_to_image.png ComfyUI shows this error:
Can you help me please? |
About the comfyui checkpoints, looking at the examples I presume they contain the following: stable_cascade_stage_c.safetensors
stable_cascade_stage_b.safetensors
|
hey, getting the same message (along with a lot of left over keys) but everything seems to work fine. The clip missing is probably from loading stage b since it does not have clip embedded |
Got it working, I used the workflow below: |
The Inspire Pack has a custom loader for cascade models. |
It seems that on some pixel size, the generated image will have strange artifacts, especially in non-realistic artstyles like digital art or anime, anyone encounter? For example, generate I doubt if it is related to some interporlation issue when stage B taking Stage C's latent as prior condition, since by previewing Stage C's latent in pixel space, it's too blurry to get artifacts. trying to find the rule and make a smallest reproduction workflow |
No, but I've found significant artifacts, while changing the cfg, when there is a large image variation, between 2 close cfg values. This was with the unet workflow, at cfg 2.9 After the latest updates, it's a little better, and happening only at cfg 2.95, so it's harder to find. |
|
I downloaded the [ previewer.safetensors ] from here, and put it in the "vae" folder, that's inside the "models" folder. And enabled the preview, via Comfy UI Manager. You can set Preview method to "Auto". Also, this may be working correctly as it is. |
Sometimes I get a black image result, but when I re-run it (same seed and everything) it proceeds to work. It seems to happen just before it finishes (so probably Stage A?):
I should get the preview working, so I can better see what's going on. Edit: Just noticed I was using older txt2img workflow (load VAE, some conditioning nodes etc), so maybe the black image problem was related to that? |
If I connect a VAE decoder to the Stage C sampler, I can get a small "preview" image, with better colors ! This may be more useful than the sampler preview, to decide if we have a good seed, and it's worth enabling Stage B. The sampler preview will always be useful for debugging, but in Stable Cascade, it may be less useful for artistic purposes, since the Stage C preview is too small resolution, and the Stage B preview shows an almost "finished" image from the start, with very few variations between steps. |
Those simple previews are just a matrix multiplication so that's why they suck but they are very cheap and better than nothing.
|
I suspect that the preview does work, but the latter steps aren't doing anything visible.
I have noticed this too. With some art styles, such as line works on black background, the output from stage B/A can become very noisy. As a workaround I found that stopping stage_b diffusion early reduce the issue. For example doing only 10 step out of 30 with a Ksampler. |
it works, thanks a lot. |
Here is my reproduction: using a modified prompt of the official text2img workflow to emphasis the outline, seed and other settings fixed. at compression ratio 42, size 1008~1048 will have the same stage C latent of 24*24, so the basic content won't change, the only difference should only come from Stage B size 1008*1008 (OK)size 1016*1016 (artifacts)size 1024*1024 (OK)size 1032*1032 (artifacts)size 1040*1040 (OK)size 1048*1048 (artifacts)my observation: the pixel seems should be a multiple of 16 to avoid artifacts, instead of 8 |
This is line that resizes the latent C to half of the latent B size (eg. latent C is resized to 128x128 for a 1024x1024 image): https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/cascade/stage_b.py#L246 |
Some one post recommended size for vertical image I always get this error : Error occurred when executing KSampler: pixel_unshuffle expects height to be divisible by downscale_factor, but input.size(-2)=297 is not divisible by 2 File "K:\ComfyUI\ComfyUI\execution.py", line 152, in recursive_execute |
What can I do to avoid this problem? |
https://huggingface.co/stabilityai/stable-cascade/tree/main/controlnet
|
I had similar findings by doing some tests and I made my own latent node with multiple of 64, and possibility to lock the aspect ratio: https://github.com/Guillaume-Fgt/ComfyUI_StableCascadeLatentRatio If I compare to your observations, I avoid 1016,1032 and 1048 pixel sizes, so looks good. If anyone want to test and give feedback, I can modify it. |
This also happens in realistic images: |
was wondering the same, we probably have to wait I'm sure there is being worked on |
https://huggingface.co/stabilityai/stable-cascade
![image](https://private-user-images.githubusercontent.com/105170707/304396814-b19b3429-7851-4c41-8f5e-1d86462bd5e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4ODgxMzgsIm5iZiI6MTcyMDg4NzgzOCwicGF0aCI6Ii8xMDUxNzA3MDcvMzA0Mzk2ODE0LWIxOWIzNDI5LTc4NTEtNGM0MS04ZjVlLTFkODY0NjJiZDVlNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxM1QxNjIzNThaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yOGUyNWUxMzlkNGE2ZWI4OWNiNDhmMTNjOGMwZWIzNGFiM2M1ZmNhZDQzYmQxYTY4ZTNlOTJlNzgyMTY3YWUyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.8_7X7w6vhf3dxO9bXI7IpzSAKdWkO25CAHMq-eq06KA)
The text was updated successfully, but these errors were encountered: