Stable-cascade support #1982

bmaltais · 2024-02-18T15:29:41Z

I have started work on supporting stable-cascade in the GUI,,, hope it will not be too much of a pain to implement. Let's discuss it in here.

GamingDaveUk · 2024-02-18T16:12:07Z

Happy to see this, not much I can add to the discussion other than a thank you for undertaking this. People like yourself creating and maintaining these tools are the reason so much content exists. Thank you

futureflix87 · 2024-02-18T16:59:15Z

Thank you so much!!

bmaltais · 2024-02-18T18:04:04Z

Can someone share a toml config file for a simple one concept finetuning? I never do finetuning and apparently using .toml is the way to go now... and I have no clue how to configure it ;-)

My 1st quest in making the GUI is getting to properly finetune a Stable Cascade model... and I need a proper .toml to run this example command:

& accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 2 stable_cascade_train_stage_c.py `
  --mixed_precision bf16 --save_precision bf16 --max_data_loader_n_workers 0 --persistent_data_loader_workers `
  --gradient_checkpointing --learning_rate 1e-4 `
  --optimizer_type adafactor --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" `
  --max_train_epochs 10 --save_every_n_epochs 1 --save_precision bf16 `
  --output_dir e:\model\test --output_name sc_test `
  --stage_c_checkpoint_path "E:\models\stable_cascade\stage_c_bf16.safetensors" `
  --effnet_checkpoint_path "E:\models\stable_cascade\effnet_encoder.safetensors" `
  --previewer_checkpoint_path "E:\models\stable_cascade\previewer.safetensors" `
  --dataset_config "D:\kohya_ss\examples\stable_cascade\test_dataset.toml" `
  --sample_every_n_epochs 1 --sample_prompts "D:\kohya_ss\examples\stable_cascade\prompt.txt" `
  --adaptive_loss_weight

Once I am successful I will be in a better place to judge how to put the GUI together... At 1st I tought I would just extend the finetuning tab to support Stable Cascade... but I think it might just be better to create a dedicated tab for it... still unsure...

bmaltais · 2024-02-18T19:13:05Z

I have figured it out...

[[datasets]]
resolution = 1024
batch_size = 4
keep_tokens = 1
enable_bucket = true

  [[datasets.subsets]]
  image_dir = 'd:\kohya_ss\examples\stable_cascade\test_dataset'
  num_repeats = 10
  class_tokens = 'toy'
  caption_extension = '.txt'

bmaltais · 2024-02-18T19:20:32Z

Look like I am successful in finetuning...

Finetuning as zxc class toy and prompting with zxc toy posing at the beach --W 800 --H 1200... so there is hope

Look like the best epoch was 7... after that it went downhill

bmaltais · 2024-02-18T19:23:39Z

I have shared the test dataset in the stable_cascade branch. Look under the examples folder. You can play with it for now.

bmaltais · 2024-02-19T00:03:51Z

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

bmaltais · 2024-02-19T20:55:47Z

If you find better parameters for better results please share. Training SC is hugely VRAM intensive.

311-code · 2024-02-19T21:56:37Z

I have 4090 24gb. I'll dive into this today and report back.

How many photos do you recommend I use for ideal use in Cascade to test this?

bmaltais · 2024-02-19T22:51:40Z

I did my test with 8… I don’t think the disappointing result is due to that… I tried using other optimiser but I don’t have enough vram.

gesen2egee · 2024-02-20T15:02:02Z

Maybe because this

bmaltais · 2024-02-21T00:29:20Z

Using the latest updated code in sd-scripts produce better results... still not perfect... kohya is working on allowing to train stage_b... hoping this will fix the issue with the final look:

311-code · 2024-02-21T22:46:14Z

Ok I feel like I'm close, but I'm not familiar this new code. Is there any basic info you can provide of where to put the training images and the format of sample .json for cascade? It's very different.

bmaltais · 2024-02-21T23:52:38Z

I did provide everything in the stable_cascade branch. Look in the example folder in that branch. You will find the dataset, toml file for the dataset, etc. The new way of configuring the image for finetuning in the latest sd-scripts code is to use a .toml file... this is what the new SC Finetuning tab is configured to use...

311-code · 2024-02-22T12:59:57Z

Thank you, I completely missed your examples folder.

I just read everything here also: https://github.com/kohya-ss/sd-scripts/tree/stable-cascade as per your main page link. Went and read the stable cascade branch. (good info + docs folder for general fine-tune) but had to translate the Japanese.

After replacing examples folder with additions/images/toml where do I place all of files? I am assuming either leave them there or to your empty "dataset" folder. Edit/Update: The .toml file in examples folder controls dataset location.

bmaltais · 2024-02-22T17:23:13Z

The dataset can be anywhere. Simply edit the toml file to point to it and specify the repeats, resolution, etc.

311-code · 2024-02-22T21:58:05Z

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file. I will edit the toml with images path.

Trying this out again today!

bmaltais · 2024-02-22T22:37:19Z

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file, got it. I will edit the toml with images path.

Trying this out again today!

Actually it does not. Just make sure to put the path to your toml in the SC Finetuning and it should work. It does not need to be in the example.

311-code · 2024-02-22T22:50:50Z

Ok got it, I see it under SC Finetuning tab > folders > Dataset toml path (it looks for .json by default) "selected all filetypes" then chose the model files for each field, then the .toml. in the \examples\stable_cascade folder. It's training now.

Not sure if you have plans to map the .toma file to gradio interface input fields with some instructions in there, but I think it would help a lot for novices like me in getting into this.

Thanks again for working on this btw, it's pretty huge for the community having easy to use training like this imo.

Edit: I keep editing my posts because my brain can't think straight the last few days and want the info to be as clear as possible for users.

bmaltais · 2024-02-23T11:55:00Z

No worries, I keep editing mine to :-)

as fast as a GUI to manage and create the toml dataset file it might be possible but I feel it might just be easier to just create one by hand. The complexity of building a gradio interface for that is beyond my current knowledge… but I am sure someone could do it.

if someone want to take a crack at creating a toml dataset gradio gui class I will gladly add it to the interface.

311-code · 2024-02-23T13:23:51Z

I had some luck with slightly higher quality outputs and a few issues, here are some samples:

The first one has a prompt censorship joke if you look closely haha.

Problems: Likeness not completely there, samples during training look good at 800 steps (Should be telling you epochs to make this easier, apologies) but not as good a 800 steps in comfyui. So I used 1800 checkpoint. Another issue is samples stuck at 192x192.

For some reason I have to use an overtrained checkpoint with a text_model for clip node that is less steps than the checkpoint to get decent results, or even mix a text_model from another training of Ted gets better results somehow.

bmaltais · 2024-02-23T13:41:44Z

Thank you for sharing this. I will test it out later tonight after work and familly stuff ;-) I will update the content of the branch with you updates so it can help others who want to cut their teeth on this ;-)

The sample you provided is actually pretty great. Probably a combination of your parameters, source data and training the text encoder

bmaltais · 2024-02-23T15:58:45Z

Interesting results...

Unet and TE:

TE only:

UNet only:

Conclusion... TE has the most importance as far as likeness goes... but without the trained UNet the result is quite fuzzy...

311-code · 2024-02-23T22:35:53Z

That first one is much better likeness than I ever got.

I really had to fight with text encoder unet model combinations. Maybe increasing text encoder learning rate a bit could help as it has the biggest impact?

bmaltais · 2024-02-24T10:07:05Z

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

311-code · 2024-02-26T04:12:47Z

Ok, I spent I spent a couple more days testing. Tried a few things, no captions, changing classes, general tokens, training self with 60 photos like I would do on SDXL.

Overall it's very difficult to figure out the right combination of unet model and text encoder model to use in comfyui, or what number of steps for 60 photos is the best for Cascade. Maybe this will change with future diffusers updates? To complicate things, the 13 ted photos look good at 800 steps in the samples, then falls off, but then got decent again at 1800 steps. It makes me wonder if it looks more like the Ted likeness if I did more epochs.

It takes a long time to fully finetune Cascade at time of writing this it seems, and I'm struggling to figure it out. It didn't look like me overall and looked pretty undertrained at 3400 steps at batch size 3 with 60 photos. Thinking this is going to need a lot more steps, which doesn't seem in line with it "training faster than SDXL". I could increase learning rate on everything here again, but in SDXL that always seemed to make the results works.

This guy is getting pretty decent results of his cat though at 8000 steps (but overfitting) and is using a very large 7 batch size https://www.reddit.com/r/StableDiffusion/comments/1azmhte/my_cat_in_different_stylesstablecascade_stagec/ with kohya scripts directly.

I ran out of disk space though because I fell asleep and it was saving too often at 300 steps.

vgaggia · 2024-02-26T10:04:53Z

I'm actually busy trying to training stable cascade with around a dataset of 180k images, although i am using onetrainer cause it seems to be less memory intensive for some reason.

I have also noticed the fact that the training gets better and worse constantly as it trains, it's gonna be a while for my training to finish on a single gpu so no clue when i can actually show some results

vgaggia · 2024-02-26T12:52:14Z

I sure will find out if it's a massive fail!

Have you considered trying a very high learn rate maybe it trains differently than we're used to, it is supposed to be easier to train if i remember right.

betterftr · 2024-02-29T17:30:52Z

for me it generates samples 192x192 during training, trying to figure out why, since I set w and h 1024

bmaltais · 2024-02-29T17:37:05Z

The small samples is related to how the as-script code is actually being used. Nothing I can do. This is something only Koby’s can address… but given how heavy creating samples is I suspect this was by design.

betterftr · 2024-02-29T19:55:58Z

well as a temporary solution one can increase the --w and --h to 4096 for 4x size :D

jordoh · 2024-03-01T04:09:29Z

Someone posted a workflow for converting the unet models here to work with official comfyui workflow (to get rid of that error) Simple enough. I've been out of town but will try it when I get back.

comfyanonymous/ComfyUI#2893 (comment)

Note that only loads the unet, not the clip, so you aren't able to utilize the (more effective) text encoder training.

311-code · 2024-03-01T19:35:59Z

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

jordoh · 2024-03-01T19:41:42Z

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

Maybe? I've been trying this with a model trained by the original Stable Cascade repo code and get errors as the model it produces isn't loadable as a clip model (I don't have a separate text encoder model from that process). It might work for kohya-ss trained models though - I'd be very interested to know if it does.

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

Yes, there's definitely a point, see this comment upthread for comparisons. For person-trained models, I'm unable to achieve any likeness with just the UNET (vs. generating with Stable Cascade repo that uses the trained CLIP).

311-code · 2024-03-02T23:07:02Z

Yup saw that before. Sorry for confusion, I meant is there "any point" to using the official comfyui workflow vs unet workflow for this. I wonder if there would be a difference.

jordoh · 2024-03-03T01:22:20Z

Yup saw that before. Sorry for confusion, I meant "any point" to comfyui workflow vs unet workflow if we got unet and clip working in both workflows. I wonder if there would be a difference.

Oh, thanks for clarifying, I think I understand what you meant now: is there any difference between saving off a checkpoint with the trained unet then using that saved checkpoint vs. using the trained unet? Seems unlikely that would affect the output, as it's the same model, clip, and VAE either way, but might save some memory or load time to use the saved off checkpoint.

311-code · 2024-03-04T10:29:28Z

Yes, thanks for info.

Something I just discovered I never knew about kohya gui. You can change prompt the prompt.txt in the samples folder as it's training to change the samples.

This was pretty helpful. Finding it useful if you are saving a lot of checkpoints every however many epochs/steps and want to see something different.

segalinc · 2024-03-04T20:20:50Z

will this feature work on multiple gpus?

sapkun · 2024-03-05T01:58:37Z

When the controlnet training script will be release for stable cascade?

paboum · 2024-03-08T11:57:03Z

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

Please try adaptive optimizers already, e.g Prodigy. I'm a newbie here, never even used those LR parameters. Also, I hope this new feature will work fine with those, so at least one test is in order.

311-code · 2024-03-12T04:41:22Z

I will need to look into prodigy also. I've heard good things.

Just want to give an update though, I tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps.

Can't get sdxl dreambooth or full finetuning level results with a trained human. Tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out.

segalinc · 2024-03-12T04:43:35Z

Have you checked what one trainer is doind that seems like people are getting really nice results using it?

…

On Mon, Mar 11, 2024, 9:41 PM brentjohnston ***@***.***> wrote: Just want to give an update, I tried all the stuff I mentioned earlier, and tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps. I can't get sdxl dreambooth level results with a trained human or a flexible model. I tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out, and focus on sdxl again for now. I hope SD3 not as difficult to train as this is, but we'll see. I'm could still be doing something wrong here. I can't produce the results I had above with Ted. — Reply to this email directly, view it on GitHub <#1982 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACHLPEUBQASHPKN73XAVNKTYX2IRRAVCNFSM6AAAAABDOHW3TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQGM2DQNJVGU> . You are receiving this because you commented.Message ID: ***@***.***>

311-code · 2024-03-13T04:28:43Z

Was thinking of trying that out but heard it may not train the text encoder like this does, Edit: Nm I believe it can

I will give it a go though just to see how it compares, thanks!

311-code · 2024-03-15T06:54:33Z

Some info from Kohya Cascade branch since things have stagnated here, if anyone want to try:

Official learning rate for Cascade default is 1e-4 or (0.0001) and official settings use bf16 for training.

The first time, specify --text_model_checkpoint_path and --save_text_model to save
the Text Encoder weights. From the next time, specify --text_model_checkpoint_path to load the saved weights.

Note:

A quick clarification, Stable Cascade uses Stage A & B to compress images and Stage C is used for the text-conditional
learning. Therefore, it makes sense to train a LoRA or ControlNet only for Stage C. You also don't train a LoRA or
ControlNet for the Stable Diffusion VAE right?

If your GPU allows for it, you should definitely go for the large Stage C, which has 3.6 billion parameters.
It is a lot better and was finetuned a lot more. Also, the ControlNet and Lora examples are only for the large Stage C at the moment.
For Stage B the difference is not so big. The large Stage B is better at reconstructing small details,
but if your GPU is not so powerful, just go for the smaller one.

I finally got Onetrainer working to compare, will report back.

Edit: Comparing to Kohya gui but had side issue. Onetrainer seems to have a custom-made diffusers to .safetensors converter after training and it's not great imo. I would recommend doing a manual conversion of a backup from diffusers loader node to checkpoint save node in comfyui if comparing.

mhaines94108 · 2024-04-20T17:47:30Z

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

I have spent several weeks trying to fine-tune Stable Cascade on a dataset of ~50K photos, and my results have a very similar finger-painted look. I've been using the sample code straight from Stable Diffusion. I guess I'll try Kohya's scripts.

3blackbar · 2024-05-21T16:52:34Z

is the bear settingsfrom examples the best ones currently ? The other one saves every 100 steps, this is way way too early.

bmaltais added the enhancement New feature or request label Feb 18, 2024

bmaltais pinned this issue Feb 18, 2024

This was referenced Feb 24, 2024

official Cascade workflow won't accept kohya-ss gui trained checkpoint (14gb stage c trained) comfyanonymous/ComfyUI#2893

Closed

[Feature Request]: Support Stable Cascade lllyasviel/stable-diffusion-webui-forge#234

Open

311-code mentioned this issue Mar 14, 2024

[Feat]: Select the non-EMA checkpoint for Stable Cascade Nerogar/OneTrainer#190

Open

Stable-cascade support #1982

Stable-cascade support #1982

Comments

bmaltais commented Feb 18, 2024

GamingDaveUk commented Feb 18, 2024

futureflix87 commented Feb 18, 2024

bmaltais commented Feb 18, 2024 • edited Loading

bmaltais commented Feb 18, 2024

bmaltais commented Feb 18, 2024 • edited Loading

bmaltais commented Feb 18, 2024

bmaltais commented Feb 19, 2024

bmaltais commented Feb 19, 2024

311-code commented Feb 19, 2024 • edited Loading

bmaltais commented Feb 19, 2024

gesen2egee commented Feb 20, 2024

bmaltais commented Feb 21, 2024

311-code commented Feb 21, 2024 • edited Loading

bmaltais commented Feb 21, 2024

311-code commented Feb 22, 2024 • edited Loading

bmaltais commented Feb 22, 2024

311-code commented Feb 22, 2024 • edited Loading

bmaltais commented Feb 22, 2024

311-code commented Feb 22, 2024 • edited Loading

bmaltais commented Feb 23, 2024

311-code commented Feb 23, 2024 • edited Loading

bmaltais commented Feb 23, 2024 • edited Loading

bmaltais commented Feb 23, 2024

311-code commented Feb 23, 2024 • edited Loading

bmaltais commented Feb 24, 2024

311-code commented Feb 26, 2024 • edited Loading

vgaggia commented Feb 26, 2024

vgaggia commented Feb 26, 2024 • edited Loading

betterftr commented Feb 29, 2024

bmaltais commented Feb 29, 2024

betterftr commented Feb 29, 2024

jordoh commented Mar 1, 2024

311-code commented Mar 1, 2024 • edited Loading

jordoh commented Mar 1, 2024

311-code commented Mar 2, 2024 • edited Loading

jordoh commented Mar 3, 2024

311-code commented Mar 4, 2024 • edited Loading

segalinc commented Mar 4, 2024

sapkun commented Mar 5, 2024

paboum commented Mar 8, 2024

311-code commented Mar 12, 2024 • edited Loading

segalinc commented Mar 12, 2024 via email

311-code commented Mar 13, 2024 • edited Loading

311-code commented Mar 15, 2024 • edited Loading

Note:

mhaines94108 commented Apr 20, 2024

3blackbar commented May 21, 2024 • edited Loading

bmaltais commented Feb 18, 2024 •

edited

Loading

bmaltais commented Feb 18, 2024 •

edited

Loading

311-code commented Feb 19, 2024 •

edited

Loading

311-code commented Feb 21, 2024 •

edited

Loading

311-code commented Feb 22, 2024 •

edited

Loading

311-code commented Feb 22, 2024 •

edited

Loading

311-code commented Feb 22, 2024 •

edited

Loading

311-code commented Feb 23, 2024 •

edited

Loading

bmaltais commented Feb 23, 2024 •

edited

Loading

311-code commented Feb 23, 2024 •

edited

Loading

311-code commented Feb 26, 2024 •

edited

Loading

vgaggia commented Feb 26, 2024 •

edited

Loading

311-code commented Mar 1, 2024 •

edited

Loading

311-code commented Mar 2, 2024 •

edited

Loading

311-code commented Mar 4, 2024 •

edited

Loading

311-code commented Mar 12, 2024 •

edited

Loading

311-code commented Mar 13, 2024 •

edited

Loading

311-code commented Mar 15, 2024 •

edited

Loading

3blackbar commented May 21, 2024 •

edited

Loading