Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune SDXL #35

Open
dydxdt opened this issue May 17, 2024 · 3 comments
Open

Finetune SDXL #35

dydxdt opened this issue May 17, 2024 · 3 comments

Comments

@dydxdt
Copy link

dydxdt commented May 17, 2024

Thanks for your good work. I use the offered SDXL weights to finetune with my own data, but it seems the loss doesn't converge and I wonder whether the offered weights are trained on 1024 resolution. I test the finetuned model and it cannot learn the style of the training data. Do you have any advice? Thx

@huiyang865
Copy link

I also encountered a similar problem: SDXL model training at 1024 resolution, loss does not seem to converge.

The model training configuration is as follows:

accelerate launch examples/brushnet/train_brushnet_sdxl.py \
--pretrained_model_name_or_path /disk1/BrushNet/data/ckpt/anything-xl \
--brushnet_model_name_or_path /disk1/BrushNet/data/ckpt/random_mask_brushnet_ckpt_sdxl_v0 \
--output_dir runs/logs/selfdata_brushnetsdxl_1024 \
--train_data_dir /disk1/data/self_developed_animate_data \
--resolution 1024 \
--max_train_steps 100000 \
--learning_rate 1e-5 \
--train_batch_size 1 \
--gradient_accumulation_steps 4 \
--tracker_project_name brushnet \
--report_to tensorboard \
--resume_from_checkpoint latest \
--validation_steps 1000 \
--checkpointing_steps 1000 \
--random_mask

The training log shows following:
E1ED3AA6-8556-4638-B5DC-0EBAB59B82FE

How did you @dydxdt solve the problem later?

@dydxdt
Copy link
Author

dydxdt commented Jun 6, 2024

Haven't figured it out. It sucks. Hope for helpful advice -_- @huiyang865

I also encountered a similar problem: SDXL model training at 1024 resolution, loss does not seem to converge.

The model training configuration is as follows:

accelerate launch examples/brushnet/train_brushnet_sdxl.py \
--pretrained_model_name_or_path /disk1/BrushNet/data/ckpt/anything-xl \
--brushnet_model_name_or_path /disk1/BrushNet/data/ckpt/random_mask_brushnet_ckpt_sdxl_v0 \
--output_dir runs/logs/selfdata_brushnetsdxl_1024 \
--train_data_dir /disk1/data/self_developed_animate_data \
--resolution 1024 \
--max_train_steps 100000 \
--learning_rate 1e-5 \
--train_batch_size 1 \
--gradient_accumulation_steps 4 \
--tracker_project_name brushnet \
--report_to tensorboard \
--resume_from_checkpoint latest \
--validation_steps 1000 \
--checkpointing_steps 1000 \
--random_mask

The training log shows following: E1ED3AA6-8556-4638-B5DC-0EBAB59B82FE

How did you @dydxdt solve the problem later?

@huiyang865
Copy link

huiyang865 commented Jun 13, 2024

Thanks for your reply.

Is the Brushnet part of SDXL and SD1.5 the same structure? I look at the code and find that BrushNet features are not injected into the Refiner module of XL, is that right?

Do these factors limit the convergence of the XL version? Look forward to your further reply. Thank you very much. @dydxdt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants