Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ Better LoRA : Dropout, Conv2d #133

Merged
merged 8 commits into from Jan 15, 2023
Merged

Feature/ Better LoRA : Dropout, Conv2d #133

merged 8 commits into from Jan 15, 2023

Conversation

cloneofsimo
Copy link
Owner

@cloneofsimo cloneofsimo commented Jan 13, 2023

So unlike classical LLMs, LDM also has other many modules as well. Arguably, it seems like many "important" features come from Resnet features. This is clearly demonstrated by, for example, plug-and-play prior

image

Check out the details on this paper : https://pnp-diffusion.github.io/

Natural question to ask is : Does dreambooth yield fine-grained level of details, such as eyes or skins, because it is able to tune resnets? Is Q, K, V, O simply not enough?

In this PR I will try to answer this questions with bunch of experiments.

Here are some initial results :

This result is LoRA rank 4, with "CrossAttention", "Attention", "GEGLU" Replaced

pr_no

Now, this is LoRA rank 1, with "ResnetBlock2D", "CrossAttention", "Attention", "GEGLU" Replaced. Now, number of parameters are now about as 2 times higher : as high as 9.7MB. Of which, only 2MB is LoRA of "CrossAttention", "Attention", "GEGLU", so I suspect that if it helps, we might make LoRA of TR rank 4 and LoRA of Resnet rank 1.

pr_res

All trained with same number of steps, sampled with same parameters. This case, it looks like its a tie. I'll try on other models as well. I think it is a good time to start implementing fidelity metrics as well, instead of CLIP alignment score.

@cloneofsimo cloneofsimo changed the base branch from master to develop January 13, 2023 21:09
@cloneofsimo cloneofsimo self-assigned this Jan 13, 2023
@brian6091
Copy link
Collaborator

Excellent!

@cloneofsimo
Copy link
Owner Author

Been testing with rank 4 of resnet as well : but the result seems only marginally better. Of course this is totally subjective so it just doesn't worth the size increase.
Next step would be compare all sets : MLP, Attention, Resnet, train them one at a time to compare the results.

image

@hafriedlander
Copy link
Collaborator

Definitely interesting experiments.

What parameter are we optimizing for, total binary size, training speed, multiple concept accuracy, something else?

(The reason I ask is my first thought was "how does this compare to just rank 16 on CrossAttention / Attention / GEGLU")

Is there some way we can auto-detect when the rank is insufficient? (Like maybe a flat gradient when training while still having a high error rate).

@hafriedlander
Copy link
Collaborator

Also, just from a end-user POV, the greatest strength of LoRA for me is the easy adjust-ability of strength of the various adjustments. It'd be interesting to see how useful post-training adjustment of individual model LoRA weights was.

@cloneofsimo
Copy link
Owner Author

So I think it is kindof constraint optimization at this point. We just don't want output too large, but quality as dreambooth as high as possible. But the objective is also mixed. distortion, perceptual fidelity, and editability is all the performance we desire, but they have tradeoff relationship. perceptual fidelity is also kinda ill-defined, as CLIP score doesn't seem to represent that very well, unlike custom diffusion or textual inversion paper would like to suggest.

@cloneofsimo
Copy link
Owner Author

It seems like what many people are looking for can be simply described : identity preservation + editability.
Community seems to have high preference for usage on faces, and thus facial identity preservation is one thing they really want.
Second is that it has to have high editability as well, which I think CLIP score can well describe.

@cloneofsimo
Copy link
Owner Author

Now @brian6091 and I had an idea on making the LoRA on the resent part only on the upsample unet layers, because downsample parts are kinda used to compress representation and not to generate with fidelity. So we'll see if it works better. @brian6091 are you going to leave a PR or it just on your own thing?

@cloneofsimo
Copy link
Owner Author

I've added dropout as well so this PR is no longer only about conv layer. I'll rename it

@cloneofsimo cloneofsimo changed the title Feat : LoRA for convolutional layer as well Feature/ Better LoRA : Dropout, Conv2d Jan 14, 2023
@cloneofsimo
Copy link
Owner Author

Looks like adding dropout helps! Just like in the paper, I've used dropout with 0.1. Same seed, everything.

pr_res_dp

@ExponentialML
Copy link

ExponentialML commented Jan 14, 2023

Your work and contributions are very underrated. Great stuff @cloneofsimo!

Edit: I'm doing some testing of my own with these changes, and Better LoRA is a very big understatement.
I genuinely believe that you are well on your way to making this the new standard of fine tuning SD. Well done.

@brian6091
Copy link
Collaborator

brian6091 commented Jan 14, 2023

Yeah, I think the ability to do ablation experiments will be super interesting. There are subjective differences between the different components (CrossAttention, FFN for example) that may be hard to capture with objective metrics (but come out with more complex prompting). But from the end-user perspective I think being able to define your own objective and having the tools to achieve that is ideal.

@brian6091
Copy link
Collaborator

brian6091 commented Jan 14, 2023

I'll work the scale/nonlinearity code back in once you've stabilized this (PR #111). Also subtle effects, but worth it IMO for the trivial cost.

@brian6091
Copy link
Collaborator

Now @brian6091 and I had an idea on making the LoRA on the resent part only on the upsample unet layers, because downsample parts are kinda used to compress representation and not to generate with fidelity. So we'll see if it works better. @brian6091 are you going to leave a PR or it just on your own thing?

I can leave a PR.

@cloneofsimo
Copy link
Owner Author

Done various experiments, but more needs to be done. I've got mixed results in my case, so i'll add these options as optional for now. using dropout does make a difference though.

@cloneofsimo cloneofsimo merged commit be1ef54 into develop Jan 15, 2023
@cloneofsimo
Copy link
Owner Author

Note:

I've found that having resnet trained requires very low learning rate : something like 5e-6 for me.

@cloneofsimo cloneofsimo deleted the convlora branch January 16, 2023 00:47
@okaris
Copy link

okaris commented Jan 16, 2023

@cloneofsimo First of all thank you for creating this repo. I've been tinkering with LoRA but can't say it's faster than (let alone twice as fast) than Dreambooth. Could you share the args you used for the above examples please?

@cloneofsimo
Copy link
Owner Author

Is this related to conv loras? Or just lora pti in general? @okaris

@okaris
Copy link

okaris commented Jan 16, 2023

The comment, lora pti in general. I can open a new issue for that. The question, I'd really like to know the settings you used for the above trainings and how long they took. Thanks!

@cloneofsimo
Copy link
Owner Author

It's been a while since I've evaluated time to train these models, but these take < 6min in general. I think they aren't as fast as the previous ones (currently in training scripts folder), because lora_pti is not optimized for speed and memory since no 8bit adam + xformers are tested. It is the textual inversion part that takes a long time since they are currently done with full precision.

@cloneofsimo
Copy link
Owner Author

I am continuing to optimize for perceptual performance first, and README is bit misguiding because they were not based on lora_pti scripts. Better fix that.

@ExponentialML
Copy link

ExponentialML commented Jan 17, 2023

Hey @cloneofsimo. Is nn.Conv2d needed here when using extended unet?

model, target_replace_module, search_class=[nn.Linear]

@cloneofsimo
Copy link
Owner Author

Ah, these + also other tools are currently unsupported for lora conv2d. Rests are coming in as a feature soon.
Meanwhile, patch_pipe will take care with no problem

@ExponentialML
Copy link

Sweet, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants