New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted learning of TIs and HNs #6700
Weighted learning of TIs and HNs #6700
Conversation
Some interesting links to get weight maps: |
This is similar to the Attention Map option of the DreamArtist extension, it is only for embeddings, not for HN. Did you look at that? |
I have not, but I will check it out. Thanks for the reference. |
With the default way what learn rate do you suggest to teach a face? @Shondoit |
Currently there's no consensus. I've found the default (0.005) to be good enough. |
1b21c35
to
bbe2b78
Compare
This is the latest video on the subject, worth watching: |
@mykeehu in that video how to determine number of vectors is still not explained. he gives some numbers but no backing up how he did come up with those numbers. i really need to understand how vectors are utilized. also he used image descriptions but when training those prompts are not even used lol :) |
In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger. We have talked a lot about the details in this and this thread. |
it shows last used prompt. in my case it didnt read the image captions generated by pre processing. it just used image name and the model name can you verify that? |
This rule of thumb doesn't make sense at all. If you have 10 picture or 1000 pictures, if they're all of the same subject, we want it to learn the same result. More training steps per image in the dataset does make sense though, cause it needs more time to come to a consensus. |
ok now makes more sense. so for a single person we can say 2 is at max is sufficient |
bbe2b78
to
7fe4321
Compare
a75fa06
to
8891d3d
Compare
Other than that, I tested this and it works as described. Not used TI or Hypernetworks much recently, so can't say anything about training speed, but it should be obvious that a weighted loss trains faster. I've used this lib to generate the masks: https://github.com/1adrianb/face-alignment I guess if this gets merged, people can create extensions for the different mask generation methods, so no need to include a specific method in this PR. |
It's definitely different between 512 and 768 models (64 vs 96).
Yeah, that was the idea from the start. Keep this simple and let people do whatever they want. Tooling will either be external or extensions. (Or perhaps one very basic one in the preprocessing tab.) |
8891d3d
to
1f40596
Compare
@AUTOMATIC1111 Is there something I should do or consider before you want to implement this? Pre-empting the question "can't this be an extension?", it's 99% based on normal TI/HN learning code. Only the Loss function is tweaked. I think it would be a waste to copy/paste everything in an extension. And I'm not sure it's easy to hijack only one line of WebUI code in the TI/HN training functions. Besides, it has no impact for users that don't use the function. |
@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect. |
This is just another feature that can be used for training. It's not magically going to fix everything. |
Thank you very much for your answer, I'm going to give it a try. Looks promising. |
1f40596
to
593cfea
Compare
you can watch my video i got pretty good results already How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial |
I did, great video btw... my training got better but still with issues.. I'm still trying with different datasets and parameters, making test, but I only have 6 gb of VRAM (1660ti)... I have to spend a lot of time on these tests.. :p |
thank you, I'm going to try when I get some free time from work. will let u know the results after the test. But honestly, I'm waiting for this PR to get merged and try this option. |
0075d65
to
0b7d34e
Compare
@Shondoit can we make stable diffusion to support transparent images? It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels |
Not entirely sure what you're trying to ask here. What do you mean exactly when you say "it's problematic"? |
0b7d34e
to
63caaad
Compare
let me show example once upscaled here |
Yeah, SD has no concept of transparency. So if you're importing an image into the model the transparency will be lost. Currently no way around it. |
63caaad
to
8c322d4
Compare
8c322d4
to
edb1009
Compare
@Shondoit is the first message up to date? ty so much |
@FurkanGozukara The PR has been added as-is, with a minor fix, so it should work the same. If you need any more details or explanation DM me on Discord. |
@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this? I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel. I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning. |
Can you expand on how step 3 works? And is there a link to your discord? |
Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples |
I've since spoken to Shondoit and since step 3 is a little bit open for interpretation. I'd like to clarify for everyone who might have some problems with the last step. In step 3 you are supposed to put your "map" into the alpha channel of your png. In this case it is not important to have the alpha channel saved as a channel that you can edit again after reopening the picture in Photoshop/GIMP, but it is important to save the transparency. The transparency is the key element here, so your "map" should not remove complete parts of your image if they have any valuable information. So when drawing the map/ generating the map it makes sense to not use black but values between white and gray. The feature "Use PNG alpha channel as loss weight" removes transparency in the learning progress, but uses it's information afterwards for the loss. |
can you show what images you prepared and what did you give to the web ui? screenshots? also did you get better results? |
You may as well use Pillow for a Pythonic way, here's an example
Just remember to scale the alpha pixels to 8 bit before converting image mode, or it may get a cut-off for those over 255. For filename, -0000 is just output file format I chose for masking procedure |
Forgive me for my ignorance but... Is a PNG with alpha channel the same as a PNG with transparency? I've tried different save and export methods in Photoshop and when reopening or previewing externally... all have lost the masked information. In other words, everything that the "original aplha channel" masks... is empty in the exported RGB channels. Is there a way to keep the RGB channels intact and add the alpha channel to the PNG? The only thing I have found looking for has been that it should be recorded in 32 bits (8 bits per channel RGBA) But photoshop only allows me 24 bits with transparency. If this were true, when training a textual inversion... Would it still be necessary to describe the content of what the transparency hides? |
@polsetes Be aware that Layer Mask and Alpha Channel are different things. I'll be honest, I use GIMP for photo editing, so I'm not familiar with Photoshop. |
I'm not familiar with how PNG is implemented. How can I verify if the transparency in my PNGs is in the correct format expected by A1111 webui? |
Hello, thanks for the good work! I'm trying to do the same approach for fine-tuning sd model with LoRa. Thanks in advance! |
What this pull request is trying to achieve.
This will add the ability to train Embeddings and Hypernetworks using a weighted loss.
It will add the option to load PNG alpha channels as weight masks that will be used during loss calculation, in effect focusing the attention towards the masked subject.
During my training I've found that focusing the attention in this manner results in much faster learning of subjects, as well as learning more detailed features of the subjects.
Example of what is being emphasized during learning. (Regular is for demonstration only)
Note that this does not interfere with Dreambooth or regular image generation.
Only during the loss calculation for TIs and HNs is this adjusted loss calculated.
In the future this could be extended to load side-by-side greyscale files, for example depth maps.
I.e. if you have image.png and image.weight.png in the same folder, it will load one as the weight for the other.
Additional notes and description of changes
I've added an extra hijack that adds a weighted_forward function, next to the regular forward function.
This function will do a regular forward pass, but before doing so, substitute the get_loss function and add the weight tensors to the model so the weighted_loss function can apply it.
Before calculating the mean, it will multiply the individual latent losses with the weight mask, resulting in more or less loss for pixels based on the amount of weight. Afterwards it will return the mean to the training function.
Environment this was tested in
How to use this
a. Generate a depth map using MiDaS
b. Generate an attention map using CLIPSeg
c. Manually draw a map
d. Create a generic gradient with falloff to the image edges.
a. Using a fairly easy Python script and PILLOW
b. Manually in Photoshop or GIMP
Note: be sure that the alpha channel is 8-bit, not single color transparency.
In the future I would like to release a script that would do these steps automatically.
(Perhaps even add MiDaS generation to the preprocess step)
Screenshots
Results
Haven't found time for extensive training runs and comparison images.
But I have received positive feedback from multiple people on Discord.
Timing:
No significant impact.
Without:
Loading dataset: 4.70it/s
Training: 2.95it/s
With:
Loading dataset: 4.70it/s
Training: 2.94it/s