Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted learning of TIs and HNs #6700

Merged
merged 5 commits into from Feb 19, 2023

Conversation

Shondoit
Copy link
Contributor

@Shondoit Shondoit commented Jan 13, 2023

What this pull request is trying to achieve.

This will add the ability to train Embeddings and Hypernetworks using a weighted loss.
It will add the option to load PNG alpha channels as weight masks that will be used during loss calculation, in effect focusing the attention towards the masked subject.

During my training I've found that focusing the attention in this manner results in much faster learning of subjects, as well as learning more detailed features of the subjects.

Example of what is being emphasized during learning. (Regular is for demonstration only)
normal depth clipseg

  1. Regular - 2. Depth - 3. CLIPSeg

Note that this does not interfere with Dreambooth or regular image generation.
Only during the loss calculation for TIs and HNs is this adjusted loss calculated.

In the future this could be extended to load side-by-side greyscale files, for example depth maps.
I.e. if you have image.png and image.weight.png in the same folder, it will load one as the weight for the other.

Additional notes and description of changes

I've added an extra hijack that adds a weighted_forward function, next to the regular forward function.
This function will do a regular forward pass, but before doing so, substitute the get_loss function and add the weight tensors to the model so the weighted_loss function can apply it.
Before calculating the mean, it will multiply the individual latent losses with the weight mask, resulting in more or less loss for pixels based on the amount of weight. Afterwards it will return the mean to the training function.

Environment this was tested in

  • OS: Windows
  • Browser: Firefox
  • Graphics card: NVIDIA RTX 2080 8GB

How to use this

  1. Preprocess images normally.
  2. Choose one of the following:
    a. Generate a depth map using MiDaS
    b. Generate an attention map using CLIPSeg
    c. Manually draw a map
    d. Create a generic gradient with falloff to the image edges.
  3. Add the map from step 2 as an alpha channel to the image from step 1.
    a. Using a fairly easy Python script and PILLOW
    b. Manually in Photoshop or GIMP
    Note: be sure that the alpha channel is 8-bit, not single color transparency.

In the future I would like to release a script that would do these steps automatically.
(Perhaps even add MiDaS generation to the preprocess step)

Screenshots

weighted-learning-ui

Results

Haven't found time for extensive training runs and comparison images.
But I have received positive feedback from multiple people on Discord.

Timing:
No significant impact.

Without:
Loading dataset: 4.70it/s
Training: 2.95it/s

With:
Loading dataset: 4.70it/s
Training: 2.94it/s

@Shondoit Shondoit changed the title Weighted learning Weighted learning of TIs and HNs Jan 13, 2023
@mykeehu
Copy link
Contributor

mykeehu commented Jan 14, 2023

This is similar to the Attention Map option of the DreamArtist extension, it is only for embeddings, not for HN. Did you look at that?

@Shondoit
Copy link
Contributor Author

This is similar to the Attention Map option of the DreamArtist extension, it is only for embeddings, not for HN. Did you look at that?

I have not, but I will check it out. Thanks for the reference.

@FurkanGozukara
Copy link

With the default way what learn rate do you suggest to teach a face? @Shondoit

@Shondoit
Copy link
Contributor Author

Shondoit commented Jan 16, 2023

Currently there's no consensus. I've found the default (0.005) to be good enough.

@mykeehu
Copy link
Contributor

mykeehu commented Jan 16, 2023

With the default way what learn rate do you suggest to teach a face? @Shondoit

This is the latest video on the subject, worth watching:
https://www.youtube.com/watch?v=2ityl_dNRNw

@FurkanGozukara
Copy link

@mykeehu in that video how to determine number of vectors is still not explained. he gives some numbers but no backing up how he did come up with those numbers. i really need to understand how vectors are utilized.

also he used image descriptions but when training those prompts are not even used lol :)

@mykeehu
Copy link
Contributor

mykeehu commented Jan 16, 2023

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger. We have talked a lot about the details in this and this thread.
Keywords are extracted from the original thing, so you have something to compare what the system still sees in the image to. So you can type the keywords in the prompt afterwards, but what's not there is remembered by the training.

@FurkanGozukara
Copy link

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger. We have talked a lot about the details in this and this thread. Keywords are extracted from the original thing, so you have something to compare what the system still sees in the image to. So you can type the keywords in the prompt afterwards, but what's not there is remembered by the training.

it shows last used prompt. in my case it didnt read the image captions generated by pre processing. it just used image name and the model name can you verify that?

@Shondoit
Copy link
Contributor Author

Shondoit commented Jan 16, 2023

@mykeehu

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger.

This rule of thumb doesn't make sense at all. If you have 10 picture or 1000 pictures, if they're all of the same subject, we want it to learn the same result.
It makes more sense to say 1 vector per subject complexity. I.e., a complex subject might need an extra vector, or an extra subject might need 2x the vectors.
Keep in mind that "Emma Watson" consists of just 2 vectors and is trained on multiple thousands of images.

More training steps per image in the dataset does make sense though, cause it needs more time to come to a consensus.

@FurkanGozukara
Copy link

@mykeehu

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger.

This rule of thumb doesn't make sense at all. If you have 10 picture or 1000 pictures, if they're all of the same subject, we want it to learn the same result. It makes more sense to say 1 vector per subject complexity. I.e., a complex subject might need an extra vector, or an extra subject might need 2x the vectors. Keep in mind that "Emma Watson" consists of just 2 vectors and is trained on multiple thousands of images.

More training steps per image in the dataset does make sense though, cause it needs more time to come to a consensus.

ok now makes more sense. so for a single person we can say 2 is at max is sufficient

@Shondoit Shondoit marked this pull request as ready for review January 17, 2023 16:07
@Shondoit Shondoit force-pushed the weighted-learning branch 4 times, most recently from a75fa06 to 8891d3d Compare January 20, 2023 07:59
@kidol
Copy link

kidol commented Jan 21, 2023

One issue I see with this PR: The latent sample shape is always the same (I think?), so why not hardcode it and leave the sampling method logic unchanged?

Other than that, I tested this and it works as described. Not used TI or Hypernetworks much recently, so can't say anything about training speed, but it should be obvious that a weighted loss trains faster.

I've used this lib to generate the masks: https://github.com/1adrianb/face-alignment
It can detect faces and extract the landsmarks. landmarks -> binary face hull -> hull expansion -> blur = 4th channel

I guess if this gets merged, people can create extensions for the different mask generation methods, so no need to include a specific method in this PR.

@Shondoit
Copy link
Contributor Author

Shondoit commented Jan 21, 2023

One issue I see with this PR: The latent sample shape is always the same (I think?), so why not hardcode it and leave the sampling method logic unchanged?

It's definitely different between 512 and 768 models (64 vs 96).
Also, I'm not entirely sure what happens if you use non-square aspect ratios. But it's good to be future prove; if there ever comes a 1024 model, it won't break.

I guess if this gets merged, people can create extensions for the different mask generation methods, so no need to include a specific method in this PR.

Yeah, that was the idea from the start. Keep this simple and let people do whatever they want. Tooling will either be external or extensions. (Or perhaps one very basic one in the preprocessing tab.)

@Shondoit
Copy link
Contributor Author

@AUTOMATIC1111 Is there something I should do or consider before you want to implement this?

Pre-empting the question "can't this be an extension?", it's 99% based on normal TI/HN learning code. Only the Loss function is tweaked. I think it would be a waste to copy/paste everything in an extension. And I'm not sure it's easy to hijack only one line of WebUI code in the TI/HN training functions. Besides, it has no impact for users that don't use the function.

@Nacurutu
Copy link

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

@Shondoit
Copy link
Contributor Author

@Nacurutu

thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

This is just another feature that can be used for training. It's not magically going to fix everything.
That being said, I've found that setting the focus on the face, it trains less on the background, so you need less tricks to coerce the training to train on the face. (I.e., proper captioning, masking out the background, etc.) With this you can mask out the object/person you want to train on and it will focus it's training on that.

@Nacurutu
Copy link

This is just another feature that can be used for training. It's not magically going to fix everything. That being said, I've found that setting the focus on the face, it trains less on the background, so you need less tricks to coerce the training to train on the face. (I.e., proper captioning, masking out the background, etc.) With this you can mask out the object/person you want to train on and it will focus it's training on that.

Thank you very much for your answer, I'm going to give it a try. Looks promising.

@FurkanGozukara
Copy link

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

you can watch my video i got pretty good results already

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

@Nacurutu
Copy link

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

you can watch my video i got pretty good results already

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

I did, great video btw... my training got better but still with issues.. I'm still trying with different datasets and parameters, making test, but I only have 6 gb of VRAM (1660ti)... I have to spend a lot of time on these tests.. :p

@Nacurutu
Copy link

Nacurutu commented Jan 24, 2023

It could be due to settings as well

I have shared dataset in this video I wonder if you can replicate : https://youtu.be/dNOpWt-epdQ

thank you, I'm going to try when I get some free time from work. will let u know the results after the test.

But honestly, I'm waiting for this PR to get merged and try this option.

@FurkanGozukara
Copy link

@Shondoit can we make stable diffusion to support transparent images?

It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

@Shondoit
Copy link
Contributor Author

@FurkanGozukara

Can we make stable diffusion to support transparent images?
It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

Not entirely sure what you're trying to ask here.
Currently, when an alpha image is loaded for img2img, the transparency is removed and filled with a default color. (Settings > Stable Diffusion > With img2img, fill image's transparent parts with this color)

What do you mean exactly when you say "it's problematic"?

@FurkanGozukara
Copy link

@FurkanGozukara

Can we make stable diffusion to support transparent images?
It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

Not entirely sure what you're trying to ask here. Currently, when an alpha image is loaded for img2img, the transparency is removed and filled with a default color. (Settings > Stable Diffusion > With img2img, fill image's transparent parts with this color)

What do you mean exactly when you say "it's problematic"?

let me show example
original image which has semi transparent pixels and background

image

once upscaled here

image

@Shondoit
Copy link
Contributor Author

Shondoit commented Feb 1, 2023

Yeah, SD has no concept of transparency. So if you're importing an image into the model the transparency will be lost. Currently no way around it.

@AUTOMATIC1111 AUTOMATIC1111 merged commit e452fac into AUTOMATIC1111:master Feb 19, 2023
AUTOMATIC1111 added a commit that referenced this pull request Feb 19, 2023
@FurkanGozukara
Copy link

@Shondoit is the first message up to date?
i would like to prepare a tutorial for this

ty so much

@Shondoit
Copy link
Contributor Author

@FurkanGozukara The PR has been added as-is, with a minor fix, so it should work the same. If you need any more details or explanation DM me on Discord.

@Vynavo
Copy link

Vynavo commented Feb 20, 2023

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

@llMUFASAll
Copy link

llMUFASAll commented Feb 24, 2023

Can you expand on how step 3 works? And is there a link to your discord?
Also the "use png alpha as loss weight" option is not there.

@zrichz
Copy link

zrichz commented Feb 26, 2023

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

@Vynavo
Copy link

Vynavo commented Feb 26, 2023

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?
I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.
I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

I've since spoken to Shondoit and since step 3 is a little bit open for interpretation. I'd like to clarify for everyone who might have some problems with the last step.

In step 3 you are supposed to put your "map" into the alpha channel of your png. In this case it is not important to have the alpha channel saved as a channel that you can edit again after reopening the picture in Photoshop/GIMP, but it is important to save the transparency. The transparency is the key element here, so your "map" should not remove complete parts of your image if they have any valuable information. So when drawing the map/ generating the map it makes sense to not use black but values between white and gray.

The feature "Use PNG alpha channel as loss weight" removes transparency in the learning progress, but uses it's information afterwards for the loss.

@FurkanGozukara
Copy link

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?
I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.
I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

I've since spoken to Shondoit and since step 3 is a little bit open for interpretation. I'd like to clarify for everyone who might have some problems with the last step.

In step 3 you are supposed to put your "map" into the alpha channel of your png. In this case it is not important to have the alpha channel saved as a channel that you can edit again after reopening the picture in Photoshop/GIMP, but it is important to save the transparency. The transparency is the key element here, so your "map" should not remove complete parts of your image if they have any valuable information. So when drawing the map/ generating the map it makes sense to not use black but values between white and gray.

The feature "Use PNG alpha channel as loss weight" removes transparency in the learning progress, but uses it's information afterwards for the loss.

can you show what images you prepared and what did you give to the web ui? screenshots?

also did you get better results?

@Totoro-Li
Copy link

Totoro-Li commented Feb 28, 2023

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

You may as well use Pillow for a Pythonic way, here's an example

import os
from PIL import Image

src_dir="train_data/orig"
mask_dir="train_data/mask"
dest_dir="train_data/train"

# Create dest_dir if it does not exist
if not os.path.exists(dest_dir):
    os.makedirs(dest_dir)
for file in os.listdir(src_dir):
    if file.endswith(".png"):
        if os.path.exists(os.path.join(mask_dir, file.replace(".png", "-0000.png"))):
            img = Image.open(os.path.join(src_dir, file))
            mask = Image.open(os.path.join(mask_dir, file.replace(".png", "-0000.png")))
            # use mask as alpha channel of image
            # convert mask to "L" image
            # Convert the mask to RGBA mode with 255 as alpha value
            if img.mode != "RGBA":
                img = img.convert("RGBA")
            if mask.mode != "L":
                # Scale the values of the mask image to fit within 0-255 range
                max_value = max(mask.getextrema())
                scale_factor = 255 / max_value
                mask = mask.point(lambda x: x * scale_factor)
                # Convert the mask image to a different mode that supports alpha channel
                mask = mask.convert("L")

            # Use mask as alpha channel of img
            img.putalpha(mask)
            img.save(os.path.join(dest_dir, file))
            # save mask
            # mask.save(os.path.join(dest_dir, file.replace(".png", "-0000.png")))

Just remember to scale the alpha pixels to 8 bit before converting image mode, or it may get a cut-off for those over 255. For filename, -0000 is just output file format I chose for masking procedure

@polsetes
Copy link

polsetes commented Mar 7, 2023

Forgive me for my ignorance but...

Is a PNG with alpha channel the same as a PNG with transparency?

I've tried different save and export methods in Photoshop and when reopening or previewing externally... all have lost the masked information. In other words, everything that the "original aplha channel" masks... is empty in the exported RGB channels. Is there a way to keep the RGB channels intact and add the alpha channel to the PNG? The only thing I have found looking for has been that it should be recorded in 32 bits (8 bits per channel RGBA) But photoshop only allows me 24 bits with transparency.

If this were true, when training a textual inversion... Would it still be necessary to describe the content of what the transparency hides?

@Shondoit
Copy link
Contributor Author

Shondoit commented Mar 8, 2023

@polsetes Be aware that Layer Mask and Alpha Channel are different things. I'll be honest, I use GIMP for photo editing, so I'm not familiar with Photoshop.

@baobabKoodaa
Copy link

I'm not familiar with how PNG is implemented. How can I verify if the transparency in my PNGs is in the correct format expected by A1111 webui?

@ghost
Copy link

ghost commented Jun 24, 2023

I followed these steps.

  1. Preprocess images normally.
  2. Manually draw a map
  3. Manually in Photoshop or GIMP

Here's my png, is this correct or incorrect?

01

I tried training but my losses are the same without enabling "PNG alpha channel as loss weight".

@ksai2324
Copy link

Hello,

thanks for the good work! I'm trying to do the same approach for fine-tuning sd model with LoRa.
I'm stuck at resizing the weight map to match the latent sample. Could you please explain your thinking process behind that?
Because now, following your same approach to do it, I'm getting shape error. Because the latent sample is of size (4,64,64) and my mask map is (1,512,512), meaning they have different number of elements.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet