Weighted learning of TIs and HNs #6700

Shondoit · 2023-01-13T15:28:09Z

What this pull request is trying to achieve.

This will add the ability to train Embeddings and Hypernetworks using a weighted loss.
It will add the option to load PNG alpha channels as weight masks that will be used during loss calculation, in effect focusing the attention towards the masked subject.

During my training I've found that focusing the attention in this manner results in much faster learning of subjects, as well as learning more detailed features of the subjects.

Example of what is being emphasized during learning. (Regular is for demonstration only)

Regular - 2. Depth - 3. CLIPSeg

Note that this does not interfere with Dreambooth or regular image generation.
Only during the loss calculation for TIs and HNs is this adjusted loss calculated.

In the future this could be extended to load side-by-side greyscale files, for example depth maps.
I.e. if you have image.png and image.weight.png in the same folder, it will load one as the weight for the other.

Additional notes and description of changes

I've added an extra hijack that adds a weighted_forward function, next to the regular forward function.
This function will do a regular forward pass, but before doing so, substitute the get_loss function and add the weight tensors to the model so the weighted_loss function can apply it.
Before calculating the mean, it will multiply the individual latent losses with the weight mask, resulting in more or less loss for pixels based on the amount of weight. Afterwards it will return the mean to the training function.

Environment this was tested in

OS: Windows
Browser: Firefox
Graphics card: NVIDIA RTX 2080 8GB

How to use this

Preprocess images normally.
Choose one of the following:
a. Generate a depth map using MiDaS
b. Generate an attention map using CLIPSeg
c. Manually draw a map
d. Create a generic gradient with falloff to the image edges.
Add the map from step 2 as an alpha channel to the image from step 1.
a. Using a fairly easy Python script and PILLOW
b. Manually in Photoshop or GIMP
Note: be sure that the alpha channel is 8-bit, not single color transparency.

In the future I would like to release a script that would do these steps automatically.
(Perhaps even add MiDaS generation to the preprocess step)

Screenshots

Results

Haven't found time for extensive training runs and comparison images.
But I have received positive feedback from multiple people on Discord.

Timing:
No significant impact.

Without:
Loading dataset: 4.70it/s
Training: 2.95it/s

With:
Loading dataset: 4.70it/s
Training: 2.94it/s

Shondoit · 2023-01-14T11:26:53Z

Some interesting links to get weight maps:
https://github.com/isl-org/MiDaS
https://github.com/danielgatis/rembg
https://github.com/graemeniedermayer/clothseg
https://github.com/xuebinqin/U-2-Net
https://github.com/pymatting/pymatting

mykeehu · 2023-01-14T21:54:48Z

This is similar to the Attention Map option of the DreamArtist extension, it is only for embeddings, not for HN. Did you look at that?

Shondoit · 2023-01-15T14:51:38Z

This is similar to the Attention Map option of the DreamArtist extension, it is only for embeddings, not for HN. Did you look at that?

I have not, but I will check it out. Thanks for the reference.

FurkanGozukara · 2023-01-16T13:51:52Z

With the default way what learn rate do you suggest to teach a face? @Shondoit

Shondoit · 2023-01-16T15:45:55Z

Currently there's no consensus. I've found the default (0.005) to be good enough.

mykeehu · 2023-01-16T16:20:46Z

With the default way what learn rate do you suggest to teach a face? @Shondoit

This is the latest video on the subject, worth watching:
https://www.youtube.com/watch?v=2ityl_dNRNw

FurkanGozukara · 2023-01-16T17:25:07Z

@mykeehu in that video how to determine number of vectors is still not explained. he gives some numbers but no backing up how he did come up with those numbers. i really need to understand how vectors are utilized.

also he used image descriptions but when training those prompts are not even used lol :)

mykeehu · 2023-01-16T17:38:36Z

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger. We have talked a lot about the details in this and this thread.
Keywords are extracted from the original thing, so you have something to compare what the system still sees in the image to. So you can type the keywords in the prompt afterwards, but what's not there is remembered by the training.

FurkanGozukara · 2023-01-16T17:46:11Z

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger. We have talked a lot about the details in this and this thread. Keywords are extracted from the original thing, so you have something to compare what the system still sees in the image to. So you can type the keywords in the prompt afterwards, but what's not there is remembered by the training.

it shows last used prompt. in my case it didnt read the image captions generated by pre processing. it just used image name and the model name can you verify that?

Shondoit · 2023-01-16T19:17:31Z

@mykeehu

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger.

This rule of thumb doesn't make sense at all. If you have 10 picture or 1000 pictures, if they're all of the same subject, we want it to learn the same result.
It makes more sense to say 1 vector per subject complexity. I.e., a complex subject might need an extra vector, or an extra subject might need 2x the vectors.
Keep in mind that "Emma Watson" consists of just 2 vectors and is trained on multiple thousands of images.

More training steps per image in the dataset does make sense though, cause it needs more time to come to a consensus.

FurkanGozukara · 2023-01-16T19:34:42Z

@mykeehu

In TI training, we use an average of 5 images per vector. You can set up more vectors, in which case the embedding will be stronger.

This rule of thumb doesn't make sense at all. If you have 10 picture or 1000 pictures, if they're all of the same subject, we want it to learn the same result. It makes more sense to say 1 vector per subject complexity. I.e., a complex subject might need an extra vector, or an extra subject might need 2x the vectors. Keep in mind that "Emma Watson" consists of just 2 vectors and is trained on multiple thousands of images.

More training steps per image in the dataset does make sense though, cause it needs more time to come to a consensus.

ok now makes more sense. so for a single person we can say 2 is at max is sufficient

kidol · 2023-01-21T14:36:49Z

One issue I see with this PR: The latent sample shape is always the same (I think?), so why not hardcode it and leave the sampling method logic unchanged?

Other than that, I tested this and it works as described. Not used TI or Hypernetworks much recently, so can't say anything about training speed, but it should be obvious that a weighted loss trains faster.

I've used this lib to generate the masks: https://github.com/1adrianb/face-alignment
It can detect faces and extract the landsmarks. landmarks -> binary face hull -> hull expansion -> blur = 4th channel

I guess if this gets merged, people can create extensions for the different mask generation methods, so no need to include a specific method in this PR.

Shondoit · 2023-01-21T14:52:51Z

One issue I see with this PR: The latent sample shape is always the same (I think?), so why not hardcode it and leave the sampling method logic unchanged?

It's definitely different between 512 and 768 models (64 vs 96).
Also, I'm not entirely sure what happens if you use non-square aspect ratios. But it's good to be future prove; if there ever comes a 1024 model, it won't break.

I guess if this gets merged, people can create extensions for the different mask generation methods, so no need to include a specific method in this PR.

Yeah, that was the idea from the start. Keep this simple and let people do whatever they want. Tooling will either be external or extensions. (Or perhaps one very basic one in the preprocessing tab.)

Shondoit · 2023-01-23T09:03:42Z

@AUTOMATIC1111 Is there something I should do or consider before you want to implement this?

Pre-empting the question "can't this be an extension?", it's 99% based on normal TI/HN learning code. Only the Loss function is tweaked. I think it would be a waste to copy/paste everything in an extension. And I'm not sure it's easy to hijack only one line of WebUI code in the TI/HN training functions. Besides, it has no impact for users that don't use the function.

Nacurutu · 2023-01-23T10:48:20Z

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

Shondoit · 2023-01-23T11:14:00Z

@Nacurutu

thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

This is just another feature that can be used for training. It's not magically going to fix everything.
That being said, I've found that setting the focus on the face, it trains less on the background, so you need less tricks to coerce the training to train on the face. (I.e., proper captioning, masking out the background, etc.) With this you can mask out the object/person you want to train on and it will focus it's training on that.

Nacurutu · 2023-01-23T12:00:11Z

This is just another feature that can be used for training. It's not magically going to fix everything. That being said, I've found that setting the focus on the face, it trains less on the background, so you need less tricks to coerce the training to train on the face. (I.e., proper captioning, masking out the background, etc.) With this you can mask out the object/person you want to train on and it will focus it's training on that.

Thank you very much for your answer, I'm going to give it a try. Looks promising.

FurkanGozukara · 2023-01-23T14:58:51Z

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

you can watch my video i got pretty good results already

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

Nacurutu · 2023-01-23T15:58:13Z

@Shondoit thanks for this... May I ask, with this we get a real improvement when training faces? Im trying to train faces but I'm not getting the results I expect.

you can watch my video i got pretty good results already

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

I did, great video btw... my training got better but still with issues.. I'm still trying with different datasets and parameters, making test, but I only have 6 gb of VRAM (1660ti)... I have to spend a lot of time on these tests.. :p

Nacurutu · 2023-01-24T12:19:35Z

It could be due to settings as well

I have shared dataset in this video I wonder if you can replicate : https://youtu.be/dNOpWt-epdQ

thank you, I'm going to try when I get some free time from work. will let u know the results after the test.

But honestly, I'm waiting for this PR to get merged and try this option.

FurkanGozukara · 2023-01-26T13:59:59Z

@Shondoit can we make stable diffusion to support transparent images?

It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

Shondoit · 2023-01-26T14:55:47Z

@FurkanGozukara

Can we make stable diffusion to support transparent images?
It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

Not entirely sure what you're trying to ask here.
Currently, when an alpha image is loaded for img2img, the transparency is removed and filled with a default color. (Settings > Stable Diffusion > With img2img, fill image's transparent parts with this color)

What do you mean exactly when you say "it's problematic"?

FurkanGozukara · 2023-01-31T17:17:52Z

@FurkanGozukara

Can we make stable diffusion to support transparent images?
It is extremely problematic to use img2img and some other stuff if there are semi transparent pixels

Not entirely sure what you're trying to ask here. Currently, when an alpha image is loaded for img2img, the transparency is removed and filled with a default color. (Settings > Stable Diffusion > With img2img, fill image's transparent parts with this color)

What do you mean exactly when you say "it's problematic"?

let me show example
original image which has semi transparent pixels and background

once upscaled here

Shondoit · 2023-02-01T13:27:59Z

Yeah, SD has no concept of transparency. So if you're importing an image into the model the transparency will be lost. Currently no way around it.

FurkanGozukara · 2023-02-19T15:24:27Z

@Shondoit is the first message up to date?
i would like to prepare a tutorial for this

ty so much

Shondoit · 2023-02-19T16:20:13Z

@FurkanGozukara The PR has been added as-is, with a minor fix, so it should work the same. If you need any more details or explanation DM me on Discord.

Vynavo · 2023-02-20T18:13:18Z

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

llMUFASAll · 2023-02-24T20:27:43Z

Can you expand on how step 3 works? And is there a link to your discord?
Also the "use png alpha as loss weight" option is not there.

zrichz · 2023-02-26T12:26:00Z

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

Vynavo · 2023-02-26T15:04:23Z

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?
I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.
I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

I've since spoken to Shondoit and since step 3 is a little bit open for interpretation. I'd like to clarify for everyone who might have some problems with the last step.

In step 3 you are supposed to put your "map" into the alpha channel of your png. In this case it is not important to have the alpha channel saved as a channel that you can edit again after reopening the picture in Photoshop/GIMP, but it is important to save the transparency. The transparency is the key element here, so your "map" should not remove complete parts of your image if they have any valuable information. So when drawing the map/ generating the map it makes sense to not use black but values between white and gray.

The feature "Use PNG alpha channel as loss weight" removes transparency in the learning progress, but uses it's information afterwards for the loss.

FurkanGozukara · 2023-02-26T19:49:25Z

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?
I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.
I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

Just Google "export png with alpha Photoshop" there are plenty of tutorials and examples

I've since spoken to Shondoit and since step 3 is a little bit open for interpretation. I'd like to clarify for everyone who might have some problems with the last step.

In step 3 you are supposed to put your "map" into the alpha channel of your png. In this case it is not important to have the alpha channel saved as a channel that you can edit again after reopening the picture in Photoshop/GIMP, but it is important to save the transparency. The transparency is the key element here, so your "map" should not remove complete parts of your image if they have any valuable information. So when drawing the map/ generating the map it makes sense to not use black but values between white and gray.

The feature "Use PNG alpha channel as loss weight" removes transparency in the learning progress, but uses it's information afterwards for the loss.

can you show what images you prepared and what did you give to the web ui? screenshots?

also did you get better results?

Totoro-Li · 2023-02-28T12:44:25Z

@Shondoit Hello , a little bit embarrassed to ask this, but how do I put my weight map into an alpha channel and get Photoshop to actually save this?

I was able to create an alpha channel and drew my weight map, but everytime I save the png it seems like the alpha channel just disappears. So when I open the png in photoshop again, there is no alpha channel anymore and I'm not sure if this is how it's supposed to be or if it's actually not saving the alpha channel.

I would really appreciate your help, since I've been creating different TIs for some time now and while my style TIs worked out quite well, I'm trying to create one for specific clothing right now and I think the alpha channel as loss weight option, could improve the TIs learning.

You may as well use Pillow for a Pythonic way, here's an example

import os
from PIL import Image

src_dir="train_data/orig"
mask_dir="train_data/mask"
dest_dir="train_data/train"

# Create dest_dir if it does not exist
if not os.path.exists(dest_dir):
    os.makedirs(dest_dir)
for file in os.listdir(src_dir):
    if file.endswith(".png"):
        if os.path.exists(os.path.join(mask_dir, file.replace(".png", "-0000.png"))):
            img = Image.open(os.path.join(src_dir, file))
            mask = Image.open(os.path.join(mask_dir, file.replace(".png", "-0000.png")))
            # use mask as alpha channel of image
            # convert mask to "L" image
            # Convert the mask to RGBA mode with 255 as alpha value
            if img.mode != "RGBA":
                img = img.convert("RGBA")
            if mask.mode != "L":
                # Scale the values of the mask image to fit within 0-255 range
                max_value = max(mask.getextrema())
                scale_factor = 255 / max_value
                mask = mask.point(lambda x: x * scale_factor)
                # Convert the mask image to a different mode that supports alpha channel
                mask = mask.convert("L")

            # Use mask as alpha channel of img
            img.putalpha(mask)
            img.save(os.path.join(dest_dir, file))
            # save mask
            # mask.save(os.path.join(dest_dir, file.replace(".png", "-0000.png")))

Just remember to scale the alpha pixels to 8 bit before converting image mode, or it may get a cut-off for those over 255. For filename, -0000 is just output file format I chose for masking procedure

polsetes · 2023-03-07T22:01:57Z

Forgive me for my ignorance but...

Is a PNG with alpha channel the same as a PNG with transparency?

I've tried different save and export methods in Photoshop and when reopening or previewing externally... all have lost the masked information. In other words, everything that the "original aplha channel" masks... is empty in the exported RGB channels. Is there a way to keep the RGB channels intact and add the alpha channel to the PNG? The only thing I have found looking for has been that it should be recorded in 32 bits (8 bits per channel RGBA) But photoshop only allows me 24 bits with transparency.

If this were true, when training a textual inversion... Would it still be necessary to describe the content of what the transparency hides?

Shondoit · 2023-03-08T12:04:26Z

@polsetes Be aware that Layer Mask and Alpha Channel are different things. I'll be honest, I use GIMP for photo editing, so I'm not familiar with Photoshop.

baobabKoodaa · 2023-03-19T15:29:37Z

I'm not familiar with how PNG is implemented. How can I verify if the transparency in my PNGs is in the correct format expected by A1111 webui?

ghost · 2023-06-24T02:11:31Z

I followed these steps.

Preprocess images normally.
Manually draw a map
Manually in Photoshop or GIMP

Here's my png, is this correct or incorrect?

I tried training but my losses are the same without enabling "PNG alpha channel as loss weight".

ksai2324 · 2023-06-29T09:39:37Z

Hello,

thanks for the good work! I'm trying to do the same approach for fine-tuning sd model with LoRa.
I'm stuck at resizing the weight map to match the latent sample. Could you please explain your thinking process behind that?
Because now, following your same approach to do it, I'm getting shape error. Because the latent sample is of size (4,64,64) and my mask map is (1,512,512), meaning they have different number of elements.

Thanks in advance!

Shondoit changed the title ~~Weighted learning~~ Weighted learning of TIs and HNs Jan 13, 2023

Shondoit force-pushed the weighted-learning branch from 1b21c35 to bbe2b78 Compare January 16, 2023 15:55

Shondoit force-pushed the weighted-learning branch from bbe2b78 to 7fe4321 Compare January 17, 2023 16:06

Shondoit marked this pull request as ready for review January 17, 2023 16:07

Shondoit requested a review from AUTOMATIC1111 as a code owner January 17, 2023 16:07

Shondoit force-pushed the weighted-learning branch 4 times, most recently from a75fa06 to 8891d3d Compare January 20, 2023 07:59

Shondoit force-pushed the weighted-learning branch from 8891d3d to 1f40596 Compare January 23, 2023 08:57

Shondoit force-pushed the weighted-learning branch from 1f40596 to 593cfea Compare January 23, 2023 14:48

Shondoit force-pushed the weighted-learning branch from 0075d65 to 0b7d34e Compare January 26, 2023 10:24

Shondoit force-pushed the weighted-learning branch from 0b7d34e to 63caaad Compare January 31, 2023 16:37

Shondoit force-pushed the weighted-learning branch from 63caaad to 8c322d4 Compare February 8, 2023 12:05

Shondoit added 4 commits February 15, 2023 10:03

Hijack to add weighted_forward to model: return loss * weight map

c4bfd20

Add PNG alpha channel as weight maps to data entries

2164200

Call weighted_forward during training

bc50936

Add ability to choose using weighted loss or not

edb1009

Shondoit force-pushed the weighted-learning branch from 8c322d4 to edb1009 Compare February 15, 2023 09:04

AUTOMATIC1111 approved these changes Feb 19, 2023

View reviewed changes

Merge branch 'master' into weighted-learning

dfb3b8f

AUTOMATIC1111 merged commit e452fac into AUTOMATIC1111:master Feb 19, 2023

AUTOMATIC1111 added a commit that referenced this pull request Feb 19, 2023

fix for #6700

11183b4

Weighted learning of TIs and HNs #6700

Weighted learning of TIs and HNs #6700

Conversation

Shondoit commented Jan 13, 2023 • edited

What this pull request is trying to achieve.

Additional notes and description of changes

How to use this

Screenshots

Results

Shondoit commented Jan 14, 2023

mykeehu commented Jan 14, 2023

Shondoit commented Jan 15, 2023

FurkanGozukara commented Jan 16, 2023

Shondoit commented Jan 16, 2023 • edited

mykeehu commented Jan 16, 2023

FurkanGozukara commented Jan 16, 2023

mykeehu commented Jan 16, 2023

FurkanGozukara commented Jan 16, 2023

Shondoit commented Jan 16, 2023 • edited

FurkanGozukara commented Jan 16, 2023

kidol commented Jan 21, 2023 • edited

Shondoit commented Jan 21, 2023 • edited

Shondoit commented Jan 23, 2023

Nacurutu commented Jan 23, 2023

Shondoit commented Jan 23, 2023

Nacurutu commented Jan 23, 2023

FurkanGozukara commented Jan 23, 2023

Nacurutu commented Jan 23, 2023

Nacurutu commented Jan 24, 2023 • edited

FurkanGozukara commented Jan 26, 2023

Shondoit commented Jan 26, 2023

FurkanGozukara commented Jan 31, 2023

Shondoit commented Feb 1, 2023

FurkanGozukara commented Feb 19, 2023

Shondoit commented Feb 19, 2023

Vynavo commented Feb 20, 2023

llMUFASAll commented Feb 24, 2023 • edited

zrichz commented Feb 26, 2023

Vynavo commented Feb 26, 2023

FurkanGozukara commented Feb 26, 2023

Totoro-Li commented Feb 28, 2023 • edited

polsetes commented Mar 7, 2023

Shondoit commented Mar 8, 2023

baobabKoodaa commented Mar 19, 2023

ghost commented Jun 24, 2023

ksai2324 commented Jun 29, 2023

Shondoit commented Jan 13, 2023 •

edited

Shondoit commented Jan 16, 2023 •

edited

Shondoit commented Jan 16, 2023 •

edited

kidol commented Jan 21, 2023 •

edited

Shondoit commented Jan 21, 2023 •

edited

Nacurutu commented Jan 24, 2023 •

edited

llMUFASAll commented Feb 24, 2023 •

edited

Totoro-Li commented Feb 28, 2023 •

edited