Skip to content

Class Explained

ArrowM edited this page Mar 20, 2023 · 3 revisions

When you train with captions, the model will associate each input images with each and every token in the associated caption. This is undesirable if you are training a specific subject like a person.

For example, let's say I am making a model of Arnold Schwarzenegger. I have 3 instance images of him with the following captions:

(1) Arnold Schwarzenegger, man, sunglasses, black coat, muscular, scene from Terminator, (2) Arnold Schwarzenegger, man, suit, red tie, smiling, (3) Arnold Schwarzenegger, muscular, smiling, man, flexing, black and white

If I trained on these images without class images, the output model could probably make good images of Arnold. However, the rest of the words in my captions will also be influenced. Using the prompt man on the output model, will output men that look like Arnold. The more frequently a word shows up through all of my captions, the more it will be influenced. Rarer words are more susceptible to training, so man should take more training to be influenced than Governator.

This is where class images come in. Let's say I reuse the same captions with the following configuration:

  • Instance Token = Arnold Schwarzenegger
  • Class Token = man
  • Instance Prompt = [filewords]
  • Class Prompt = [filewords]
  • # Class Images = 1

Your captions will be processed into the following class captions:

(1) man, sunglasses, black coat, muscular, scene from Terminator, (2) man, suit, red tie, smiling, (3) muscular, smiling, man, flexing, black and white

The extension will generate 1 class image for each instance image. These class image and caption pairs will be fed into dreambooth alongside your instance images. Now the word man will be trained on Arnold and the other 3 (presumably random) men in the class images. This should help retain the source model's concept of a man.

If you have a small dataset or do a large amount of training, 1 class image per instance image may be insufficient. In the Arnold example, man would start looking like Arnold or the men in the class images. If all the class image men are bald, man would be biases towards bald men. By using a greater # of class images, we can include more pictures of randomly generated men, which will capture more of the source model's concept of a man and reduce the amount of bias in the trained model.

Lastly, a couple important notes on the implementation of class pictures. Each Instance image and 1 of its associated Class image is fed into dreambooth during an epoch. (there's some added complexity when using multiple image buckets, but you can ignore this). Increasing will the number of class images will not increase the ratio of class to instance images that are fed into dreambooth. The weight of class images is controlled by the Prior Weight Loss field in the Advanced section of the Settings tab.

  • Prior Weight Loss = 1.0 (probably too high, stuff will mostly look like your class images)
  • Prior Weight Loss = 0.01 (too low, class images will be ignored)
  • Prior Weight Loss = 0.15 - 0.4 (this is typically the range I stay in)