Skip to content

ELI5 Training

Zuxier edited this page Jan 21, 2023 · 9 revisions

Preamble

These settings are a starting point and are probably not optimal. For better results, read the rest of the wiki and do your own testing to get a better understanding of how to create datasets and configure the settings.
The settings on this are for ≥ 12gb VRAM. See the Low VRAM page if you have less.

0. Installation

Installation page.

1. Create a dataset

The most important step - the other settings do not matter if you have bad dataset images. Place your images in 1 directory.

Source Images

What makes good training images?

  • High resolution - images downscaled to 512px squares (or similar, see Bucketing page).
  • Unobstructed view of the training subject
  • Simple composition
  • Variety - If you are training a person, more background/lighting/clothing/facial expression/pose variations are better

What makes bad training images?

  • Low resolution - Do not upscale training images or use images with visible compression.
  • Bad cropping - if your inputs are close-ups, the model outputs will be close-ups
  • Multiple subjects
  • Duplicates or high similarity
  • Images where the subject is not the main focus

DreamBooth with "learn" the whole image (objects,colors,sharpness,background,etc) and associate them with your keyword. Your goal is maximize the overlap of your subject in the images and minimize the overlap of everything else.

Preprocess

⚠️ This is the simplest way of creating training sets. New commits have added bucketing, which allows you to use any resolution of input images. The process outlined here is just one option.

Caption each photo in a text file that matches the photo name (e.g. dog1.png & dog1.txt). We recommend the following process:

  1. Open the Train tab of the A1111 webui, then go to the Preprocess images section.
  2. Set your source and destination directories (don't use the same path for both).
  3. Pick your settings:
    1. Create flipped copies - unchecked.
    2. Split oversized images - check to split long photos into multiple squares.
    3. Auto focal point crop - check to focus crops around faces.
    4. Use BLIP for caption - check to auto-caption real pics, produces captions like man standing next to chair, red chair.
    5. Use deepbooru for caption - check to auto-caption anime pics, produces captions like 1girl,solo,looking at viewer,hairclip. (may be useful for real pics too).
  4. Hit preprocess
  5. Go to your destination directory and review the results:
    1. Remove or redo bad crops.
    2. Tweak bad captions. Caption words will also be trained, like your keyword, so avoid heavy repetition. See Captioning page for more details.

2. Create a model

Create tab

Set Name of your model name (this is separate from keyword you are training).
Select Source Checkpoint. This is the model your custom model is branching from.
Hit Create Model. This will add a new model to the training model list on the right and select it.
Switching between models in the training mode list will not load settings. To load settings you must select a model then hit Load Settings.

3. Settings

Settings tab

Hit Performance Wizard.

General

Uncheck Generate Classification Images Using txt2img

Intervals

Set Training Steps Per Image (Epochs) to the result of 10,000/(# dataset pics). This is how many additional epochs will run when you hit train.
Set Amount of time to pause between Epochs to 0.
Set Save Preview(s) Frequency (Epochs) to 0.

Learning Rate

Set Learning Rate to 1e-6.

Advanced

Check Use EMA.
Set Step Ratio of Text Encoder Training to 0.20 if you are training a style, or 0.65 if you are training a subject.

Concepts tab

DO NOT hit training wizard on this tab.

Directories

Set Dataset Directory to your dataset path.

Filewords

Set Instance Token. This will be the keyword you use in prompts after completing your model. This should be a unique word or phrase. Here are some examples:
alexa - bad. Your source model already has many association for alexa.
sks123 - bad. This is actually interpreted as sks 1 2 3, you may as well pick something like alexa.
timwalz - okay. Combining first and last will give a more unique string than just a first name and is still easy to remember.
ohwx - great. Taken from the bottom of this list. Totally unique, but hard to remember. But if you don't use it, you will regret it.

Set Class Token. This is the initializer for your token. If you are training on pictures of your dog, you would put dog.

Prompts

Set Instance Prompt to [filewords].
Set Class Prompt to [filewords].

Image Generation

Set Class Images Per Instance Image to 5.

If you are training multiple concepts, repeat these steps on the other Concept tabs with different tokens.

Saving tab

Check all the boxes in the Checkpoints section. uncheck all the boxes in the Diffusion Weights section.

4. Train

Training is VRAM intensive. If you have used the WebUI for anything besides Dreambooth, it is recommended that your restart the A1111 project before training (just the backend, you don't need to refresh the webpage).

Hit the orange Train button, the red is for nukes!

Training will stop when it reaches the # of epochs specified in Training Steps Per Image (Epochs). You can also hit Cancel to stop it early. At the time of writing, the UI may freeze, but the training process will continue to run. You can check the true progress in the console running the A1111 project If there is an error that stops training, consult the Troubleshooting section of the wiki.

5. Test results

After training completes, refresh your checkpoint list. You should see your new checkpoint(s) named with the pattern {modelname}_{step #}. If you used Save Frequency, you will see multiple checkpoints with different step counts. More step = more training.

Select your model from the checkpoint list, create a prompt with your keyword (e.g. "a photo of timwalz"), and generate several images. Repeat this process for a few prompts.

If the output does not resemble your training subject, you may need to train for longer (select your dreambooth model, load settings, set Training Steps Per Image (Epochs) to 10 or 20, train, check results, and repeat until you are happy).

If the output has weird textures or everything in the images starts to look like your training subject, you trained too much. Check one of your checkpoints with fewer steps. You can also check the rest of this wiki for more ideas on how to improve your settings.

6. Other info

Getting a great model may take multiple attempts. For the best results, read the rest of the wiki which will tell you how to improve your dataset, finetune your settings, and troubleshoot.
You can generate a checkpoint anytime with the Generate Checkpoint button. You can also use your custom checkpoints as Source Checkpoints if you want to resume training from a previous step count.