Skip to content

Model Training ‐ Comparison ‐ Introduction

Nikita K edited this page Oct 7, 2023 · 2 revisions

So, I've spent countless hours training models and generating images. However, unlike others who have embarked on this journey, I won't just share the final conslusions with you. I'll also demonstrate all the intermediate steps and provide my subjective insights. But the ultimate conclusion will be up to you!

During the comparison process, we will:

  • understand the meaning of each of the discussed parameters, how changing them affect the results, the training time and the system resource consumption;

  • see how the base model performs and try to improve its quality by identifying the most optimal and versatile parameters values.

By the base model I mean a model trained on an empirically determined and partially Googled set of parameters. And at the time of starting this guide I didn't understand meaning of some of them. However, the results consistently turned out to be good with this approach.

You can view these settings here, but you are unlikely to need them, since we will optimize them further.

All the models trained in this guide have the same Seed to ensure that the models can be reproduced and results are less affected by the random. However, there was one nuance.

I trained two models using these base settings. You might wonder why two? Initially, I trained one model. However, the training process for all the subsequent models with modified parameters extended for quite some time. At one point, their characteristics started to significantly deviate from the base model. I trained the second model with the same settings, and my suspicions were confirmed - it produced slightly different results and had slightly different characteristics. This is partly due to the nature of adaptive optimizers because they can yield different results even with the same seed. It just seemed strange to me that this divergence occurred so suddenly: as if for several days, the model remains consistent, and then it starts to significantly deviate from more recent ones. In any case, to somewhat mitigate the difference between the two models with identical parameters, we will look at both of them side by side on most of the graphs and grids in the following comparisons.


Considering the complexity of the topic and the interrelatedness of almost all the terms and parameters, let's start by discussing the key ones.

We will train the standard LoRA model (also known as LoRA for Linear Layers). However, there are other types of these models developed by the project LyCORIS. Some of them are more suitable for training a specific style rather than a character, while others may solve more specific tasks. In my experience and the guides I've read, these alternative models didn't outperform the standard ones in terms of quality. Given that they also require separate configuration, we will skip them to simplify the comparison.

The next important parameter that requires its own configuration and directly impacts the quality of the result is the optimizer. The optimizer is responsible for finding the best values for the parameters of the trainable model. In simple terms, optimizers differ in the algorithms they use to search for these best values. There are also two major groups of optimizers: adaptive and non-adaptive. The former dynamically adjust the Learning Rate (LR) based on various parameters, while the latter use a constant LR. LR, in simple terms, controls the model's learning strength. Drawing a parallel with real life, with low LR you can spend the whole semester preparing for an exam, studying one topic at a time every day, or with high LR you can try to cram all the material in one night. In theory, result should be the same. So, to avoid explicitly setting this critically important parameter manually, we will use the adaptive optimizer DAdaptAdam. It has performed well in my experiments and is generally recommended.


We will start by comparing the key parameters that either directly influence the result or raise questions. In another section I will separately highlight some important parameters that don't require comparison. You are unlikely to need to adjust the remaining unexamined parameters, but you can read about all of them in the Kohya SS documentation.

ABOUT COMPARISON

In every comparison I will take N values of one parameter. Base model's value will be marked as B. If parameter has default value and it involves in comparison it will be marked as D. Parameters others than considered one won't be changed and will keep values from base settings.

We will conduct the comparison using graphs from Tensorboard as well as grids of images at a resolution of 768x768 from Stable Diffusion itself.

GRAPHS

DLR(step)

As it turns out, despite the absence of the need to manually set the LR, adaptive optimizers still require monitoring the dynamic LR (DLR). The initial value of DLR is set by the optimizer parameter d0, which by default is 1e-6, or 0.000001, meaning that training in all the cases starts with this DLR value.

Step corresponds to pass through Batch Size training images one time.

So, this graph will demonstrate how learning strength changes through the model training.

Loss(epoch)

Loss is the measure of inaccuracy in reproducing the training images.

Epoch corresponds to pass though all the training images Repeats times.

So, this graph will demostrate how character similarity changes through the model training. However, this graph doesn't make sense without a context since loss changing doesn't necessarily have a direct positive or negative impact on the result. So, we will consider it making final comparison conclusion looking at all the graphs and grids.

GRIDS

For the grids, we will generate images using Dreamshaper 8 (DS) and Epic Realism - Pure Evolution V1 (ER). DS is responsible for graphic images meanwhile ER is responsible for photorealistic images.

You can save model every N epochs or steps during training. I will use this feature to save model every 10% of the training. It will allow us to see how fast and how well the model learns.

So, on both DS and ER checkpoints we will generate these 3 grids most of the time:

  1. fixed seed, simple prompt - 1 image for models saved every 10% of the training;
  2. fixed seed, complex prompt - 1 image for models saved every 10% of the training;
  3. random seed, random prompt - 9 images for fully trained models.

Every grid will also have base model results highlighted with red frame.

The Fixed Seed implies that the base images in these comparisons will not change. This allows us to analyze how they react to changes in different training parameters throughout the whole guide.

Images with a Random Seed are necessary for a fairer comparison because it's possible that due to various circumstances, high-quality results with a fixed seed may only be achieved with the base settings. Therefore, we eliminate potential bias.

Simple Prompt is just (masterpiece:1.3), portrait, closeup. Mostly, it will lead to a simple reproducing of the character. Simple prompt is the same for both checkpoints.

Complex Prompt is (masterpiece:1.3), portrait, closeup, lavander field, sunrise, pink shirt for ER and (masterpiece:1.3), portrait, closeup, green mountains background, japan street, black hoodie for DS. It will help us to analyze how the model reproduces the character in a context that doesn't exist in the training dataset.

Random Prompts will help us to analyze how the model behaves in a completely random context. Additionally, for Random Prompts, I won't specify the prompts themselves, as we are primarily interested in the quality of the results relative to the base model's results.

Images with fixed seeds can be downloaded here. You can import them to AUTOMATIC 1111 PNG Info tab to view all the settings and reproduce results by yourself.


You can download both base models here. Due to cloud space limit I will upload only fully trained models.


Next

Short Way - Model Training ‐ Comparison - Brief

Long Way - Model Training ‐ Comparison - [Growth Rate]

Clone this wiki locally