Skip to content

Image Generation

Nikita K edited this page Oct 7, 2023 · 15 revisions

Installation

So, for generating images, we will need a GUI for working with Stable Diffusion. I will use the most popular one - AUTOMATIC 1111. You can use any other, but keep in mind that they have both similarities and differences, so not all of the methods and features I will describe here may work in other GUI.

The installation and setup of the GUI are beyond the scope of this guide and involve many nuances. Once again, you can use Google Colab or RunPod if your GPU is slow. You can either install a clean GUI following the instructions provided on its page or use a third-party installer. I can recommend the A1111 Web UI Installer, which I personally use. However, note that it is outdated, and its creators suggest using Stability Matrix instead. Nevertheless, it still works and is less overloaded with features compared to the latest one, so it can still be a good choice.

Also, make sure to read about possible command-line arguments and pay special attention to optimizations. Unfortunately, optimization involves not only launch parameters but also certain system settings. Among all the launch parameters for optimization, I use --opt-sdp-no-mem-attention. Additionally, I had to disable Hardware Acceleration in Windows settings. Make sure to update your GPU drivers as well!


GUI Preparation

Go to the Settings tab, click on Show All Pages, and then use Ctrl + F to search for quicksettings. Select two items - sd_vae and CLIP_stop_at_last_layers. Afterward, at the top click on Apply Settings and then Reload UI.

After reloading the UI, you will see that two new settings have appeared at the top of the page.

  • Clip Skip has been discussed in the comparison, so simply set it to 2.

  • VAE is responsible for decoding the final image. You need to download vae-ft-mse-840000-ema-pruned and place it in stable-diffusion-webui\models\VAE, then click on the icon to the right of SD VAE to update the list and select the downloaded VAE.

  • Stable Diffusion Checkpoint is the checkpoint that will be used for image generation. We will dig deeper a little later.


Parameters Review

Prompt (also known as Positive Prompt - PP) is a description of what you want to generate. Negative prompt (NP) is a description of what you DO NOT want to generate. It's better to list words or phrases separated by commas in both prompts rather than writing full sentences, as in Midjourney. Additionally, you can enclose any word or phrase in parentheses and assign a weight to change its influence on the result. For example, (masterpiece:1.3) or (blue eyes:0.8) in PP or (nsfw:1.5) in NP. We will explore more complex things that can be specified in both prompts a little later.


Sampling Method (also known as Sampler) determines the method of transforming noise into an image during Sampling Steps. The sampler directly affects the result and generation time. Increasing the number of steps also increases the overall generation time. Furthermore, changing the number of steps may alter the initial image for some samplers, while others gradually improve its quality without significantly changing the overall scene.

DPM++ 2M Karras is considered to be the best sampler, but in a recent update to 1.6.0, AUTOMATIC 1111 added many new samplers, so the winner may change soon. Also, when comparing samplers, I can recommend a good video and a good article.


CFG Scale determines how strictly the neural network will follow your prompt. The higher this value, the less the neural network will deviate from your prompt. Optimal range is from 3 to 8. Going beyond this range could produce artifacts and reduce overall quality.


The Seed allows you to reproduce the result. If you specify any number greater than 0, then when generating again, if you haven't changed other settings, you will get the same previous image. If you specify -1, a random number will be automatically used under the hood.

The button with the dice icon sets the seed to -1, and the button with the recycle icon takes the seed from the last generated image.


The Width and Height settings determine the resolution of the final image. The maximum resolution at which stable results can be obtained on SD1.5 checkpoints is 768x768. Some checkpoints may produce relatively stable results at 1024x1024, but there will still be more artifacts than usual.

Since obtaining stable results in high resolution immediately is not feasible, upscaling is used to increase the resolution. Hires. fix allows us to upscale images during generation, and Upscaler is the model used for upscaling. I can also suggest you to read upscale.wiki to find more upscalers. However, I can recommend some interesting upscalers:

All the upscalers must be placed in stable-diffusion-webui\models\ESRGAN.

Hires steps determines the number of steps for upscaling (0 means the same value as in Sampling steps). Upscale By specifies how much to increase the image size. Denoising Strength controls how much the image will change during upscaling.

If you want the scene and appearance to change minimally, use a low Denoising Strength. If you want more changes in the image, increase it.

You can also upscale any images using the img2img and Extras tabs. While the second option is relatively straightforward, there are many complexities and nuances to consider with img2img. It requires separate discussion. Additionally, I recommend using Gigapixel AI for upscaling. In any case, it's better to upscale images after generation to avoid upscaling bad ones and slowing down the generation process.


You can also generate multiple images at once. Batch Size determines the number of images generated simultaneously, while Batch Count determines the total number of such batches. So, with Batch Size = 9 and Batch Count = 50, you will generate 9 * 50 images in total.

Prompt Preparation

So, to start with, we'll need to install a few extensions. The first one is Dynamic Prompts. If will help us to randomize prompts.

To install the extension, go to Extensions -> Install from URL. Reload UI after the installation.

Ok, prompt randomization. The extension allows us to add dynamic elements to prompts that will change randomly. Essentially, within curly braces, you list options separated by |, and during generation, a random option will be selected. For example, {day|night} city in {summer|winter|autumn|spring}. Additionally, when listing options like this, you can increase the frequency for specific ones. For instance, the prompt {9::day|night} will choose the option day in 9 out of 10 cases. The frequency is calculated relative to the total sum of coefficients. If a coefficient is not specified, it defaults to 1. For example, in the prompt {3::summer|4::winter|5::autumn|spring}, the option summer will have a frequency of 3 / (3 + 4 + 5 + 1).

Additionally, to avoid cluttering the prompt with long options, you can place all the options in a separate file, where each option starts on a new line. You need to put the file in stable-diffusion-webui\extensions\sd-dynamic-prompts\wildcards. For example, you can create a file named season.txt with the following content at the specified path.

summer
winter
autumn
spring

As a result, instead of {summer|winter|autumn|spring}, you can use __season__. You can view all your files in the Wildcards Manager tab. This way, you can describe many scenes, store them in a separate file, and then use it in your prompts. Generating such queries is something ChatGPT handles quite well. You can try my scenes for begining.

Among interesting prompts, I can also mention a list of names of various artists, which you can also include in your prompts.

Also, with this extension, you can enable the so-called Magic Prompt. Its essence is to add random words to COMPLETE your prompt using a model trained to generate prompts. To make it work you have leave an incomplete part at the end of your prompt, such as in the , and this model will complete the location or add something else. I recommend trying it as well! Sometimes it can be more convenient than specifying a bunch of your own dynamic prompt parts.


The second extension is Randomizer Keywords. It will help us to set generation parameters directly in the prompt. Do you already understand how both these extensions works in combination? :) For example, <cfg_scale:{3|4|5|6|7}>. But the most convenient thing is that we can specify the checkpoint directly in the prompt and choose one of the random options!

This extension is a little unknown and abandoned. So, I forked it and fixed some annoying problems as best as I could. While you can find better supported and less unknown extension - Unprompted, I still think that the first one is much better. Unprompted is overwhelmed with a lot of features most of the people will never use while some core features are too bugged. And its syntax sucks ass. All these factors led to the fact that it was easier to fix the first extension, which was dumb as a doornail in a good sense, than to deal with the second spaghetti monster.


It's time to revisit the choice of the checkpoint. As I mentioned earlier in the comparison, LoRA works differently on different checkpoints: sometimes worse, sometimes better. Therefore, I've compiled a list of checkpoints that provide the most stable and high-quality results. I've divided this list into two categories: one for photo and one for graphic. You can view them here.

You should place the checkpoints themselves in the folder stable-diffusion-webui\models\Stable-diffusion. I also highly recommend organizing the checkpoints for photo and graphic into separate subfolders.

Now, we can create wildcard files for them. If you placed the checkpoints in the photo and graphic folders, you can create a folder named checkpoints in stable-diffusion-webui\extensions\sd-dynamic-prompts\wildcards and add two files: photo.txt and graphic.txt. In each of these files, you need to specify the relative paths to the corresponding checkpoints. For example, graphic\28DSTABLEBESTVERSION_28dv5.safetensors or photo\absolutereality_v181.safetensors. Here you can download the corresponding files for the list of checkpoints I mentioned.

The lists of links and templates will be periodically updated.

In the end, we can use a random checkpoint like this: <checkpoint:{__checkpoints/photo__|__checkpoints/graphic__}>. Once again, you can assign weights if you want to get photo more often than graphic or vice versa.


To apply a trained model, you need to place it in the stable-diffusion-webui\models\Lora and add <lora:*model_filename*:*weight*> to the prompt (for example, <lora:final_28d28:0.9>). The weight determines how strongly the model will influence the result.

I don't recommend going beyond the range from 0.8 to 1.2.

Also, you need to add a Trigger Prompt to the prompt (for example, (blh woman:1.2)).

As you can see, you shouldn't overdo the weight for this prompt either.


Another interesting thing you can use in both prompts is the so-called Textual Inversion models. In essence, these models accumulate knowledge about what should/shouldn't be generated. For example, you can train a model on images of poorly drawn hands and name it bad-hands. Then, instead of a negative prompt like bad hands, extra fingers..., you can use the name of this model, bad-hands, in your prompt. So, you'll receive results with bad hands much rarely.

These models should be placed in the stable-diffusion-webui\embeddings folder. I recommend using at least these:

In the end, with time, you will develop your ideal prompt.

X/Y/Z Plot

This is a very convenient feature that allows you to make various comparisons. For example, it's useful for selecting the best model epoch, as I did in the comparisons.

To start, set all the generation parameters, including the positive and negative prompts, don't forget to specify LoRA and Trigger Prompt. Now, let the magic begin. Let's say we have 10 models that we want to compare. It's crucial that they have names that fit a common template so you'll have to rename the last model from something like model to model-000010. For example, in the prompt, we use the following LoRA: <lora:model-000001:0.9>.

Open Script -> X/Y/Z Plot. In the field with the type, choose Prompt S/R. S/R stands for Search and Replace. Its essence is that it looks for the first value from the list and in each subsequent step replaces it with the next one. Thus, in the prompt, it will first be <lora:model-000001:0.9>, then <lora:model-000002:0.9>, and so on. If you want to change something in the prompt, use Prompt S/R, and if it's a generation parameter, look for it in the list. You can change 3 parameters at once - 1 parameter for each axe.

Be sure to give it a try!

Image Generation

Don't forget to turn off X/Y/Z Plot and let's proceed with automation.

So, let's assume we trained models on both checkpoints and selected epochs 8, 9, and 10 for each. Then, the final prompt might look something like this:

<lora:blh_final_{rv20|28d28} ({8|9|10}):{0.9|1.0|1.1}>
<checkpoint:{__checkpoints/graphic__|__checkpoints/photo__}>
<height:{9::768|1024}>
<width:{9::768|1024}>
{<sampler_name:Euler a><steps:50>|3::<sampler_name:DPM++ 2M Karras><steps:25>}
<cfg_scale:{3|4|5|6|7}>

(blh woman:1.2)

(masterpiece:1.3), portrait, closeup, __scene__, (by __artist__:{3::0|1.5})

Therefore, we will select one of the three epochs for one of the two models. For this model, we will choose one of the three weights. We also randomly select a checkpoint. In one out of every ten cases, we increase the height or width of the image to 1024 pixels. We also randomly choose a sampler with the number of steps and CFG Scale. In the main part of the prompt, a random scene from the file will be selected, and with a 25% probability, the style of an artist will be applied.

It's very important to set Batch Size to a value greater than 1 (I recommend 6, or 9). You should use it because extension will constantly switch between checkpoints, and when switching, the checkpoint is loaded into VRAM, which takes time. As a result, if you generate images one by one, changing the checkpoint may take as long as image generation.

Now all that's left is to right-click on the Generate button, select Generate Forever to enable infinite generation, and wait until you've had enough :) Use Cancel Generate Forever button to stop infinite generation. You can also set high Batch Count as an alternative. That's it, we've automated the process as best as we could!


Next - Useful Links

Clone this wiki locally