# **1. Install Dependencies**

We need the packages to run the training and testing successfully as follows:
- `diffusers`, Reference: https://github.com/huggingface/diffusers, https://huggingface.co/docs/diffusers/en/index
- `kohya_ss`, Reference: https://github.com/bmaltais/kohya_ss
- `xformers` and Libs in `requirements.txt` (modified from `kohya_ss` package)

**Step 1: Create and switch to a new virtual environment in this directory for installation (Recommended).**
   - Install `Anaconda` or `Miniconda`.
   - Run the command: `conda create --prefix ./env python=3.10`
   - After the environment is created, choose `env` as the working kernel.
   
**Step 2: Install and set up all required libraries and documents.**
   - If you encounter an initialization problem, `switch the kernel to another environment, and then switch it back to the newly created one`.
   - Restart the kernel after installation.
   - You can comment back on the code lines after installation to not repeat the installation process.

**Notice: If you cannot successfully run the installation process, please manually `git clone` the `diffusers` and `kohya_ss` repos to this project's `Root Directory` and follow the command lines for further usage.**

### **1-1: Huggingface Diffusers package**
This is the official package of `Hugging Face Diffusers`, which contains many useful built-in functions to build our own pipelines for diffusion models.

In [None]:
!git clone https://github.com/huggingface/diffusers
!cd diffusers
%pip install .
!cd ..

### **1-2: Kohya_ss package**
This is the official package of `kohya_ss`, which is currently a well known training package for Stable Diffusion models, you can either setup and use its UI, or use the pipelines provided in this project.

In [None]:
!git clone --recurse-submodules https://github.com/bmaltais/kohya_ss.git

### **1-3: PyTorch and CUDA**
Please download the latest stable version of `PyTorch` and `CUDA` for your hardware system (https://pytorch.org/get-started/locally), the code here is for `Windows OS`.

In [None]:
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

### **1-4: Other libraries and installation**
The `xformers` library will massively increase performance and improve memory usage during training and testing.

Please also notice that running diffusion models requires a high level of hardware systems, especially GPU.

This project was run with `CPU: AMD Ryzen 7 7800X3D 8-Core Processor` and `GPU: NVIDIA GeForce RTX 4070 / 12g VRAM`.

In [None]:
%pip install matplotlib
%pip install xformers
%pip install -U peft
%pip install omegaconf
%pip install -r requirements.txt

**Notice: Please restart the kernel and run again if everything is installed.**

### **1-4: Customize accelerate document**
Please run the default command to create a file called `default_config.yaml` in your cache folder. It is normally located at `~/.cache/huggingface/accelerate`,

or `your environment variable HF_HOME` suffixed with `accelerate`, or `your environment variable XDG_CACHE_HOME` suffixed with `huggingface/accelerate`.

In [None]:
# !accelerate config default

Please copy and paste these configs (without the comments) to overwrite the `default_config.yaml` file in your cache folder.

Feel free to modify `existing items` if your hardware is different from the settings, but do not add `additional items` for the provided template.

In [None]:
# Modified setting for default_config.yaml:

## compute_environment: LOCAL_MACHINE
## debug: false
## distributed_type: 'NO'
## downcast_bf16: 'no'
## gpu_ids: all
## machine_rank: 0
## main_training_function: main
## mixed_precision: fp16
## num_machines: 1
## num_processes: 1
## rdzv_backend: static
## same_network: true
## tpu_env: []
## tpu_use_cluster: false
## tpu_use_sudo: false
## use_cpu: false

**Congratulations! If you are done with the setup, then we are good to do for the next step.**

# **2. Dataset and Pre-processing**

The provided dataset contains 20 selected pictures `(.jpg)` and the corresponding caption `(.txt)` of `Geralt of Rivia`, who is a very well-known character from the series `The Witcher`.

The pictures were mostly sampled from the game `The Witcher 3: Wild Hunt` which was first released in 2015 and is still the latest episode of the story. The process of data collecting is as follows:

**Step 1: Search and download gaming pictures.**
   - Searched from `Wallhaven` (https://wallhaven.cc) which has lots of pictures with good quality.
   - Selected and downloaded 35 pictures with different features of `Geralt`, e.g. his face, only upper body, full body, different poses, different angles, etc.
   - Carefully picked the 20 pictures that contain 7 faces, 10 upper-body, and 3 full-body pics for the dataset creation.

**Step 2: Pre-processing for the image data**
   - Used the tool from `Birme` (https://www.birme.net) to crop, resize, and transform the image format in batches.
   - Cropped the unrelated elements, characters, and complicated objects, and let `Geralt` be the main part of each picture.

**Step 3: Pre-processing for the caption data**
   - Used the `wd14-tagger` plugin on `webUI` (which is a well-known interface to implement various tasks with Stable Diffusion models) to mark and auto-generate captions for each picture.
   - Manually added additional captions and deleted inappropriate generated captions for all 20 pictures.
   - It is important to know here that we set the `trigger word` for our LoRA to be `geralt of rivia`, which means we expect to see a strong effect of LoRA when entering `trigger word` as part of the prompt.
   - Reference: `wd14-tagger` (https://github.com/picobyte/stable-diffusion-webui-wd14-tagger), `webUI` (https://github.com/AUTOMATIC1111/stable-diffusion-webui).

**Step 4: Place the dataset in the folder and finalize the pre-processing**
   - Created folder (called `100_geralt of rivia man`, which is already provided in this project).
   - Place all `.jpg` and `.txt` data in this folder as the dataset for LoRA training.

### **2-1: Import dependencies**
We import the following dependencies for this project, including PyTorch, Tensorboard, etc.

In [None]:
import torch
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tensorboard import notebook
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from huggingface_hub import snapshot_download
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

### **2-2: Show pre-processed dataset**
We show 6 randomly selected pictures as a demo to help you understand what the dataset actually looks like, including the `pictures of Geralt` and the `corresponding captions`.

In [None]:
# Define the image and text file paths
img_indices = [1, 5, 11, 13, 16, 20]
img_files = [f"./dataset/img/100_geralt of rivia man/{i}.jpg" for i in img_indices]
txt_files = [f"./dataset/img/100_geralt of rivia man/{i}.txt" for i in img_indices]

# Create a figure with 3 x 2 subplots
fig, axs = plt.subplots(3, 2, figsize=(12, 18))  # Increase the figure size

for i in range(3):
    for j in range(2):
        # Calculate the index of the image and text file
        index = i * 2 + j

        # Read the image file
        img_data = mpimg.imread(img_files[index])

        # Read the text file
        with open(txt_files[index], 'r') as file:
            prompt = file.read()

        # Display the image
        axs[i, j].imshow(img_data)
        axs[i, j].set_xticks([])
        axs[i, j].set_yticks([])

        # Display the prompt below the image
        axs[i, j].text(0.5, -0.25, prompt, wrap=True, horizontalalignment='center', fontsize=12, transform=axs[i, j].transAxes)

plt.subplots_adjust(wspace=1)
plt.show()

# **3. Training Process and Results**

A few points you would like to know before starting this part: 

- The purpose of this project is to `prepare a whole dataset from scratch` and to train `LoRA adapters` to learn a specific character, who is `Geral of Rivia` here.

- We then test how powerful the `LoRA technique` is for `learning a specific character (or specific style)`.

- Instead of using the latest `SDXL` model, we trained the LoRA adapter on `Stable Diffusion v1.5` with a more stable and reliable training process.

Let's dive deeper into the training process!

### **3-1: Open Tensorboard (re-run the code here if you want to monitor results)**
We first open `Tensorboard` to monitor the LoRA training process (if you would like to run it), and the results from the log files, which are located at `./output/log/`.

Since the training process would be long, and it takes around 40 minutes (on my hardware system):

- If you don't prefer to run it, this project has attached the trained model and its `log file`, you can simply open `Tensorboard` to look at the given training records!

- This `.ipynb` file was also executed with a clean run with all the results for your reference.

In [None]:
# Open a Tensorboard instance
%load_ext tensorboard
%tensorboard --logdir={"./output/log"} --port=6006

In [None]:
# List current Tensorboard instances
notebook.list()

In [None]:
# Display the Tensorboard (Tensorboard will display here)
notebook.display(port=6006, height=800)

### **3-2: Close Tensorboard**
Please comment the code if you do not want to automatically close the `Tensorboard`.

Leaving this code uncomment can avoid initialization problems during the first run before restarting the kernel since `Tensorboard will not close with the restart`.

To open the `Tensorboard` again, please re-run the code blocks at `3-1`.

In [None]:
# Close existing TensorBoard instances
!taskkill /IM "tensorboard.exe" /F
!rmdir /S /Q %temp%\.tensorboard-info

### **3-3: Run training script**
The main idea of LoRA training is to freeze the base model's weights and insert trainable layers, which are matrices for rank decomposition, in each transformer block. We do not need to train all weights of the base model while using the LoRA training technique since all weights can be divided and stored as matrices with far fewer weights to train in total.

We selected and used the existing training framework of standard LoRA training for Stable Diffusion v1.5 provided in the `kohya_ss` library, this script file is called `train_network.py` and located at `./kohya_ss/sd-scripts/train_network.py`, while we customized the parameters for this project's training process.

Some important training parameters are as below:

1. `pretrained_model_name_or_path`: Select `Stable Diffusion v1.5` as the base model of the series that we want the LoRA adapter to apply to.

2. `mixed_precision`: Select `fp16` to implement half-precision data types during the training, this can reduce the memory usage and increase the performance.

3. `resolution`: Select `512 x 512`, since the base model `Stable Diffusion v1.5` was trained with this size of images, so we want to keep Consistency with our dataset.

4. `seed`: Set the seed number to `reduce the randomness` when we want to reproduce a certain image result (but randomness still exists since the random crop still exists).

5. `cache_latents`: Turn on cache latents to compress the images to smaller latents stored in VRAM as cache to speed up the training process.

6. `unet_lr`: Remained as default value `0.0001` for LoRA training with the U-Net learning rate on the inference process (denoising and feature extraction).

7. `text_encoder_lr`: Remained as default value `0.00005` for LoRA training with the text encoder learning rate (text encoder is the CLIP model `CLIP ViT-L/14` in `SD v1.5`).

8. `loss_type`: Select `L2 loss` (MSE loss) since we want to let the model be less sensitive to small variances but very responsive to reducing large errors, this can be important for noise reduction.

9. `lr_scheduler`, `lr_scheduler_num_cycles`,`lr_warmup_steps`: This is for adjusting the learning rate to prevent overfitting and underfitting cases during the process, we select the scheduler `cosine_with_restarts` for cyclical change of the learning rate with `10%` of the total steps as warm-up steps (`400 steps`) and `4` period of cycles during the training.

10. `train_batch_size`: Set to `2` for processing 2 images in the same batch.

11. `max_train_steps`: Entered `4000` steps, which is also determining the total epoch. This is because the training script reads the dataset folder name and takes the prefix `100` as the steps for each image, we have 20 images and the batch size is set to 2, so `100 x 20 / 2 = 1000` we have 1000 steps per epoch. Since the maximum number of steps is `4000`, we end up with `4` training epochs in total.

12. `save_every_n_epochs`: Set to `1` because we want to have middle LoRA checkpoints for each epoch, and then compare them in the testing part.

13. `network_dim`: The dimension of LoRA network, we select a higher value to let the LoRA adapter learn as many features as possible, and since we filtered the pictures, unrelated elements are fewer in our dataset.

14. `network_alpha`: Alpha is a parameter for the regularization term (penalty term), we set it to a high value since we only provide 20 pictures and want to prevent overfitting (expect the decision boundary to have lesser curvatures), this is because smaller weights tend to fix high variance when increasing alpha. 

15. `optimizer_type`: Selected `AdamW8bit` as the optimizer, since it can decouple weight decay from the gradient updates, then apply directly to the weights. In addition, it can also reduce the bit width from 32-bit float points to 8-bit integers, resulting in better stability and faster training.

**Here is the code to start the training script, you can choose to skip this part, and go to the next code block to download my trained LoRA checkpoints.**

**Uncomment the code if you would like to try running it!**

**Notice: The process may take some time to complete and depends on the hardware used.**

In [None]:
# Please uncomment the following code to train the model
'''
!accelerate launch --mixed_precision="fp16" --num_processes=1 --num_machines=1 --num_cpu_threads_per_process=2 ".\kohya_ss/sd-scripts/train_network.py" \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_data_dir="./dataset/img" \
--output_name="lora-train" \
--output_dir="./output/model" \
--logging_dir="./output/log" \
--resolution="512,512" \
--caption_extension=".txt" \
--seed="1234" \
--mixed_precision="fp16" \
--bucket_no_upscale \
--bucket_reso_steps=64 \
--cache_latents \
--gradient_checkpointing \
--huber_c="0.1" \
--huber_schedule="snr" \
--learning_rate="0.0001" \
--unet_lr=0.0001 \
--text_encoder_lr=5e-05 \
--loss_type="l2" \
--lr_scheduler="cosine_with_restarts" \
--lr_scheduler_num_cycles="4" \
--lr_warmup_steps="400" \
--max_grad_norm="1" \
--mem_eff_attn \
--min_timestep=0 \
--network_alpha="128" \
--network_dim=128 \
--network_module=networks.lora \
--optimizer_type="AdamW8bit" \
--max_train_steps="4000" \
--train_batch_size="2" \
--max_data_loader_n_workers="0" \
--save_every_n_epochs="1" \
--save_model_as=safetensors \
--save_precision="fp16" \
--xformers
'''

**Here you can choose to `download the trained LoRA checkpoints` from my Hugging Face repository (by default).**

The 4 trained LoRA checkpoints (`.safetensors`) are on my Hugging Face repository (https://huggingface.co/kevin-chu/sd15-lora-geralt-of-rivia), details are as below:
- `lora-train-000001.safetensors`: 1 epoch checkpoint
- `lora-train-000002.safetensors`: 2 epochs checkpoint
- `lora-train-000003.safetensors`: 3 epochs checkpoint
- `lora-train.safetensors`: 4 epochs checkpoint (complete training)
Please put the checkpoints to this directory `./output/model` for further access.

**Notice: If Canvas allows large files to upload, you will see the 4 LoRA checkpoints already exist in this directory `./output/model`, and you do not need to download again for this case.**

In [None]:
# Download the trained LoRA checkpoints from my Hugging Face repository (skip this part if you want to run the training script)
snapshot_download(
    repo_id="kevin-chu/sd15-lora-geralt-of-rivia",
    allow_patterns="*.safetensors",
    local_dir="./output/model",
    local_dir_use_symlinks=False
)

**If you have run the training script, and then reached here, it means your training is complete. You can scroll up and refresh the Tensorboard to see your training results!**

# **4. Testing Section**

### **4-1: Testing hyperparameters and helper functions**
Several helper functions are designed for the testing section, it provides a more systematic way to demonstrate the results and compare the images.

In [None]:
# Set the hyperparameters for the testing
steps = [50]
scale = 7
width = 512
height = 512
num_images = 1
generator = torch.Generator(device="cuda")
clip_skip = 2

# Create a list of weight names
weight_names = ["lora-train-000001.safetensors", "lora-train-000002.safetensors", "lora-train-000003.safetensors", "lora-train.safetensors"]

In [None]:
# Function for loading LoRA adapters to base or pre-trained model
def load_lora_adapters(pipe, weight_names):

    # Load the trained model with the specified LoRA weights and generate images (starting from lora-1 to lora-4)
    for i, weight_name in enumerate(weight_names, 1):

        # Load the base model with the specified LoRA weights (each adapter named as lora-1, lora-2, lora-3, and lora-4)
        pipe.load_lora_weights("kevin-chu/sd15-lora-geralt-of-rivia", weight_name=weight_name, adapter_name=f"lora-{i}")

    # Set/Replace adapter names
    adapter_names = list(pipe.get_list_adapters().values())
    adapter_names = [item for sublist in adapter_names[:len(adapter_names)//2] for item in sublist]
    
    return adapter_names

In [None]:
# Function for generating images with the specified adapter using different weights
def generate_images_with_adapter(pipe, positive_prompt, negative_prompt, seed_num, inference_steps_num, adapter_names, lora_weights, is_adapter_merged=False, added_adaper_name = None, adapters_merged_weights = None):

    # Create an empty list for images
    images = []

    # Iterating through all combinations of steps, adapter names, and LoRA weights
    for step in inference_steps_num:
        for lora_weight  in lora_weights:
            for adapter_name in adapter_names:

                # Activate the specified adapter
                if is_adapter_merged == True:
                    # Merge the adapters with provided LoRA weights
                    pipe.set_adapters([added_adaper_name, adapter_name], adapters_merged_weights)
                else:
                    # Not merge the adapters
                    pipe.set_adapters(adapter_name)

                # Generate images with LoRA adapters
                image = pipe(positive_prompt, negative_prompt=negative_prompt, width=width, height=height, num_inference_steps=step, guidance_scale=scale, clip_skip = clip_skip,
                                    num_images_per_prompt=num_images, generator=generator.manual_seed(seed_num), cross_attention_kwargs={"scale": lora_weight}
                ).images[0]

                # Append the generated image to the list
                images.append(image)

    return images

In [None]:
# Function for image display (LoRA adapters with different weights)
def display_images_lora_weights(images, adapter_names, lora_weights):
    
        # Create a figure with LoRA weights as rows and LoRA adapter names as columns
        fig, axs = plt.subplots(len(lora_weights), len(adapter_names), figsize=(20, 5 * len(lora_weights)))

        # Set counters for x and y axis labels
        x_counter = 0
        y_counter = 0

        # Display the images
        for i in range(len(lora_weights)):
            for j in range(len(adapter_names)):
    
                # Calculate the index of the image
                index = i * len(adapter_names) + j
                axs[i, j].imshow(images[index])
    
                # Turn off the axis lines and ticks
                axs[i, j].set_xticks([])
                axs[i, j].set_yticks([])
    
                # Set the x,y axis label (as title) for this subplot
                if i == 0:  # Only set x-axis labels for the top row
                    axs[i, j].set_title(f"{adapter_names[x_counter]} ({x_counter + 1} epoch)", fontsize=14, fontweight='bold')
                    x_counter += 1
    
                if j == 0:  # Only set y-axis labels for the first column
                    axs[i, j].set_ylabel(f"lora-weight: {lora_weights[y_counter]}", fontsize=14, fontweight='bold')
                    y_counter += 1
        
        plt.show()

In [None]:
# Function for image display (different steps)
def display_images_steps(images, test_steps, pipe):
    
        # Create a figure with LoRA adapter name as rows and steps as columns
        fig, axs = plt.subplots(len(pipe.get_active_adapters()), len(test_steps), figsize=(20, 5 * len(pipe.get_active_adapters())))

        # Set counters for x and y axis labels
        x_counter = 0
        y_counter = 0

        # Display the images
        for i in range(len(pipe.get_active_adapters())):
            for j in range(len(test_steps)):
    
                # Calculate the index of the image
                index = i * len(test_steps) + j

                # Check if the figure is only 1-dimensional
                if axs.ndim == 1:
                    ax = axs[j]
                else:
                    ax = axs[i, j]
                ax.imshow(images[index])

                # Turn off the axis lines and ticks
                ax.set_xticks([])
                ax.set_yticks([])

                # Set the x,y axis label (as title) for this subplot
                if i == 0:  # Only set x-axis labels for the top row
                    ax.set_title(f"inference steps: {test_steps[x_counter]}", fontsize=14, fontweight='bold')
                    x_counter += 1

                if j == 0:  # Only set y-axis labels for the first column
                    ax.set_ylabel(f"{pipe.get_active_adapters()[y_counter]} (lora weight: 1.0)", fontsize=14, fontweight='bold')
                    y_counter += 1
        
        plt.show()

In [None]:
# Funtion for freeing cuda memory (must first delete pipeline then empty cache)
def free_cuda_memory():
    globals().pop("sd_pipeline", None)
    globals().pop("test_images", None)
    torch.cuda.empty_cache()

# Show GPU memory status
def memory_stats():
    print(torch.cuda.memory_allocated()/1024**2)
    print(torch.cuda.memory_cached()/1024**2)

### **4-2: Test 1 - Comparisons of generated images with the base model and the LoRA-trained adapters**
We would like to compare the generated images `before` and `after` applying the LoRA-trained adapters to the base model `Stable Diffusion v1.5`, and see what interesting results we can find! (Reference: `Stable Diffusion v1.5` https://huggingface.co/runwayml/stable-diffusion-v1-5)

The testing process is as follows:
- Step 1: Load base model
- Step 2: Load LoRA adapters to the base model
- Step 3: Generate and show the result images
- Step 4: Free GPU memory

#### **Step 1: Load base model**

In [None]:
# Load base model (Stable Diffusion v1.5) for comparison (we close the safety checker to avoid detection errors of poor quality images)
sd_pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True, safety_checker=None).to("cuda")

# Set the scheduler to DPMSolver
sd_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(sd_pipeline.scheduler.config)

# Optimize the pipeline for memory-efficient attention
sd_pipeline.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
sd_pipeline.vae.enable_xformers_memory_efficient_attention(attention_op=None)

#### **Step 2: Load LoRA adapters to the base model**
We set the `LoRA weights` to `0, 0.25, 0.5, 0.75, 1.0` (minimum is `0` / maximum is `1`), which means how much the LoRA would affect the resulting image.

In [None]:
# Create lists of adapter names and LoRA weights
lora_names = []
lora_weights = [0, 0.25, 0.5, 0.75, 1.0]

# Load LoRA adapters to the base model
lora_names = load_lora_adapters(sd_pipeline, weight_names)

#### **Step 3: Generate and show the result images**
We choose a simple `positive prompt` here with `no negative prompt`, and keep the `seed to 1` to reproduce the same process for comparison.

In [None]:
# Set the test prompts
test_positive_prompt = "geralt of rivia"
test_negative_prompt = ""
seed = 1

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, steps, lora_names, lora_weights)

In [None]:
# Display the images
display_images_lora_weights(test_images, lora_names, lora_weights)

#### **Step 4: Free GPU memory**
- The process of generating images takes up a lot of GPU VRAM memory.
- We have already applied `xformers` to the model pipeline, but it is still good to always know free the GPU memory before the next test case (except if we want to reuse the pipeline).

In [None]:
# Free cuda memory
free_cuda_memory()

# Show current GPU memory status
memory_stats()

### **4-3: Test 2 - Apply the different styles of LoRA adapters on the base model**
- We would like to compare the generated images of a `pixel` style LoRA merging with the LoRA adapters we trained, then we apply each merged adapter to the base model `Stable Diffusion v1.5`.
- The pixel-style LoRA is from `CivitAI` (https://civitai.com/models/44960/mpixel, file name: `pixel_f2.safetensors`), which can generate decent pixel-style images while using this LoRA.

#### **Step 1: Download the adapter and load LoRA adapters to the base model**
 - You can download this model using `wget`
 - If you have not installed `wget` before (most likely on Windows OS), you can choose to download the model directly from the provided `CivitAI` link above.
 - Remember to download it to the directory of the base-model folder as `./based-model`.

In [None]:
# Download the pre-trained LoRA (pixel style)
!wget "https://civitai.com/api/download/models/52870?type=Model&format=SafeTensor" -O ./based-model/pixel_f2.safetensors

# Load base model (Stable Diffusion v1.5) and scheduler
sd_pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True, safety_checker=None).to("cuda")
sd_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(sd_pipeline.scheduler.config)

# Optimize the pipeline for memory-efficient attention
sd_pipeline.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
sd_pipeline.vae.enable_xformers_memory_efficient_attention(attention_op=None)

# Load LoRA adapters to the base model
lora_names = load_lora_adapters(sd_pipeline, weight_names)
sd_pipeline.load_lora_weights("./based-model", weight_name="pixel_f2.safetensors", adapter_name="pixel")

#### **Step 2: Generate and show the result images**
Settings for this test are as follows:
- `LoRA weights`: 0, 0.25, 0.5, 0.75, 1.0
- `Positive prompt`: geralt of rivia, pixel (`pixel` is the trigger word for the downloaded LoRA)
- `Negative prompt`: None
- `merged_weights` : [0.5, 0.5] the `former value` is for `the pixel LoRA`, and the `latter value` is for `each of our trained LoRA`.

In [None]:
# Set the test prompts for the pre-trained model
test_positive_prompt = "geralt of rivia, pixel"
test_negative_prompt = ""
seed = 924608315
merged_weights = [0.5, 0.5]
added_adaper_name = "pixel"

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, steps, lora_names, lora_weights, True, added_adaper_name, merged_weights)

In [None]:
# Display the images
display_images_lora_weights(test_images, lora_names, lora_weights)

#### **Step 3: Free GPU memory**

In [None]:
# Free cuda memory
free_cuda_memory()

# Show current GPU memory status
memory_stats()

### **4-4: Test 3 - Apply LoRA adapters on other pre-trained checkpoint models**
- Show different cases of style merging with our LoRA adapters and other pre-trained models (different styles).
- All `three` style pre-trained models are based on `Stabel Diffusion v1.5`.
- The styles are `anime`, `comic`, and `realistic` style.

#### **Case 1: LoRA merges with an anime-style pre-trained model**
- We would like to compare the generated images of applying our LoRA adapters to the pre-trained model with `anime style`.
- This checkpoint model is from `Hugging Face` (https://huggingface.co/gsdf/Counterfeit-V3.0, file name: `Counterfeit-V3.0_fix_fp16.safetensors`).
- You can also find it on `CivitAI` (https://civitai.com/models/4468/counterfeit-v30?modelVersionId=57618).
- This checkpoint model can generate high-quality anime-style images.

##### **Step 1: Download the checkpoint model and load LoRA adapters**
If you choose to download manually, you can put it into the base-model folder as this directory `./based-model`, then modify `model_dir` to `./based-model/Counterfeit-V3.0_fix_fp16.safetensors`.

In [None]:
# Download the pre-trained model (anime style)
snapshot_download(
    repo_id="gsdf/Counterfeit-V3.0",
    allow_patterns="Counterfeit-V3.0_fix_fp16.safetensors",
    local_dir="./based-model"
)
model_dir = "./based-model/Counterfeit-V3.0_fix_fp16.safetensors"

# Load the pre-trained model and scheduler
sd_pipeline = StableDiffusionPipeline.from_single_file(model_dir, torch_dtype=torch.float16).to("cuda")
sd_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(sd_pipeline.scheduler.config)
sd_pipeline.safety_checker = None

# Optimize the pipeline for memory-efficient attention
sd_pipeline.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
sd_pipeline.vae.enable_xformers_memory_efficient_attention(attention_op=None)

# Load LoRA adapters to the base model
lora_names = load_lora_adapters(sd_pipeline, weight_names)

##### **Step 2: Generate and show the result images**
Settings for this test are as follows:
- `LoRA weights`: 0, 0.25, 0.5, 0.75, 1.0
- `Positive prompt`: 1boy, geralt of rivia
- `Negative prompt`: None

In [None]:
# Set the test prompts for the pre-trained model
test_positive_prompt = "1boy, geralt of rivia"
test_negative_prompt = ""
seed = 629389646

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, steps, lora_names, lora_weights)

In [None]:
# Display the images
display_images_lora_weights(test_images, lora_names, lora_weights)

##### **Step 3: Free GPU memory**

In [None]:
# Free cuda memory
free_cuda_memory()

# Show current GPU memory status
memory_stats()

#### **Case 2: LoRA merges with a comic-style pre-trained model**
- We would like to compare the generated images of applying our LoRA adapters to the pre-trained model with `comic style`.
- This checkpoint model is from `CivitAI` (https://civitai.com/models/35960/flat-2d-animerge?modelVersionId=266360, file name: `flat2DAnimerge_v45Sharp.safetensors`).
- This checkpoint model can generate high-quality comic-style images.

##### **Step 1: Download the checkpoint model and load LoRA adapters**
If you choose to download manually, you can put it into the base-model folder as this directory `./based-model`.

In [None]:
# Download the pre-trained model (comic style)
!wget "https://civitai.com/api/download/models/266360?type=Model&format=SafeTensor&size=pruned&fp=fp16" -O ./based-model/flat2DAnimerge_v45Sharp.safetensors
model_dir = "./based-model/flat2DAnimerge_v45Sharp.safetensors"

# Load the pre-trained model and scheduler
sd_pipeline = StableDiffusionPipeline.from_single_file(model_dir, torch_dtype=torch.float16).to("cuda")
sd_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(sd_pipeline.scheduler.config)
sd_pipeline.safety_checker = None

# Optimize the pipeline for memory-efficient attention
sd_pipeline.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
sd_pipeline.vae.enable_xformers_memory_efficient_attention(attention_op=None)

# Load LoRA adapters to the base model
lora_names = load_lora_adapters(sd_pipeline, weight_names)

##### **Step 2: Generate and show the result images**
Settings for this test are as follows:
- `LoRA weights`: 0, 0.25, 0.5, 0.75, 1.0
- `Positive prompt`: geralt of rivia, 1boy, masterpiece, best quality
- `Negative prompt`: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry

This time we add some `standard` positive and negative prompts for generating images.

In [None]:
# Set the test prompts for the pre-trained model
test_positive_prompt = "geralt of rivia, 1boy, masterpiece, best quality"
test_negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
seed = 3330428146

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, steps, lora_names, lora_weights)

In [None]:
# Display the images
display_images_lora_weights(test_images, lora_names, lora_weights)

##### **Step 3: Free GPU memory**

In [None]:
# Free cuda memory
free_cuda_memory()

# Show current GPU memory status
memory_stats()

#### **Case 3: LoRA merges with a realistic-style pre-trained model**
- We would like to compare the generated images of applying our LoRA adapters to the pre-trained model with `realistic style`.
- This checkpoint model is from `CivitAI` (https://civitai.com/models/4201/realistic-vision-v60-b1?modelVersionId=130072, file name: `realisticVisionV60B1_v51VAE.safetensors`).
- This checkpoint model can generate high-quality realistic-style images.

##### **Step 1: Download the checkpoint model and load LoRA adapters**
If you choose to download manually, you can put it into the base-model folder as this directory `./based-model`.

In [None]:
# Download the pre-trained model (realistic style)
!wget "https://civitai.com/api/download/models/130072?type=Model&format=SafeTensor&size=pruned&fp=fp16" -O ./based-model/realisticVisionV60B1_v51VAE.safetensors
model_dir = "./based-model/realisticVisionV60B1_v51VAE.safetensors"

# Load the pre-trained model and scheduler
sd_pipeline = StableDiffusionPipeline.from_single_file(model_dir, torch_dtype=torch.float16).to("cuda")
sd_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(sd_pipeline.scheduler.config)
sd_pipeline.safety_checker = None

# Optimize the pipeline for memory-efficient attention
sd_pipeline.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
sd_pipeline.vae.enable_xformers_memory_efficient_attention(attention_op=None)

# Load LoRA adapters to the base model
lora_names = load_lora_adapters(sd_pipeline, weight_names)

##### **Step 2: Generate and show the result images**
Settings for this test are as follows:
- `LoRA weights`: 0, 0.25, 0.5, 0.75, 1.0
- `Positive prompt`: 1boy, geralt of rivia, full body, RAW photo, subject, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
- `Negative prompt`: deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation

This time we add some `complicated` positive and negative prompts which are recommended from the documents of this checkpoint model for generating images.

In [None]:
# Set the test prompts for the pre-trained model
test_positive_prompt = "1boy, geralt of rivia, full body, RAW photo, subject, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
test_negative_prompt = "deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
seed = 3599032598
steps = [100]

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, steps, lora_names, lora_weights)

##### **Step 3: Free GPU memory**

In [None]:
# Display the images
display_images_lora_weights(test_images, lora_names, lora_weights)

#### **Case 4: Generate images with different inference steps for denoising**
 - We found out `how many inference steps` are crucial for generating good images, while `low inference steps` would result in generated images with massive noise.
 - High inference steps mean more GPU computing resources and time are required, so it is an important topic for every task to decide the trade-off.

We want to try the `realistic` style checkpoint model since it seems to require more `inference steps` than other styles in Cases 1 and 2, it's most likely because the high-resolution images are more demanding for their high quality, and this depends on what kind of visual effect is the style pursuing.

##### **Step 1: Generate and show the result images**
- We `reuse` the settings of the pre-trained model (realistic style), so this time we do not need to download any new checkpoints.
- We select `lora-4` which is the adapter trained with the `highest epochs`, and set the `LoRA weight` to its maximum value `1` to challenge the minimum `inference steps` we need to generate a high-quality image without noise residue.

Settings for this test are as follows:
- `LoRA name`: lora-4 (trained with complete 4 epochs)
- `LoRA weight`: 1.0
- `inference steps`: 30, 40, 50, 60, 70
- `Positive prompt`: geralt of rivia, 1boy, masterpiece, best quality
- `Negative prompt`: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry

In [None]:
# We choose the LoRA adapter with the longest training period (lora-4 adapter), set the weight to maximum (1.0), and only test on different steps
test_steps = [30, 40, 50, 60, 70]
lora_names = ["lora-4"]
lora_weights = [1.0]

# Generate images with the specified adapter names and LoRA weights
test_images = generate_images_with_adapter(sd_pipeline, test_positive_prompt, test_negative_prompt, seed, test_steps, lora_names, lora_weights)

In [None]:
# Display the images
display_images_steps(test_images, test_steps, sd_pipeline)

##### **Step 2: Free GPU memory**

In [None]:
# Free cuda memory
free_cuda_memory()

# Show current GPU memory status
memory_stats()

# **Ending: Special Thanks**

<p style="font-size:24px;"> Thank you for reading this project, this comes to the end of the project based on the course EECE 570 - Fundamentals of Visual Computing (2023 Winter Session Term 2) at the University of British Columbia, Canada. </p>

<p style="font-size:24px;"> I extend my deepest gratitude to Professor Xiaoxiao Li and all the teaching assistants who have enriched this course with their deep knowledge of the latest research and developments in Computer Vision. </p>

<p style="font-size:24px;"> As AI/ML technologies improve at an incredibly fast pace, their potential to significantly change our lives becomes more evident each day. This project serves as a great start for further exploration of the Computer Vision and Generative AI fields. I am excited about the potential directions of this research project and looking forward to expanding this project to include future topics with great value. </p>

<p style="font-size:24px;"> Special thanks to the teams at Stability AI and Hugging Face, the authors of open-source projects (Kohya, webUI, etc.), and all the researchers and developers contributing to the ML field. I am positive to see a bright future with innovations that technology will bring to us. </p>
