# Custom Depth-to-Image Model Playground

#### made by [なんか](https://twitter.com/_determina_)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/discus0434/custom-depth2image-playground/blob/main/Custom_Depth_to_Image_Playground.ipynb)

---

### This notebook does:
  - Perform Task Operation for adapting depth-to-image model to specified domain
  - Play with the model (powered by [AUTOMATIC1111's WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui))


---


### THIS NOTEBOOK FORCES RUNTIME TO CRASH AT **SECTION 3.1**!!! 
### IF YOU RUN CELLS ALL AT FIRST, RUNTIME WILL HALT THERE. RUN SECTION 3.2 AFTER CRASHED. 
### IT IS 100% INTENDED BEHAVIOR.

# 0. Allocate GPU

In [None]:
!nvidia-smi

# 1. Setup

## 1.1 Install Requirements

In [None]:
!pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
!pip install pytorch_lightning tensorboard omegaconf einops taming-transformers transformers kornia test-tube matplotlib pandas
!pip install diffusers invisible-watermark

In [None]:
import os
import gc
import copy
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1.2 Clone AUTOMATIC1111's WebUI

In [None]:
%cd /content/
!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
%cd /content/stable-diffusion-webui

# 2. Task Arithmetic

## 2.1 Download Model Weights (base + task-specified models)

### Base model & depth model
- Stable Diffusion V2.0 base
- Stable Diffusion V2.0 depth

In [None]:
!wget https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt \
  -O /content/stable-diffusion-webui/models/Stable-diffusion/512-base-ema.ckpt
!wget https://huggingface.co/stabilityai/stable-diffusion-2-depth/resolve/main/512-depth-ema.ckpt \
  -O /content/stable-diffusion-webui/models/Stable-diffusion/512-depth-ema.ckpt

### Model as you like

In [None]:
#@markdown #### MODEL MUST BE FINETUNED FROM STABLE DIFFUSION V2.0 OR ITS DESCENDANTS!!
SPECIFIED_MODEL_URL = "https://huggingface.co/hakurei/waifu-diffusion-v1-4/resolve/main/wd-1-4-anime_e2.ckpt"  # @param {type: "string"}
SPECIFIED_MODEL_NAME = SPECIFIED_MODEL_URL.split("/")[-1]
SPECIFIED_MODEL_PATH = f"/content/stable-diffusion-webui/models/Stable-diffusion/{SPECIFIED_MODEL_NAME}"

!wget {SPECIFIED_MODEL_URL} -O {SPECIFIED_MODEL_PATH}

## 2.2 Perform Task Operations

In [None]:
# @markdown ### Optional Configuration

# @markdown ---

# @markdown #### 1. VAE replacement
# @markdown If you wanna replace original SD's VAE with specified one, keep it checked
REPLACE_VAE = True  # @param {type: "boolean"}

# @markdown ---

# @markdown #### 2. Set multipliers for task vectors
# @markdown If `DEPTH = 0.5` and `SPECIFIED_MODEL = 0.1`, the outcome WILL NOT reflect effects of specified model.

# @markdown If `DEPTH = 0.1` and `SPECIFIED_MODEL = 0.8`, the outcome WILL reflect effects of specified model, but not depth.

# @markdown (I recommend `DEPTH = 0.45` and `SPECIFIED_MODEL = 0.75` if Waifu Diffusion)
DEPTH = 0.45  # @param {type: "number"}
SPECIFIED_MODEL = 0.75  # @param {type: "number"}

# @markdown ---
# @markdown #### 3. Set custom depth model name
# @markdown Customized depth model is stored to `/content/stable-diffusion-webui/models/Stable-diffusion/{MODEL_NAME}.ckpt`.

# @markdown Wanna use in local, you may download it and `yaml` file whose name is the same as the model.

MODEL_NAME = "custom-depth"  # @param {type: "string"}
MODEL_PATH = f"/content/stable-diffusion-webui/models/Stable-diffusion/{MODEL_NAME}.ckpt"
MODEL_YAML_PATH = f"/content/stable-diffusion-webui/models/Stable-diffusion/{MODEL_NAME}.yaml"

# load models
base = torch.load(
    "/content/stable-diffusion-webui/models/Stable-diffusion/512-base-ema.ckpt",
    weights_only=True,
    map_location="cuda",
)
task_specified = torch.load(
    SPECIFIED_MODEL_PATH, 
    weights_only=True,
    map_location="cuda",
)

# for saving RAM, remove unnecessary weights
for key in set(task_specified["state_dict"].keys()) | set(base["state_dict"].keys()):
    if "cond_stage_model" in key:
        try:
            del task_specified["state_dict"][key], base["state_dict"][key]
        except Exception:
            pass
        
    elif "model_ema" in key:
        try:
            del base["state_dict"][key]
        except Exception:
            pass

# make task vector of specified model
for key in set(task_specified["state_dict"].keys()):
    if "model.diffusion_model" in key:
        task_specified["state_dict"][key] = task_specified["state_dict"][key] - base["state_dict"][key]

depth = torch.load(
    "/content/stable-diffusion-webui/models/Stable-diffusion/512-depth-ema.ckpt",
    map_location="cpu",
)

# perform task operation
for key in set(depth["state_dict"].keys()) & set(task_specified["state_dict"].keys()):
    # replace weight of VAE with specified weight, if REPLACE_VAE is True
    if "first_stage_model" in key:
        if REPLACE_VAE:
            depth["state_dict"][key] = task_specified["state_dict"][key].cpu()
    # don't replace weight of an input block whose dimension is different from each other
    elif "model.diffusion_model.input_blocks.0.0" in key:
        pass
    # otherwise, add task weight
    elif "model.diffusion_model" in key:
        task_depth = depth["state_dict"][key] - base["state_dict"][key].cpu()
        depth["state_dict"][key] = base["state_dict"][key].cpu() + (task_depth * DEPTH + task_specified["state_dict"][key].cpu() * SPECIFIED_MODEL)

del task_specified, base
gc.collect()

# save customized model
torch.save(depth, MODEL_PATH)

del depth
gc.collect()

# make yaml file to use depth model in webui
with open(MODEL_YAML_PATH, "w") as f:
  f.write(
"""
model:
  base_learning_rate: 5.0e-07
  target: ldm.models.diffusion.ddpm.LatentDepth2ImageDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.0120
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: "jpg"
    cond_stage_key: "txt"
    image_size: 64
    channels: 4
    cond_stage_trainable: false
    conditioning_key: hybrid
    scale_factor: 0.18215
    monitor: val/loss_simple_ema
    finetune_keys: null
    use_ema: False

    depth_stage_config:
      target: ldm.modules.midas.api.MiDaSInference
      params:
        model_type: "dpt_hybrid"

    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        image_size: 32 # unused
        in_channels: 5
        out_channels: 4
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_head_channels: 64 # need to fix for flash-attn
        use_spatial_transformer: True
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          #attn_type: "vanilla-xformers"
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
            - 1
            - 2
            - 4
            - 4
          num_res_blocks: 2
          attn_resolutions: [ ]
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
      params:
        freeze: True
        layer: "penultimate"
"""
  )

# remove models
os.remove("/content/stable-diffusion-webui/models/Stable-diffusion/512-base-ema.ckpt")
os.remove("/content/stable-diffusion-webui/models/Stable-diffusion/512-depth-ema.ckpt")
os.remove(SPECIFIED_MODEL_PATH)

# 3. Launch WebUI

## 3.1 Crash it!

To free RAM and VRAM used for task operation, The next cell will force the runtime to crash deliberately.

You DON'T need to restart and re-execute cells above. After crashed, just execute the next cell.

In [None]:
os.kill(os.getpid(), 9)

## 3.2 Run AUTOMATIC1111's WebUI

WebUI's URL follows `Running on public URL`.

I recommend to use CLIP's penultimate layer.

---

#### Some instructs about prompting

Using Waifu Diffusion V1.4's task vector, you may get better results if use `((masterpiece, best quality))` as prefix of prompt.

And here is one example of negative prompt. One day I picked this up somewhere:
```
lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts)), ((bad composition))
```

In [None]:
# run webui
%cd /content/stable-diffusion-webui
!COMMANDLINE_ARGS="--share --gradio-debug --no-half-vae" REQS_FILE="requirements.txt" python launch.py