# Mega similarity check #

### Abstract ###

- Distance metric is not meaningful, **but the identity does.**
- However I still prefer L2 distance ~~because it looks like I'm doing a ML task~~.
- Objective: Try to explain *subjective* experience with model difference, especially if any components have been **changed**, ignoring how much it has been changed.
- Thanks ["RC"](https://github.com/CCRcmcpe) for providing the initial script (and the idea).

### Input ### 
- See next cell. Paths of models and abbreviation you like.

### Output ###
- TONS of JSON, showing `(layer_name, distance_between_2_models)`

### Special case or comparasion ###
- Text encoder for model `nai`: `"cond_stage_model.transformer", "cond_stage_model.transformer.text_model"`

### Some layer name to interprept ###
- `first_stage_model`: VAE
- `cond_stage_model`: Text Encoder
- `model.diffusion_model`: Diffusion model
- `model_ema`: EMA model for training
- `cumprod`, `betas`, `alphas`: `CosineAnnealingLR`

In [1]:
# Set the paths here.
ofp_folder = "json/"
cmp_mapping = [
    [
        "vae", [
            [
                ["full", "../../stable-diffusion-webui/tmp/VAE/kl-f8-full.ckpt"],
                ["wd1", "../../stable-diffusion-webui/tmp/VAE/kl-f8-anime.ckpt"]
            ],
            [
                ["full", "../../stable-diffusion-webui/tmp/VAE/kl-f8-full.ckpt"],
                ["sd84k", "../../stable-diffusion-webui/tmp/VAE/vae-ft-mse-840000-ema-pruned.ckpt"]
            ],
            [
                ["wd1", "../../stable-diffusion-webui/tmp/VAE/kl-f8-anime.ckpt"],
                ["wd2", "../../stable-diffusion-webui/tmp/VAE/kl-f8-anime2.ckpt"]
            ],
            [
                ["sd84k", "../../stable-diffusion-webui/tmp/VAE/vae-ft-mse-840000-ema-pruned.ckpt"],
                ["wd2", "../../stable-diffusion-webui/tmp/VAE/kl-f8-anime2.ckpt"]
            ],
            [
                ["sd84k", "../../stable-diffusion-webui/tmp/VAE/vae-ft-mse-840000-ema-pruned.ckpt"],
                ["nai", "../../stable-diffusion-webui/tmp/VAE/animevae.pt"]
            ],
            [
                ["wd2", "../../stable-diffusion-webui/tmp/VAE/kl-f8-anime2.ckpt"],
                ["nai", "../../stable-diffusion-webui/tmp/VAE/animevae.pt"]
            ]
        ]            
    ],
    [
        "sd1", [
            [
                ["sd12", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-2-full-ema.ckpt"],
                ["sd14", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-4-full-ema.ckpt"]
            ]
        ]
    ],
    [
        "nai", [           
            [
                ["sd147g", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-4-full-ema.ckpt"],
                ["nai7g", "../../stable-diffusion-webui/tmp/nodelaileak/animefull-latest.ckpt"]
            ],
            [
                ["sd144g", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt"],
                ["nai4g", "../../stable-diffusion-webui/models/Stable-diffusion/animefull-final-pruned.ckpt"]
            ],
            [
                ["nai7g", "../../stable-diffusion-webui/tmp/nodelaileak/animefull-latest.ckpt"],
                ["nai4g", "../../stable-diffusion-webui/models/Stable-diffusion/animefull-final-pruned.ckpt"]
            ],
            [        
                ["sd144g", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt"],
                ["ac", "../../stable-diffusion-webui/models/Stable-diffusion/ACertainty.ckpt"]
            ]
        ]
    ],
    [
        "any3", [
            [
                ["nai4g", "../../stable-diffusion-webui/models/Stable-diffusion/animefull-final-pruned.ckpt"],
                ["any3", "../../stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0-pruned-fp32.ckpt"]
            ],
            [
                ["mmke", "../../stable-diffusion-webui/models/Stable-diffusion/momoko-e.ckpt"],
                ["any3", "../../stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0-pruned-fp32.ckpt"]
            ], 
            [
                ["nai4g", "../../stable-diffusion-webui/models/Stable-diffusion/animefull-final-pruned.ckpt"],
                ["mmke", "../../stable-diffusion-webui/models/Stable-diffusion/momoko-e.ckpt"]
            ],      
            [
                ["any3", "../../stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0-pruned-fp32.ckpt"],
                ["basil", "../../stable-diffusion-webui/models/Stable-diffusion/basil_mix.ckpt"]
            ],
            [
                ["any3", "../../stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0-pruned-fp32.ckpt"],
                ["any4", "../../stable-diffusion-webui/models/Stable-diffusion/anything-v4.0-pruned.safetensors"]
            ]
        ]
    ],
    [
        "nmfsan", [
            [  
                ["ao", "../../stable-diffusion-webui/models/Stable-diffusion/sample-nd-epoch59.safetensors"],
                ["cc", "../../stable-diffusion-webui/models/Stable-diffusion/bp_nman_e29.safetensors"]
            ],
            [  
                ["any3", "../../stable-diffusion-webui/models/Stable-diffusion/Anything-V3.0-pruned-fp32.ckpt"],
                ["cc", "../../stable-diffusion-webui/models/Stable-diffusion/bp_nman_e29.safetensors"]
            ]
        ]
    ],
    [
        "sd2", [
            [  
                ["sd14", "../../stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt"],
                ["sd21", "../../stable-diffusion-webui/tmp/SD2/sd-21-768-ema-pruned.ckpt"]
            ],
            [  
                ["sd20", "../../stable-diffusion-webui/tmp/SD2/sd-20-768-v-ema.ckpt"],
                ["sd21", "../../stable-diffusion-webui/tmp/SD2/sd-21-768-ema-pruned.ckpt"]
            ],
            [  
                ["sd20", "../../stable-diffusion-webui/tmp/SD2/sd-20-768-v-ema.ckpt"],
                ["wd14", "../../stable-diffusion-webui/tmp/SD2/wd-1-4-anime_e1.ckpt"]
            ],
            [  
                ["sd21", "../../stable-diffusion-webui/tmp/SD2/sd-21-768-ema-pruned.ckpt"],
                ["cjd211", "../../stable-diffusion-webui/tmp/SD2/cjd-v2-1-1.safetensors"]
            ],
            [  
                ["wd14", "../../stable-diffusion-webui/tmp/SD2/wd-1-4-anime_e1.ckpt"],
                ["p1at", "../../stable-diffusion-webui/tmp/SD2/wd-1-4-sd-2-1-025-text-encoder.ckpt"]
            ],
            [  
                ["wd14e1", "../../stable-diffusion-webui/tmp/SD2/wd-1-4-anime_e1.ckpt"],
                ["wd14e2", "../../stable-diffusion-webui/tmp/SD2/wd-1-4-anime_e2.ckpt"]
            ],
            [  
                ["sd21", "../../stable-diffusion-webui/tmp/SD2/sd-21-768-ema-pruned.ckpt"],
                ["jrd", "../../stable-diffusion-webui/tmp/SD2/sd2_1_ucg_autotagger_ruminant_notte_12_21.ckpt"]
            ]        
        ]  
    ]
]

In [2]:
from pathlib import Path
import click
import torch
import json 
from safetensors.torch import load_file

import time

In [3]:
g_device = "cpu"

In [4]:
def load_model(path: Path, device: str, print_ptl_info=False) -> dict[str, torch.Tensor]:
    if ".safetensors" in path.suffixes:
        return load_file(path, device=device)
    else:
        ckpt = torch.load(path, map_location=device)
        if print_ptl_info and "epoch" in ckpt and "global_step" in ckpt:
            print(f"[I] {path.name}: epoch {ckpt['epoch']}, step {ckpt['global_step']}")
        return ckpt["state_dict"] if "state_dict" in ckpt else ckpt


def check_equal_shape(a: torch.Tensor, b: torch.Tensor, fn):
    if a.shape != b.shape:
        return "nan" #"DIFFERENT SHAPE"

    return fn(a.type(torch.float),b.type(torch.float))
    #return fn(a.reshape(-1), b.reshape(-1))


METRIC_MAP = {
    "equal": torch.equal,
    "l1": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.dist(a, b, p=1)),
    "l2": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.dist(a, b, p=2)),
    "cossim": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.mean(torch.cosine_similarity(a, b, dim=0)))
}

In [5]:
def cmp_c(a_path, b_path, device, metric, no_ptl_info):
    metric_fn = METRIC_MAP[metric]
    
    try:
        a_path = a_path.decode('UTF-8')
        b_path = b_path.decode('UTF-8')
    except:
        #No need
        pass

    a = load_model(Path(a_path), device, not no_ptl_info)
    b = load_model(Path(b_path), device, not no_ptl_info)

    ak = set(a.keys())
    bk = set(b.keys())
    
    keys_inter = ak.intersection(bk)
    da = list(ak.difference(bk))
    db = list(bk.difference(ak))
    kv = {}
    for k in keys_inter:
        rt = metric_fn(a[k], b[k])
        try:
           rt =  rt.numpy().tolist()
        except:
            #"nan" or True / False
            pass
        kv[k] = rt

    #Special case: NAI renamed the TE (claimed using GPT-2)
    if not (("animefull" in str(a_path)) and ("animefull" in str(b_path))):
        if "animefull" in str(a_path):
            for dak in da:
                if "cond_stage_model.transformer" in dak:
                    kv["nai." + dak] = metric_fn(a[dak], b[dak.replace("cond_stage_model.transformer", "cond_stage_model.transformer.text_model")]).numpy().tolist()
        elif "animefull" in str(b_path):
            for dbk in db:
                if "cond_stage_model.transformer" in dbk:
                    kv["nai." + dbk] = metric_fn(b[dbk], a[dbk.replace("cond_stage_model.transformer", "cond_stage_model.transformer.text_model")]).numpy().tolist()

    return kv, da, db

In [6]:
def cmp_json(pa,pb,ofp,d="l2",npi=True):
    kv, da, db = cmp_c(Path(pa), Path(pb), g_device, d, npi)
    dj = {'kv':kv, 'da':da, 'db':db}
    with open(ofp, "w") as outfile:
        json.dump(dj, outfile, indent=4, sort_keys=True)

In [7]:
# Testing: Obvious result
#cmp_json("./cosmoany.safetensors", "./cosmoany.safetensors", "json/test.json")

In [8]:
cmp_count = 0
ts = time.time()
for cm0 in cmp_mapping:
    ofp0 = cm0[0]
    for pab in cm0[1]: 
        pak = pab[0][0]
        pav = pab[0][1]
        pbk = pab[1][0]
        pbv = pab[1][1]
        ofp = "{}{}_{}_{}.json".format(ofp_folder, ofp0, pak, pbk)
        print(ofp)
        cmp_count = cmp_count + 1
        cmp_json(pav, pbv, ofp)
te = time.time()
print("Compare: {}, time: {} sec".format(cmp_count, int(te - ts)))

json/vae_full_wd1.json
json/vae_full_sd84k.json
json/vae_wd1_wd2.json
json/vae_sd84k_wd2.json
json/vae_sd84k_nai.json
json/vae_wd2_nai.json
json/sd1_sd12_sd14.json
json/nai_sd147g_nai7g.json
json/nai_sd144g_nai4g.json
json/nai_nai7g_nai4g.json
json/nai_sd144g_ac.json
json/any3_nai4g_any3.json
json/any3_mmke_any3.json
json/any3_nai4g_mmke.json
json/any3_any3_basil.json
json/any3_any3_any4.json
json/nmfsan_ao_cc.json
json/nmfsan_any3_cc.json
json/sd2_sd14_sd21.json
json/sd2_sd20_sd21.json
json/sd2_sd20_wd14.json
json/sd2_sd21_cjd211.json
json/sd2_wd14_p1at.json
json/sd2_wd14e1_wd14e2.json
json/sd2_sd21_jrd.json
Compare: 25, time: 148 sec


### Findings (VAE) ###
- `kl-f8` vs SD prune: Some layers are pruned.
- `kl-f8` vs WD1: Both `encoder`, `decoder` is trained
- WD1 vs WD2: Only `decoder` is trained
- `kl-f8` vs NAI: Only `decoder` is trained. **However it is same as SD v1.4 bundled. See below.**

### Findings (NAI) ###
- SD 7G vs SD 4G: EMA pruned. *Applies for all models*.
- SD 7G vs NAI 7G: **Same "text encoder" (renamed layer) and "VAE".**
- ACertainty: "Seriously fine-tuned from SD with tons of (NAI) AIGC." **confirmed.** However VAE Decoder is different.

### Findings (SD Variant) ###
- momoko-e: **Dreambooth**. Text encoder is **partially changed.**
- Anything v3 / BasilMix / Anything v4 etc.: **Merged model. All layers are changed.**
- NAI: Some `cumprod` layers dropped
- ANY3: Same as NAI (merged)
- AC: Same as NAI (???)

### Findings (NMFSAN) ###
- NMFSAN: No `cond_stage_model` = Load "last text encoder" or `None` (will generate glitched images). No `first_stage_model` = Must load VAE (`same_model_name.vae.pt`).
- Currently called "negative textual inversion". Freeze TE train UNET > Make TI > Freeze TE train UNET again. i.e. No TE no VAE.

### Findings (SD 2.x) ###
- WD v1.4: Text encoder and VAE encoder is changed.
- CJD v2.1.1: VAE encoder is changed.
- J's RD: Text encoder and VAE are **uncahnged**.
- P1at's merge: VAE is unchanged, but everything else is changed.
- SD 1.x vs 2.x: Text encoder is entirely swithced. Some layers' dimension is changed. 

## Discussion ##
- "Ignoring prompts" (where is `astolfo`? No human!) is caused by **bias in text encoder?**
- "Missing details" (given some element of the entity is present, e.g. `astolfo` has `pink_hair` but `1girl`) is caused by **bias in UNET?**
- That's why Anything v3 (20 / 20 momoco style with minimal negative prompts) is popular because the task (waifu AIGC) is a narrow objective which favours bias?
- BPModel is commented "hard to use" becasue it relies on original SD text encoder and VAE? However diversity is maximum with most art style is succesfully trained? Original SD has such "artist prompts", e.g. Vincent Van Gogh.
- Why ACertainty looks like NAI? AIGC dataset as informal Reinforcement Learning?
- SD 2.x is so broken becasuse the CLIP? Or the UNET? Why WD 1.4 E1 ignores prompts (where's `astolfo`? No `1boy`!) but start listening prompts in E2 (must include `quality:0` but `pink_hair`, `1boy` is OK)?
- J's RD fails just because the original SD 2.x text encoder is so bad? Applies for CJD also (where's `astolfo`? No `1boy`!)?
- Why BasilMix works (nice merge with chosen hyperparameters)?