# Mega similarity check (V2b) #

### Abstract ###

- Distance metric is not meaningful, **but the identity does.**
- However I still prefer L2 distance ~~because it looks like I'm doing a ML task~~.
- Objective: Try to explain *subjective* experience with model difference, especially if any components have been **changed**, ignoring how much it has been changed.
- Thanks ["CC"](https://github.com/crosstyan), ["RC"](https://github.com/CCRcmcpe) and ["AO"](https://github.com/AdjointOperator) for providing the initial script (and the idea).

### Changelog ###

- `v2b`: Apply [concurrent feature](https://stackoverflow.com/questions/51601756/use-tqdm-with-concurrent-futures) to speed up the process. However it will be cpu only.
- `v2`: Scan a folder in a static type (SD 1.x) and path, then try to build a "distance matrix" and visualize it.
- `v1`: Custom model group / paths. Can compare different models by interest but features are limited.

### Required libraries ###

- ~~Should be the common ML pack we're using. Also with [SD webui's dependency](https://github.com/AUTOMATIC1111/stable-diffusion-webui).~~

- [scikit-learn](https://scikit-learn.org/stable/install.html)
- [NetworkX](https://networkx.org/documentation/stable/release/release_3.0.html)
- [safetensors](https://huggingface.co/docs/safetensors/index)
- [pytorch](https://pytorch.org/get-started/locally/#windows-python)
- [matplotlib](https://matplotlib.org/stable/api/matplotlib_configuration_api.html)
- [numpy](https://numpy.org/)

### Input ### 
- See next cell. Paths of models and abbreviation you like.

### Output ###
- TONS of JSON, showing `(layer_name, distance_between_2_models)`
- TONS of IMG, showing `(pair_of_model, distance_for_each_type_of_diffusion_layer)`

### Special case or comparasion ###
- Text encoder for model `nai`: `"cond_stage_model.transformer", "cond_stage_model.transformer.text_model"`

### Some layer name to interprept ###

- Whole model combined as called `DiffusionPipeline` in [Diffusers](https://huggingface.co/docs/diffusers/index).

|Layer name|Description|Class name in Diffusers|
|---|---|---|
|`first_stage_model`|VAE|`AutoencoderKL`|
|`cond_stage_model`|Text Encoder (SD1, SD2)|`CLIPTextModel`|
|`conditioner.embedders.0`|Text Encoder 1 (SDXL)|`CLIPTextModel`|
|`conditioner.embedders.1`|Text Encoder 2 (SDXL)|`CLIPTextModelWithProjection`|
|`model.diffusion_model`|UNET|`UNet2DConditionModel`|
|`model_ema`|EMA model for training|n/a|
|`cumprod`, `betas`, `alphas`|`CosineAnnealingLR`|n/a|

### Some notation (Useful in the bin chart) ###
- `attn1`: `sattn` = *Self attention*
- `attn2`: `xattn` = *Cross attention*
- `ff`: *Feed forward*
- `norm`: [Normalisation layer](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html). `elementwise_affine=True` introduces trainable `bias` and `weight`. 
- `proj`: *Projection*
- `emb_layers`: *Embedding layers*
- `mlp`: *Multilayer perceptron*
- `others`: `ff` + `norm` + `proj` + `emb_layers`

Path configuration. Make sure the model names are named with ascending order.

In [1]:
# Set the paths here.
cmp_keyword = "dgmla3"

ofp_folder = {
    "json": "./json_v2_{}/".format(cmp_keyword),
    "img": "./img_v2_{}/".format(cmp_keyword)
}

# Model group folder
model_folder = "F:/NOVELAI/astolfo_mix/sdxl/raw/" #"../../stable-diffusion-webui/tmp/astolfo_mix/sdxl/raw/"

# Model group key (see v1 for example)
model_group_key = cmp_keyword

# Note: this is a output file instead of input.
model_map_o = "./json_v2_{}/model_map_v2.json".format(cmp_keyword)

- `am0`: Merge process of [AstolfoMix](../../ch05/README.md). No CLIP / TE reset.
- `am1`: Merge process of [AstolfoMix](../../ch05/README.md). With CLIP / TE reset.
- `am2`: Receipe model of [AstolfoMix](../../ch05/README.md).
- `am3`: Merge process of [AstolfoMix](../../ch05/README.md). Extended from `am0`
- `am4`: Receipe model of [AstolfoMix](../../ch05/README.md). Extended from `am2`
- `am5`: Merge process of [AstolfoMix](../../ch05/README.md). `am0` with `am3`.
- `am6`: Receipe model (UNET) of [AstolfoMix-SD2](../../ch05/README_SD2.md).
- `am7`: `am6` but removed 2 outliers.
- `am7b`: Receipe model (TE) of [AstolfoMix-SD2](../../ch05/README_SD2.md).
- `am8`: Merge process of [AstolfoMix-SD2](../../ch05/README_SD2.md)(209b).
- `am9`: Merge process of [AstolfoMix-SD2](../../ch05/README_SD2.md)(210b).
- `21b`: Receipe model of [AstolfoMix "21b"](../../ch05/README.md).
- `dgmla2b`: Recent merges of [AstolfoMix SDXL DGMLA-216](../../ch05/README_XL.md)(116a to 215a).
- `main0`: Same as `main`, but with [dreamlike-art/dreamlike-photoreal-2.0](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0) as an outlier.
- `main`: All the models below, but seleting the representive of the models.
- `cwd`: Dataset for [ThePioneer/CoolerWaifuDiffusion](https://huggingface.co/ThePioneer/CoolerWaifuDiffusion).
- `sd2`: All SD 2.x based models.
- `sdxl0`: All SDXL based models (10 out of 40 includes original SDXL 1.0).
- `sdxl1`: All SDXL based models (10 out of 40 includes original SDXL 1.0).
- `sdxl2`: All SDXL based models (10 out of 40 includes original SDXL 1.0).
- `sdxl3`: All SDXL based models (10 out of 40 includes original SDXL 1.0).
- `sdxl4`: All SDXL based models (handpicked 10 outliers includes original SDXL 1.0).
- `sdxl5`: All SDXL based models (10 out of 52 includes original SDXL 1.0).
- `sdxl6`: All SDXL based models (52 out of 52 includes original SDXL 1.0). Done in a 512GB RAM Workstation.
- `pony1`: All SDXL based model (118 models). Workstation upgraded to 4TB.  **Should use around 780GB of RAM.**
- `dgmla3`: All SDXL based model (220 models), with `dgmla2b` (12 models). **Should use around 1.5TB of RAM.**
- `wdacnai0`: Same as `wdacnai`, but with SD 1.2 and 1.5 also.
- `wdacnai`: Dataset for [hakurei/waifu-diffusion-v1-3](https://huggingface.co/hakurei/waifu-diffusion-v1-3), [JosephusCheung/ACertainty](https://huggingface.co/JosephusCheung/ACertainty) and [CompVis/stable-diffusion-v-1-4-original](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original)
- `evt`: Dataset for [haor/Evt_V4-preview](https://huggingface.co/haor/Evt_V4-preview)
- `merge`: Dataset for a brunch of merged models with their believed recipes.
- `any3`: Dataset for [AdamOswald1/anything-v5.0](https://huggingface.co/AdamOswald1/anything-v5.0).
- `nnai`: Dataset for [JosephusCheung/ACertainty](https://huggingface.co/JosephusCheung/ACertainty).
- `aocc`: Dataset for [AnnihilationOperator/ABPModel](https://huggingface.co/AnnihilationOperator/ABPModel) vs  [Crosstyan/BPModel](https://huggingface.co/Crosstyan/BPModel)
- `aobp`: Dataset for [AnnihilationOperator/ABPModel](https://huggingface.co/AnnihilationOperator/ABPModel)
- `ccbp`: Dataset for [Crosstyan/BPModel](https://huggingface.co/Crosstyan/BPModel)

Load libraries.

In [2]:
import time
import os
import json
import re
from pathlib import Path
import math

import numpy as np
import matplotlib as mpl
import torch
from safetensors.torch import load_file #safe_open

import networkx as nx

from matplotlib import pyplot as plt
from tqdm import tqdm
from tqdm.contrib.concurrent import thread_map

In [3]:
torch.__version__

'2.4.0+cu124'

In [4]:
# Fix for OMP: Error #15
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

Some operations.

In [5]:
# TODO: Support 'cuda', but 'cpu' is arleady fast.
g_device = "cpu"
# Currently for generating graph only.
g_seed = 114514
# How many threads for this process
g_threads = 48 if g_device == "cpu" else 1

In [6]:
# Create output folder
for v in ofp_folder.values():
    os.makedirs(os.path.dirname(v), exist_ok=True)  

In [7]:
model_list = os.listdir(model_folder)
# Exclude yaml.
model_list = list(filter(lambda p: p.endswith(".ckpt") or p.endswith(".safetensors") or p.endswith(".bin"), model_list))
if len(model_list) < 2:
    raise Exception("Need at least 2 models for comparasion.")

Expected $O(N^2)$ for [pairwise distances for distance matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html). For example, $6*5/2 = 15$ pairs for 6 models.

In [8]:
def make_cmp_mapping(models):
    i = 0
    cm = [[model_group_key, []]]
    for i0 in range(len(model_list)):        
        for i1 in range(i0 + 1, len(model_list)):
            i = i + 1
            # Do not modify the model key: it is used to build the distance matrix! 
            cm[0][1].append([
                [str("m{}".format(i0)), "{}{}".format(model_folder, model_list[i0])],
                [str("m{}".format(i1)), "{}{}".format(model_folder, model_list[i1])],
            ])
    print("{} model pairs to go.".format(i))
    return cm

In [9]:
cmp_mapping = make_cmp_mapping(model_list)

26796 model pairs to go.


Dump result first. Will dump the distance matrix later.

In [10]:
cmj = {
    "ml": model_list,
    "cm": cmp_mapping
}
with open(model_map_o, "w") as fj:
    json.dump(cmj, fj, indent=4, sort_keys=True)

In [11]:
# V1 only. I don't expect to handcraft the mapping here.
if False:
    # For "micro_cmp", go to the cell with cmp_json()
    cmp_mapping = []
    try:
        with open(model_map_o, "r") as mmf:
            read_content = mmf.read()
            cmp_mapping = json.loads(read_content)
    except:
        print("Error when loading model map. There won't be mass scale comparasion.".format(model_map_o))

Distance matrix for model components. Default zero (identical).

In [12]:
cmp_dm = {
    "te": torch.zeros((len(model_list), len(model_list))),
    "te0": torch.zeros((len(model_list), len(model_list))),
    "te1": torch.zeros((len(model_list), len(model_list))),
    "vae": torch.zeros((len(model_list), len(model_list))),
    "unet": torch.zeros((len(model_list), len(model_list))),
    "other": torch.zeros((len(model_list), len(model_list)))
}

model_cache = {}

Functions inside the compare loop.

In [13]:
def load_model(path: Path, device: str, print_ptl_info=False) -> dict[str, torch.Tensor]:
    if ".safetensors" in path.suffixes:
        return load_file(path, device=device)
    else:
        ckpt = torch.load(path, map_location=device)
        if print_ptl_info and "epoch" in ckpt and "global_step" in ckpt:
            print(f"[I] {path.name}: epoch {ckpt['epoch']}, step {ckpt['global_step']}")
        return ckpt["state_dict"] if "state_dict" in ckpt else ckpt

# Reminder: Dodge different shape!
def check_equal_shape(a: torch.Tensor, b: torch.Tensor, fn):
    if a.shape != b.shape:
        raise Exception("DIFFERENT SHAPE")
        #print("DIFFERENT SHAPE: return -1.0")
        #return -1.0
    return fn(a.type(torch.float),b.type(torch.float))

TENSOR_METRIC_MAP = {
    #"equal": torch.equal,
    "l0": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.dist(a, b, p=0)),    
    "l1": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.dist(a, b, p=1)),
    "l2": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.dist(a, b, p=2)),
    "cossim": lambda a, b: check_equal_shape(a, b, lambda a, b: torch.mean(torch.cosine_similarity(a, b, dim=0)))
}

FIG_METRIC_MAP = {
    #"equal": lambda v: np.linalg.norm(v, 0), 
    "l0": lambda v: torch.linalg.norm(v, 0),    
    "l1": lambda v: torch.linalg.norm(v, 1),
    "l2": lambda v: torch.linalg.norm(v, 2),
    #I don't know how to make this meaningful...
    "cossim": lambda v: torch.linalg.norm(v, None)
}

Read a pair of models, extract the key paths, compare for difference, and return all the intermediate data (useful for next step).

In graphical sense: `(da(kv)ab)err`. Obvious?

In [14]:
def substring_in_d(k, da):
    for d in da:
        if k in d:
            return True
    return False

def cmp_c(a_path, b_path, device, metric, no_ptl_info):
    metric_fn = TENSOR_METRIC_MAP[metric]
    
    try:
        a_path = a_path.decode('UTF-8')
        b_path = b_path.decode('UTF-8')
    except:
        #No need
        pass

    # Load from model cache
    if a_path not in model_cache:
        model_cache[a_path] = load_model(Path(a_path), device, not no_ptl_info)
    if b_path not in model_cache:
        model_cache[b_path] = load_model(Path(b_path), device, not no_ptl_info)
    
    a = model_cache[a_path] 
    b = model_cache[b_path]

    ak = set(a.keys())
    bk = set(b.keys())
    
    keys_inter = ak.intersection(bk)
    da = list(ak.difference(bk))
    db = list(bk.difference(ak))
    kv = {}
    err = []
    for k in keys_inter:
        try:
            rt = metric_fn(a[k], b[k])
            #rt = rt.numpy().tolist()
            # Wow infinity.
            if math.isinf(rt):
                raise Exception("Infinity is found.")
            if math.isnan(rt):
                raise Exception("NaN is found.")
            kv[k] = rt
        except:
            #Silenced.
            #"nan" or True / False
            #print("DIFFERENT SHAPE at key {}. Ignored.".format(k))
            err.append(k)
            pass        

    #Special case: NAI renamed the TE (claimed using GPT-2)
    #Another special case: Somehow it still shows no key...
    if not (("animefull" in str(a_path)) and ("animefull" in str(b_path))):
        if "animefull" in str(a_path):
            nda = substring_in_d("cond_stage_model.transformer.text_model", db)
            for dak in da:
                if ("cond_stage_model.transformer" in dak) and nda:
                    try:
                        kv["nai." + dak] = metric_fn(a[dak], b[dak.replace("cond_stage_model.transformer", "cond_stage_model.transformer.text_model")]) #.numpy().tolist()
                    except:
                        err.append(dak)
        elif "animefull" in str(b_path):
            ndb = substring_in_d("cond_stage_model.transformer.text_model", da)
            for dbk in db:
                if ("cond_stage_model.transformer" in dbk) and ndb:
                    try:
                        kv["nai." + dbk] = metric_fn(b[dbk], a[dbk.replace("cond_stage_model.transformer", "cond_stage_model.transformer.text_model")]) #.numpy().tolist()
                    except:
                        err.append(dbk)
    
    return kv, da, db, err, a, b

Plot graph from the results above.

In [15]:
# https://www.geeksforgeeks.org/python-extract-numbers-from-string/
def index_from_model_path_key(txt):
    temp = re.findall(r'\d+', txt)
    res = list(map(int, temp))
    return res[0]

def cmp_attn(pak, pbk, kv, a, b, ofi, d):
    dmia = index_from_model_path_key(pak)
    dmib = index_from_model_path_key(pbk)
    tmfn = TENSOR_METRIC_MAP[d]
    fmfn = FIG_METRIC_MAP[d]
    diffs = {}
    dlabel = d.upper() #L2

    no_unet = True

    # Ensure there is UNET.
    for k in kv.keys():
        if k.startswith('model.diffusion_model'):
            no_unet = False
            break

    if no_unet:
        return

    for k in kv.keys():
        #This time I look for components instead of layers        
        #But I must dodge NAI renamed TE (identical to SD 1.4 so do it seperately if needed)
        if k.startswith('nai.cond_stage_model'):
            continue
        
        delta = tmfn(a[k], b[k]) #.numpy().tolist()
     
        if k.startswith('cond_stage_model'):
            c = 'te'
        elif k.startswith('conditioner.embedders.0'):
            c = 'te0'
        elif k.startswith('conditioner.embedders.1'):
            c = 'te1'
        elif k.startswith('first_stage_model'):
            c = 'vae'
        elif k.startswith('model.diffusion_model'):
            c = 'unet'
        else:
            c = 'other'            
        diffs.setdefault(c, []).append(delta)

    for k in diffs:
        #diffs[k] = np.concatenate([diffs[k]], axis=0)
        diffs[k] = torch.cat((torch.tensor([diffs[k]]),), 0) #float

    lsp = len(diffs.keys())
    if (lsp == 0):
        raise Exception("Why others is absent?")

    # 240611: Too many images.
    if False:
        fig, axs = plt.subplots(lsp, 1, figsize=(16, lsp * 4), sharex=False)
        fig.tight_layout(pad=4.0)
        for i, (k, v) in enumerate(diffs.items()):
            #bins=len(v) for finding outliers
            #v: numpy.array. 80 layers for attn, 526 for others.
            dval = fmfn(v)

            # Note: In general, a distance matrix is a weighted adjacency matrix of some graph. (wiki)
            cmp_dm[k][dmia][dmib] = dval
            cmp_dm[k][dmib][dmia] = dval

            # I hate this bug
            aaxs = axs if lsp == 1 else axs[i]
            aaxs.hist(v, bins=len(v), density=False)
            aaxs.set(xlabel=dlabel, ylabel='a.u.')
            aaxs.xaxis.labelpad = 20
            aaxs.set_yscale('log')
            aaxs.set_title(f'{k}: ${dlabel}={dval:.4f}$')
        
        plt.savefig(ofi, bbox_inches='tight')
        #WTF the plot retains? Why?
        plt.close()

    for i, (k, v) in enumerate(diffs.items()):
        #bins=len(v) for finding outliers
        #v: numpy.array. 80 layers for attn, 526 for others.
        dval = fmfn(v)

        # Note: In general, a distance matrix is a weighted adjacency matrix of some graph. (wiki)
        cmp_dm[k][dmia][dmib] = dval
        cmp_dm[k][dmib][dmia] = dval

Procedure of a comparasion. Original scripts has [custom garbage collection](https://docs.python.org/3/library/gc.html), but its default setting is fine for me. Also $O(N^2)$ comparasion is harsh.

Variables explanation for "nice guys":

|var|text|
|---|---|
|pak|Index of model A in model list.|
|pbk|Index of model B in model list.|
|pav|Path of model A.|
|pbv|Path of model B.|
|ofp|Folder path for output JSON reports.|
|ofi|Folder path for output PNG plots.|
|d|Distancing method, [p-norm](https://en.wikipedia.org/wiki/Norm_(mathematics)) or [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity).[pytorch](https://pytorch.org/docs/stable/generated/torch.dist.html), [numpy](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html)|
|npi|`no_ptl_info`. IDK what it means.|
|kv|[Key-Value Pairs of intersection.](https://www.w3schools.com/js/js_json_objects.asp)|
|da|Distinct content of model A.|
|db|Distinct content of model B.|
|err|Interset layers which throw errors. Usually they're in different shape.|
|a|Instance of model A.|
|b|Instance of model B.|
|dj|Data for output JSON file.|
|fj|File path for output JSON file.|

In [16]:
# 240611: Too many files.
def cmp_json(pak, pbk, pav, pbv, ofp, ofi, d, npi):
    kv, da, db, err, a, b = cmp_c(Path(pav), Path(pbv), g_device, d, npi)
    dj = {'kv':kv, 'da':da, 'db':db, 'err': err}
    #with open(ofp, "w") as fj:
    #    json.dump(dj, fj, indent=4, sort_keys=True)
    cmp_attn(pak, pbk, kv, a, b, ofi, d)

Test / Manual operation for a single comparasion.

~~Also as example for the above variables.~~

In [17]:
# Nothing to test here. 
if False:
    # Testing: Obvious result
    cmp_json(
        "../../stable-diffusion-webui/tmp/SD1/aobp/ABPModel-ep59.safetensors", 
        "../../stable-diffusion-webui/models/Stable-diffusion/sample-nd-epoch59.safetensors",
        "./json/test.json",
        "./img/test.png",
        "l2",
        True
    )

The compare loop. `tqdm` may not work, at least for me.

In [18]:
def exec_cmp(argv):
    # Nasty unpacking but it works.
    pab = argv[0] 
    ofp0 = argv[1]

    pak = pab[0][0]
    pav = pab[0][1]
    pbk = pab[1][0]
    pbv = pab[1][1]
    ofjp = "{}{}_{}_{}.json".format(ofp_folder['json'], ofp0, pak, pbk)
    ofip = "{}{}_{}_{}.png".format(ofp_folder['img'], ofp0, pak, pbk)

    #print("{} vs {}".format(pav, pbv))
    cmp_json(pak, pbk, pav, pbv, ofjp, ofip, "l2", True)
    return ofp0

In [19]:
ts = time.time()
print("Too many files to generate: No intermediate files will be generated.")
cmp_count = len(cmp_mapping[0][1])
ofp0 = cmp_mapping[0][0]
print("{} model pairs to be compared.".format(cmp_count))
print("Parallel mode. Progress bar may not show linear progress.")
res_cmp = thread_map(exec_cmp, [(pab, ofp0) for pab in cmp_mapping[0][1]], max_workers=g_threads)

Too many files to generate: No intermediate files will be generated.
26796 model pairs to be compared.
Parallel mode. Progress bar may not show linear progress.


  0%|          | 0/26796 [00:00<?, ?it/s]

End of the comparasion loop. Dump result again.

In [20]:
# https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable
class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        if torch.is_tensor(obj):
            obj = obj.cpu().numpy()
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)

In [21]:
cmj["dm"] = cmp_dm
with open(model_map_o, "w") as fj:
    json.dump(cmj, fj, indent=4, sort_keys=True, cls=NumpyEncoder)

Now here is the "part 2": Visualize the distance matrix.

In [22]:
# https://discuss.dizzycoding.com/drawing-a-graph-or-a-network-from-a-distance-matrix/
# https://thispointer.com/python-how-to-convert-a-list-to-dictionary/
# https://stackoverflow.com/questions/46784028/edge-length-in-networkx
# https://stackoverflow.com/questions/60397606/how-to-round-off-values-corresponding-to-edge-labels-in-a-networkx-graph
# https://stackoverflow.com/questions/75078531/edge-length-based-on-distance/75083544#75083544
# https://stackoverflow.com/questions/3567018/how-can-i-specify-an-exact-output-size-for-my-networkx-graph
# https://stackoverflow.com/questions/26691442/how-do-i-add-a-new-attribute-to-an-edge-in-networkx
# https://stackoverflow.com/questions/18911994/visualize-distance-matrix-as-a-graph

def plot_vg(dmk, dmv, d):
    
    mgm = { i : model_list[i] for i in range(0, len(model_list) ) }
    initialpos = { i : (i,i) for i in range(0, len(model_list) )}

    dmv2 = dmv.cpu().numpy() if torch.is_tensor(dmv) else dmv

    G = nx.from_numpy_array(dmv2)

    #np.zero = no such comparasion = can skip.
    if nx.number_of_edges(G) == 0:
        return

    G = nx.relabel_nodes(G, mgm)    
    og = "{}{}_{}_vg.png".format(ofp_folder['img'], model_group_key, dmk)

    # Argh the weight is inverse...
    for eg in G.edges(data=True):
        #print(eg)
        #print(eg[2])
        egv =  0.0 if eg[2]['weight'] == 0.0 else (1.0 / eg[2]['weight'])
        #print(egv)
        eg[2]['inverse_weight'] = egv

    # Using the unnormalized Laplacian, the layout shows possible clusters of nodes which are an approximation of the ratio cut.
    #pos = nx.spring_layout(G, weight='inverse_weight', pos=initialpos, seed=g_seed)
    pos = nx.spectral_layout(G, weight='inverse_weight')
    edge_labels = dict([((u,v,), f"{d['weight']:.4f}") for u,v,d in G.edges(data=True)])

    plt.figure(1, figsize=(16,12)) 

    nx.draw_networkx(G, pos)
    nx.draw_networkx_labels(G, pos)
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
   
    plt.title("{} Distance of {} in model group {}".format(d.upper(), dmk.upper(), model_group_key))
    plt.savefig(og)
    #plt.show()
    plt.close()

In [23]:
for cdmk in cmp_dm.keys():
    plot_vg(cdmk, cmp_dm[cdmk], "l2")

In [24]:
def plotline(dmk, dmv, d):
    #Assume models are arranged in sequence
    x_arr = [m[:10] for m in model_list]
    y_arr = dmv[0] 

    og = "{}{}_{}_xy.png".format(ofp_folder['img'], model_group_key, dmk)

    plt.figure(1, figsize=(16,12)) 

    plt.plot(x_arr, y_arr, label='line')

    plt.xlabel("Model")
    plt.ylabel(d.upper())

    plt.title("{} Distance of {} in model group {}".format(d.upper(), dmk.upper(), model_group_key))
    plt.savefig(og)
    #plt.show()
    plt.close()

In [25]:
for cdmk in cmp_dm.keys():
    plotline(cdmk, cmp_dm[cdmk], "l2")

In [26]:
te = time.time()
print("Compare: {}, time: {} sec".format(cmp_count, int(te - ts)))

Compare: 26796, time: 110933 sec


### Findings (VAE) ###
- `kl-f8` vs SD prune: Some layers are pruned.
- `kl-f8` vs WD1: Both `encoder`, `decoder` is trained
- WD1 vs WD2: Only `decoder` is trained
- `kl-f8` vs NAI: Only `decoder` is trained. **However it is same as SD v1.4 bundled. See below.**

### Findings (NAI) ###
- SD 7G vs SD 4G: EMA pruned. *Applies for all models*.
- SD 7G vs NAI 7G: **Same "text encoder" (renamed layer) and "VAE".**
- ACertainty: "Seriously fine-tuned from SD with tons of (NAI) AIGC." **confirmed.** However VAE Decoder is different.

### Findings (SD Variant) ###
- momoko-e: **Dreambooth**. Text encoder is **partially changed.**
- Anything v3 / BasilMix / Anything v4 etc.: **Merged model. All layers are changed.**
- NAI: Some `cumprod` layers dropped
- ANY3: Same as NAI (merged)
- AC: Same as NAI (???)

### Findings (NMFSAN) ###
- NMFSAN: No `cond_stage_model` = Load "last text encoder" or `None` (will generate glitched images). No `first_stage_model` = Must load VAE (`same_model_name.vae.pt`).
- Currently called "negative textual inversion". Freeze TE train UNET > Make TI > Freeze TE train UNET again. i.e. No TE no VAE.

### Findings (BPModel) ###
- I have internal versions started from "Stupid Diffusion", another internal version of ACertainty.
- This is not a strict and formal proof, but I expect the L2 distances will align with a **almost flat plane**, to show some meaning for such comparasion.
- This **must not the exact same plane** because the BPModel was trained with changed dataset and configuration (for example, ARB setting / adding subset of datasets / "negative-TI" trick). However the iterlation should show a somewhat "clear way of improvement".

|Model A|Model B|others|attn1|attn2|
|---|---|---|---|---|
|AC|mk0|96.0301|23.2504|26.0299|
|mk0|mk3|51.6394|16.5754|13.6055|
|mk3|mk5|20.2971|8.6033|4.6885|
|mk5|nman|22.4227|7.5967|5.7737|
|L1(A)|L1(B)|190.3893|56.0258|50.0976|
|L2(A)|L2(B)|**113.1510**|**30.7742**|**30.2982**|
|AC|nman|**113.7759**|**35.8125**|**30.5798**|

- Somehow L2 can reflect "direction between the models".

### Findings (SD 2.x) ###
- WD v1.4: Text encoder and VAE encoder is changed.
- CJD v2.1.1: VAE encoder is changed.
- J's RD: Text encoder and VAE are **uncahnged**.
- P1at's merge: VAE is unchanged, but everything else is changed.
- SD 1.x vs 2.x: Text encoder is entirely swithced. Some layers' dimension is changed. 

## Discussion ##
- "Ignoring prompts" (where is `astolfo`? No human!) is caused by **bias in text encoder?**
- "Missing details" (given some element of the entity is present, e.g. `astolfo` has `pink_hair` but `1girl`) is caused by **bias in UNET?**
- That's why Anything v3 (20 / 20 momoco style with minimal negative prompts) is popular because the task (waifu AIGC) is a narrow objective which favours bias?
- BPModel is commented "hard to use" becasue it relies on original SD text encoder and VAE? However diversity is maximum with most art style is succesfully trained? Original SD has such "artist prompts", e.g. Vincent Van Gogh.
- Why ACertainty looks like NAI? AIGC dataset as informal Reinforcement Learning?
- SD 2.x is so broken becasuse the CLIP? Or the UNET? Why WD 1.4 E1 ignores prompts (where's `astolfo`? No `1boy`!) but start listening prompts in E2 (must include `quality:0` but `pink_hair`, `1boy` is OK)?
- J's RD fails just because the original SD 2.x text encoder is so bad? Applies for CJD also (where's `astolfo`? No `1boy`!)?
- Why BasilMix works (nice merge with chosen hyperparameters)?

# Further work #
- Compare for a set of models **with a clear relatiion**. For example, [merging ratio](https://huggingface.co/ThePioneer/CoolerWaifuDiffusion) and [training epoches](https://huggingface.co/AnnihilationOperator/ABPModel)