# AstolfoMix E2E merger #

Super rare N-to-1 SDXL merger with merging algorithms. ~~SDXL only.~~

## Abstract ##

- Base code: [Examples in sd-mecha](https://github.com/ljleb/sd-mecha/blob/main/examples)
- Using [sd-mecha](https://github.com/ljleb/sd-mecha) as main library. **Thank you [@ljleb](https://github.com/ljleb/) for the codebase, [@illyaeater](https://github.com/Enferlain), and [@silveroxides](https://github.com/silveroxides) for the alpha tester.**
- Each generated model will have its own model metadata and `*.mecha` ~~assembly like~~ [recipe](https://github.com/ljleb/sd-mecha/blob/main/examples/recipes/test_split_unet_text_encoder.mecha). Open it with text editor.
- **No need to waste 1TB+ of disk space for pariwise merging and iterlate with WebUI.** However you should know the "model pool", otherwise it is likely result in a worse model. 

## Required libraries ##

- `torch>=2.0.1`
- `tensordict`
- `sd-mecha` (I prefer [clone](https://github.com/6DammK9/sd-mecha/tree/main) the source code inplace,current version as on 241011, branch [della](https://github.com/6DammK9/sd-mecha/tree/della)), until [this PR](https://github.com/ljleb/sd-mecha/pull/41) has been merged.
- [safetensors](https://huggingface.co/docs/safetensors/index)
- [diffusers](https://huggingface.co/docs/diffusers/installation)
- [pytorch](https://pytorch.org/get-started/locally/#windows-python)

## Supported merging algorithms ##

- `ALGO_DARE` as [DARE (ICML2024)](https://arxiv.org/abs/2311.03099) and [DELLA](https://arxiv.org/abs/2406.11617), "TIES w/ DARE", along with DROP only (no rescale)
- `ALGO_TIES` as [TIES (NeurIPS2023)](https://arxiv.org/abs/2306.01708), along with [TIES-SOUP](https://github.com/6DammK9/nai-anime-pure-negative-prompt/blob/main/ch01/ties.md#spinoff-ties-soup) modified by me
- `ALGO_AVERAGE` as [ModelSoup](https://arxiv.org/abs/2203.05482) as "Averaging with filtered components". Running average for merging pairwise. Slower but the most memory efficient. Can fit inside GPU.
- `ALGO_NAVG` as [ModelSoup](https://arxiv.org/abs/2203.05482) as "Averaging with filtered components". Brutally merge all models at once.
- `ALGO_MEDIAN`: part of [RFA (IEEE2022)](https://arxiv.org/abs/1912.13445), as "Geometric Median". Iterlative deterministic approach. Slow for algerba, fast for gradient descent.

### Notes on MEDIAN ###

- $O(N)$ for space complexity. See the table below for my actual running time / RAM usage *My WS became laggy because of high latency of [PMem (always overtemperature)](https://www.servethehome.com/intel-optane-dc-persistent-memory-guide-for-pmem-100-pmem-200-and-pmem-300-optane-dimms/)*.
- $O(N log N)$ for time complexity, but more like $O(N)$ with high constant. It requries **40 hours** with limited 16 threads. Currently limited to single CPU. [Intel OpenMP hints on how to utilize many CPUs but I don't know how to operate.](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#intel-openmp-runtime-library-libiomp)
- [Lists Comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp) is fast and efficient, but have large footprint in memory.
- DELLA / DARE / TIES are all compatible with [Geometric Median](https://en.wikipedia.org/wiki/Geometric_median), meanwhile Weiszfeld's algorithm is already utilized which is $O(N)$ approximation.

|Date|Algo|Model counts|Threads|RAM Usage (TB, FP64)|Time used (Hours, Xeon 8358 x2)|
|---|---|---|---|---|---|
|240607|TGMD|117|16|1.214|14|
|240622|TGMD|133|8|< 1.0|12.5|
|240830|TGMD|192|8|1.446|41.5|
|241002|DGMLA|192|16|1.452|39.1|
|241006|DGMLA|20|48|0.358|2.33|
|241011|DGMLA|216|48|3.500|36.2|
|250228|DGMLA|256|24|2.600|39.0|

it takes **1.214TB of RAM** with limited 16 threads (116 models). 

### Notes on DELLA / DARE / TIES ###

- After trying, I think filtering models are not useful. Therefore enable `MODE_RAW` only is good.
- $O(N)$ of space complexity. For merging 102 SDXL models, it requires **410GB for TIES** and **450GB for DARE** (with limited 16 threads). Full 128 threads will OOM for 512GB system RAM.
- $O(N)$ of time complexity also. For merging 102 SDXL models, it requires **75 minutes for TIES** and **105 minutes for DARE** (with limited 16 threads). Speedup will be low after 8 threads.
- `sd-mecha` has already worked so hard on optimization, however `conditioner.embedders.1.model.token_embedding.weight` is a huge layer with `[49408, 1280]` in size. Merging this single layer took 390GB for 102 SDXL models. Unless there is inplace merge, this issue will limit the size of model pool.
- DELLA is $O(NlogN)$ for sorting. From unit test it projects around 4-6x more time to merge.

### Notes on NAVG / MODELSTOCK ###

- Required time: 26 (RAW) for 116 models. May be useful if you don't have GPU, or you want minimal error (within FP64!) by using single merging step.
- CPU usage: **100% with AVX2.**
- RAM usage: **Around 241GB**. $O(N)$ of space complexity.

### Notes on averaging with filtered models ###

- Required time: 11 (RAW) + 70 * 3 * 1.66 (TE) + 67 (SELECTED_TE) + 70 * 2.83 (UNET) + 43.9 (FINAL) + 102.5 (E2E) minutes = **12.85 hours** for 70 models 
- CPU usage: **100% with AVX2.**
- RAM usage: *Around 32GB*.
- VRAM usage: *Around 4GB*. 
- Storage usage: $5N+3$ SDXL models, including $N$ raw models. For $N=70$, it will use **2.24TB** for the most efficient approach.
- I intentionally make it into Python notebook because I need to switch mode this time.

## Model naming schema ##

- `RAW` as `_x001`: Place all raw models. Will generate `x102a` as averaged model regardless components.
- `CLIP` as `_x001te`: Will generate all models as `x102a` replaced with `_x001`'s TE. Will be a set of `te0`, `te1`, `te2`. Use these models for model selection. 
- `UNET` as `_x102a-x099te0x099te1`: *Require selected TEs.* Will generate all models as `_102`'s UNET and average of selected `te0` and `te1`. VAE will be `x25a`.
- `FINAL` as `e2e`: Final model as `x102`.

## Recommened directories to make ##

- `raw`: Store the raw $N$ models
- `clip`: Store $3N$ models for CLIP selection
- `unet`: Store $N$ models for UNET selection

## Operation Mode ##

- [`RAW`, `CLIP`, `UNET`, `FINAL`]. Procedure will be *mutually exclusive*. I will keep restarting the whole notebook.

## Limitation ##

- ~~VAE remains unmanaged.~~ VAE can be picked from one of the raw models.
- SDXL models only. I don't need this for SD1 and SD2.
- Safetensors only. 

## WTF why and will it work? ##

- Yes. [It is part of my research](./README_XL.md).
- Image comparasion will be listed there.

## Appendix: Pseudocode of sd-mecha ##

- Note that the core concept is different from WebUI or supermerger. It focus on [serialization](https://www.geeksforgeeks.org/serialization-in-java/), along with *multiple merging methods* and *custom applied areas*.

- It will pick `model_b_as_recipe` for every merge key and `model_a` for every passthroguh key

- Sample code: `model_b_as_recipe = sd_mecha.merge_methods(model_a, model_b_as_recipe, alpha, beta, etc) #returns model_a`

- For example, in `n_average`, `alpha` tends to `1`, instead of `0` in WebUI. 

- Also, `pick_vae` will show a special case on "bake VAE": `model_a_instead = sd_mecha.merge_methods(model_a, model_b_as_recipe, alpha=1) #returns model_a`

Algorithm `SD-MECHA`:

------

- Let

$\{model_A, model_B\, model_C\} \in models$ and $arch_{models} \in arch_{SD}$ and is consistant (i.e. $arch_{model_A}=arch_{model_B}$)

$\{SumAverage,AddDiff,Rotate,ReBasin, etc.\} \in merge$

$\{CLIP, UNET, VAE\} \in models$, but $\{CLIP, UNET\} \in \alpha, \{VAE\} \notin \alpha$

$ \alpha = [0,1] , \alpha=0 \implies model_A, \alpha=1 \implies model_B$

- Repeat:

$model_A, model_B, merge, \alpha, \beta, etc. \leftarrow deserialize(recipe)$ or user defined

$model_A \leftarrow merge(model_A, model_B, \alpha, \beta, etc.)$

$model_B \leftarrow model_A$

$recipe \leftarrow serialize(model_B)$

- Until no more $model_B$

- Return $recipe$

------

## Importing libraries ##

In [1]:
# Built-in
import time
import os
import math

# Is dependency fufilled?
import torch

from tqdm import tqdm

In [2]:
torch.__version__

'2.4.0+cu124'

In [3]:
# Import the main module.
import sd_mecha

sd_mecha.set_log_level() #INFO

In [4]:
# Fix for OMP: Error #15
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

## User input session starts here. ##

Specify all the paths.

In [5]:
DIR_BASE = "F:/NOVELAI/astolfo_mix/sdxl/" #To set up merger

DIR_RAW = "raw/" #To load N models
DIR_CLIP = "clip/"  #To write 3N models
DIR_UNET = "unet/" #To write N models
DIR_FINAL = "./" #To write 1 model

Quick check on directory and make the model name prefix.

In [6]:
MECHA_RECIPE_EXT = ".mecha"
MECHA_MODEL_EXT = ".safetensors"

MODEL_LIST_RAW = os.listdir("{}{}".format(DIR_BASE,DIR_RAW))
# Exclude yaml.
MODEL_LIST_RAW = list(filter(lambda p: p.endswith(MECHA_MODEL_EXT), MODEL_LIST_RAW)) #p.endswith(".ckpt") or p.endswith(".safetensors") or p.endswith(".bin")
if len(MODEL_LIST_RAW) < 2:
    raise Exception("Need at least 2 models for merge.")
#model_list = list(map(lambda p: os.path.splitext(os.path.basename(p))[0], model_list))

In [7]:
print("{} raw models found.".format(len(MODEL_LIST_RAW)))

256 raw models found.


Define model selection. Index start with 1. Check model list for ordering! *If you are using merging algorithms, you can ignore the list.*
```log
te0: --,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--=0
te1: 02,--,04,07,09,--,--,15,--,--,--,--,30,--,37,39,40,46,47,49,54,55,57,59,60,61,63,67,68,69,70,71=-24
te2: 02,--,04,07,--,--,--,15,--,--,--,--,--,31,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--=-5
=sd: --,03,--,--,--,10,12,--,16,18,19,25,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--,--=-7

te0: =0
te1: 01,03,09,10,12,14,15,16,18,19,25,29,31,34,40,44,47,48,49,50,53,56,58,60,61,64,65,66=-28
unet: 02,03,05,14,16,18,19,24,28,29,34,39,41,44,45,47,48,50,51,52,53,55,56,58,60,61,64,65,66,67,71,72,73=-33
```

In [8]:
# 241006: Cleared since I am not going to use them anymore.
MODEL_SELECTION_TE0 = [i+1 for i in range(len(MODEL_LIST_RAW)) if i+1 not in []] 
MODEL_SELECTION_TE1 = [i+1 for i in range(len(MODEL_LIST_RAW)) if i+1 not in []] 
MODEL_SELECTION_UNET = [i+1 for i in range(len(MODEL_LIST_RAW)) if i+1 not in []]

#001 is the Original SDXL
MODEL_SELECTION_OG = 1
#002 has the "FP16 fixed VAE"
MODEL_SELECTION_VAE = 2

In [9]:
print("TE0:{},TE1:{},UNET:{}".format(len(MODEL_SELECTION_TE0),len(MODEL_SELECTION_TE1),len(MODEL_SELECTION_UNET)))

TE0:256,TE1:256,UNET:256


Specify all the keywords (I'll avoid hardcode because they will be everywhere)

Note that I only run 1 merging algorithm at a time, otherwise my code will explode (too many items to consider)

Merging algorithms usually using all models i.e. `MODE_RAW` only. Others are dedicated for `ALGO_AVERAGE`.

In [10]:
MODE_RAW = 'MODE_RAW'
MODE_CLIP = 'MODE_CLIP'
MODE_UNET = 'MODE_UNET'
MODE_FINAL = 'MODE_FINAL'
ALGO_AVERAGE = 'ALGO_AVERAGE' #Uniform Soup in Model Soup (implied), running average
ALGO_NAVG = 'ALGO_NAVG' #Uniform Soup in Model Soup (implied), single shot
ALGO_TIES = 'ALGO_TIES' #TIES merge (Algo 1)
ALGO_DARE = 'ALGO_DARE' #TIES w/ DARE
ALGO_MODELSTOCK = 'ALGO_MODELSTOCK' #ModelStock (mergekit approach)
ALGO_MEDIAN = 'ALGO_MEDIAN' #Geometric Median

ALGO_ACTIVATED = ALGO_DARE #ALGO_AVERAGE, ALGO_NAVG, ALGO_TIES, ALGO_DARE, ALGO_MODELSTOCK, ALGO_MEDIAN
MODE_ACTIVATED = [MODE_RAW] #[MODE_RAW,MODE_CLIP,MODE_UNET,MODE_FINAL]

After generations of AstolfoMix, I am tired to delete 600x unused files. Therefore I am going to skip the recipe generation.

In [11]:
SKIP_GEN_MANUAL_FILTER = (MODE_ACTIVATED == [MODE_RAW])

- Since AstolfoMix is a huge mix, and oppose to introduce too many hypermeters. 
- Optimization is pain, see [AutoMBW-rt](https://github.com/6DammK9/auto-MBW-rt) which involves 27 parameters and a set of testing prompts.
- Therefore I suggest not to adjust any numbers until you have your "baseline model" generated. Note that "averaging" is parameterless. 
- Since the $\lambda$ kicks in as `add_diff`, we can use seperated script to make models with different $\lambda$, so no need to rerun this script which is time consuming.
- Instead, we rerun the script for a *picked* $\lambda$. $k$ is being obvious to be 1.0 because 0.2 is not good.
- DARE add $p$ to the equation, which make things complicated. Currently there is no discussion how it behaves, we only know it improves performance under some task (however I have no specific task here).
- Model Stock propose an idea to "drag mean to center (median?)". However `mergekit`'s approach is "mean of cosine similarity" due to the difficulty of the N model case. It can be applied to TIES as `apply_stock` since $\tau_m$ is an average (mean) of filtered weight. $\epsilon$ is considered for implementation only.

This is old merging log.

- DARE-TIES-SOUP: `24050903`. Best result from previous iterlation. `p=0.1,k=1.0,alpha=0.9,vote_sgn=1.0`
- NAVG: `24060301`. Control Test.
- MODELSTOCK: `24060302`. Failed with `nan` error. `cos_eps=1e-6`
- MEDIAN: `24060303`. Param sticked with source. `eps=1e-6,maxiter=100,ftol=1e-20`
- DARE-TIES-STOCK: `24060701`. Looks like STOCK has did nothing. `p=0.1,k=1.0,alpha=0.9,vote_sgn=1.0,apply_stock=1.0,cos_eps=1e-6`
- DARE-TIES-GM: `24060702`. Suprisingly looks like MEDIAN. `p=0.1,k=1.0,alpha=0.9,vote_sgn=1.0,apply_median=1.0,eps=1e-6,maxiter=100,ftol=1e-20`
- DARE-TIES-GM: `24062201`. Bumped from 116 to 133 models. Same as `24060702`.

This is the *recent* merging log.
- DGMLA: `24100201`. 192 models. `p=0.1,eps=-0.05`. Baseline.
- DGMLA: `24100601`. 20 models. `p=0.1,eps=-0.05`. Still works.
- DGMLA: `24100602`. 20 models. `p=0.3,eps=-0.30`. Doesn't work.
- DGMLA: `24100603`. 20 models. `p=0.1,eps=0.05`. Works in a different way.
- DGMLA: `24100604`. 20 models. `p=0.1,eps=-0.1`. Works in the same way.
- DGMLA: `24101101`. 216 models. `p=0.1,eps=-0.1`. Works as expected.
- DGMLA: `25022801`. 256 models. `p=0.1,eps=-0.1`. **Kinda works, but image so abstract. Need more study.**

In [12]:
DARE_PROB = 0.1 #By my experience. Paper has no recommended value, but code default at 0.1
DELLA_EPS = -0.1 #By my experience. Paper has no recommended value, but code default at 0.07 with p=0.35
DARE_DROP = 0.0 #1.0 for full DARE, 0.0 for Dropout only 
TIES_TOP_K = 1.0 #From ljleb, verified by me, must be 1.0, even paper suggest 0.2 as "Top 20%"
TIES_VOTE_SGN = 1.0 #0.0 for plain TIES, > 0.0 for TIES-SOUP
TIES_LAMBDA = 1.0 #1.0 From ljleb and paper, use other script to split in to different lamba
TIES_MODELSTOCK = 0.0 #1.0 for applying "t" to voted weight (kind of average)
TIES_MEDIAN = 1.0 #1.0 for applying geometric median instead of arithmetic mean in the last step.
MODELSTOCK_EPS = 1e-6 #As torch's default.
MEDIAN_EPS = 1e-6 #As torch's default.
MEDIAN_MAXITER = 100 #As author
MEDIAN_FTOL = 1e-20 #As author

Insert version number, and the... *"AstolfoMix"*.

If you want to make multiple versions of AstolfoMix with different algorithms, I suggest modify the `MODEL_NAME_KEYWORD`.

In [13]:
MODEL_NAME_SUFFIX = "25022801-1458190" #yymmddxx-commit
MODEL_NAME_KEYWORD = "AstolfoMix"

Change if your PC is in trouble.

My WS: [Xeon 8358 ES, X12DPI-N6, 4TB DDR4 w/ PMem, 2x RTX2080ti 22G, P4510 4TB *2](https://github.com/6DammK9/nai-anime-pure-negative-prompt/blob/main/ch04/ice_lake_ws.md). ~~Overkill~~ Suitable for a merger.

In [14]:
g_seed = 250228 #For reproducible result
g_device = "cuda:1" if ALGO_ACTIVATED == ALGO_AVERAGE else "cpu"  #I have 2 GPUS and this is the CPU slot
g_precision_while_merge = torch.float64 if "cuda" in g_device else torch.float64 #I have RAM
g_precision_final_model = torch.float16 if "cuda" in g_device else torch.float16 #fp16

#240407: 2**34 will throw NaN issue. Stay with default = 2**28
g_total_buffer_size=2**28
#240507: (Not effective) DARE requires single thread to prevent OOM
#240603: I have my machine upgraded. And... OS becomes unstable.
g_threads = 24 #16 if ALGO_ACTIVATED == ALGO_DARE else None

## User input shuold ends here. ##

Define output model name. I want to keep the format, however I need to manage the name manually.

In [15]:
FORMAT_BYPASS = "{}"

In [16]:
# Auto zfill under total model count
def az(n):
    return str(n).zfill(math.ceil(math.log10(len(MODEL_LIST_RAW))))

In [17]:
N_CLIP = ("te0","te1","te2")
N_RAW = "_x{}"
N_ITR = "{}a"
N_PICKED = "x{}"

MODEL_NAME_RAW_PREFIX = (N_ITR.format(N_PICKED)).format(az(len(MODEL_LIST_RAW)-1)) #x102a

MODEL_NAME_TE = ("{}{}{}{}".format(N_PICKED,N_CLIP[0],N_PICKED,N_CLIP[1])).format(az(len(MODEL_SELECTION_TE0)-1),az(len(MODEL_SELECTION_TE1)-1)) #x102te0x102te1
MODEL_NAME_FINAL_PREFIX = N_PICKED.format(az(len(MODEL_SELECTION_UNET)-1)) #x102

MODEL_NAME_RAW = "{}-{}-{}".format(MODEL_NAME_RAW_PREFIX,MODEL_NAME_KEYWORD,MODEL_NAME_SUFFIX) #x102a-AstolfoMix-e2e-240507-4edc67c
MODEL_NAME_TE_ITR = "{}-{}-{}{}-{}".format(MODEL_NAME_RAW_PREFIX,MODEL_NAME_KEYWORD,FORMAT_BYPASS,FORMAT_BYPASS,MODEL_NAME_SUFFIX) #x102a-AstolfoMix-_x102te0-e2e-240507-4edc67c
MODEL_NAME_SELECTED_TE = "{}-{}-{}-{}".format(MODEL_NAME_RAW_PREFIX,MODEL_NAME_KEYWORD,MODEL_NAME_TE,MODEL_NAME_SUFFIX) #x102a-AstolfoMix-x102te0x102te1-e2e-240507-4edc67c
MODEL_NAME_UNET_ITR = "_{}-{}-{}-{}".format(FORMAT_BYPASS,MODEL_NAME_KEYWORD,MODEL_NAME_TE,MODEL_NAME_SUFFIX) #_x102a-AstolfoMix-x102te0x102te1-e2e-240507-4edc67c
MODEL_NAME_FINAL = "{}-{}-{}-{}".format(MODEL_NAME_FINAL_PREFIX,MODEL_NAME_KEYWORD,MODEL_NAME_TE,MODEL_NAME_SUFFIX) #x102-AstolfoMix-x102te0x102te1-e2e-240507-4edc67c
MODEL_NAME_E2E = "{}-{}-{}-e2e-{}".format(MODEL_NAME_FINAL_PREFIX,MODEL_NAME_KEYWORD,MODEL_NAME_TE,MODEL_NAME_SUFFIX) #x102-AstolfoMix-x102te0x102te1-e2e-240507-4edc67c

In [18]:
print("Naive average model:                     {}".format(MODEL_NAME_RAW))
print("CLIP models to iterlate:                 {}".format(MODEL_NAME_TE_ITR))
print("Naive average model with selected CLIP:  {}".format(MODEL_NAME_SELECTED_TE))
print("UNET models to iterlate:                 {}".format(MODEL_NAME_UNET_ITR))
print("Final merged model (staged):             {}".format(MODEL_NAME_FINAL))
print("Final merged model (e2e):                {}".format(MODEL_NAME_E2E))

Naive average model:                     x255a-AstolfoMix-25022801-1458190
CLIP models to iterlate:                 x255a-AstolfoMix-{}{}-25022801-1458190
Naive average model with selected CLIP:  x255a-AstolfoMix-x255te0x255te1-25022801-1458190
UNET models to iterlate:                 _{}-AstolfoMix-x255te0x255te1-25022801-1458190
Final merged model (staged):             x255-AstolfoMix-x255te0x255te1-25022801-1458190
Final merged model (e2e):                x255-AstolfoMix-x255te0x255te1-e2e-25022801-1458190


## Setting up merge receipe and merge scheduler ##

- I'm still a bit panic about hardcoding. Getter / Setter will be fine. ~~No, you won't see OOP in python notebook.~~
- Will always run. `MODE_ACTIVATED` controls actual merge process only.

In [19]:
def rmk_raw():
    return 'RAW'
def rmk_ste():
    return 'SELECTED_TE'
def rmk_f():
    return 'FINAL'
def rmk_e2e():
    return 'E2E'
def rmk_te(i,j):
    return 'CLIP{}_TE{}'.format(i,j) #CLIP1TE0
def rmk_unet(i):
    return 'UNET{}'.format(i)  #UNET1

In [20]:
recipe_mapping = {}

def set_rmk(k, v):
    recipe_mapping[k] = v

In [21]:
def reset_rm():
    set_rmk(rmk_raw(), None)
    set_rmk(rmk_ste(), None)
    set_rmk(rmk_f(), None)

    for i in range(len(MODEL_LIST_RAW)):
        set_rmk(rmk_unet(i+1), None)
        for j in range(3):
            set_rmk(rmk_te(i+1,j), None)

In [22]:
reset_rm()

Single merger should be fine.

In [23]:
scheduler = sd_mecha.RecipeMerger(
    models_dir=DIR_BASE,
    default_device=g_device,
    default_dtype=g_precision_while_merge,
)

Define recipe extension, and make the model output path (Note that it is still being formatted)

In [24]:
OS_MODEL_PATH_RAW = "{}{}{}".format(DIR_BASE,MODEL_NAME_RAW,MECHA_MODEL_EXT)
RECIPE_PATH_RAW = "{}{}{}".format(DIR_BASE,MODEL_NAME_RAW,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_RAW = "{}{}".format(DIR_FINAL,MODEL_NAME_RAW)

OS_MODEL_PATH_TE_ITR = "{}{}{}{}".format(DIR_BASE,DIR_CLIP,MODEL_NAME_TE_ITR,MECHA_MODEL_EXT)
RECIPE_PATH_TE_ITR = "{}{}{}{}".format(DIR_BASE,DIR_CLIP,MODEL_NAME_TE_ITR,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_TE_ITR =  "{}{}{}".format(DIR_FINAL,DIR_CLIP,MODEL_NAME_TE_ITR)

OS_MODEL_PATH_SELECTED_TE = "{}{}{}".format(DIR_BASE,MODEL_NAME_SELECTED_TE,MECHA_MODEL_EXT)
RECIPE_PATH_SELECTED_TE = "{}{}{}".format(DIR_BASE,MODEL_NAME_SELECTED_TE,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_SELECTED_TE =  "{}{}".format(DIR_FINAL,MODEL_NAME_SELECTED_TE)

OS_MODEL_PATH_UNET_ITR = "{}{}{}{}".format(DIR_BASE,DIR_UNET,MODEL_NAME_UNET_ITR,MECHA_MODEL_EXT)
RECIPE_PATH_UNET_ITR = "{}{}{}{}".format(DIR_BASE,DIR_UNET,MODEL_NAME_UNET_ITR,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_UNET_ITR =  "{}{}{}".format(DIR_FINAL,DIR_UNET,MODEL_NAME_UNET_ITR)

OS_MODEL_PATH_FINAL = "{}{}{}".format(DIR_BASE,MODEL_NAME_FINAL,MECHA_MODEL_EXT)
RECIPE_PATH_FINAL = "{}{}{}".format(DIR_BASE,MODEL_NAME_FINAL,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_FINAL = "{}{}".format(DIR_FINAL,MODEL_NAME_FINAL)

OS_MODEL_PATH_E2E = "{}{}{}".format(DIR_BASE,MODEL_NAME_E2E,MECHA_MODEL_EXT)
RECIPE_PATH_E2E = "{}{}{}".format(DIR_BASE,MODEL_NAME_E2E,MECHA_RECIPE_EXT)
MECHA_MODEL_PATH_E2E = "{}{}".format(DIR_FINAL,MODEL_NAME_E2E)

In [25]:
# Better test the ugly full file path
def get_te_itr_path(s: str,i: int,j: int):
    return s.format(N_RAW.format(az(i+1)),N_CLIP[j])

def get_unet_itr_path(s: str,i: int):
    return s.format((N_ITR.format(N_PICKED)).format(az(i+1)))

In [26]:
print("Sample TE_ITR recipe path:   {}".format(get_te_itr_path(RECIPE_PATH_TE_ITR, 1, 1)))
print("Sample TE_ITR model path:    {}".format(get_te_itr_path(OS_MODEL_PATH_TE_ITR, 1, 1)))
print("Sample UNET_ITR recipe path: {}".format(get_unet_itr_path(RECIPE_PATH_UNET_ITR, 1)))
print("Sample UNET_ITR model path:  {}".format(get_unet_itr_path(OS_MODEL_PATH_UNET_ITR, 1)))
print("Does RAW model exists:       {}".format(os.path.isfile(OS_MODEL_PATH_RAW)))
print("Final model path:            {}".format(MECHA_MODEL_PATH_FINAL))

Sample TE_ITR recipe path:   F:/NOVELAI/astolfo_mix/sdxl/clip/x255a-AstolfoMix-_x002te1-25022801-1458190.mecha
Sample TE_ITR model path:    F:/NOVELAI/astolfo_mix/sdxl/clip/x255a-AstolfoMix-_x002te1-25022801-1458190.safetensors
Sample UNET_ITR recipe path: F:/NOVELAI/astolfo_mix/sdxl/unet/_x002a-AstolfoMix-x255te0x255te1-25022801-1458190.mecha
Sample UNET_ITR model path:  F:/NOVELAI/astolfo_mix/sdxl/unet/_x002a-AstolfoMix-x255te0x255te1-25022801-1458190.safetensors
Does RAW model exists:       False
Final model path:            ./x255-AstolfoMix-x255te0x255te1-25022801-1458190


### Right before the merging stuffs, I need to clear some hardcode. ###

In [27]:
MECHA_IS_SDXL = "sdxl"
MECHA_TXT1_IS_VITG = "txt"
MECHA_TXT2_IS_VITL = "txt2"
MECHA_UNET_IS_UNET = "unet"

# 240407: Seems that safe to be float
TAKE_MODEL_A = 0.0
TAKE_MODEL_B = 1.0

# 240407: New syntax proposed by ljleb
TE0_ALPHA = (sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT1_IS_VITG], TAKE_MODEL_A) | sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT2_IS_VITL, MECHA_UNET_IS_UNET], TAKE_MODEL_B))
TE1_ALPHA = (sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT2_IS_VITL], TAKE_MODEL_A) | sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT1_IS_VITG, MECHA_UNET_IS_UNET], TAKE_MODEL_B))
TE2_ALPHA = (sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT1_IS_VITG, MECHA_TXT2_IS_VITL], TAKE_MODEL_A) | sd_mecha.default(MECHA_IS_SDXL, [MECHA_UNET_IS_UNET], TAKE_MODEL_B))
TE0_ALPHA_INVERTED = (sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT2_IS_VITL, MECHA_UNET_IS_UNET], TAKE_MODEL_A) | sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT1_IS_VITG], TAKE_MODEL_B))
TE1_ALPHA_INVERTED = (sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT1_IS_VITG, MECHA_UNET_IS_UNET], TAKE_MODEL_A) | sd_mecha.default(MECHA_IS_SDXL, [MECHA_TXT2_IS_VITL], TAKE_MODEL_B))

CASTED_VAE_MODEL = sd_mecha.model("{}{}".format(DIR_RAW,MODEL_LIST_RAW[MODEL_SELECTION_VAE - 1]), MECHA_IS_SDXL)
CASTED_OG_MODEL = sd_mecha.model("{}{}".format(DIR_RAW,MODEL_LIST_RAW[MODEL_SELECTION_OG - 1]), MECHA_IS_SDXL)

FALLBACK_AS_OG_MODEL = CASTED_OG_MODEL

# 240603: Yes my algorithms are stackable. "TSD Merge" will have tons of parameters.
KWARGS_NAVG = { 'dtype': g_precision_while_merge, 'device': g_device }
KWARGS_MODELSTOCK = { 'cos_eps': MODELSTOCK_EPS, **KWARGS_NAVG }
KWARGS_MEDIAN = { 'eps': MEDIAN_EPS, 'maxiter': MEDIAN_MAXITER, 'ftol': MEDIAN_FTOL, **KWARGS_NAVG }
KWARGS_TIES = { 'alpha': TIES_LAMBDA, 'k': TIES_TOP_K, 'vote_sgn': TIES_VOTE_SGN, 'apply_stock': TIES_MODELSTOCK, 'apply_median': TIES_MEDIAN, **KWARGS_MODELSTOCK, **KWARGS_MEDIAN }
KWARGS_DARE = { 'probability': DARE_PROB, 'della_eps': DELLA_EPS, 'rescale': DARE_DROP, 'seed': g_seed, **KWARGS_TIES }

### Pick VAE ###
- It will pick `cur_model` for every merge key and `vae_model` for every passthroguh key
- Note that `cur_model` is already casted as `sd_mecha.model()`.

In [28]:
def pick_vae(cur_model):
    return sd_mecha.weighted_sum(CASTED_VAE_MODEL, cur_model, alpha=TAKE_MODEL_B, dtype=g_precision_while_merge, device=g_device)

### Naive Average ###

- Pay attention to the `alpha`. It is opposite to A1111: It is "A merge to B" instead of "Merge A with B".
- Also the receipe is set of `RecipeNode` under [tree structure](https://en.wikipedia.org/wiki/Tree_(data_structure)). Therefore you can expect there are quite a lot of recursive stuffs (returning iteself).

In [29]:
def make_recipe_naive_merge():
    models = list(map(lambda p: "{}{}".format(DIR_RAW,p),MODEL_LIST_RAW))

    if ALGO_ACTIVATED == ALGO_AVERAGE:
        recipe = models[0]
        casted_recipe = sd_mecha.model(recipe, MECHA_IS_SDXL)
        for i, model in enumerate(models[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe = sd_mecha.weighted_sum(casted_model, casted_recipe, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
    elif ALGO_ACTIVATED == ALGO_NAVG:
        casted_models = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models]
        casted_recipe = sd_mecha.n_average(*casted_models, **KWARGS_NAVG)        
    elif ALGO_ACTIVATED == ALGO_MEDIAN:
        casted_models = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models]
        casted_recipe = sd_mecha.geom_median(*casted_models, **KWARGS_MEDIAN)     
    elif ALGO_ACTIVATED == ALGO_MODELSTOCK:
        casted_models = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models]
        casted_recipe = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models, **KWARGS_MODELSTOCK)
    elif ALGO_ACTIVATED == ALGO_TIES:
        casted_models = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models]
        casted_recipe = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models, **KWARGS_TIES)
    elif ALGO_ACTIVATED == ALGO_DARE:
        casted_models = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models]
        casted_recipe = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models, **KWARGS_DARE)
    else:
        raise Exception("Algorithm {} is not supported.".format(ALGO_ACTIVATED))
    
    return pick_vae(casted_recipe)        

In [30]:
set_rmk(rmk_raw(), make_recipe_naive_merge())
sd_mecha.serialize_and_save(recipe_mapping[rmk_raw()], RECIPE_PATH_RAW)

INFO: Saving recipe to F:\NOVELAI\astolfo_mix\sdxl\x255a-AstolfoMix-25022801-1458190.mecha


### CLIP Models to test ###
- Note that `alpha` is using a special operator `|` which is "Bitwise OR".
- Also recall "TE0 use ViT-G / `txt1`" and "TE1 use ViT-L / `txt2`" and "TE2 use both"
- *I don't like the new syntax.*

In [31]:
def make_recipe_te_itr(p):
    clip_model = "{}{}".format(DIR_RAW, p)
    unet_model = "{}{}".format(MECHA_MODEL_PATH_RAW, MECHA_MODEL_EXT)
    casted_clip_model = sd_mecha.model(clip_model, MECHA_IS_SDXL)
    casted_unet_model = sd_mecha.model(unet_model, MECHA_IS_SDXL)
    recipe_te0 = sd_mecha.weighted_sum(casted_clip_model, casted_unet_model, alpha=TE0_ALPHA, dtype=g_precision_while_merge, device=g_device)
    recipe_te1 = sd_mecha.weighted_sum(casted_clip_model, casted_unet_model, alpha=TE1_ALPHA, dtype=g_precision_while_merge, device=g_device)
    recipe_te2 = sd_mecha.weighted_sum(casted_clip_model, casted_unet_model, alpha=TE2_ALPHA, dtype=g_precision_while_merge, device=g_device)
    return (pick_vae(recipe_te0), pick_vae(recipe_te1), pick_vae(recipe_te2))

In [32]:
# 3N models
if SKIP_GEN_MANUAL_FILTER:
    print("Skipped generating unused recipes.")
else:
    for i in range(len(MODEL_LIST_RAW)):
        rte = make_recipe_te_itr(MODEL_LIST_RAW[i])
        for j in range(len(N_CLIP)):    
            set_rmk(rmk_te(i+1,j), rte[j])
            sd_mecha.serialize_and_save(recipe_mapping[rmk_te(i+1,j)], get_te_itr_path(RECIPE_PATH_TE_ITR,i,j))

Skipped generating unused recipes.


### Picked CLIP Model ###

In [33]:
def make_recipe_ste():
    raw_models = list(map(lambda p: "{}{}".format(DIR_RAW,p),MODEL_LIST_RAW))
    models_te0 = [raw_models[i-1] for i in MODEL_SELECTION_TE0]
    models_te1 = [raw_models[i-1] for i in MODEL_SELECTION_TE1]

    unet_model = "{}{}".format(MECHA_MODEL_PATH_RAW, MECHA_MODEL_EXT)
    casted_unet_model = sd_mecha.model(unet_model, MECHA_IS_SDXL)

    if ALGO_ACTIVATED == ALGO_AVERAGE:
        recipe_te0 = models_te0[0]
        casted_recipe_te0 = sd_mecha.model(recipe_te0, MECHA_IS_SDXL)
        for i, model in enumerate(models_te0[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_te0 = sd_mecha.weighted_sum(casted_model, casted_recipe_te0, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
        
        recipe_te1 = models_te1[0]
        casted_recipe_te1 = sd_mecha.model(recipe_te1, MECHA_IS_SDXL)
        for i, model in enumerate(models_te1[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_te1 = sd_mecha.weighted_sum(casted_model, casted_recipe_te1, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
    elif ALGO_ACTIVATED == ALGO_NAVG:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.n_average(*casted_models_te0, **KWARGS_NAVG)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.n_average(*casted_models_te1, **KWARGS_NAVG)
    elif ALGO_ACTIVATED == ALGO_MEDIAN:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.geom_median(*casted_models_te0, **KWARGS_MEDIAN)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.geom_median(*casted_models_te1, **KWARGS_MEDIAN)
    elif ALGO_ACTIVATED == ALGO_MODELSTOCK:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_MODELSTOCK)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_MODELSTOCK)
    elif ALGO_ACTIVATED == ALGO_TIES:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_TIES)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_TIES)
    elif ALGO_ACTIVATED == ALGO_DARE:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_DARE)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_DARE)
    else:
        raise Exception("Algorithm {} is not supported.".format(ALGO_ACTIVATED))  

    casted_unet_model = sd_mecha.weighted_sum(casted_recipe_te0, casted_unet_model, alpha=TE0_ALPHA, dtype=g_precision_while_merge, device=g_device)
    casted_unet_model = sd_mecha.weighted_sum(casted_recipe_te1, casted_unet_model, alpha=TE1_ALPHA, dtype=g_precision_while_merge, device=g_device)

    return pick_vae(casted_unet_model)

In [34]:
if SKIP_GEN_MANUAL_FILTER:
    print("Skipped generating unused recipes.")
else:
    set_rmk(rmk_ste(), make_recipe_ste())
    sd_mecha.serialize_and_save(recipe_mapping[rmk_ste()], RECIPE_PATH_SELECTED_TE)

Skipped generating unused recipes.


### UNET Models to test ###

In [35]:
def make_recipe_unet_itr(p):
    unet_model = "{}{}".format(DIR_RAW, p)
    clip_model = "{}{}".format(MECHA_MODEL_PATH_SELECTED_TE, MECHA_MODEL_EXT)
    casted_unet_model = sd_mecha.model(unet_model, MECHA_IS_SDXL)
    casted_clip_model = sd_mecha.model(clip_model, MECHA_IS_SDXL)
    recipe_te2 = sd_mecha.weighted_sum(casted_clip_model, casted_unet_model, alpha=TE2_ALPHA, dtype=g_precision_while_merge, device=g_device)
    return pick_vae(recipe_te2)

In [36]:
# N models
if SKIP_GEN_MANUAL_FILTER:
    print("Skipped generating unused recipes.")
else:
    for i in range(len(MODEL_LIST_RAW)):
        set_rmk(rmk_unet(i+1), make_recipe_unet_itr(MODEL_LIST_RAW[i]))
        sd_mecha.serialize_and_save(recipe_mapping[rmk_unet(i+1)], get_unet_itr_path(RECIPE_PATH_UNET_ITR,i))

Skipped generating unused recipes.


### Final model ###

- 2 models will be produced for validation. The real e2e and staged merging should yield same model within floating errors.

In [37]:
def make_recipe_final():
    raw_models = list(map(lambda p: "{}{}".format(DIR_RAW,p),MODEL_LIST_RAW))
    models_unet = [raw_models[i-1] for i in MODEL_SELECTION_UNET]

    clip_model = "{}{}".format(MECHA_MODEL_PATH_SELECTED_TE, MECHA_MODEL_EXT)
    casted_clip_model = sd_mecha.model(clip_model, MECHA_IS_SDXL)

    if ALGO_ACTIVATED == ALGO_AVERAGE:
        recipe_unet = models_unet[0]
        casted_recipe_unet = sd_mecha.model(recipe_unet, MECHA_IS_SDXL)
        for i, model in enumerate(models_unet[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_unet = sd_mecha.weighted_sum(casted_model, casted_recipe_unet, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
    elif ALGO_ACTIVATED == ALGO_NAVG:
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.n_average(*casted_models_unet, **KWARGS_NAVG)
    elif ALGO_ACTIVATED == ALGO_MEDIAN:
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.geom_median(*casted_models_unet, **KWARGS_MEDIAN)
    elif ALGO_ACTIVATED == ALGO_MODELSTOCK:
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_MODELSTOCK)
    elif ALGO_ACTIVATED == ALGO_TIES:
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_TIES)
    elif ALGO_ACTIVATED == ALGO_DARE:
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_DARE)
    else:
        raise Exception("Algorithm {} is not supported.".format(ALGO_ACTIVATED))  

    final_model = sd_mecha.weighted_sum(casted_clip_model, casted_recipe_unet, alpha=TE2_ALPHA, dtype=g_precision_while_merge, device=g_device)

    return pick_vae(final_model)

def make_recipe_e2e():
    raw_models = list(map(lambda p: "{}{}".format(DIR_RAW,p),MODEL_LIST_RAW))

    models_te0 = [raw_models[i-1] for i in MODEL_SELECTION_TE0]
    models_te1 = [raw_models[i-1] for i in MODEL_SELECTION_TE1]
    models_unet = [raw_models[i-1] for i in MODEL_SELECTION_UNET]
   
    if ALGO_ACTIVATED == ALGO_AVERAGE:
        recipe_te0 = models_te0[0]
        casted_recipe_te0 = sd_mecha.model(recipe_te0, MECHA_IS_SDXL)
        for i, model in enumerate(models_te0[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_te0 = sd_mecha.weighted_sum(casted_model, casted_recipe_te0, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
        recipe_te1 = models_te1[0]
        casted_recipe_te1 = sd_mecha.model(recipe_te1, MECHA_IS_SDXL)
        for i, model in enumerate(models_te1[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_te1 = sd_mecha.weighted_sum(casted_model, casted_recipe_te1, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
        recipe_unet = models_unet[0]
        casted_recipe_unet = sd_mecha.model(recipe_unet, MECHA_IS_SDXL)
        for i, model in enumerate(models_unet[1:], start=2):
            casted_model = sd_mecha.model(model, MECHA_IS_SDXL)
            casted_recipe_unet = sd_mecha.weighted_sum(casted_model, casted_recipe_unet, alpha=(i-1)/i, dtype=g_precision_while_merge, device=g_device)
    elif ALGO_ACTIVATED == ALGO_NAVG:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.n_average(*casted_models_te0, **KWARGS_NAVG)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.n_average(*casted_models_te1, **KWARGS_NAVG)
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.n_average(*casted_models_unet, **KWARGS_NAVG)
    elif ALGO_ACTIVATED == ALGO_MEDIAN:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.geom_median(*casted_models_te0, **KWARGS_MEDIAN)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.geom_median(*casted_models_te1, **KWARGS_MEDIAN)
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.geom_median(*casted_models_unet, **KWARGS_MEDIAN)
    elif ALGO_ACTIVATED == ALGO_MODELSTOCK:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_MODELSTOCK)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_MODELSTOCK)
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.model_stock_n_models(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_MODELSTOCK)
    elif ALGO_ACTIVATED == ALGO_TIES:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_TIES)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_TIES)
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.add_difference_ties(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_TIES)
    elif ALGO_ACTIVATED == ALGO_DARE:
        casted_models_te0 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te0]
        casted_recipe_te0 = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_te0, **KWARGS_DARE)
        casted_models_te1 = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_te1]
        casted_recipe_te1 = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_te1, **KWARGS_DARE)
        casted_models_unet = [sd_mecha.model(model, MECHA_IS_SDXL) for model in models_unet]
        casted_recipe_unet = sd_mecha.ties_with_dare(CASTED_OG_MODEL, *casted_models_unet, **KWARGS_DARE)
    else:
        raise Exception("Algorithm {} is not supported.".format(ALGO_ACTIVATED))  

    e2e_model = "{}{}".format(DIR_RAW,MODEL_LIST_RAW[MODEL_SELECTION_VAE - 1]) 
    casted_e2e_model = sd_mecha.model(e2e_model, MECHA_IS_SDXL)

    casted_e2e_model = sd_mecha.weighted_sum(casted_e2e_model, casted_recipe_unet, alpha=TE2_ALPHA, dtype=g_precision_while_merge, device=g_device)
    casted_e2e_model = sd_mecha.weighted_sum(casted_e2e_model, casted_recipe_te0, alpha=TE0_ALPHA_INVERTED, dtype=g_precision_while_merge, device=g_device)
    casted_e2e_model = sd_mecha.weighted_sum(casted_e2e_model, casted_recipe_te1, alpha=TE1_ALPHA_INVERTED, dtype=g_precision_while_merge, device=g_device)

    #Note that there is no pick_vae
    return casted_e2e_model

In [38]:
if SKIP_GEN_MANUAL_FILTER:
    print("Skipped generating unused recipes.")
else:
    set_rmk(rmk_f(), make_recipe_final())
    sd_mecha.serialize_and_save(recipe_mapping[rmk_f()], RECIPE_PATH_FINAL)
    set_rmk(rmk_e2e(), make_recipe_e2e())
    sd_mecha.serialize_and_save(recipe_mapping[rmk_e2e()], RECIPE_PATH_E2E)

Skipped generating unused recipes.


## Time for action ##

In [39]:
ts = time.time()

### Naive Average ###

In [40]:
tss = time.time()

if MODE_RAW in MODE_ACTIVATED:
    if os.path.isfile(OS_MODEL_PATH_RAW):
        print("Merged model is present. Skipping.")
    else:
        scheduler.merge_and_save(recipe_mapping[rmk_raw()], output=MECHA_MODEL_PATH_RAW, fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

INFO: Saving to F:\NOVELAI\astolfo_mix\sdxl\x255a-AstolfoMix-25022801-1458190.safetensors
Merging recipe: 100%|██████████| 2515/2515 [39:03:56<00:00, 55.92s/it, key=model.diffusion_model.output_blocks.7.0.out_layers.3.weight, shape=[320, 320, 3, 3]]                       


Merge time: 140645 sec


### CLIP Models to test ###

In [41]:
tss = time.time()

if MODE_CLIP in MODE_ACTIVATED:
    # 3N models
    #for i in tqdm(range(len(MODEL_LIST_RAW)), desc="Iterlating raw model list to swap TEs"):
    #    for j in tqdm(range(len(N_CLIP)), desc="Making models with swapped raw TEs"):
    for i in range(len(MODEL_LIST_RAW)):
        for j in range(len(N_CLIP)):    
            if os.path.isfile(get_te_itr_path(OS_MODEL_PATH_TE_ITR,i,j)):
                print("Merged model is present. Skipping.")
            else:        
                scheduler.merge_and_save(recipe_mapping[rmk_te(i+1,j)], output=get_te_itr_path(MECHA_MODEL_PATH_TE_ITR,i,j), fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

This session is not activated. Skipping.
Merge time: 0 sec


### Picked CLIP Model ###

In [42]:
tss = time.time()

if MODE_CLIP in MODE_ACTIVATED:
    if os.path.isfile(OS_MODEL_PATH_SELECTED_TE):
        print("Merged model is present. Skipping.")
    else:
        scheduler.merge_and_save(recipe_mapping[rmk_ste()], output=MECHA_MODEL_PATH_SELECTED_TE, fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

This session is not activated. Skipping.
Merge time: 0 sec


### UNET Models to test ###

In [43]:
tss = time.time()

if MODE_UNET in MODE_ACTIVATED:
    # N models
    #for i in tqdm(range(len(MODEL_LIST_RAW)), desc="Iterlating raw model list to swap UNETs"):
    for i in range(len(MODEL_LIST_RAW)):
        if os.path.isfile(get_unet_itr_path(OS_MODEL_PATH_UNET_ITR,i)):
            print("Merged model is present. Skipping.")
        else:
            scheduler.merge_and_save(recipe_mapping[rmk_unet(i+1)], output=get_unet_itr_path(MECHA_MODEL_PATH_UNET_ITR,i), fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

This session is not activated. Skipping.
Merge time: 0 sec


### Final model ###

In [44]:
print (MECHA_MODEL_PATH_SELECTED_TE, MECHA_MODEL_PATH_FINAL)

./x255a-AstolfoMix-x255te0x255te1-25022801-1458190 ./x255-AstolfoMix-x255te0x255te1-25022801-1458190


In [45]:
tss = time.time()

if MODE_FINAL in MODE_ACTIVATED:
    if os.path.isfile(OS_MODEL_PATH_FINAL):
        print("Merged model is present. Skipping.")
    elif os.path.isfile(OS_MODEL_PATH_SELECTED_TE):
        scheduler.merge_and_save(recipe_mapping[rmk_f()], output=MECHA_MODEL_PATH_FINAL, fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
    else:
        print("Selected TE is not present. Skipping.")
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

This session is not activated. Skipping.
Merge time: 0 sec


In [46]:
tss = time.time()

if MODE_FINAL in MODE_ACTIVATED:
    if os.path.isfile(OS_MODEL_PATH_E2E):
        print("Merged model is present. Skipping.")
    else:
        scheduler.merge_and_save(recipe_mapping[rmk_e2e()], output=MECHA_MODEL_PATH_E2E, fallback_model=FALLBACK_AS_OG_MODEL, save_dtype=g_precision_final_model, total_buffer_size=g_total_buffer_size, threads=g_threads)
else:
    print("This session is not activated. Skipping.")

tse = time.time()
print("Merge time: {} sec".format(int(tse - tss)))    

This session is not activated. Skipping.
Merge time: 0 sec


Full operation time.

In [47]:
te = time.time()
print("Total time: {} sec".format(int(te - ts)))

Total time: 140645 sec
