<a href="https://colab.research.google.com/github/SushantSingh-23-01/Image_Inference/blob/main/SDXL_optimizations/SDXL_Quantizer_debloat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Quantize SDXL

Inspired By: https://civitai.com/articles/10417

`Note`: If the unet folder is not available and only merged checkpoint is available, this notebook's first section is useful for seperating the unet and clipvision files.

HuggingFace: https://huggingface.co/John6666

`Note`: Most of the sdxl, pony, illustrious models are available on this account.

### Importing Libraries

In [None]:
from dataclasses import dataclass
from huggingface_hub import hf_hub_download
import logging
import os
from safetensors.torch import load_file, save_file
import torch

### Main Config

`Config Parameters`:
- **repo_id**: Hugging Face repository.
- **filename**: Check the huggingface repository and if there is no *unet* folder or *safetensors* file's name doesn't match, make changes accordingly.
- **filename_prefix**: Desired name for the quantized model. Example: *cyberrealistic_xl, epicrealism_xl* etc.
- **output_dir**: Directory where quantized model will be stored.
- **quant_type**: Type of quantization to use. Example: *Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q8* etc.


`Note`: You still would need to download the clipvision models. Download following files:
1. *hf_repo_id/text_encoder/model.safetensors* and rename it to *filename_prefix_clip_l*. (~246 Mb)
2. *hf_repo_id/text_encoder_2/model.safetensors* and rename it to *filename_prefix_clip_g*. (1.4 Gb)
3. Place this both files in clip vision folder of models directory.
4. Vae needs to be only downloaded once for SDXL as all versions have same vae in them. It will also be available in *hf_repo_id/vae/diffusion_pytorch_model.safetensors*
5. Download it, rename it to *sdxl_vae* and place it in vae folder.

In [None]:
@dataclass
class config:
    repo_id = 'John6666/cyberrealistic-pony-v85-sdxl'
    filename = 'unet/diffusion_pytorch_model.safetensors'
    filename_prefix = 'cyberrealistic-sdxl-pony'
    output_dir = '/content/components'
    quant_type = 'Q5_K_S'

In [None]:
os.makedirs(config.output_dir, exist_ok=True)

### Downloading Model

In [None]:
unet = hf_hub_download(repo_id=config.repo_id, filename=config.filename)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


diffusion_pytorch_model.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]

###Setting up Llama.cpp
This cell will set up Llama.cpp need for GGUF conversion.

In [None]:
# Clone the llama.cpp repository
!git clone https://github.com/ggerganov/llama.cpp

# Install gguf-py
!pip install llama.cpp/gguf-py

# Change to the llama.cpp directory
%cd llama.cpp

Cloning into 'llama.cpp'...
remote: Enumerating objects: 48449, done.[K
remote: Counting objects: 100% (383/383), done.[K
remote: Compressing objects: 100% (269/269), done.[K
remote: Total 48449 (delta 272), reused 120 (delta 114), pack-reused 48066 (from 4)[K
Receiving objects: 100% (48449/48449), 102.88 MiB | 23.89 MiB/s, done.
Resolving deltas: 100% (34773/34773), done.
Processing ./llama.cpp/gguf-py
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: gguf
  Building wheel for gguf (pyproject.toml) ... [?25l[?25hdone
  Created wheel for gguf: filename=gguf-0.16.0-py3-none-any.whl size=80598 sha256=dd9cd5f887e237d0f99ad345b067c7cd83f731710324fbdc0803b2bbfcef37be
  Stored in directory: /root/.cache/pip/wheels/a1/6c/c6/6dbfb804e7a1607174676026fc9bf5d1006ceff85ba5c680b6
Successfully built gguf
Installing collected packages: 

###Download conversion files and patching
Thsi cell will patch Llama.cpp to recognize the SDXL architecture. The convert.py and the patch are from the city96's repo.

In [None]:
# Download convert.py
!wget -O convert.py "https://raw.githubusercontent.com/city96/ComfyUI-GGUF/main/tools/convert.py"

# Download convert_g.py for clip_g
!wget -O convert_g.py "https://huggingface.co/Old-Fisherman/SDXL_Finetune_GGUF_Files/resolve/main/convert_g.py"

# Download lcpp patch
!wget -O lcpp.patch "https://raw.githubusercontent.com/city96/ComfyUI-GGUF/main/tools/lcpp.patch"

# Patching lcpp
!git checkout tags/b3600
!git apply lcpp.patch

# Create the build directory
!mkdir build

# Change to the build directory
%cd build

# Run cmake to configure the build
!cmake ..

# Build the target with cmake
!cmake --build . --config Debug -j10 --target llama-quantize

# Change back to the previous directory
%cd ..

--2025-04-11 11:48:51--  https://raw.githubusercontent.com/city96/ComfyUI-GGUF/main/tools/convert.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12007 (12K) [text/plain]
Saving to: ‘convert.py’


2025-04-11 11:48:52 (18.0 MB/s) - ‘convert.py’ saved [12007/12007]

--2025-04-11 11:48:52--  https://huggingface.co/Old-Fisherman/SDXL_Finetune_GGUF_Files/resolve/main/convert_g.py
Resolving huggingface.co (huggingface.co)... 3.165.160.11, 3.165.160.12, 3.165.160.61, ...
Connecting to huggingface.co (huggingface.co)|3.165.160.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5077 (5.0K) [text/plain]
Saving to: ‘convert_g.py’


2025-04-11 11:48:53 (1.84 GB/s) - ‘convert_g.py’ saved [5077/5077]

--2025-04-11 11:48:53--  https://raw.githu

###Conversion from fp16 Unet Safetensors to Quant_xxx GGUF

In [None]:
class Quantizer:
    def __init__(self, input_path, output_dir, filename_prefix, quant_type):
        self.input_path = input_path
        self.output_dir = output_dir
        self.filename_prefix = filename_prefix
        self.quant_type = quant_type

    def convert_fp16_gguf16(self):
        dst = os.path.join(config.output_dir, f"{config.filename_prefix}-F16.gguf")
        if not os.path.exists(self.input_path):
            print(f"Error: Source file not found at {self.input_path}")
        else:
            command = f"python convert.py --src {self.input_path} --dst {dst}"
            print(f"Running command: {command}")
            !{command}

    def convert_gguf16_to_qx(self):
        src = os.path.join(config.output_dir, f"{config.filename_prefix}-F16.gguf")
        dst = os.path.join(self.output_dir, f"{self.filename_prefix}_{self.quant_type}.gguf")
        !./build/bin/llama-quantize {src} {dst} {config.quant_type}

    def __call__(self):
        self.convert_fp16_gguf16()
        self.convert_gguf16_to_qx()


quantizer = Quantizer(unet, config.output_dir, config.filename_prefix, config.quant_type)
quantizer()

Running command: python convert.py --src /root/.cache/huggingface/hub/models--John6666--cyberrealistic-pony-v85-sdxl/snapshots/3e34631448a938e007831dbca9156d6352f3c4f9/unet/diffusion_pytorch_model.safetensors --dst /content/components/cyberrealistic-sdxl-pony-F16.gguf
* Architecture detected from input: sdxl
add_embedding.linear_1.bias                                               torch.float16 --> F16, shape = {1280}
add_embedding.linear_1.weight                                             torch.float16 --> F16, shape = {2816, 1280}
add_embedding.linear_2.bias                                               torch.float16 --> F16, shape = {1280}
add_embedding.linear_2.weight                                             torch.float16 --> F16, shape = {1280, 1280}
conv_in.bias                                                              torch.float16 --> F16, shape = {320}
conv_in.weight                                                            torch.float16 --> F16, shape = {256, 45}
conv