# Quantize SDXL

Inspired By: https://civitai.com/articles/10417

**Main Difference**: Cutting down the unet-clipvisions seperation step, as most checkpoints have unet file available on huggingface.

`Note`: If the unet seperate file is not available and only merged checkpoint is available, this [colab notebook's](https://colab.research.google.com/drive/1xRwSht2tc82O8jrQQG4cl5LH1xdhiCyn?usp=sharing). The first section is useful for seperating the unet and clipvision files.

HuggingFace: https://huggingface.co/John6666

`Note`: Most of the sdxl models are available on this account.

### Importing Libraries

In [None]:
from dataclasses import dataclass
from huggingface_hub import hf_hub_download
import logging
import os
from safetensors.torch import load_file, save_file
import torch

### Main Config

`Config Parameters`:
- **repo_id**: Hugging Face repository.
- **filename**: Check the huggingface repository and if there is no *unet* folder or *safetensors* file's name doesn't match, make changes accordingly.
- **filename_prefix**: Desired name for the quantized model. Example: *cyberrealistic_xl, epicrealism_xl* etc.
- **output_dir**: Directory where quantized model will be stored.
- **quant_type**: Type of quantization to use. Example: *Q4_K_S, Q4_K_M, Q5_K_S, Q5_K_M, Q8* etc.


`Note`: You still would need to download the clipvision models. Download following files:
1. *hf_repo_id/text_encoder/model.safetensors* and rename it to *filename_prefix_clip_l*. (~246 Mb)
2. *hf_repo_id/text_encoder_2/model.safetensors* and rename it to *filename_prefix_clip_g*. (1.4 Gb)
3. Place this both files in clip vision folder of models directory.
4. Vae needs to be only downloaded once for SDXL as all versions have same vae in them. It will also be available in *hf_repo_id/vae/diffusion_pytorch_model.safetensors*
5. Download it, rename it to *sdxl_vae* and place it in vae folder.

`Note`: After carefully filling the config, all following cells can be run all together.

`Warning`: At the moment this does not support Pony and Illustrious versions.

In [None]:
@dataclass
class config:
    repo_id = 'John6666/cyberrealistic-pony-v85-sdxl'
    filename = 'unet/diffusion_pytorch_model.safetensors'
    filename_prefix = 'cyberrealistic-sdxl-pony'
    output_dir = '/content/components'
    quant_type = 'Q5_K_S'

In [None]:
os.makedirs(config.output_dir, exist_ok=True)

### Downloading Model

In [None]:
unet = hf_hub_download(repo_id=config.repo_id, filename=config.filename)

### Setting up Llama.cpp

This cell will set up Llama.cpp need for GGUF conversion.

In [None]:
# Clone the llama.cpp repository
!git clone https://github.com/ggerganov/llama.cpp

# Install gguf-py
!pip install llama.cpp/gguf-py

# Change to the llama.cpp directory
%cd llama.cpp

### Download conversion files and patching

This cell will patch Llama.cpp to recognize the SDXL architecture. The convert.py and the patch are from the city96's repo.

In [None]:
# Download convert.py
!wget -O convert.py "https://raw.githubusercontent.com/city96/ComfyUI-GGUF/main/tools/convert.py"

# Download convert_g.py for clip_g
!wget -O convert_g.py "https://huggingface.co/Old-Fisherman/SDXL_Finetune_GGUF_Files/resolve/main/convert_g.py"

# Download lcpp patch
!wget -O lcpp.patch "https://raw.githubusercontent.com/city96/ComfyUI-GGUF/main/tools/lcpp.patch"

# Patching lcpp
!git checkout tags/b3600
!git apply lcpp.patch

# Create the build directory
!mkdir build

# Change to the build directory
%cd build

# Run cmake to configure the build
!cmake ..

# Build the target with cmake
!cmake --build . --config Debug -j10 --target llama-quantize

# Change back to the previous directory
%cd ..

### Conversion from fp16 Unet Safetensors to Quant_xxx GGUF

In [None]:
class Quantizer:
    def __init__(self, input_path, output_dir, filename_prefix, quant_type):
        self.input_path = input_path
        self.output_dir = output_dir
        self.filename_prefix = filename_prefix
        self.quant_type = quant_type

    def convert_fp16_gguf16(self):
        dst = os.path.join(config.output_dir, f"{config.filename_prefix}-F16.gguf")
        if not os.path.exists(self.input_path):
            print(f"Error: Source file not found at {self.input_path}")
        else:
            command = f"python convert.py --src {self.input_path} --dst {dst}"
            print(f"Running command: {command}")
            !{command}

    def convert_gguf16_to_qx(self):
        src = os.path.join(config.output_dir, f"{config.filename_prefix}-F16.gguf")
        dst = os.path.join(self.output_dir, f"{self.filename_prefix}_{self.quant_type}.gguf")
        !./build/bin/llama-quantize {src} {dst} {config.quant_type}

    def __call__(self):
        self.convert_fp16_gguf16()
        self.convert_gguf16_to_qx()


quantizer = Quantizer(unet, config.output_dir, config.filename_prefix, config.quant_type)
quantizer()